How to access GPU nodes
In order to use a node that has a GPU, you need to request it from the job scheduler. Nodes may have two GPUs (Tesla T10) or three GPUs (Tesla M2070 nodes). To begin an interactive session, at the shell prompt, enter:
qrsh -l gpu
On Hoffman2 there are currently four publicly available GPU nodes cuda capability of 6.1 (all other publicly available nodes have cuda capability less than 3), each of this node is equipped with one P4 card. To request one of these nodes, please use:
qrsh -l gpu,P4
The above qrsh command will reserve an entire gpu node with its 2 or 3 gpu processors. An interactive session made with the above qrsh command will expire in 2 hours.
- To specify a different time limit for your session, use the h_rt or time parameter. Example for requesting 9 hours:
qrsh -l gpu,h_rt=9:00:00
- To reserve two gpu-nodes use:
qrsh -l gpu -pe dc_gpu 2
- To see which node(s) were reserved, at a gpu-node shell prompt enter:
get_pe_hostfile
- To see if the gpu nodes are up and/or in use, at any shell prompt enter:
qhost_gpu_nodes
- To see the specifics for a particular gpu node, at a g-node shell prompt enter:
gpu-device-query.sh
- To get a quick session for compiling or testing your code:
qrsh -l i,gpu
How to Specify GPU Types
There are multiple GPU types available in the cluster. Each type of GPU has a different compute capability, memory size and clock speed, among other things. If your GPU program requires a specific GPU type to run, you need to specify it explicitly. Without specifying GPU type allows UGE to arbitrarily pick any available GPU for your job. You may need to compile your code on the machine that has the required type of GPU. Currently, the following GPU types are available:
GPU type | Compute Capability |
Number of Cores |
Global Memory Size |
UGE option |
---|---|---|---|---|
Tesla V100 | 7.0 | 5120 | 32 GB | -l gpu,V100 |
Tesla P4 | 6.1 | 2560 | 8 GB | -l gpu,P4 |
Tesla T10 | 1.3 | 240 | 4.3 GB | -l gpu,T10 |
Tesla M2070 | 2.0 | 448 | 5.6 GB | -l gpu,fermi (see [1] below) |
Tesla M2090 | 2.0 | 512 | 6 GB | -l gpu,fermi (see [1] below) |
References:
The UGE options in the table above can be combined with other UGE options, for example:
qrsh -l gpu,fermi,h_rt=3:00:00
[1] If you specify -l fermi
the job will go to either M2070 or M2090 GPU nodes. If you specify -l M2070
the job will only go to M2070 and will not go to M2090 even when the later is available. If you specify -l M2090
the job will only go to M2090 and will not go to M2070 even when the later is available. This implies potentially longer wait time.
For most users, we recommend using -l fermi
instead of -l M2070
or -l M2090
unless you specifically want to use either one of them (e.g., benchmarking the differences between M2070 and M2090)
CUDA
CUDA is installed in /u/local/cuda/ on the Hoffman2 Cluster. There are several versions available. To see which versions of cuda are available please issue:
module av cuda
to load version 10.), use:
module load cuda/10.0
Note: you will be able to load a cuda module only when on a GPU node (either in an interactive session, requested with: qrsh -l gpu, or within batch jobs in which you have requested one or more GPU nodes with: “-l gpu”).
Already compiled samples of NVIDIA GPU Computing Software Development Kit are available, for example for cuda 10.0, in:
/u/local/cuda/10.0/NVIDIA_CUDA-10.0_Samples/bin/x86_64/linux/releasemodule load cuda/10.0
To install CUDA in your home directory, please see the instructions in the /u/local/cuda/README_ATS file. To install the NVIDIA GPU Computing Software Development Kit in your home directory, please see the instructions in the