Main navigation

Hoffman2 GPU Queue

Authorization to use Hoffman2 GPU nodes

To use a Hoffman2 node which has gpus (graphical processing units), you need to add your account to the gpu group. To do this, contact User Support

How to access GPU nodes

In order to use a node that has a gpu, you need to request it from the job scheduler. Nodes may have two gpus (Tesla T10) or three gpus (Tesla M2070 nodes). To begin an interactive session, at the shell prompt, enter:

qrsh -l gpu

The above qrsh command will reserve an entire gpu node with its 2 or 3 gpu processors. The maximum amount of memory (h_data or mem) that you can request is 24G on the Tesla T10 nodes, or 48G on the Tesla M2070 nodes. An interactive session made with the above qrsh command will expire in 2 hours. The maximum amount of time for a session is 9 hours.

  • To specify a different time limit for your session, use the h_rt or time parameter. Example for requesting 9 hours:

    qrsh -l gpu,h_rt=9:00:00

  • To reserve two nodes at a login node shell prompt, enter:

    qrsh -l gpu  -pe dc_gpu 2

  • To see which node(s) were reserved, at a g-node shell prompt enter:

    get_pe_hostfile

  • To see if the gpu nodes are up and/or in use, at any shell prompt enter:

    qhost_gpu_nodes

  • To see the specifics for a particular gpu node, at a g-node shell prompt enter:

    gpu-device-query.sh

  • To get a quick session for compiling or testing your code. This does not give you exclusive use of the gpu node:

    qrsh -l i,gpu

How to Specify GPU Types

There are multiple GPU types available in the cluster. Each type of GPU has a different compute capability, memory size and clock speed, among other things. If your GPU program requires a specific GPU type to run, you need to specify it explicitly. Without specifying GPU type allows UGE to arbitrarily pick any available GPU for your job. You may need to compile your code on the machine that has the required type of GPU. Currently, the following GPU types are available:

GPU type Compute
Capability
Number
of Cores
Global
Memory Size
UGE option
Tesla T10 1.3 240 4.3 GB -l T10
Tesla M2070 2.0 448 5.6 GB -l fermi (see [1] below)
Tesla M2090 2.0 512 6 GB -l fermi (see [1] below)

References:

The UGE options in the table above can be combined with other UGE options, for example:

qrsh -l gpu,fermi,h_rt=3:00:00

[1] If you specify -l fermi the job will go to either M2070 or M2090 GPU nodes. If you specify -l M2070 the job will only go to M2070 and will not go to M2090 even when the later is available. If you specify -l M2090 the job will only go to M2090 and will not go to M2070 even when the later is available. This implies potentially longer wait time.

For most users, we recommend using -l fermi instead of -l M2070 or -l M2090 unless you specifically want to use either one of them (e.g., benchmarking the differences between M2070 and M2090)

CUDA

CUDA is installed in /u/local/cuda/ on the Hoffman2 Cluster. There are several versions available. The most recent as of December 2011 is 4.0.17   You can refer to the current production version with /u/local/cuda/current/. To install CUDA in your home directory, please see the instructions in the /u/local/cuda/README_ATS file. To install the NVIDIA GPU Computing Software Development Kit in your home directory, please see the instructions in the /u/local/cuda/README_SDK_ATS file.

 

December 2012

Report Typos and Errors
UCLA OIT

© 2016 UC REGENTS TERMS OF USE & PRIVACY POLICY