Computing¶
Upon connecting/logging into the cluster (unless Connecting via Jupyter Notebook/Lab) users access the cluster via its login nodes. Login nodes are special hosts which sole purpose is to provide a gateway to the compute nodes and their computational resources. For more information, see Role of login nodes.
Computational resources (such as memory, cores, runtime, CPU-type, GPU-type, etc.) on board of compute nodes are managed by a job scheduler. Any CPU, GPU or memory intensive computing task should be performed within either interactive sessions or batch jobs scheduled on the cluster’s compute nodes.
This page describes:
Computational resources on the Hoffman2 Cluster¶
Node types¶
A summary of the types of nodes that you will encounter while using the Hoffman2 Cluster and a description of their intended use is given in the Types of nodes on the Hoffman2 Cluster table:
Note Login nodes are meant for light-weight tasks such as editing your code and submitting jobs to the scheduler. Login nodes are a resource shared by many concurrent users and are not intended for CPU or memory intensive tasks. Please see Role of login nodes. Important All CPU and/or memory intensive (as well as GPU) computations need to run on compute nodes accessed via the scheduler. |
|
The Hoffman2 Cluster has two dedicated and performance-tuned data transfer nodes with advanced parallel transfer tools to support your research workflows. |
|
CPU-based compute nodes |
Most of the nodes on the Hoffman2 Cluster are CPU-based compute nodes. These are where your jobs execute and can be accessed interactively via the qrsh command or by batch job execution. |
GPU-based compute nodes |
A portion of compute nodes on the Hoffman2 Cluster is equipped with one or more GPU cards available on the Hoffman2 Cluster of various types. Please refer to: Role of GPU nodes to see what workload is best suited to run on these nodes. Please refer to GPU access to learn how to request an interactive session or a batch job to run on a GPU node. |
The Hoffman2 Cluster has a number of compute nodes available to the entire UCLA community. Additionally, research groups can purchase dedicated compute nodes. Users in groups who have contributed nodes to the cluster can access their nodes in a preferential fashion and for extended runtimes or access unused cores across the wider cluster (see: Highp vs shared vs campus jobs).
Group-owned nodes¶
Group-owned nodes, allow users to run jobs (interactive or batch) on their computational resources for an extended runtime (up to fourteen days). Moreover, the portion of the jobs submitted to owned-resources, that can be concurrently allocated on them, are guaranteed to start within twenty-four hours from their submission (wait time is typically less). Node ownership also allow users in that group to access any currently unused resource owned by a different group for up to a runtime of 24 hours.
If your group is interested in purchasing nodes, please visit: Purchasing additional resources.
Jobs and resources¶
To prevent resource contention and to distribute computations across the multiple compute nodes on the cluster, any type of CPU/GPU or memory intensive task should be executed on compute nodes by requesting interactive sessions, for interactive-type work, or submitting batch jobs to the scheduler.
The command that requests interactive sessions is: qrsh.
The command that submits batch jobs (which instructions are contained in a shell scrip) is: qsub.
The command that terminates a job is qdel.
Note
Each compute nodes on the Hoffman2 Cluster generally run simultaneous jobs from multiple users, to prevent automatic job termination and to ensure performance of every job it is important to request the right amount of resources when submitting a batch job or requesting an interactive session.
To learn which computational resources such as (but not limited to) run time, memory, and number of computing cores can be requested for your interactive sessions or batch jobs, please see:
Note
If no attributes are specified the scheduler assumes that a batch job or interactive session will use 1 core, 1 GB of memory and that it will run for 2 hours on any available compute node on the cluster and it will dispatch the job accordingly.
Requesting resources (other than cores)¶
Within the Univa Grid Engine (UGE) (the job scheduler currently running on the cluster), any resource (other then the number of computing cores - to request which please see Requesting multiple cores) can be requested via key-value pairs, known as complexes, passed as the arguments of the -l
option to qsub or qrsh. Some of the complexes that you might need to use, are shown in the Principal requestable resources table:
name of key |
type |
default value |
specifies |
---|---|---|---|
|
TIME |
2:00:00 |
runtime |
|
MEMORY |
1G |
memory per process |
|
MEMORY |
1G |
memory per job |
|
BOOLEAN |
TRUE |
run on owned resources |
|
BOOLEAN |
TRUE |
run on owned resources |
|
STRING |
NONE |
specify processor type |
|
BOOLEAN |
TRUE |
run on GPU nodes |
|
BOOLEAN |
TRUE |
run on GPU node w/ P4 card |
|
BOOLEAN |
TRUE |
run on GPU node w/ RTX2080Ti cards |
|
BOOLEAN |
TRUE |
run on GPU node w/ P4 card |
|
BOOLEAN |
TRUE |
run on GPU node w/ A100 cards |
|
RSMAP |
1 |
number of GPU cards on same node |
|
BOOLEAN |
TRUE |
run on GPU node w/ A6000 cards |
|
RSMAP |
1 |
number of GPU cards on same node |
A complete list of complexes defined on the Hoffman2 Cluster can be obtained by issuing at the command line of a terminal connected to the Hoffman2 Cluster with the command:
$ qconf -sc
Examples of how to request resources¶
To request a runtime of, for example, 12 hours, use:
$ qrsh -l h_rt=12:00:00
$ qsub -l h_rt=12:00:00
To request, for example, 4GB of memory, use:
$ qrsh -l h_data=4G
$ qsub -l h_data=4G
To request a node in exclusive mode (e.g., all of its cores and memory):
$ qrsh -l exclusive
$ qsub -l exclusive
Warning
This applies only if your group has purchased computational resources. If you are unsure you can check with the command:
$ myresources
if you are in the campus resource group using this option will cause your job to never start.
To request to run on a owned-nodes:
$ qrsh -l highp
$ qsub -l highp
To request to run on any GPU node (but nodes with A100, the latter needs to be explicitly requested):
$ qrsh -l gpu
$ qsub -l gpu
Tip
To check the specifications and the types of GPU cards publicly available on the Hoffman2 Cluster please refer to the: GPU cards available on the Hoffman2 Cluster table.
To request to run on a GPU node with a specific GPU card, for example V100
:
$ qrsh -l gpu,V100
$ qsub -l gpu,V100
To request to run on a node with a specific CPU card, for example a CPU in the intel-gold
class:
$ qrsh -l arch=intel-gold\*
$ qsub -l arch=intel-gold\*
Note
Possible values of the arch
complex can be queried by issuing the command:
$ qhost -F arch | awk -F = '{print $2}' | grep -v ^$ | sort | uniq
Note
Several resources can be requested either as a command-separated list of key-values pairs following the -l
option, or by space-separated -l key-value pair
options. For example to request a run time of 3 hours and 4GB of memory:
$ qrsh -l h_rt=3:00:00,h_data=4G
or:
$ qrsh -l h_rt=3:00:00 -l h_data=4G
$ qsub -l h_rt=3:00:00,h_data=4G
or:
$ qsub -l h_rt=3:00:00 -l h_data=4G
Requesting multiple cores¶
If you are planning to run an application that will use more than one CPU core, you should request cores using the -pe <parallel environment> <n>
directive (where: <parallel environment>
is the name of the parallel environment and <n>
is the number of cores that you are planning to use) to the qrsh
or qsub
commands.
A list of the principal parallel environment names and their role is given in the parallel environment table.
name |
allocation rule |
use |
---|---|---|
|
cores are allocated on a single host |
shared memory jobs |
|
cores are allocated on any host |
Distributed memory jobs |
|
one core per node |
use with |
Examples of how to request multiple cores¶
To run an applications that uses multiple cores in shared-memory (e.g., threads, openmp, etc.), use, for example, to request 12 cores:
$ qrsh -pe shared 12
$ qsub -pe shared 12
To run an application that uses multiple cores in a distributed-memory fashion (e.g., MPI), use, for example, to request 72 cores:
$ qrsh -pe dc\* 72
$ qsub -pe dc\* 72
To run an application that uses multiple cores in a combination of shared-memory (within a node) and distributed-memory (across nodes), use, for example, to run across 6 nodes (using all the resources on each of them):
$ qrsh -l exclusive -pe node 6
$ qsub -l exclusive -pe node 6
Note
If, for symmetry of computational speed and to have the same number of cores on each node, you need to request a specific CPU-type, use, for example to request 3 nodes each with an intel-gold
type of CPU, use:
$ qrsh -l exclusive,arch=intel-gold\* -pe node 6
$ qsub -l exclusive,arch=intel-gold\* -pe node 6
Possible values of the arch
complex can be queried by issuing the command:
$ qhost -F arch | awk -F = '{print $2}' | grep -v ^$ | sort | uniq
A complete list of parallel environments available on the Hoffman2 Cluster can be obtained by issuing the command:
$ qconf -spl
Requesting multiple GPU cards on the same node¶
GPU nodes with RTX2080Ti or A100 on the Hoffman2 Cluster have multiple GPU cards, to request more than one GPU card on the same node users should add -l cuda=N
to their batch job/interactive session resources requested (where N
will depend on the GPU node, see Number of cards per GPU node).
GPU card type |
Number of cards per node |
Scheduler option to request number of cards |
---|---|---|
A100 |
4 |
-l gpu,A100,cuda={1,4} |
V100 |
1 |
-l gpu,V100,cuda=1 |
RTX2080Ti |
2 |
-l gpu,RTX2080Ti,cuda={1,2} |
P4 |
1 |
-l gpu,P4,cuda=1 |
Requesting interactive sessions¶
Basic usage¶
An interactive session allows you to access computing resources (e.g., cores, memory, GPUs, etc.) on the nodes comprising the cluster for a given amount of time. To request an interactive session, from a terminal connected to the Hoffman2 cluster issue the command:
$ qrsh
after issuing the command above, the shell prompt will typically return after a short wait and your prompt will typically change to display the compute node on which your interactive session is running. For example, user joebrun
could experience the following change in prompt:
[joebruin@login3 ~]$ qrsh
[joebruin@n2001 ~]$
from the login nodes, login3, to the compute node, n2001.
Note
Unless you have otherwise requested, by default you have access to 1GB or memory, one computing core and 2 hours run-time on any node on the cluster that is available to you.
Customizing the qrsh command¶
The qrsh
command can be customized to allow you to request the needed runtime, amount of memory, number of cores, whether the cores requested will be from one or more compute nodes, the type or CPU, the type of GPU, and many other requestable characteristics. Each of the resources that a user can request is specified by a comma-separated list of key-value pairs, known as complexes, which follow the -l
option to the qrsh
command, while the number of cores is specified by the -pe
option to the qrsh
command followed by a space separate list of two items: the name of the parallel environment needed (which will be suitable to shared, distributed or hybrid memory use) and the integer number specifying the number of cores requested.
To learn more how to request resources or compute cores, please see: Requesting resources (other than cores) and Requesting multiple cores.
qrsh command to run serial jobs¶
Serial jobs use one compute core and therefore there is no need to specify the parallel environment and the number of cores. To get an interactive session with a runtime longer than the default 2 hours and more memory than the default 1GB, you will need to specify a value for the scheduler complex h_rt
(runtime) and a value for the complex h_data
(memory).
For example, to request an interactive session with a runtime of 3 hours and a total of 4GB of memory, issue at the Hoffman2 command prompt:
$ qrsh -l h_rt=3:00:00,h_data=4G
Warning
The scheduler is configured to automatically terminate jobs that will attempt to use more memory than it was requested or to continue to run past the time limit. Make sure to request enough memory and runtime in order to keep your interactive session active.
qrsh command to run distributed memory jobs¶
If program you intend to run in the interactive session, can run across multiple nodes (using message passing libraries), you will need to request cores with -pe dc\* <n>`(where ``<n>
is the number of cores requested).
For example, to request 16 CPU core, a runtime of 1 hour, and 2GB of memory per core, issue:
$ qrsh -l h_rt=1:00:00,h_data=2G -pe dc\* 16
qrsh command to run on exclusively reserved nodes¶
When invoking an interactive session with qrsh
, the proper memory size needs to be specified via h_data
. If you are unsure of what amount of memory is appropriate for your interactive session, you could add -l exclusive
to your qrsh
command. In this case, the h_data
value is used by the scheduler to select a compute node having a total amount of memory equal or greater than what specified with h_data
. In this case, the memory limit for the job is the compute node’s physical memory size.
For example, the command:
$ qrsh -l h_rt=8:00:00,h_data=32G,exclusive
will start an interactive session on a compute node equipped with at least 32G of physical memory. The node will be exclusively reserved for you and you can therefore use all of its cores and memory (despite the h_data
value).
Note
You can only request as much memory as is available on nodes on the cluster. Interactive session requested via qrsh
without specifying an h_data
value are automatically assigned an h_data=1G
, which may or may not be too small for your application.
qrsh command to run on your group’s nodes¶
Warning
The following section does not apply to you if your research group has not purchased Hoffman2 compute nodes.
To run on your group nodes, add the -l highp
switch to your qrsh
command. For example, to request an interactive session with a duration of two days (48 hours), 4GB of memory (and one core), issue the command:
$ qrsh -l highp,h_rt=48:00:00,h_data=4G
You could also request multiple cores using the -pe dc\* <n>
, -pe shared <n>
or -l exclusive -pe node <n>
as described in Requesting multiple cores. When combining with -l highp
, the amount of cores, or the memory requested, needs to be compatible with what is available on your group compute nodes. Contact user support should you have any questions.
Although you are allowed to specify h_rt
as high as 336 hours (14 days) for a qrsh
session, it is not recommended. For example, if the network connection is interrupted (e.g. your laptop or desktop computer goes into sleep mode), the qrsh
session may be lost, possibly terminating all running programs within it.
qrsh examples¶
Note
Multiple resources can be requested with the -l
option to qrsh
. Each key=value complex needs to be given as comma-separated list without any white space in between (e.g., -l key1=value1,key2=value2
). Alternative separate -l
options can be specified (e.g., -l key1=value1 -l key2=value2
).
To request a single processor for 24 hours from the interactive queues, issue the command:
$ qrsh -l h_rt=24:00:00,h_data=1G
To request 8 processors for 4 hours (total 8*1G=8GB memory) on a single node from the interactive queues, issue the command:
$ qrsh -l h_data=1G,h_rt=4:00:00,h_data=1G -pe shared 8
To request 4 processors for 3 hours (total 4*1G=4GB memory) on a single node, issue the command:
$ qrsh -l h_data=1G,h_rt=3:00:00,h_data=1G -pe shared 4
To request 12 processors, 1GB of memory per processor, for 2 hours, issue the command:
$ qrsh -l h_data=1G,h_rt=2:00:00 -pe dc\* 12Note
The 12 CPUs are distributed across multiple compute nodes. The backslash
\
indc\*
is significant when you issue this command in an interactive csh/tcsh unix shell.
qrsh startup time¶
A qrsh
session is scheduled along with all other jobs managed by the scheduler software. The shorter time (the -l h_rt
option), and the fewer number of processors (the -pe
option), the better chance you have of getting a session. Request just what you need for the best use of computing resources. Be considerate to other users by exiting your qrsh
session when you are done to release the computing resources to other users.
Resource limitation¶
Hoffman2 Cluster’s compute nodes have different memory sizes. When you request more then one core (using: -pe shared <n>
), the total memory requested on the node will be the product of the number of cores time the memory per core (h_data
). In general, the larger the total memory requested, the longer the wait. Please refer to the output of the command:
$ qhost
to see what total memory is available on the various nodes on the cluster, keeping in mind that not all hosts may be accessible to you.
When you request multiple cores, or a large amount of total memory, you may or may not get the interactive session immediately, depending on how busy the cluster is and the permission level of your account. To see to which class of nodes (memory, number of cores, etc.) you have access to, you can enter the following at the Hoffman2 command prompt:
$ myresources
Interpreting error messages¶
Occasionally, you may encounter one of the following messages: error: no suitable queues
or qrsh: No match.
If you receive the no suitable queues
message and you are requesting the interactive queues (-l i
), be sure you have not requested more than 24 hours. This message may mean there is something incompatible with the various parameters you have specified and your qrsh
session can never start. For example, you have requested -l h_rt=25:00:00
but your userid
is not authorized to run sessions or jobs for more than 24 hours.
If your session could not be scheduled, first try your qrsh
command again in case it was a momentary problem with the scheduler.
If your session still cannot be scheduled, try lowering either the value of h_rt
, the number of processors requested, or both, if possible.
Contact user support should you still have problems.
Running MPI with qrsh¶
The following instructions apply to the IntelMPI and the OpenMPI libraries. They may not apply to other MPI implementations.
After requesting an interactive session to run distributed memory jobs, you will need to select the version of IntelMPI/OpenMPI and to set the environment for the scheduled job. In the following example the executable MPI program is named foo
.
In the qrsh
session at the shell prompt, enter one of the following commands:
If you are in bash
or sh
-type shell and you need a specific version of the IntelMPI (say: intel/19.0.5
):
$ module load intel/19.0.5 # load the intel/19.0.5 module
$ . /u/local/bin/set_qrsh_env.sh # set the environment for the scheduled job
$ `which mpirun` -n $NSLOTS ./foo # run the foo MPI executable
If you are in tcsh
or csh
shell and you need a specific version of the IntelMPI (say: intel/19.0.5
):
$ module load intel/19.0.5 # load the intel/19.0.5 module
$ source /u/local/bin/set_qrsh_env.csh # set the environment for the scheduled job
$ `which mpirun` -n $NSLOTS ./foo # run the foo MPI executable
If you are in bash
or sh
-type shell and you need a specific version of the OpenMPI (say: openmpi/3.1.6
):
$ module load openmpi/3.1.6 # load the openmpi/3.1.6
$ . /u/local/bin/set_qrsh_env.sh # set the environment for the scheduled job
$ `which mpirun` -n $NSLOTS ./foo # run the foo MPI executable
If you are in tcsh
or csh
shell and you need a specific version of the OpenMPI (say: openmpi/3.1.6
):
$ module load openmpi/3.1.6 # load the openmpi/3.1.6
$ source /u/local/bin/set_qrsh_env.csh # set the environment for the scheduled job
$ `which mpirun` -n $NSLOTS ./foo # run the foo MPI executable
You could replace $NSLOTS
with an integer, which is less than the number of processors you requested on your qrsh
command if needed.
You do not have to create a hostfile and pass it to mpiexec.hydra
with its -machinefile
or -hostfile
option because mpiexec.hydra
automatically retrieves that information from UGE
.
Additional tools¶
Additional scripts are available that may help you run other parallel distributed memory software. You can enter these commands at the compute node’s shell prompt:
$ get_pe_hostfile
Returns the contents of the UGE pe_hostfile
file for the current qrsh
session. If you have used the -pe
directive to request multiple processors on multiple nodes, you will probably need to tell your program the names of those nodes and how many processors have been allocated on each node. This information is unique to your current qrsh
session.
To create an MPI-style hostfile named hfile
in the current directory:
$ get_pe_hostfile | awk '{print $1" slots="$2}' > hfile
The UGE pe_hostfile
is located:
$SGE_ROOT/$SGE_CELL/spool/node/active_jobs/sge_jobid.1/pe_hostfile
where node
and sge_jobid
are the hostname
and UGE $JOB_ID
, respectively, of the current qrsh
session.
To return the value of JOB_ID
for the current qrsh
session, issue the command:
$ get_sge_jobid
To return the contents of the scheduler environment file for the current qrsh
session, issue:
$ get_sge_env
which is used by the set_qrsh_env
scripts.
UGE
-specific environment variables are defined in the file:
$SGE_ROOT/$SGE_CELL/spool/node/active_jobs/sge_jobid.1/environment
or,
$SGE_ROOT/$SGE_CELL/spool/node/active_jobs/sge_jobid.sge_taskid/environment
where node
and sge_jobid
are the hostname
and UGE $JOB_ID
, respectively, of the current qrsh
session. sge_taskid
is the task number of a array job $SGE_TASK_ID
.
Problems with the instructions on this section? Please send comments here.
Submitting batch jobs¶
In order to run a non-interactive batch job under the Univa Grid Engine (UGE), you need to specify the resources and the number of cores that your job will need and the actual command (or a recipe consisting of multiple commands) to execute.
In this section the following topics are discussed:
Use qsub with a submission script¶
A submission script allows you to set the environment for your job (for example by loading a needed module) and/or to codify a sequence of commands (for example for actions that need to occur in sequence).
Once you have generated a submission script you can submit your job with:
$ qsub <submission-script>
where: <submission-script>
is the name of your submission script.
You can also define (or redefine) resource at the command line. For example to requests the complexes key1/value1 and key/value2 and to change the parallel environment or the number of cores requested (say to shared
and 8
), you could use:
$ qsub -l key1=value1,key2=value2 -pe shared 8 <submission-script>
Note
The resources, parallel environment and number of cores requested as options to qsub
on the command line take the precedence on the resources, parallel environment and number of cores specified within the submission script.
Use qdel to terminate a job¶
After a job is submitted, you can use qdel
to terminate it:
$ qdel <JOB_ID>
where <JOB_ID>
is the job ID of the job being terminated. The job ID can be displayed by the myjobs command.
How to build a submission script¶
In this section an example of a basic submission script (written in shell scripting language bash) is described. You can copy and paste the script in a file on the cluster. The script should be modified (as instructed in its comment lines) to suit your requirements in terms of resources, number of cores, job environment and the actual commands that you will need to run.
#### submit_job.sh START ####
#!/bin/bash
#$ -cwd
# error = Merged with joblog
#$ -o joblog.$JOB_ID
#$ -j y
## Edit the line below as needed:
#$ -l h_rt=1:00:00,h_data=1G
## Modify the parallel environment
## and the number of cores as needed:
#$ -pe shared 1
# Email address to notify
#$ -M $USER@mail
# Notify when
#$ -m bea
# echo job info on joblog:
echo "Job $JOB_ID started on: " `hostname -s`
echo "Job $JOB_ID started on: " `date `
echo " "
# load the job environment:
. /u/local/Modules/default/init/modules.sh
## Edit the line below as needed:
module load gcc/4.9.3
## substitute the command to run your code
## in the two lines below:
echo '/usr/bin/time -v hostname'
/usr/bin/time -v hostname
# echo job info on joblog:
echo "Job $JOB_ID ended on: " `hostname -s`
echo "Job $JOB_ID ended on: " `date `
echo " "
#### submit_job.sh STOP ####
To submit the job issue at the command line:
$ chmod u+x submit_job.sh
$ qsub submit_job.sh
To understand the Basic submission script its parts are analyzed in the following sections.
Submission script preamble¶
#### submit_job.sh START ####
#!/bin/bash
#$ -cwd
# error = Merged with joblog
#$ -o joblog.$JOB_ID
#$ -j y
## Edit the line below as needed:
#$ -l h_rt=1:00:00,h_data=1G
## Modify the parallel environment
## and the number of cores as needed:
#$ -pe shared 1
# Email address to notify
#$ -M $USER@mail
# Notify when
#$ -m bea
The submission script preamble contains the resources information (lines starting with: #$ -l
and #$ -pe
) that the scheduler needs to properly dispatch the job. You will need to edit these lines to match your needs (see: Requesting resources (other than cores) and Requesting multiple cores to learn how to do so). The meaning of other scheduler-specific lines is explained in the Principal options to the qsub command.
Lines starting with #$
are interpreted by the scheduler, while lines starting with #
are comments inserted for clarity and lines starting with ##
are meant to inform you which lines you should modify.
Submission script logging abilities¶
Lines starting with echo
, once the job is running, will output to the file joblog.$JOB_ID
useful information about the node on which the job is running, the start and end time, and the command that is being executed.
Submission script: setting the job environment¶
The part of submit_job.sh that loads the environment for the job is:
# load the job environment:
. /u/local/Modules/default/init/modules.sh
## Edit the line below as needed:
module load gcc/4.9.3
you should modify the module load gcc/4.9.3
line and add any number of module load <app>
lines as needed (see: Environmental modules).
Submission script: recipe to run the command¶
Finally, the part of submit_job.sh that actually expresses the command(s) to run is:
## substitute the command to run your code
## in the two lines below:
echo '/usr/bin/time -v hostname'
/usr/bin/time -v hostname
in this example the command to be run is the unix command hostname
which simply return the name of the host on which the job is running. The command is executed from within /usr/bin/time -b
which will output in the file joblog.$JOB_ID
useful information about the resource consumption.
Note
The environment variable $JOB_ID
is set up by the Univa Grid Engine scheduler to uniquely identify each of your jobs. Should you need to contact support about a job please provide its $JOB_ID
.
Running array jobs¶
If you need to perform a series of operations each independent from the other, you can consider breaking these operations into independent tasks each running as its own job. In this circumstance you can use the Univa Grid Engine Array Job function. An array job is an array of identical tasks being differentiated only by an index number and treated by the scheduler as a series of jobs.
To access this function of the Univa Grid Engine scheduler you will need to add to the submission script preamble the line:
#$ -t lower-upper:interval
where the arguments: lower
, upper
and interval
of the -t
option represent the boundaries of the index associated with each task in the series of jobs. Their values are available within each jobs in the array through the environment variables: $SGE_TASK_FIRST
, $SGE_TASK_LAST
and $SGE_TASK_STEPSIZE
.
The environment variable $SGE_TASK_ID
is the index variable for each task in the array job it can be used as the index in a loop, which instead of being executed serially is executed in parallel by the independent tasks. To clarify this an array job example is given below.
Array job example¶
As an example of an array job, let’s consider the operation of adding two vectors. In this particular example, vector v1 (which 49 components go sequentially from 1 to 49) and vector v2 (which 49 components go in decreasing order from 99 to 51) are added to form vector v3 (which 49 components are all going to be equal to 100). This is a toy-example with a mere didactic purpose.
To understand how the process worksm we will first perform the operation sequentially with, for example, the following script:
#### add_two_vectors_sequentially.sh START ####
#!/bin/bash
# create new vector data files for v1 and v2
for i in `seq 1 49`;do
if [ $i == 1 ]; then
echo $i > v1.dat
echo $((100-$i)) > v2.dat
else
echo $i >> v1.dat
echo $((100-$i)) >> v2.dat
fi
done
# now add and save in v3.dat:
for i in `seq 1 49`;do
# use the unix command sed -n ${line_number}p to read by line
v1_c=`sed -n ${i}p v1.dat`
v2_c=`sed -n ${i}p v2.dat`
v3_c=$((v1_c+v2_c))
if [ $i == 1 ]; then
echo $v3_c > v3.dat
else
echo $v3_c >> v3.dat
fi
done
#### add_two_vectors_sequentially.sh STOP ####
after creating this script you could submit it for batch execution with:
$ chmod u+x add_two_vectors_sequentially.sh # mark the script as executable
$ qsub -l h_rt=200,h_data=100M -o joblog -j y add_two_vectors_sequentially.sh
This computation, however, could also be broken in a number of tasks of which each performs the addition of a particular component of the vectors v1 and v2. To do so you will need to first create the files for the vectors v1 and v2, you can do so for example with the script:
#### create_vectors.sh START ####
#!/bin/bash
# create new vector data files for v1 and v2
for i in `seq 1 49`;do
if [ $i == 1 ]; then
echo $i > v1.dat
echo $((100-$i)) > v2.dat
else
echo $i >> v1.dat
echo $((100-$i)) >> v2.dat
fi
done
#### create_vectors.sh STOP ####
which you can execute by issuing at the command line:
$ chmod u+x create_vectors.sh
$ ./create_vectors.sh
you will then need to modify the add_two_vectors_sequentially.sh script that performs the addition to look like:
#### add_by_component.sh START ####
#!/bin/bash
if [ -e v1.dat ]; then
# use the unix command sed -n ${line_number}p to read by line
c_v1=`sed -n ${SGE_TASK_ID}p v1.dat`
else
c_v1=0
fi
if [ -e v2.dat ]; then
# use the unix command sed -n ${line_number}p to read by line
c_v2=`sed -n ${SGE_TASK_ID}p v2.dat`
else
c_v2=0
fi
c_v3=$((c_v1+c_v2))
echo $c_v3 > v3_${SGE_TASK_ID}.dat
#### add_by_component.sh START ####
Note
that the index $i
of the add_two_vectors_sequentially.sh script has been replaced by the $SGE_TASK_ID
environmental variable in the add_by_component.sh script and that the for
loop is gone.
To submit the script add_by_component.sh for batch execution you could use the submission script:
#### submit_arrayjob.sh START ####
#!/bin/bash
#$ -cwd
# error = Merged with joblog
#$ -o joblog.$JOB_ID.$TASK_ID
#$ -j y
## Edit the line below as needed:
#$ -l h_rt=200,h_data=50M
## Modify the parallel environment
## and the number of cores as needed:
#$ -pe shared 1
# Email address to notify
#$ -M $USER@mail
# Notify when
#$ -m bea
#$ -t 1-49:1
# echo job info on joblog:
echo "Job $JOB_ID.$SGE_TASK_ID started on: " `hostname -s`
echo "Job $JOB_ID.$SGE_TASK_ID started on: " `date `
echo " "
# load the job environment:
. /u/local/Modules/default/init/modules.sh
## Edit the line below as needed:
#module load gcc/4.9.3
## substitute the command to run your code
## in the two lines below:
echo '/usr/bin/time -v ./add_by_component.sh'
/usr/bin/time -v ./add_by_component.sh
# echo job info on joblog:
echo "Job $JOB_ID.$SGE_TASK_ID ended on: " `hostname -s`
echo "Job $JOB_ID.$SGE_TASK_ID ended on: " `date `
echo " "
#### submit_arrayjob.sh STOP ####
which you can then submit it with:
$ chmod u+x submit_arrayjob.sh
$ qsub submit_arrayjob.sh
In this example the script add_by_component.sh will run 49 times, each time operating on one of the components of vectors v1 and v2 by reading the line corresponding to $SGE_TASK_ID
of the files v1.dat
and v2.dat
.
To stitch back the vector v3 you can use a script like:
#### stitch_v3.sh START ####
#!/bin/bash
for i in `seq 1 49`;do
if [ $i == 1 ]; then
cat v3_$i.dat > v3.dat
else
cat v3_$i.dat >> v3.dat
fi
done
#### stitch_v3.sh STOP ####
which you can then execute with:
$ chmod u+x stitch_v3.sh # mark the script as executable
$ ./stitch_v3.sh
The file v3.dat
will now contain the 49 components of the vector v3.
Note
You can run the scripts: create_vectors.sh and stitch_v3.sh from the command line (without being in an interactive session) because the two scripts do not require much in terms of resources - as this is a toy example. Should you pre and post array job creation tasks require more resources you should submit them as batch jobs or from within an interactive session.
Problem with these instructions? Please let us know.
Parallel MPI jobs¶
For a parallel MPI job you need to have a line that specifies a parallel environment:
#$ -pe dc* n
The maximum number of cores requested,``n``, that you should use depends on your account’s access level.
Multi-threaded/OpenMP jobs¶
For a multi-threaded OpenMP job you need to request that all processors be on the same node by using the shared parallel environment.
#$ -pe shared n
where the maximum n
, the number of slots requested, can be no larger than the number of CPU/cores of a compute node.
How to reserve one (or more) entire node(s)¶
To get one or more entire nodes for parallel jobs, use -pe node* n -l exclusive
, where n
is the number of nodes you are requesting.
Example of requesting 2 whole nodes with qsub
:
$ qsub -pe node 2 -l exclusive mysubmissionscript.sh
Example of requesting 3 whole nodes in the preamble of a submission script:
#$ -l exclusive
#$ -pe node 3
How to run on owned nodes¶
To run a batch job on owned nodes:
$ qsub -l highp[,other-options] mysubmissionscript.sh
Example of requesting to run on owend nodes in the preamble of a submission script:
#$ -l highp[,other-options]
Use qsub to submit a binary from the command line¶
For example, suppose that you want to run the binary program $HOME/fortran/hello_world
, you may submit the job from the Hoffman2 command line with:
$ qsub -l h_rt=200,h_data=50M -o $SCRATCH/hello_world.out -j y -M $USER@mail -m bea -b y $HOME/fortran/hello_world
Principal options to the qsub command¶
|
requests the type of resources to be used by the command |
|
sets the path to where the standard output stream of the job will be written |
|
specifies that the standard error stream of the job is merged into the standard output stream |
|
defines the email address to notify (please leave this field unchanged) |
|
defines when to notify the recipient with an email (in this case it will notify at the beginning, |
|
gives the user the possibility to indicate explicitly that the command to be executed (the binary |
|
the input command to |
To see a complete list of options that you can pass to the command qsub
please issue:
$ man qsub
and also refer to the Requesting resources (other than cores) and Requesting multiple cores sections.
hello_world example¶
As an example of batch submission of a binary, the procedure to generate the binary hello_world
and submit it to the queues for batch execution is described in what follows.
Note
The steps below can be performed either in an interactive session or on a login node as the editing and compilation of this particular example do not represet a demanding computational task.
create the directory
$HOME/fortran
if needed and cd to it:$ bash $ module unload gcc # fall back to default gcc in case a different one was loaded $ if [ ! -d $HOME/fortran ]; then mkdir $HOME/fortran; fi; cd fortran
using any of the editors availane on the cluster paste in the file:
hello_world.f
in the current directory (i.e.,$HOME/fortran
) the following lines:c hello_world START program hello_world print *, 'Hello World!' end program hello_world c hello_world STOP
compile the program:
$ gfortran -o hello_world hello_world.f
check that the executable binary,
hello_world
, runs:$ ./hello_world
should give you:
Hello World!
submit the the executable binary,
hello_world
, for batch execution:$ qsub -l h_rt=200,h_data=50M -o $SCRATCH/hello_world.out -j y -M $USER@mail -m bea -b y $HOME/fortran/hello_world
once you get your email that the job has completed you can check the output with:
$ cat $SCRACH/hello_world.out
which should look like:
Hello World!
Queue scripts¶
Each IDRE-provided queue script is named for a type of job or application. The queue script builds a UGE command file for that particular type of job or application. A queue script can be run either as a single command to which you provide appropriate options, or as an interactive application, which presents you with a menu of choices and prompts you for the values of options.
For example, if you simply enter a queue script command such as:
job.q
without any command-line arguments, the queue script will enter its interactive mode and present you with a menu of tasks you can perform. One of these tasks is to build the command file, another is to submit a command file that has already been built, and another is to show the status of jobs you have already submitted. See queue scripts
for details, or select Info
from any queue script menu, or enter man queue
at a shell prompt.
You can also enter myjobs
at the shell prompt to show the status of jobs you have submitted and which have not already completed. You can also enter groupjobs
at the shell prompt to show the status of pending jobs everyone in your group has submitted. Enter groupjobs -help
for options.
IDRE-provided queue scripts can be used to run the following types of jobs:
Serial Jobs
Serial Array Jobs
Multi-threaded Jobs
MPI Parallel Jobs
Application Jobs
Serial jobs¶
A serial job runs on a single thread on a single node. It does not take advantage of multi-processor nodes or the multiple compute nodes available with a cluster.
To build or submit an UGE
command file for a serial job, you can either enter:
job.q [queue-script-options]
or, you can provide the name of your executable on the command line:
job.q [queue-script-options] name_of_executable [executable-arguments]
When you enter job.q
without the name of your executable, it will interactively ask you to enter any needed memory, wall-clock time limit, and other options, and ask you if you want to submit the job. You can quit out of the queue script menu and edit the UGE
command file, which the script built, if you want to change or add other Univa Grid Engine options before you submit your job.
If you did not submit the command file at the end of the menu dialog and decided to edit the file before submitting it, you can submit your command file using either a queue script Submit
menu item, or the qsub
command:
qsub executable.cmd
When you enter job.q
with the name of your executable, it will by default build the command file using defaults for any queue script options that you did not specify, submit it to the job scheduler, and delete the command file that it built.
Serial array jobs¶
Array jobs are serial jobs or multi-threaded jobs that use the same executable but different input variables or input files, as in parametric studies. Users typically run thousands of jobs with one submission.
The UGE
command file for a serial array job will, at the minimum, contain the UGE
keyword statement for a lower index value and an upper index value. By default, the index interval is one. UGE
keeps track of the jobs using the environment variable SGE_TASK_ID
, which varies from the lower index value to the upper index value for each job. Your program can use SGE_TASK_ID
to select the input files to read or the options to be used for that particular run.
If your program is multi-threaded, you must edit the UGE
command file built by the jobarray.q
script and add an UGE
keyword statement that specifies the shared parallel environment and the number of processors your job requires. In most cases you should request no more than 8 processors because the maximum number of processors on most nodes is 8. See the For a multi-threaded OpenMP job section for more information.
To build or submit an UGE
command file for a serial array job, enter:
jobarray.q
For details, see the section Running an Array of Jobs Using UGE.
Multi-threaded jobs¶
Multi-threaded jobs are jobs which will run on more than one thread on the same node. Programs using the OpenMP-based threaded library are a typical example of those that can take advantage of multi-core nodes.
If you know your program is multi-threaded, you need to request that UGE
allocate multiple processors. Otherwise your job will contend for resources with other jobs that are running on the same node, and all jobs on that node may be adversely affected. The queue script will prompt you to enter the number of tasks for your job. The queue script default is 4 tasks. You should request at least as many tasks as your program has threads, but usually no more than 8 tasks because the maximum number of processors on most nodes is 8. See the scalability benchmarks in the GPU cards available on the Hoffman2 Cluster table for information on how to determine the optimal number of tasks.
To build or submit an UGE
command file for a multi-threaded job, enter:
openmp.q
For details, see OpenMP programs and Multi threaded programs.
MPI parallel jobs¶
MPI parallel jobs are those executable programs that are linked with one of the message passing libraries like OpenMPI. These applications explicitly send messages from one node to another using either a Gigabit Ethernet (GE) interface or Infiniband (IB) interface. IDRE recommends that everyone use the Infiniband interface because latency for message passing is short with the IB interface compared to the GE interface.
When MPI jobs are submitted to the cluster, one needs to tell the UGE scheduler how many processors are needed to run the jobs. The queue script will prompt you to enter the number of tasks for your job. The queue script default for generic jobs is 4 parallel tasks. Please see the scalability benchmarks at GPU cards available on the Hoffman2 Cluster table for information on how to determine the optimal number of tasks.
To build or submit an UGE
command file for a parallel job, enter:
mpi.q
For details, see the How to Run MPI section.
Application jobs¶
An application job is one which runs software provided by a commercial vendor or is open source. It is usually installed in system directories (e.g., MATLAB).
To build or submit an UGE
command file for an application job, enter:
application.q
where application is replaced with the name of the application. For example, use matlab.q
to run MATLAB batch jobs. For details, see Software and its subsequent links for each package or program on to how to run them.
Batch job output files¶
When a job has completed, UGE
messages will be available in the stdout
and stderr
files that were were defined in your UGE
command file with the -o
and -e
or -j
keywords. Program output will be available in any files that your program has written.
If your UGE
command file was built using a queue script, stdout
and stderr
from UGE
will be found in one of:
jobname.joblog
jobname.joblog.$JOB_ID
jobname.joblog.$JOB_ID.$SGE_TASK_ID (for array jobs)
Output from your program will be found in one of:
jobname.out
jobname.out.$JOB_ID
jobname.output.$JOB_ID
jobname.output.$JOB_ID.$SGE_TASK_ID (for array jobs)
Problems with the instructions on this section? Please send comments here.
GPU access¶
How to access GPU nodes¶
There are multiple GPU types available in the cluster. Each type of GPU has a different compute capability, memory size and clock speed, among other things. Please refer to table GPU cards available on the Hoffman2 Cluster to see what GPUs are currently available.
GPU type |
Compute capability |
# of CUDA cores |
GPU memory |
scheduler options |
---|---|---|---|---|
H100 |
9.0 |
16896 |
94 GB |
-l gpu,H100,cuda=1 |
A6000 |
8.6 |
10752 |
48 GB |
-l gpu,A6000,cuda=1 |
A100 |
8.0 |
6912 |
80 GB or 40 GB |
-l gpu,A100,cuda=1 |
RTX2080Ti |
7.5 |
4352 |
10 GB |
-l gpu,RTX2080Ti,cuda=1 |
V100 |
7.0 |
5120 |
32 GB |
-l gpu,V100,cuda=1 |
GTX1080Ti |
6.11 |
3584 |
12 GB |
-l gpu,GTX1080Ti,cuda=1 |
Tesla P4 |
6.1 |
2560 |
8 GB |
-l gpu,P4,cuda=1 |
Tesla K40 |
3.5 |
2880 |
12 GB |
-l gpu,K40,cuda=1 |
If GPU memory is important for your job and you will need to run on one (or more) A100 cards with 80GB of memory, please use:
$ qrsh/qsub -l A100,gpu,gpu_mem=80G,cuda=1
To see a full list of GPU nodes on the Hoffman2 Cluster which includes the number of GPU cards per host, at the cluster prompt issue:
$ GPU_NODES_AT_A_GLANCE
To see the queue status of GPU nodes on the Hoffman2 Cluster, at the cluster prompt issue:
$ CURRENT_GPU_JOBS
To request multiple GPUs on nodes with A100, A6000, RTX2080Ti, GTX1080Ti, P4 or K40 cards augment the cuda
value in the scheduler resource request (see scheduler option in GPU cards available on the Hoffman2 Cluster to the desired number (up to 4 on nodes with A100 cards, up to 3 on nodes with P4 cards and up to 2 on nodes with RTX2080Ti, GTX1080Ti, A6000 and K40 cards). The scheduler options reported in table GPU cards available on the Hoffman2 Cluster can be combined with other scheduler options (for a list of which see: Principal requestable resources and Principal parallel environments table), for example:
$ qrsh -l gpu,P4,cuda=1,h_rt=3:00:00
To see the specifics for a particular gpu node, at a g-node shell prompt enter:
$ gpu-device-query.sh
CUDA¶
Various CUDA versions are installed on the Hoffman2 Cluster, to see which versions of cuda are available please issue:
$ module av cuda
Note
You will be able to load a cuda module only when on a GPU node, you can however see how a cuda modulefile will change your environment by issuing:
$ module show cuda
After requesting an interactive session on a GPU node, to load a specific version, use:
$ module load cuda/<VERSION>
where VERSION
is one of the versions listed in the output of: module av cuda
.
Monitoring resource utilization¶
While the job is running¶
Open a terminal on the Hoffman2 cluster and issue at the command line:
$ check_usage
If you have any interactive session or barch job running, the command check_usage
will give you a snapshot of the current resource utilization of each job on each node on which jobs are running. The command will also inform you of the resources you have requested for each job.
After the job has completed¶
To check the scheduler accounting logs you will need to know your $JOB_ID
. For example, for $JOB_ID
equal to 4753410
you will use:
$ qacct -j 4753410
and inspect the maxvmem.