Computing

Upon connecting/login into the cluster, unless Connecting via Jupyter Notebook/Lab, users access the cluster via its login nodes. Login nodes are special hosts which sole purpose is to provide a gateway to the compute nodes and their computational resources. For more information, see Role of the login nodes.

Computational resources (such as memory, cores, runtime, CPU-type, GPU-type, etc.) on board of compute nodes are managed by a job scheduler. Any CPU, GPU or memory intensive computing task should be performed within either interactive sessions or batch jobs scheduled on the cluster’s compute nodes.

This page describes:

Computational resources on the Hoffman2 Cluster

Node types

A summary of the types of nodes that you will encounter while using the Hoffman2 Cluster and a description of their intended use is given in the Types of nodes on the Hoffman2 Cluster table:

Types of nodes on the Hoffman2 Cluster

login nodes

Upon connecting to the Hoffman2 Cluster via a terminal and SSH at the fully qualified domain name: hoffman2.idre.ucla.edu, or connecting via remote desktop at either of the: nx.hoffman2.idre.ucla.edu or x2go.hoffman2.idre.ucla.edu, you access a login node. Login nodes are meant for light-weight tasks such as working on your code and submitting jobs to the scheduler. Login nodes are a resource shared by many concurrent users and not intended for heavy-weight tasks. Please see Role of the login nodes

CPU-based compute nodes

Most of the nodes on the Hoffman2 Cluster are CPU-based compute nodes. These are where your jobs execute and can be accessed interactively via the qrsh command or by batch job execution.

GPU-based compute nodes

A portion of compute nodes on the Hoffman2 Cluster is equipped with one or more GPU cards publicly available on the Hoffman2 Cluster of various types. Please refer to: Role of GPU nodes to see what workload is best suited to run on these nodes. Please refer to GPU access to learn how to request an interactive session or a batch job to run on a GPU node.

The Hoffman2 Cluster has a number of compute nodes available to the entire UCLA community. Additionally, research groups can purchase dedicated compute nodes.

Group-owned nodes

Group-owned nodes, allow users to run jobs (interactive or batch) on their computational resources for an extended runtime (up to fourteen days). Moreover, the portion of the jobs submitted to owned-resources, that can be concurrently allocated on them, are guaranteed to start within twenty-four hours from their submission (wait time is typically less). Node ownership also allow users in that group to access any currently unused resource owned by a different group for up to a runtime of 24 hours.

If your group is interested in purchasing nodes, please visit: Purchasing additional resources.

Highp vs shared vs campus jobs

In the Hoffman2 Cluster jargon jobs submitted to owned resources are referred to as highp jobs while jobs submitted to other groups’ currently unused resources as shared jobs. Jobs submitted by users in groups that have not purchased nodes are limited to run on IDRE-owned resources for up to 24 hours; jobs from these users are referred to as campus jobs and the users as campus users.

See also: Job scheduling policy.

Jobs and resources

To prevent resource contention and to distribute computations across the multiple compute nodes on the cluster, any type of CPU/GPU or memory intensive task should be executed on compute nodes by requesting interactive sessions, for interactive-type work, or submitting batch jobs to the scheduler.

Note

Each compute nodes on the Hoffman2 Cluster generally run simultaneous jobs from multiple users, to prevent automatic job termination and to ensure performance of every job it is important to request the right amount of resources when submitting a batch job or requesting an interactive session.

To learn which computational resources can be requested and how, plese see:

Note

If no attributes are specified the scheduler assumes that a batch job or interactive session will use 1 core, 1 GB of memory and that it will run for 2 hours on any available compute node on the cluster and it will dispatch the job accordingly.

Requesting resources (other than cores)

Within the Univa Grid Engine (UGE) (the job scheduler currently running on the cluster), any resource (other then the number of computing cores - to request which please see Requesting multiple cores) can be requested via key-value pairs, known as complexes, passed as the arguments of the -l option to qsub or qrsh. Some of the complexes that you might need to use, are shown in the Principal requestable resources table:

Principal requestable resources

name of key

type

default value

specifies

h_rt

TIME

2:00:00

runtime

h_data

MEMORY

1G

memory per process

h_vmem

MEMORY

1G

memory per job

exclusive

BOOLEAN

TRUE

run on owned resources

highp

BOOLEAN

TRUE

run on owned resources

arch

STRING

NONE

specify processor type

gpu

BOOLEAN

TRUE

run on GPU nodes

P4

BOOLEAN

TRUE

run on GPU node w/ P4 card

RTX2080Ti

BOOLEAN

TRUE

run on GPU node w/ RTX2080Ti cards

V100

BOOLEAN

TRUE

run on GPU node w/ P4 card

A100

BOOLEAN

TRUE

run on GPU node w/ A100 cards

cuda

RSMAP

1

number of GPU cards on same node

A complete list of complexes defined on the Hoffman2 Cluster can be obtained by issuing at the command line of a terminal connected to the Hoffman2 Cluster with the command:

$ qconf -sc

Examples of how to request resources

To request a runtime of, for example, 12 hours, use:

$ qrsh -l h_rt=12:00:00

Note

Several resources can be requested either as a command-separated list of key-values pairs following the -l option, or by space-separated -l key-value pair options. For example to request a run time of 3 hours and 4GB of memory:

$ qrsh -l h_rt=3:00:00,h_data=4G

or:

$ qrsh -l h_rt=3:00:00 -l h_data=4G

Requesting multiple cores

If you are planning to run an application that will use more than one CPU core, you should request cores using the -pe <parallel environment> <n> directive (where: <parallel environment> is the name of the parallel environment and <n> is the number of cores that you are planning to use) to the qrsh or qsub commands.

A list of the principal parallel environment names and their role is given in the parallel environment table.

Principal parallel environments table

name

allocation rule

use

shared

cores are allocated on a single host

shared memory jobs

dc\*

cores are allocated on any host

Distributed memory jobs

node

one core per node

use with -l exclusive for hybrid distributed/shared memory jobs

Examples of how to request multiple cores

To run an applications that uses multiple cores in shared-memory (e.g., threads, openmp, etc.), use, for example, to request 12 cores:

$ qrsh -pe shared 12

A complete list of parallel environments available on the Hoffman2 Cluster can be obtained by issuing the command:

$ qconf -spl

Requesting multiple GPU cards on the same node

  • Examples of how to request multiple GPU cards on the same node

GPU nodes with RTX2080Ti or A100 on the Hoffman2 Cluster have multiple GPU cards

Number of cards per GPU node

GPU card type

Number of cards per node

Scheduler option to request number of cards

A100

4

-l gpu,A100,cuda={1,4}

V100

1

-l gpu,V100,cuda=1

RTX2080Ti

2

-l gpu,RTX2080Ti,cuda={1,2}

P4

1

-l gpu,P4,cuda=1

Requesting interactive sessions

Basic usage

An interactive session allows you to access computing resources (e.g., cores, memory, GPUs, etc.) on the nodes comprising the cluster for a given amount of time. To request an interactive session, from a terminal connected to the Hoffman2 cluster issue the command:

$ qrsh

after issuing the command above, the shell prompt will typically return after a short wait and your prompt will typically change to display the compute node on which your interactive session is running. For example, user joebrun could experience the following change in prompt:

[joebruin@login3 ~]$ qrsh
[joebruin@n2001 ~]$

from the login nodes, login3, to the compute node, n2001.

Note

Unless you have otherwise requested, by default you have access to 1GB or memory, one computing core and 2 hours run-time on any node on the cluster that is available to you.

Customizing the qrsh command

The qrsh command can be customized to allow you to request the needed runtime, amount of memory, number of cores, whether the cores requested will be from one or more compute nodes, the type or CPU, the type of GPU, and many other requestable characteristics. Each of the resources that a user can request is specified by a comma-separated list of key-value pairs, known as complexes, which follow the -l option to the qrsh command, while the number of cores is specified by the -pe option to the qrsh command followed by a space separate list of two items: the name of the parallel environment needed (which will be suitable to shared, distributed or hybrid memory use) and the integer number specifying the number of cores requested.

qrsh command to run serial jobs

Serial jobs use one compute core and therefore there is no need to specify the parallel environment and the number of cores. To get an interactive session with a runtime longer than the default 2 hours and more memory than the default 1GB, you will need to specify a value for the scheduler complex h_rt (runtime) and a value for the complex h_data (memory).

For example, to request an interactive session with a runtime of 3 hours and a total of 4GB of memory, issue at the Hoffman2 command prompt:

$ qrsh -l h_rt=3:00:00,h_data=4G

Warning

The scheduler is configured to automatically terminate jobs that will attempt to use more memory than it was requested or to continue to run past the time limit. Make sure to request enough memory and runtime in order to keep your interactive session active.

qrsh command to run shared memory jobs

If your application spawns multiple threads, or, more generally, uses multiple cores in a shared memory parallelization paradigm, you will need to request the number of cores you are planning to use with the pe shared <n> directive (where <n> is the number of cores requested).

For example, to request 4 CPU core, a runtime of 8 hours, and 2GB of memory per core, issue:

$  qrsh -l h_rt=8:00:00,h_data=2G,h_vmem=8G -pe shared 4

Note

h_data is memory per CPU process, if your jobs spawn threads under one sole CPU process the memory limit on that process is h_data despite the fact that you have reserved multiple cores. To ensure that the scheduler will not automatically terminate a shared memory job that uses threads you will need to request h_vmem equal to the product of h_data times the number of cores requested.

qrsh command to run distributed memory jobs

If program you intend to run in the interactive session, can run across multiple nodes (using message passing libraries), you will need to request cores with -pe dc\* <n>`(where ``<n> is the number of cores requested).

For example, to request 16 CPU core, a runtime of 1 hour, and 2GB of memory per core, issue:

$ qrsh -l h_rt=1:00:00,h_data=2G -pe dc\*  16

Warning

Do not specify h_vmem when choosing the -pe dc* parallel environment!

qrsh command to run hybrid distributed/shared memory jobs

If your program can execute in shared memory within a node and in distributed memory across nodes (for example, it can do openmp in combination with MPI), you should request an interactive session requesting multiple nodes and all the cores within it. To do so you can use the combination of the node parallel environment and the exclusive complex.

For example, to request 3 entire nodes for a runtime of 1 hour, with each node having at least 36 GB of memory, issue:

$ qrsh -l h_rt=1:00:00,h_data=36G,exclusive -pe node  3 -now n

Interactive sessions awarded with qrsh attempts to start jobs immediately, to prevent an interactive session from exiting if resources are not currently available you can add: -now n.

Warning

Requesting one or more nodes in exclusive mode may cause a relatively long wait time for the interactive session to be awarded. If you need these type of resources you should consider running your job in batch.

qrsh command to run on exclusively reserved nodes

When invoking an interactive session with qrsh, the proper memory size needs to be specified via h_data. If you are unsure of what amount of memory is appropriate for your interactive session, you could add -l exclusive to your qrsh command. In this case, the h_data value is used by the scheduler to select a compute node having a total amount of memory equal or greater than what specified with h_data. In this case, the memory limit for the job is the compute node’s physical memory size.

For example, the command:

$ qrsh -l h_rt=8:00:00,h_data=32G,exclusive

will start an interactive session on a compute node equipped with at least 32G of physical memory. The node will be exclusively reserved for you and you can therefore use all of its cores and memory (despite the h_data value).

Note

You can only request as much memory as is available on nodes on the cluster. Interactive session requested via qrsh without specifying an h_data value are automatically assigned an h_data=1G, which may or may not be too small for your application.

qrsh command to run on your group’s nodes

Warning

The following section does not apply to you if your research group has not purchased Hoffman2 compute nodes.

To run on your group nodes, add the -l highp switch to your qrsh command. For example, to request an interactive session with a duration of two days (48 hours), 4GB of memory (and one core), issue the command:

$ qrsh -l highp,h_rt=48:00:00,h_data=4G

You could also request multiple cores using the -pe dc\* <n>, -pe shared <n> or -l exclusive -pe node <n> as described in Requesting multiple cores. When combining with -l highp, the amount of cores, or the memory requested, needs to be compatible with what is available on your group compute nodes. Contact user support should you have any questions.

Although you are allowed to specify h_rt as high as 336 hours (14 days) for a qrsh session, it is not recommended. For example, if the network connection is interrupted (e.g. your laptop or desktop computer goes into sleep mode), the qrsh session may be lost, possibly terminating all running programs within it.

qrsh examples

Note

Multiple resources can be requested with the -l option to qrsh. Each key=value complex needs to be given as comma-separated list without any white space in between (e.g., -l key1=value1,key2=value2). Alternative separate -l options can be specified (e.g., -l key1=value1 -l key2=value2).

  • To request a single processor for 24 hours from the interactive queues, issue the command:

$ qrsh -l h_rt=24:00:00,h_data=1G
  • To request 8 processors for 4 hours (total 8*1G=8GB memory) on a single node from the interactive queues, issue the command:

$ qrsh -l h_data=1G,h_rt=4:00:00,h_data=1G,h_vmem=8G -pe shared 8
  • To request 4 processors for 3 hours (total 4*1G=4GB memory) on a single node, issue the command:

$ qrsh -l h_data=1G,h_rt=3:00:00,h_data=1G,h_vmem=4G -pe shared 4
  • To request 12 processors, 1GB of memory per processor, for 2 hours, issue the command:

Warning

Do not specify h_vmem with -pe dc\*!

$ qrsh -l h_data=1G,h_rt=2:00:00 -pe dc\* 12

Note

The 12 CPUs are distributed across multiple compute nodes. The backslash \ in dc\* is significant when you issue this command in an interactive csh/tcsh unix shell.

qrsh startup time

A qrsh session is scheduled along with all other jobs managed by the scheduler software. The shorter time (the -l h_rt option), and the fewer number of processors (the -pe option), the better chance you have of getting a session. Request just what you need for the best use of computing resources. Be considerate to other users by exiting your qrsh session when you are done to release the computing resources to other users.

Resource limitation

Hoffman2 Cluster’s compute nodes have different memory sizes. When you request more then one core (using: -pe shared <n>), the total memory requested on the node will be the product of the number of cores time the memory per core (h_data). In general, the larger the total memory requested, the longer the wait. Please refer to the output of the command:

$ qhost

to see what total memory is available on the various nodes on the cluster, keeping in mind that not all hosts may be accessible to you.

When you request multiple cores, or a large amount of total memory, you may or may not get the interactive session immediately, depending on how busy the cluster is and the permission level of your account. To see to which class of nodes (memory, number of cores, etc.) you have access to, you can enter the following at the Hoffman2 command prompt:

$  myresources

Interpreting error messages

Occasionally, you may encounter one of the following messages: error: no suitable queues or qrsh: No match.

If you receive the no suitable queues message and you are requesting the interactive queues (-l i), be sure you have not requested more than 24 hours. This message may mean there is something incompatible with the various parameters you have specified and your qrsh session can never start. For example, you have requested -l h_rt=25:00:00 but your userid is not authorized to run sessions or jobs for more than 24 hours.

If your session could not be scheduled, first try your qrsh command again in case it was a momentary problem with the scheduler.

If your session still cannot be scheduled, try lowering either the value of h_rt, the number of processors requested, or both, if possible.

Contact user support should you still have problems.

Running MPI with qrsh

The following instructions apply to the IntelMPI and the OpenMPI libraries. They may not apply to other MPI implementations.

After requesting an interactive session to run distributed memory jobs, you will need to select the version of IntelMPI/OpenMPI and to set the environment for the scheduled job. In the following example the executable MPI program is named foo.

In the qrsh session at the shell prompt, enter one of the following commands:

If you are in bash or sh-type shell and you need a specific version of the IntelMPI (say: intel/19.0.5):

$ module load intel/19.0.5           # load the intel/19.0.5 module
$ . /u/local/bin/set_qrsh_env.sh     # set the environment for the scheduled job
$ `which mpirun` -n $NSLOTS ./foo    # run the foo MPI executable

You could replace $NSLOTS with an integer, which is less than the number of processors you requested on your qrsh command if needed.

You do not have to create a hostfile and pass it to mpiexec.hydra with its -machinefile or -hostfile option because mpiexec.hydra automatically retrieves that information from UGE.

Additional tools

Additional scripts are available that may help you run other parallel distributed memory software. You can enter these commands at the compute node’s shell prompt:

$ get_pe_hostfile

Returns the contents of the UGE pe_hostfile file for the current qrsh session. If you have used the -pe directive to request multiple processors on multiple nodes, you will probably need to tell your program the names of those nodes and how many processors have been allocated on each node. This information is unique to your current qrsh session.

To create an MPI-style hostfile named hfile in the current directory:

$ get_pe_hostfile | awk '{print $1" slots="$2}' > hfile

The UGE pe_hostfile is located:

$SGE_ROOT/$SGE_CELL/spool/node/active_jobs/sge_jobid.1/pe_hostfile

where node and sge_jobid are the hostname and UGE $JOB_ID, respectively, of the current qrsh session.

To return the value of JOB_ID for the current qrsh session, issue the command:

$ get_sge_jobid

To return the contents of the scheduler environment file for the current qrsh session, issue:

$ get_sge_env

which is used by the set_qrsh_env scripts.

UGE-specific environment variables are defined in the file:

$SGE_ROOT/$SGE_CELL/spool/node/active_jobs/sge_jobid.1/environment

or,

$SGE_ROOT/$SGE_CELL/spool/node/active_jobs/sge_jobid.sge_taskid/environment

where node and sge_jobid are the hostname and UGE $JOB_ID, respectively, of the current qrsh session. sge_taskid is the task number of a array job $SGE_TASK_ID.

Problems with the instructions on this section? Please send comments here.

Submitting batch jobs

In order to run a non-interactive batch job under the Univa Grid Engine (UGE), you need to specify the resources and the number of cores that your job will need and the actual command (or a recipe consisting of multiple commands) to execute.

In this section the following topics are discussed:

Use qsub with a submission script

A submission script allows you to set the environment for your job (for example by loading a needed module) and/or to codify a sequence of commands (for example for actions that need to occur in sequence).

Once you have generated a submission script you can submit your job with:

$ qsub <submission-script>

where: <submission-script> is the name of your submission script.

You can also define (or redefine) resource at the command line. For example to requests the complexes key1/value1 and key/value2 and to change the parallel environment or the number of cores requested (say to shared and 8), you could use:

$ qsub -l key1=value1,key2=value2 -pe shared 8 <submission-script>

Note

The resources, parallel environment and number of cores requested as options to qsub on the command line take the precedence on the resources, parallel environment and number of cores specified within the submission script.

Use qdel to terminate a job

After a job is submitted, you can use qdel to terminate it:

$ qdel <JOB_ID>

where <JOB_ID> is the job ID of the job being terminated. The job ID can be displayed by the myjobs command.

How to build a submission script

In this section an example of a basic submission script (written in shell scripting language bash) is described. You can copy and paste the script in a file on the cluster. The script should be modified (as instructed in its comment lines) to suit your requirements in terms of resources, number of cores, job environment and the actual commands that you will need to run.

Basic submission script
#### submit_job.sh START ####
#!/bin/bash
#$ -cwd
# error = Merged with joblog
#$ -o joblog.$JOB_ID
#$ -j y
## Edit the line below as needed:
#$ -l h_rt=1:00:00,h_data=1G
## Modify the parallel environment
## and the number of cores as needed:
#$ -pe shared 1
# Email address to notify
#$ -M $USER@mail
# Notify when
#$ -m bea

# echo job info on joblog:
echo "Job $JOB_ID started on:   " `hostname -s`
echo "Job $JOB_ID started on:   " `date `
echo " "

# load the job environment:
. /u/local/Modules/default/init/modules.sh
## Edit the line below as needed:
module load gcc/4.9.3

## substitute the command to run your code
## in the two lines below:
echo '/usr/bin/time -v hostname'
/usr/bin/time -v hostname

# echo job info on joblog:
echo "Job $JOB_ID ended on:   " `hostname -s`
echo "Job $JOB_ID ended on:   " `date `
echo " "
#### submit_job.sh STOP ####

To submit the job issue at the command line:

$ chmod u+x submit_job.sh
$ qsub submit_job.sh

To understand the Basic submission script its parts are analyzed in the following sections.

Submission script preamble

#### submit_job.sh START ####
#!/bin/bash
#$ -cwd
# error = Merged with joblog
#$ -o joblog.$JOB_ID
#$ -j y
## Edit the line below as needed:
#$ -l h_rt=1:00:00,h_data=1G
## Modify the parallel environment
## and the number of cores as needed:
#$ -pe shared 1
# Email address to notify
#$ -M $USER@mail
# Notify when
#$ -m bea

The submission script preamble contains the resources information (lines starting with: #$ -l and #$ -pe) that the scheduler needs to properly dispatch the job. You will need to edit these lines to match your needs (see: Requesting resources (other than cores) and Requesting multiple cores to learn how to do so). The meaning of other scheduler-specific lines is explained in the Principal options to the qsub command.

Lines starting with #$ are interpreted by the scheduler, while lines starting with # are comments inserted for clarity and lines starting with ## are meant to inform you which lines you should modify.

Submission script logging abilities

Lines starting with echo, once the job is running, will output to the file joblog.$JOB_ID useful information about the node on which the job is running, the start and end time, and the command that is being executed.

Submission script: setting the job environment

The part of submit_job.sh that loads the environment for the job is:

# load the job environment:
. /u/local/Modules/default/init/modules.sh
## Edit the line below as needed:
module load gcc/4.9.3

you should modify the module load gcc/4.9.3 line and add any number of module load <app> lines as needed (see: Environmental modules).

Submission script: recipe to run the command

Finally, the part of submit_job.sh that actually expresses the command(s) to run is:

## substitute the command to run your code
## in the two lines below:
echo '/usr/bin/time -v hostname'
/usr/bin/time -v hostname

in this example the command to be run is the unix command hostname which simply return the name of the host on which the job is running. The command is executed from within /usr/bin/time -b which will output in the file joblog.$JOB_ID useful information about the resource consumption.

Note

The environment variable $JOB_ID is set up by the Univa Grid Engine scheduler to uniquely identify each of your jobs. Should you need to contact support about a job please provide its $JOB_ID.

Running array jobs

If you need to perform a series of operations each independent from the other, you can consider breaking these operations into independent tasks each running as its own job. In this circumstance you can use the Univa Grid Engine Array Job function. An array job is an array of identical tasks being differentiated only by an index number and treated by the scheduler as a series of jobs.

To access this function of the Univa Grid Engine scheduler you will need to add to the submission script preamble the line:

#$ -t lower-upper:interval

where the arguments: lower, upper and interval of the -t option represent the boundaries of the index associated with each task in the series of jobs. Their values are available within each jobs in the array through the environment variables: $SGE_TASK_FIRST, $SGE_TASK_LAST and $SGE_TASK_STEPSIZE.

The environment variable $SGE_TASK_ID is the index variable for each task in the array job it can be used as the index in a loop, which instead of being executed serially is executed in parallel by the independent tasks. To clarify this an array job example is given below.

Array job example

As an example of an array job, let’s consider the operation of adding two vectors. In this particular example, vector v1 (which 49 components go sequentially from 1 to 49) and vector v2 (which 49 components go in decreasing order from 99 to 51) are added to form vector v3 (which 49 components are all going to be equal to 100). This is a toy-example with a mere didactic purpose.

To understand how the process worksm we will first perform the operation sequentially with, for example, the following script:

add_two_vectors_sequentially.sh
#### add_two_vectors_sequentially.sh START ####
#!/bin/bash

# create new vector data files for v1 and v2
for i in `seq 1 49`;do
 if [ $i == 1 ]; then
     echo $i > v1.dat
     echo $((100-$i)) > v2.dat
 else
     echo $i >> v1.dat
     echo $((100-$i)) >> v2.dat
 fi
done

# now add and save in v3.dat:
for i in `seq 1 49`;do
 # use the unix command sed -n ${line_number}p to read by line
 v1_c=`sed -n ${i}p v1.dat`
 v2_c=`sed -n ${i}p v2.dat`
 v3_c=$((v1_c+v2_c))
 if [ $i == 1 ]; then
     echo $v3_c > v3.dat
 else
     echo $v3_c >> v3.dat
 fi
done
#### add_two_vectors_sequentially.sh STOP ####

after creating this script you could submit it for batch execution with:

$ chmod u+x add_two_vectors_sequentially.sh # mark the script as executable
$ qsub -l h_rt=200,h_data=100M -o joblog -j y add_two_vectors_sequentially.sh

This computation, however, could also be broken in a number of tasks of which each performs the addition of a particular component of the vectors v1 and v2. To do so you will need to first create the files for the vectors v1 and v2, you can do so for example with the script:

create_vectors.sh
#### create_vectors.sh START ####
#!/bin/bash

# create new vector data files for v1 and v2
for i in `seq 1 49`;do
 if [ $i == 1 ]; then
     echo $i > v1.dat
     echo $((100-$i)) > v2.dat
 else
     echo $i >> v1.dat
     echo $((100-$i)) >> v2.dat
 fi
done
#### create_vectors.sh STOP ####

which you can execute by issuing at the command line:

$ chmod u+x create_vectors.sh
$ ./create_vectors.sh

you will then need to modify the add_two_vectors_sequentially.sh script that performs the addition to look like:

add_by_component.sh
#### add_by_component.sh START ####
#!/bin/bash

if [ -e  v1.dat ]; then
   # use the unix command sed -n ${line_number}p to read by line
   c_v1=`sed -n ${SGE_TASK_ID}p v1.dat`
else
   c_v1=0
fi

if [ -e v2.dat ]; then
   # use the unix command sed -n ${line_number}p to read by line
   c_v2=`sed -n ${SGE_TASK_ID}p v2.dat`
else
   c_v2=0
fi

c_v3=$((c_v1+c_v2))

echo $c_v3 > v3_${SGE_TASK_ID}.dat
#### add_by_component.sh START ####

Note

that the index $i of the add_two_vectors_sequentially.sh script has been replaced by the $SGE_TASK_ID environmental variable in the add_by_component.sh script and that the for loop is gone.

To submit the script add_by_component.sh for batch execution you could use the submission script:

Array Job submission script
#### submit_arrayjob.sh START ####
#!/bin/bash
#$ -cwd
# error = Merged with joblog
#$ -o joblog.$JOB_ID.$TASK_ID
#$ -j y
## Edit the line below as needed:
#$ -l h_rt=200,h_data=50M
## Modify the parallel environment
## and the number of cores as needed:
#$ -pe shared 1
# Email address to notify
#$ -M $USER@mail
# Notify when
#$ -m bea
#$ -t 1-49:1

# echo job info on joblog:
echo "Job $JOB_ID.$SGE_TASK_ID started on:   " `hostname -s`
echo "Job $JOB_ID.$SGE_TASK_ID started on:   " `date `
echo " "

# load the job environment:
. /u/local/Modules/default/init/modules.sh
## Edit the line below as needed:
#module load gcc/4.9.3

## substitute the command to run your code
## in the two lines below:
echo '/usr/bin/time -v ./add_by_component.sh'
/usr/bin/time -v ./add_by_component.sh

# echo job info on joblog:
echo "Job $JOB_ID.$SGE_TASK_ID ended on:   " `hostname -s`
echo "Job $JOB_ID.$SGE_TASK_ID ended on:   " `date `
echo " "
#### submit_arrayjob.sh STOP ####

which you can then submit it with:

$ chmod u+x submit_arrayjob.sh
$ qsub submit_arrayjob.sh

In this example the script add_by_component.sh will run 49 times, each time operating on one of the components of vectors v1 and v2 by reading the line corresponding to $SGE_TASK_ID of the files v1.dat and v2.dat.

To stitch back the vector v3 you can use a script like:

stitch_v3.sh
#### stitch_v3.sh START ####
#!/bin/bash

for i in `seq 1 49`;do
 if [ $i == 1 ]; then
   cat v3_$i.dat > v3.dat
 else
   cat v3_$i.dat >> v3.dat
  fi
done
#### stitch_v3.sh STOP ####

which you can then execute with:

$ chmod u+x stitch_v3.sh   # mark the script as executable
$ ./stitch_v3.sh

The file v3.dat will now contain the 49 components of the vector v3.

Note

You can run the scripts: create_vectors.sh and stitch_v3.sh from the command line (without being in an interactive session) because the two scripts do not require much in terms of resources - as this is a toy example. Should you pre and post array job creation tasks require more resources you should submit them as batch jobs or from within an interactive session.

Problem with these instructions? Please let us know.

Parallel MPI jobs

For a parallel MPI job you need to have a line that specifies a parallel environment:

#$ -pe dc* n

The maximum number of cores requested,``n``, that you should use depends on your account’s access level.

Multi-threaded/OpenMP jobs

For a multi-threaded OpenMP job you need to request that all processors be on the same node by using the shared parallel environment.

#$ -pe shared n

where the maximum n, the number of slots requested, can be no larger than the number of CPU/cores of a compute node.

How to reserve one (or more) entire node(s)

To get one or more entire nodes for parallel jobs, use -pe node* n -l exclusive, where n is the number of nodes you are requesting.

Example of requesting 2 whole nodes with qsub:

$ qsub -pe node 2 -l exclusive mysubmissionscript.sh

Example of requesting 3 whole nodes in the preamble of a submission script:

#$ -l exclusive
#$ -pe node 3

How to run on owned nodes

To run a batch job on owned nodes:

$  qsub -l highp[,other-options] mysubmissionscript.sh

Example of requesting to run on owend nodes in the preamble of a submission script:

#$ -l highp[,other-options]

Use qsub to submit a binary from the command line

For example, suppose that you want to run the binary program $HOME/fortran/hello_world, you may submit the job from the Hoffman2 command line with:

$ qsub -l h_rt=200,h_data=50M -o $SCRATCH/hello_world.out -j y -M $USER@mail -m bea -b y $HOME/fortran/hello_world

Principal options to the qsub command

-l h_rt=200,h_data=50M

requests the type of resources to be used by the command hello_world

-o $SCRATCH/hello_world.out

sets the path to where the standard output stream of the job will be written

-j y

specifies that the standard error stream of the job is merged into the standard output stream

-M $USER@mail

defines the email address to notify (please leave this field unchanged)

-m bea

defines when to notify the recipient with an email (in this case it will notify at the beginning, b, of the job, if the job is aborted or rescheduled, e, and at the end, e, of the job)

-b y

gives the user the possibility to indicate explicitly that the command to be executed (the binary hello_world in this case) is to be treated as binary (by default qsub sets it has -b n and therefore expects a script as the input command to run)

$HOME/fortran/hello

the input command to qsub

To see a complete list of options that you can pass to the command qsub please issue:

$ man qsub

and also refer to the Requesting resources (other than cores) and Requesting multiple cores sections.

hello_world example

As an example of batch submission of a binary, the procedure to generate the binary hello_world and submit it to the queues for batch execution is described in what follows.

Note

The steps below can be performed either in an interactive session or on a login node as the editing and compilation of this particular example do not represet a demanding computational task.

  1. create the directory $HOME/fortran if needed and cd to it:

    $ bash
    $ module unload gcc # fall back to default gcc in case a different one was loaded
    $ if [ ! -d $HOME/fortran ]; then mkdir $HOME/fortran; fi; cd fortran
    
  2. using any of the editors availane on the cluster paste in the file: hello_world.f in the current directory (i.e., $HOME/fortran) the following lines:

    c     hello_world START
          program hello_world
          print *, 'Hello World!'
          end program hello_world
    c     hello_world STOP
    
  3. compile the program:

    $ gfortran -o hello_world hello_world.f
    
  4. check that the executable binary, hello_world, runs:

    $ ./hello_world
    

    should give you:

    Hello World!
    
  5. submit the the executable binary, hello_world, for batch execution:

    $ qsub -l h_rt=200,h_data=50M -o $SCRATCH/hello_world.out -j y -M $USER@mail -m bea -b y $HOME/fortran/hello_world
    
  6. once you get your email that the job has completed you can check the output with:

    $ cat $SCRACH/hello_world.out
    

    which should look like:

    Hello World!
    

Queue scripts

Each IDRE-provided queue script is named for a type of job or application. The queue script builds a UGE command file for that particular type of job or application. A queue script can be run either as a single command to which you provide appropriate options, or as an interactive application, which presents you with a menu of choices and prompts you for the values of options.

For example, if you simply enter a queue script command such as:

job.q

without any command-line arguments, the queue script will enter its interactive mode and present you with a menu of tasks you can perform. One of these tasks is to build the command file, another is to submit a command file that has already been built, and another is to show the status of jobs you have already submitted. See queue scripts for details, or select Info from any queue script menu, or enter man queue at a shell prompt.

You can also enter myjobs at the shell prompt to show the status of jobs you have submitted and which have not already completed. You can also enter groupjobs at the shell prompt to show the status of pending jobs everyone in your group has submitted. Enter groupjobs -help for options.

IDRE-provided queue scripts can be used to run the following types of jobs:

  • Serial Jobs

  • Serial Array Jobs

  • Multi-threaded Jobs

  • MPI Parallel Jobs

  • Application Jobs

Serial jobs

A serial job runs on a single thread on a single node. It does not take advantage of multi-processor nodes or the multiple compute nodes available with a cluster.

To build or submit an UGE command file for a serial job, you can either enter:

job.q [queue-script-options]

or, you can provide the name of your executable on the command line:

job.q [queue-script-options] name_of_executable [executable-arguments]

When you enter job.q without the name of your executable, it will interactively ask you to enter any needed memory, wall-clock time limit, and other options, and ask you if you want to submit the job. You can quit out of the queue script menu and edit the UGE command file, which the script built, if you want to change or add other Univa Grid Engine options before you submit your job.

If you did not submit the command file at the end of the menu dialog and decided to edit the file before submitting it, you can submit your command file using either a queue script Submit menu item, or the qsub command:

qsub executable.cmd

When you enter job.q with the name of your executable, it will by default build the command file using defaults for any queue script options that you did not specify, submit it to the job scheduler, and delete the command file that it built.

Serial array jobs

Array jobs are serial jobs or multi-threaded jobs that use the same executable but different input variables or input files, as in parametric studies. Users typically run thousands of jobs with one submission.

The UGE command file for a serial array job will, at the minimum, contain the UGE keyword statement for a lower index value and an upper index value. By default, the index interval is one. UGE keeps track of the jobs using the environment variable SGE_TASK_ID, which varies from the lower index value to the upper index value for each job. Your program can use SGE_TASK_ID to select the input files to read or the options to be used for that particular run.

If your program is multi-threaded, you must edit the UGE command file built by the jobarray.q script and add an UGE keyword statement that specifies the shared parallel environment and the number of processors your job requires. In most cases you should request no more than 8 processors because the maximum number of processors on most nodes is 8. See the For a multi-threaded OpenMP job section for more information.

To build or submit an UGE command file for a serial array job, enter:

jobarray.q

For details, see the section Running an Array of Jobs Using UGE.

Multi-threaded jobs

Multi-threaded jobs are jobs which will run on more than one thread on the same node. Programs using the OpenMP-based threaded library are a typical example of those that can take advantage of multi-core nodes.

If you know your program is multi-threaded, you need to request that UGE allocate multiple processors. Otherwise your job will contend for resources with other jobs that are running on the same node, and all jobs on that node may be adversely affected. The queue script will prompt you to enter the number of tasks for your job. The queue script default is 4 tasks. You should request at least as many tasks as your program has threads, but usually no more than 8 tasks because the maximum number of processors on most nodes is 8. See the scalability benchmarks in the GPU cards publicly available on the Hoffman2 Cluster table for information on how to determine the optimal number of tasks.

To build or submit an UGE command file for a multi-threaded job, enter:

openmp.q

For details, see OpenMP programs and Multi threaded programs.

MPI parallel jobs

MPI parallel jobs are those executable programs that are linked with one of the message passing libraries like OpenMPI. These applications explicitly send messages from one node to another using either a Gigabit Ethernet (GE) interface or Infiniband (IB) interface. IDRE recommends that everyone use the Infiniband interface because latency for message passing is short with the IB interface compared to the GE interface.

When MPI jobs are submitted to the cluster, one needs to tell the UGE scheduler how many processors are needed to run the jobs. The queue script will prompt you to enter the number of tasks for your job. The queue script default for generic jobs is 4 parallel tasks. Please see the scalability benchmarks at GPU cards publicly available on the Hoffman2 Cluster table for information on how to determine the optimal number of tasks.

To build or submit an UGE command file for a parallel job, enter:

mpi.q

For details, see the How to Run MPI section.

Application jobs

An application job is one which runs software provided by a commercial vendor or is open source. It is usually installed in system directories (e.g., MATLAB).

To build or submit an UGE command file for an application job, enter:

application.q

where application is replaced with the name of the application. For example, use matlab.q to run MATLAB batch jobs. For details, see Software and its subsequent links for each package or program on to how to run them.

Batch job output files

When a job has completed, UGE messages will be available in the stdout and stderr files that were were defined in your UGE command file with the -o and -e or -j keywords. Program output will be available in any files that your program has written.

If your UGE command file was built using a queue script, stdout and stderr from UGE will be found in one of:

jobname.joblog
jobname.joblog.$JOB_ID
jobname.joblog.$JOB_ID.$SGE_TASK_ID (for array jobs)

Output from your program will be found in one of:

jobname.out
jobname.out.$JOB_ID
jobname.output.$JOB_ID
jobname.output.$JOB_ID.$SGE_TASK_ID (for array jobs)

Problems with the instructions on this section? Please send comments here.

GPU access

How to access GPU nodes

There are multiple GPU types available in the cluster. Each type of GPU has a different compute capability, memory size and clock speed, among other things. Please refer to table GPU cards publicly available on the Hoffman2 Cluster to see what GPUs are currently available.

GPU cards publicly available on the Hoffman2 Cluster

GPU type

Compute capability

Number of CUDA cores

Global memory size

A100

8.0

6912

80 GB

Tesla V100

7.0

5120

32 GB

GeForce RTX 2080 Ti

7.5

4352

10 GB

Tesla P4

6.1

2560

8 GB

Note

Your group may have access to other type of GPU cards not specified here.

In order to use one or more nodes that have one or more GPU cards, your qrsh/qsub request will need to include both the gpu keyword as well as the keyword referring to a speciofic GPU card. To see a list of such keywords please refer to the table How to request publicly available GPU cards on the Hoffman2 Cluster. You may need to compile your code on the machine that has the required type of GPU.

How to request publicly available GPU cards on the Hoffman2 Cluster

GPU type

scheduler complex

scheduler option

A100

A100

-l gpu,A100,cuda=1

Tesla V100

V100

-l gpu,V100,cuda=1

GeForce RTX 2080 Ti

RTX2080T

-l gpu,RTX2080Ti,cuda=1

Tesla P4

P4

-l gpu,P4,cuda=1

To request multiple GPU on A100 or RTX2080T nodes augment the cuda complex to the desired number (up to 4 on A100 and up to 2 on RTX2080T nodes). The scheduler options reported in table How to request publicly available GPU cards on the Hoffman2 Cluster can be combined with other scheduler options (for a list of which see: Principal requestable resources and Principal parallel environments table), for example:

$ qrsh -l gpu,P4,cuda=1,h_rt=3:00:00

To see the specifics for a particular gpu node, at a g-node shell prompt enter:

$ gpu-device-query.sh

CUDA

Various CUDA versions are installed on the Hoffman2 Cluster, to see which versions of cuda are available please issue:

$ module av cuda

Note

You will be able to load a cuda module only when on a GPU node, you can however see how a cuda modulefile will change your environment by issuing:

$ module show cuda

After requesting an interactive session on a GPU node, to load a specific version, use:

$ module load cuda/<VERSION>

where VERSION is one of the versions listed in the output of: module av cuda.

CUDA Samples

Already compiled samples of NVIDIA GPU Computing Software Development Kit are generally available in:

$ module load cuda
$ echo $CUDA_SAMPLES

Problems with the instructions on this section? Please send comments here.

Monitoring resource utilization

While the job is running

Open a terminal on the Hoffman2 cluster and issue at the command line:

$ check_usage

If you have any interactive session or barch job running, the command check_usage will give you a snapshot of the current resource utilization of each job on each node on which jobs are running. The command will also inform you of the resources you have requested for each job.

After the job has completed

To check the scheduler accounting logs you will need to know your $JOB_ID. For example, for $JOB_ID equal to 4753410 you will use:

$ qacct -j 4753410

and inspect the maxvmem.