Main navigation

Running a Batch Job

The Hoffman2 Cluster uses the Univa Grid Engine (UGE) for scheduling jobs and interactive compute sessions.

Submit batch jobs from the cluster nodes

In order to run a job under the Univa Grid Engine, you need to supply UGE directives and their arguements to the job scheduler. The easiest way to do this is to create a UGE command file that consists of a set of UGE directives along with the commands required to execute the actual job. The command file for submitting a job can either be built using queue scripts provided by IDRE, or by building an UGE command file yourself. We recommend you use an IDRE-provided queue script if you are not familiar with UGE directives.

Queue scripts

Each IDRE-provided queue script is named for a type of job or application. The queue script builds a UGE command file for that particular type of job or application. A queue script can be run either as a single command to which you provide appropriate options, or as an interactive application which presents you with a menu of choices and prompts you for the values of options.

For example, if you simply enter a queue script command such as:

job.q

without any command-line arguments, the queue script will enter its interactive mode and present you with a menu of tasks you can perform. One of these tasks is to build the command file, another is to submit a command file that has already been built, another is to show the status of jobs you have already submitted. See Queue scripts for details, or select Info from any queue script menu, or enter man queue at a shell prompt.

You can also enter myjobs at the shell prompt to show the status of jobs you have submitted and which have not already completed. You can also enter groupjobs at the shell prompt to show the status of pending jobs everyone in your group has submitted. Enter groupjobs -help for options.

IDRE-provided queue scripts can be used to run the following types of jobs:

Serial Jobs

A serial job runs on a single thread on a single node. It does not take advantage of multi-processor nodes or the multiple compute nodes available with a cluster.

To build or submit an UGE command file for a serial job, you can either enter:

job.q [queue-script-options]

or, you can provide the name of your executable on the command line:

job.q [queue-script-options] name_of_executable [executable-arguments]

When you enter job.q without the name of your executable, it will interactively ask you to enter any needed memory, wall-clock time limit and other options, and ask you if you want to submit the job. You can quit out of the queue script menu and edit the UGE command file which the script built if you want to change or add other Univa Grid Engine options before you submit your job.

If you did not submit the command file at the end of the menu dialog and decided to edit the file before submitting it, you can submit your command file using either a queue script Submit menu item, or the qsub command:

qsub executable.cmd

When you enter job.q with the name of your executable, it will by default build the command file using defaults for any queue script options that you did not specify, submit it to the job scheduler, and delete the command file that it built.

Serial Array Jobs

Array jobs are serial jobs or multi-threaded jobs that use the same executable but different input variables or input files, as in parametric studies. Users typically run thousands of jobs with one submission.

The UGE command file for a serial array job will, at the minimum, contain the UGE keyword statement for a lower index value and an upper index value. By default, the index interval is one. UGE keeps track of the jobs using the environment variable SGE_TASK_ID which varies from the lower index value to the upper index value for each job. Your program can use SGE_TASK_ID to select the input files to read or the options to be used for that particular run.

If your program is multi-threaded, you must edit the UGE command file built by the jobarray.q script and add an UGE keyword statement that specifies the shared parallel environment and the number of processors your job requires. In most cases you should request no more than 8 processors because the maximum number of processors on most nodes is 8. See For a multi-threaded OpenMP job below.

To build or submit an UGE command file for a serial array job, enter:

jobarray.q

For details, see Running an Array of Jobs Using UGE.

Multi-threaded Jobs

Multi-threaded jobs are jobs which will run on more than one thread on the same node. Programs using the OpenMP-based threaded library are a typical example of those that can take advantage of multi-core nodes.

If you know your program is multi-threaded, you need to request that UGE allocate multiple processors. Otherwise your job will contend for resources with other jobs that are running on the same node, and all jobs on that node may be adversely affected. The queue script will prompt you to enter the number of tasks for your job. The queue script default is 4 tasks. You should request at least as many tasks as your program has threads, but usually no more than 8 tasks because the maximum number of processors on most nodes is 8. See Scalability Benchmark below for information on how to determine the optimal number of tasks.

To build or submit an UGE command file for a multi-threaded job, enter:

openmp.q

For details, see OpenMP programs and Multi threaded programs.

MPI Parallel Jobs

MPI parallel jobs are those executable programs that are linked with one of the message passing libraries like OpenMPI. These applications explictly send messages from one node to another using either a Gigabit Ethernet (GE) interface or Infiniband (IB) interface. IDRE recommends that everyone use the Infiniband interface because latency for message passing is short with the IB interface compared to the GE interface.

When MPI jobs are submitted to the cluster, one needs to tell the UGE scheduler how many processors are needed to run the jobs. The queue script will prompt you to enter the number of tasks for your job. The queue script default for generic jobs is 4 parallel tasks. Please see Scalability Benchmark below for information on how to determine the optimal number of tasks.

To build or submit an UGE command file for a parallel job, enter:

mpi.q

For details, see How to Run MPI.

Application Jobs

An application job is one which runs software provided by a commercial vendor or is open source. It is usually installed in system directories (e.g., MATLAB).

To build or submit an UGE command file for an application job, enter:

application.q

where application is replaced with the name of the application. For example, use matlab.q to run MATLAB batch jobs. For details, see Software and its subsequent links for each package or program to How to run.

Batch Job Output Files

When a job has completed, UGE messages will be available in the stdout and stderr files that were were defined in your UGE command file with the -o and -e or -j keywords. Program output will be available in any files that your program has written.

If your UGE command file was built using a queue script, stdout and stderr from UGE will be found in one of:

  • jobname.joblog
  • jobname.joblog.$JOB_ID
  • jobname.joblog.$JOB_ID.$SGE_TASK_ID (for array jobs)

Output from your program will be found in one of:

  • jobname.out
  • jobname.out.$JOB_ID
  • jobname.output.$JOB_ID
  • jobname.output.$JOB_ID.$SGE_TASK_ID (for array jobs)

Build a UGE command file for your job and use UGE commands directly

This section describes building an UGE command file yourself, instead of letting a queue script build it for you. Or you may modify an UGE command file that a queue script has built, according to the information presented here.

For parallel jobs, IDRE strongly recommends that you use the queue script mpi.q to initially create the UGE command file. In addition to the UGE keyword statements that specify time and memory resources appropriate for your job, a queue script-built UGE command file contains shell commands that initialize the environment for the job, invoke the program that the job will run, and perform any job post-processing needed.

The UGE keyword statements in a command file are called active comments because they begin with #$ and comments in a script file normally begin with #.

Any qsub command line option can be used in the command file as an active comment. The qsub command line options are listed on the submit man page

Each UGE keyword statement begins with #$ followed by the UGE keyword and its value, if any. For example:

#$ -cwd
#$ -o jobname.joblog
#$ -j y

where jobname is the name of your job. Here the first UGE statement #$ -cwd specifies that the current working directory is to be used for the job. The second UGE statement #$ -o jobname.joblog names the output file in which the UGE command file will write its standard out messages. The third #$ -j y specifies that any messages that UGE may write to standard error are to be merged with those it writes to standard out.

After you have created the UGE command file, issue the appropriate UGE commands from a login node to submit and monitor the job. See Commonly-Used UGE Commands

For a serial or multi-threaded job using job arrays

For a serial or multi-threaded job using job arrays you need to use an UGE keyword statement of the form:

#$ -t lower-upper:interval

Please see Running an Array of Jobs Using UGE for more information.

For a parallel MPI job

For a parallel MPI job you need to have a line that specifies a parallel environment, similar to one of the examples below.

If you are in the campus group, or if you are a member of a resource group whose nodes are located in the IDRE Data Center you can use the dc_idre parallel environment:

#$ -pe dc_idre number_of_slots_requested

If you are in the campus group, or if you are a member of a resource group whose nodes are located in the MSA Data Center you can use the dc_msa parallel environment:

#$ -pe dc_msa number_of_slots_requested

If you are in the campus group, or if you are a member of a resource group whose nodes are located in the POD Data Center you can use the dc_pod parallel environment:

#$ -pe dc_pod number_of_slots_requested

If you are not sure in which data center your shared cluster nodes are located, or if you belong to more than one group and are authorized to run on nodes in either data center, you can use -pe dc* number_of_slots_requested and let UGE decide in which data center to run your job.

The maximum number_of_slots_requested value that you should use depends not only on number processors authorized for your resource group, but also on the parallel scalability of your program. You need to verify your program’s actual speed-up before making long production runs. If your code does not scale, it may run slower with more processors which is a waste of your time and an inefficient use of cluster resources. See Scalability Benchmark below for information on how to determine the optimal number of processors.

A UGE slot usually corresponds to a single core or processor on a multiple-core node. The slots will be distributed among many nodes and MPI needs to be told on which nodes the allocated slots reside. Different nodes and/or number of slots on a particular node, will be reserved each time a job runs. That information can be retrieved from an UGE run-time file named by the UGE environment variable $PE_HOSTFILE which is available inside your UGE command file. If you initially use the mpi.q or mpishm.q script to build your UGE command file, it will make the MPI hostfile for you.

For a multi-threaded OpenMP job

For a multi-threaded OpenMP job you need to request that all processors be on the same node by using the sharedparallel environment.

#$ -pe shared number_of_slots_requested

where the maximum number_of_slots_requested is uaually 8 because the maximum number of processors per node on most Hoffman2 Cluster nodes is 8. You should request at least as many processors as your program has threads. SeeScalability Benchmark below for information on how to determine the optimal number of processors.

For an OpenMP job which combines OpenMP and MPI

For an OpenMP job which combines OpenMP and MPI you need to specify one of the nthreads or nthreads_msa ornthreads_pod parallel environments. See the Hoffman2 Cluster Parallel Environments table below for a list of parallel environment names.

Example of using a possible mix of 8-processor, 12-processor and 16-processor nodes:

#$ -pe 4threads number_of_slots_requested

where number_of_slots_requested is a multiple of 4. Specifying the 4threads* parallel environments gives UGE more flexibility to choose nodes, so your job might start sooner.

If you are not sure in which data center your resource group’s nodes are located, or if you belong to more than one group and are authorized to run on nodes in more than one data center, you can use -pe nthreads*number_of_slots_requested or -pe shared number_of_slots_requested and let UGE decide in which data center to run your job.

 

Hoffman2 Cluster Parallel Environments (PE)

Parallel Environments for Threaded Programs (OpenMP)

Any nodes in IDRE, MSA or POD Data Centers.

PE Description
shared p p processors on a single node, p ≤ 16
2threads p
2threads_msa p
2threads_pod p
2 processors per node, total p processors
3threads p
3threads_msa p
3threads_pod p
3 processors per node, total p processors
4threads p
4threads_msa p
4threads_pod p
4 processors per node, total p processors
5threads p
5threads_msa p
5threads_pod p
5 processors per node, total p processors
6threads p
6threads_msa p
6threads_pod p
6 processors per node, total p processors
7threads p
7threads_msa p
7threads_pod p
7 processors per node, total p processors
8threads p
8threads_msa p
8threads_pod p
8 processors per node, total p processors
12threads p
12threads_msa p
12threads_pod p
12 processors per node, total p processors
16threads p
16threads_msa p
16threads_pod p
16 processors per node, total p processors p ≤ 32

Parallel Environments for MPI Programs (OpenMPI)

PE Description
dc_idre p
mpi p
p processors on multiple nodes in IDRE Data Center
see notes. mpi is deprecated.
dc_msa p p processors on multiple nodes in MSA Data Center see notes
dc_pod p p processors on multiple nodes in POD Data Center see notes
alldc p p processors on multiple nodes irrespective of Data Centers see notes

Parallel Environments for Threaded MPI Programs

Other nthreads parallel environments are also available.

PE Description
4threads p
4threads_msa p
4threads_pod p
4 processors per 8- 12- or 16-core node
in IDRE, MSA or POD Data Centers see notes
8threads p
8threads_msa p
8threads_pod p
8 processors per 8- 12- or 16-core node
in IDRE, MSA and POD Data Centers. see notes
12threads p
12threads_msa p
12threads_pod p
12 processors per 12- or 16-core node
in IDRE, MSA and POD Data Centers. see notes
16threads_msa p
16threads_pod p
16 processors per 16-core node. p ≤ 32
in MSA and POD Data Centers. see notes

Parallel Environments that request whole nodes

PE Description
node* n n nodes in any Data Center
node n n nodes in IDRE Data Center
node_msa n n nodes in MSA Data Center
node_pod n n nodes in POD Data Center

Number of processors notes

  1. For campus users, currently p ≤ 128.
  2. For shared cluster users, p is the entire shared cluster for ≤ 24 hour jobs, or limited to the number of processors contributed by the research group for long-running jobs.
  3. Permission to use more processors on request to IDRE.
  4. When using alldc the job will span across datacenters over a long distance network. Parallel applications may not perform as well as they performed within a single datacenter due to latency.

 

Benchmarking

If you use the node* parallel environment then no other jobs will run or start on the same nodes when your job is running. This may be useful for benchmark purposes, where you want to avoid contention from other jobs running on the same node. See also How to reserve an entire node below.

Scalability Benchmark

Before submitting your parallel code for production runs, you should determine the optimal number of processors to use. You can do this by performing a scalability benchmark.

A common way to carry out the so-called strong scaling benchmark, is to examine the wall-clock elapsed time of your code for runs with different numbers of processors. You can time your code with the /usr/bin/time command inside your UGE command file, or after a job completes get the wall-clock seconds with this command:

qacct -j jobid | grep ru_wallclock

where jobid is your UGE job ID.

Start with a small number of processors (say 2 or 4), and increase the number of processors in each run. Your code scales well if doubling the number of processors halves the wall-clock elapsed time.

In many cases, for a given problem or data size, the code’s wall-clock elapsed time does not decrease when the number of processors is increased beyond a certain point. That is, beyond some point, using more processors slows down your job. Normally, the number of processors you should use for production runs is less than this number.

High-memory job requirements

Using the shared, nthreads, nthreads_msa or nthreads_pod parallel environments where n is between 2 and 16, may provide a work-around for serial jobs requiring more than the default amount of memory per processor. For example if a serial job requires 4GB memory, one may specify -pe shared 4 in addition to specifying -l h_data=1G. Example for a job that will have access to a total of 4GB memory:

$# -pe shared 4
$# -l h_data=1G [other options]

If your job requires more than 64GB memory, you can use the -l highmem parameter. Example for a job which will have access to a total of 80GB memory:

$# -pe shared 10
$# -l h_data=8G,highmem [other options]

highmem will restrict your job to running on the few nodes that have 128GB total memory per node. Specifying the additional highmem parameter further constrains the job scheduler from selecting available nodes, so the wait time before your job starts may increase.

How to reserve an entire node

There are two approaches to getting an entire node:

  • using -l exclusive for a single whole node for a serial job
  • using -pe node* n for one or more whole nodes for parallel jobs

A single whole node for a serial job

The recommended way to get an entire node for serial jobs is to use -l exclusive in your qsub or qrsh command, or add -l exclusive to your UGE command file.

Example of requesting a whole node with qsub:

qsub -l exclusive [other options]

Example of requesting a whole node in a UGE command file:

#$ -l exclusive [other options]

Using this approach, you will get a whole node with 8, 12 or 16 processors. If you want to get a whole node with a certain number of processors, use the num_proc option in addition to the exclusive option.

For example, to get a 8-core whole node:

qsub -l exclusive,num_proc=8 [other options]

Specifying the additional num_proc parameter further constrains the job scheduler from selecting available nodes, so the wait time before your job starts may increase.

Notice:

  • You cannot request multiple whole nodes using -l exclusive.
  • -l exclusive does not generate a PE_HOSTFILE

One or more whole nodes for parallel jobs

The recommended way to get one or more entire nodes for parallel jobs is to use -pe node* n in your qsub or qrsh command or add -pe node* n to your UGE command file, where n is the number of nodes you are requesting.

Example of requesting 2 whole nodes with qsub:

qsub -pe node 2 [other options]

Example of requesting 3 whole nodes in a UGE command file:

#$ -pe node 3 [other options]

Using this approach, you will get whole nodes with 8, 12 or 16 processors. If your application requires a certain number of processors, you need to retrieve that information at run time in your job script.

If you want to get a whole node with a certain number of processors, use the num_proc option in addition to theexclusive option. Example to get two 8-processor whole nodes:

qsub -pe node 2 -l num_proc=8 [other arguments]

Specifying the additional num_proc parameter further constrains the job scheduler from selecting available nodes, so the wait time before your job starts may increase.

Running Multiple Serial Jobs on a Single Node through UGE

In the event you want to request a complete node for your job and you want to run multiple serial jobs on that node at the same time, enter the execution commands in multiple lines inside the same script and send them to background. The last line in your script must contain the command:

wait

An example script is given below. You may use this procedure when you are not sure how much memory your jobs are going to use and you don’t want your job’s memory consumption to slow down another user’s job running on the same node. At the same time you can run multiple jobs as well.

#!/bin/sh
#$ -o sleep.sh.joblog.$JOB_ID
#$ -j y
#  job duration is one hour
#$ -l h_data=4G,h_rt=1:00:00,exclusive

#  Three serial jobs run at the same time
sleep.sh > sleep.sh.ouput  2>&1  &
sleep10.sh > sleep10.sh.ouput  2>&1  &
sleep20.sh > sleep20.sh.ouput  2>&1  &

#  wait is important so that UGE waits for all jobs to finish
wait
exit

Submit a batch job from the UCLA Grid Portal

Note: UCLA Grid Portal will be taken down in near future.

The UCLA Grid Portal provides a web portal interface to the Hoffman2 Cluster. Every Hoffman2 Cluster user can access the UCLA Grid Portal. To submit a batch job from the UCLA Grid Portal, click the Job Services tab then click one of: Generic Jobs, Applications or Multi-Jobs.

Generic Jobs
Use this page to submit a job that runs a program or script that either you or a colleague have written and is usually installed in your home directory. In the fill-in form provided, supply the name of the executable, any job parameters, time request, number of processors.
Applications
Use this page to submit a commonly used application. Normally, you are required to know less about an application than a generic job, as the UCLA Grid Portal keeps track of the location of the executable and other information about the application. You normally must prepare an input file that the application will read or run. Some applications can present forms to you on the UCLA Grid Portal that you can fill in to create the input file if you are not familiar with application requirements.
Multi-Jobs
Use this page to submit multiple jobs that run a program or script that either you or a colleague have written. For details, see Running an Array of Jobs Using UGE.

After you submit a job, click Job Status where you can monitor its progress, and view and download its output after your job completes.

Report Typos and Errors
UCLA OIT

© 2016 UC REGENTS TERMS OF USE & PRIVACY POLICY