Running a Batch Job

The Hoffman2 Cluster uses the Univa Grid Engine (UGE) for scheduling jobs and interactive compute sessions.

Submit batch jobs from the cluster nodes

In order to run a job under the Univa Grid Engine, you need to supply UGE directives and their arguements to the job scheduler. The easiest way to do this is to create a UGE command file that consists of a set of UGE directives along with the commands required to execute the actual job. The command file for submitting a job can either be built using queue scripts provided by IDRE, or by building an UGE command file yourself.

Queue scripts

Each IDRE-provided queue script is named for a type of job or application. The queue script builds a UGE command file for that particular type of job or application. A queue script can be run either as a single command to which you provide appropriate options, or as an interactive application which presents you with a menu of choices and prompts you for the values of options.

For example, if you simply enter a queue script command such as:

job.q

without any command-line arguments, the queue script will enter its interactive mode and present you with a menu of tasks you can perform. One of these tasks is to build the command file, another is to submit a command file that has already been built, another is to show the status of jobs you have already submitted. See Queue scripts for details, or select Info from any queue script menu, or enter man queue at a shell prompt.

You can also enter myjobs at the shell prompt to show the status of jobs you have submitted and which have not already completed. You can also enter groupjobs at the shell prompt to show the status of pending jobs everyone in your group has submitted. Enter groupjobs -help for options.

IDRE-provided queue scripts can be used to run the following types of jobs:

Serial Jobs

A serial job runs on a single thread on a single node. It does not take advantage of multi-processor nodes or the multiple compute nodes available with a cluster.

To build or submit an UGE command file for a serial job, you can either enter:

job.q [queue-script-options]

or, you can provide the name of your executable on the command line:

job.q [queue-script-options] name_of_executable [executable-arguments]

When you enter job.q without the name of your executable, it will interactively ask you to enter any needed memory, wall-clock time limit and other options, and ask you if you want to submit the job. You can quit out of the queue script menu and edit the UGE command file which the script built if you want to change or add other Univa Grid Engine options before you submit your job.

If you did not submit the command file at the end of the menu dialog and decided to edit the file before submitting it, you can submit your command file using either a queue script Submit menu item, or the qsub command:

qsub executable.cmd

When you enter job.q with the name of your executable, it will by default build the command file using defaults for any queue script options that you did not specify, submit it to the job scheduler, and delete the command file that it built.

Serial Array Jobs

Array jobs are serial jobs or multi-threaded jobs that use the same executable but different input variables or input files, as in parametric studies. Users typically run thousands of jobs with one submission.

The UGE command file for a serial array job will, at the minimum, contain the UGE keyword statement for a lower index value and an upper index value. By default, the index interval is one. UGE keeps track of the jobs using the environment variable SGE_TASK_ID which varies from the lower index value to the upper index value for each job. Your program can use SGE_TASK_ID to select the input files to read or the options to be used for that particular run.

If your program is multi-threaded, you must edit the UGE command file built by the jobarray.q script and add an UGE keyword statement that specifies the shared parallel environment and the number of processors your job requires. In most cases you should request no more than 8 processors because the maximum number of processors on most nodes is 8. See For a multi-threaded OpenMP job below.

To build or submit an UGE command file for a serial array job, enter:

jobarray.q

For details, see Running an Array of Jobs Using UGE.

Multi-threaded Jobs

Multi-threaded jobs are jobs which will run on more than one thread on the same node. Programs using the OpenMP-based threaded library are a typical example of those that can take advantage of multi-core nodes.

If you know your program is multi-threaded, you need to request that UGE allocate multiple processors. Otherwise your job will contend for resources with other jobs that are running on the same node, and all jobs on that node may be adversely affected. The queue script will prompt you to enter the number of tasks for your job. The queue script default is 4 tasks. You should request at least as many tasks as your program has threads, but usually no more than 8 tasks because the maximum number of processors on most nodes is 8. See Scalability Benchmark below for information on how to determine the optimal number of tasks.

To build or submit an UGE command file for a multi-threaded job, enter:

openmp.q

For details, see OpenMP programs and Multi threaded programs.

MPI Parallel Jobs

MPI parallel jobs are those executable programs that are linked with one of the message passing libraries like OpenMPI. These applications explictly send messages from one node to another using either a Gigabit Ethernet (GE) interface or Infiniband (IB) interface. IDRE recommends that everyone use the Infiniband interface because latency for message passing is short with the IB interface compared to the GE interface.

When MPI jobs are submitted to the cluster, one needs to tell the UGE scheduler how many processors are needed to run the jobs. The queue script will prompt you to enter the number of tasks for your job. The queue script default for generic jobs is 4 parallel tasks. Please see Scalability Benchmark below for information on how to determine the optimal number of tasks.

To build or submit an UGE command file for a parallel job, enter:

mpi.q

For details, see How to Run MPI.

Application Jobs

An application job is one which runs software provided by a commercial vendor or is open source. It is usually installed in system directories (e.g., MATLAB).

To build or submit an UGE command file for an application job, enter:

application.q

where application is replaced with the name of the application. For example, use matlab.q to run MATLAB batch jobs. For details, see Software and its subsequent links for each package or program to How to run.

Batch Job Output Files

When a job has completed, UGE messages will be available in the stdout and stderr files that were were defined in your UGE command file with the -o and -e or -j keywords. Program output will be available in any files that your program has written.

If your UGE command file was built using a queue script, stdout and stderr from UGE will be found in one of:

  • jobname.joblog
  • jobname.joblog.$JOB_ID
  • jobname.joblog.$JOB_ID.$SGE_TASK_ID (for array jobs)

Output from your program will be found in one of:

  • jobname.out
  • jobname.out.$JOB_ID
  • jobname.output.$JOB_ID
  • jobname.output.$JOB_ID.$SGE_TASK_ID (for array jobs)

Build a UGE command file for your job and use UGE commands directly

This section describes building an UGE command file yourself, instead of letting a queue script build it for you. Or you may modify an UGE command file that a queue script has built, according to the information presented here.

The UGE keyword statements in a command file are called active comments because they begin with #$ and comments in a script file normally begin with #.

Any qsub command line option can be used in the command file as an active comment. The qsub command line options are listed on the submit man page

Each UGE keyword statement begins with #$ followed by the UGE keyword and its value, if any. For example:

#$ -cwd
#$ -o jobname.joblog
#$ -j y

where jobname is the name of your job. Here the first UGE statement #$ -cwd specifies that the current working directory is to be used for the job. The second UGE statement #$ -o jobname.joblog names the output file in which the UGE command file will write its standard out messages. The third #$ -j y specifies that any messages that UGE may write to standard error are to be merged with those it writes to standard out.

After you have created the UGE command file, issue the appropriate UGE commands from a login node to submit and monitor the job. See Commonly-Used UGE Commands

Using job arrays

For job arrays you need to use an UGE keyword statement of the form:

#$ -t lower-upper:interval

Please see Running an Array of Jobs Using UGE for more information.

For a parallel MPI job

For a parallel MPI job you need to have a line that specifies a parallel environment:

#$ -pe dc* number_of_slots_requested

The maximum number_of_slots_requested value that you should use depends on your account’s access level.

For a multi-threaded (OpenMP) job

For a multi-threaded OpenMP job you need to request that all processors be on the same node by using the shared parallel environment.

#$ -pe shared number_of_slots_requested

where the maximum number_of_slots_requested no larger than the number of CPU/cores of a compute node.

Parallel Environments (PE)

For Threaded Programs (e.g. OpenMP)

PE Description
shared p p processors on a single node

For MPI Programs

PE Description
dc* p p processors across multiple nodes. The * is significant. There is no space between “dc” and “*”. There is space between “*” and the value of p.

Requesting whole nodes

PE Description
node* n n nodes (normally used with -l exclusive)

How to reserve an entire node

One or more whole nodes for parallel jobs

To get one or more entire nodes for parallel jobs, use -pe node* n -l exclusive in your qsub or qrsh command or add them to your UGE command file, where n is the number of nodes you are requesting.

Example of requesting 2 whole nodes with qsub:

qsub -pe node 2 -l exclusive [other options]

Example of requesting 3 whole nodes in a UGE command file:

#$ -l exclusive
#$ -pe node 3 

Submit a batch job from the UCLA Grid Portal

Note: UCLA Grid Portal will be taken down in near future.

The UCLA Grid Portal provides a web portal interface to the Hoffman2 Cluster. Every Hoffman2 Cluster user can access the UCLA Grid Portal. To submit a batch job from the UCLA Grid Portal, click the Job Services tab then click one of: Generic Jobs, Applications or Multi-Jobs.

Generic Jobs
Use this page to submit a job that runs a program or script that either you or a colleague have written and is usually installed in your home directory. In the fill-in form provided, supply the name of the executable, any job parameters, time request, number of processors.
Applications
Use this page to submit a commonly used application. Normally, you are required to know less about an application than a generic job, as the UCLA Grid Portal keeps track of the location of the executable and other information about the application. You normally must prepare an input file that the application will read or run. Some applications can present forms to you on the UCLA Grid Portal that you can fill in to create the input file if you are not familiar with application requirements.
Multi-Jobs
Use this page to submit multiple jobs that run a program or script that either you or a colleague have written. For details, see Running an Array of Jobs Using UGE.

After you submit a job, click Job Status where you can monitor its progress, and view and download its output after your job completes.

Report Typos and Errors
UCLA OIT

© 2016 UC REGENTS TERMS OF USE & PRIVACY POLICY