Matlab is a mutli-purpose numerical computing environment.

Contents

# General information

On Hoffman2 there are only a small number of publicly available matlab licenses. To maximize availability, when possible, we encourage our users to compile their matlab script via the matlab compiler, mcc, and then submit their matlab executables as batch jobs.

Alternatively, if you need a toolbox that is not available on the cluster or if you need a dedicated matlab license for your group, you should purchase a network-type license from MathWorks and we will gladly install it and reserve it for your group.

# Interactive use

To run matlab interactively you have to request an interactive session via qrsh, for example for serial use of matlab and a modest size problem and a run time of 4 hours you would enter:

`qrsh -l h_data=20g,h_rt=2:00:00`

## Invoking matlab

When you get the prompt on the interactive node start matlab with:

```
module load matlab
matlab
```

## Matlab Virtual memory Size Issue

The Hoffman2 cluster’s job scheduler currently enforces the virtual memory limit on jobs based on the `h_data`

value in job submissions. It is important to set `h_data`

large enough to run the job. On the other hand, setting too large `h_data`

limits the number of available nodes to run the job or results in a job unable to start. During the runtime of a job, if the virtual memory limit is exceeded, the job is terminated instantly.

Matlab consumes a large amount of virtual memory when the Java-based graphics interface is used. Depending on the Matlab versions and the CPU models, we have measured that launching the Matlab GUI requires 15-20GB of virtual memory (without using any user data). For example, on Intel Gold Gold 6140 CPU, the virtual memory size of the MATLAB process is 20GB. On Intel E5-2670v3 CPU, the virtual memory size of the MATLAB process is 16GB.

One way to reduce the virtual memory use of Matlab is to use the text-based user interface, i.e. launch matlab with the command:

`matlab -nojvm -nodisplay -nosplash`

In this case, the virtual memory usage to launch Matlab is about 1.8GB.

## Using the Distributed Computing Toolbox

The Distributed Computing Toolbox allows jobs to take advantage of multiple computational cores available either on one cluster node (that is a physical host computer) or cores across several cluster nodes. This mode of computation distributes the computational load across several workers. The code needs to use either MATLAB’s own parallel instructions (such as parfor) or make explicit calls to an external mpirun command.

### Using resources local to one computational node

By default the Distributed Computing Toolbox when seeing parallel instructions will spawn workers thread on the local host where MATLAB was first invoked. You can control the number of workers by specifically opening a parpool with a given number of workers:

`p = parpool('local',`*no_of_tasks*);
<parallel matlab code (i.e., parfor loop)>
delete(p)

where

is the number of parallel workers. In this case your qrsh session will need to have been invoked requesting an equivalent number of cores:*no_of_tasks*

`qrsh -l h_data=1g,h_rt=4:00:00 -pe shared `*no_of_tasks*

where, again,

is the number of parallel workers (an integer number).*no_of_tasks*

### Using distributed resources across more than one computational node

In certain circumstances the parallel workers may need to be started on hosts other than the node where matlab was started. In this case MATLAB can take advantage of the Distributing Computing Server and spawn batch submission of its workers to other nodes via the Hoffman2 scheduler. A creation of a non local parallel profile needs to be done before this mode of computing can be implemented.

#### Crete a cluster configuration profile

- Open an interactive MATLAB desktop, click Parallel > Manage Cluster Profiles.
- click Add -> Select Custom -> Select Generic -> now click Edit button on lower right corner
- Use the following parameters for the profile:

**Description=**myparallelmatlab - could be anything
**JobStorageLocation=**Current working folder
**NumWorkers=**16
**ClusterMatlabRoot=**/u/local/apps/matlab/current
**IndependentSubmitFcns=**independentSubmitFcn
**CommunicatingSubmitFcn=**communicatingSubmitFcn
**OperatingSystem=**unix
**HasSharedFilesystem=**true
**NumWorkersRange=**[1 16]
**CaptureDiary=**true
**GetJobStateFcn=**getJobStateFcn
**DeleteJobFcn=**deleteJobFcn

#### Use the distributed cluster profile

The new profile can be used with code such as:

`p = parpool('myparallelmatlab',`*no_of_tasks*);
<parallel matlab code (i.e., parfor loop)>
delete(p)

An example of parallel code is given here:

**%make sure you have the following line before opening parpool **
n=100;
**%make sure you have the following line before opening parpool **
pctconfig('preservejobs',true);
**%parpool open statement assuming myparallelmatlab is the default profile you created
% and you want to use 2 workers out of available 16 workers**
p = parpool('myparallelmatlab',2);
**%do parallel work here, for example use parfor to generate random numbers**
parfor ii = 1:n
rand(n,n);
end
**%finish parallel work and close the parpool**
delete(p);

## Creating matlab standalone executables with the matlab compiler

To use the MATLAB compiler to create a stand-alone matlab executable, enter:

```
module load matlab
mcc -m [mcc-options] <space separated list of matlab functions to be compiled>
```

note: if more than one matlab function needs to be included in the compilation they should be listed placing the main function as first. If, for example, your matlab code is organized in main function, written in a separate file (for example: main.m) that calls code in functions written in separate files: f1.m, f2.m, you would use:

`mcc -m [mcc-options] main.m f1.m f2.m `

To create an executable that will run on a single processor, include this mcc option:

`-R -singleCompThread`

If your matlab code contains calls to the parallel matlab toolbox it is possible to create an executable that will spawn several parallel workers. Your code will need to explicitly invoke a parpool with the parallel cluster profile option specified as an argument. For example to use the ‘local’ cluster profile (which will start parallel workers on the same node) use:

`p = parpool('local',`*no_of_tasks*);
<parallel matlab code (i.e., parfor loop)>
delete(p)

where

is the number of parallel workers. In this case your qrsh session will need to have been invoked requesting an equivalent number of cores:*no_of_tasks*

`qrsh -l h_data=1g,h_rt=4:00:00 -pe shared `*no_of_tasks*

where, again,

is the number of parallel workers (an integer number).*no_of_tasks*

## Running matlab standalone executables

After compilation and within the same interactive session you can run your matlab standalone executable with:

```
module load matlab
./<name_of_your_matlab_script_with_no_m_extension>
```

### Running matlab standalone executables for matlab version 9.1 (R2016b)

Two modulefiles are available for matlab version 9.1 (R2016b):

```
matlab/9.1
matlab/9.1_MCR
```

while the `matlab/9.1`

modulefile can be used to run matlab or use its compiler, to run standalone martlab executables, built with matlab version 9.1, you will need to load the `matlab/9.1_MCR`

modulefile:

`module load matlab/9.1_MCR`

# Batch use

## How to run serial or multi-threaded MATLAB jobs using the Queue Scripts

The easiest way to run MATLAB in batch (assuming that your matlab code does not use a `parfor`

loop – otherwise see below) is to use the queue scripts (which can be used from the login node). See Running a Batch Job for a discussion of the queue scripts and how they are used.

The following queue scripts are available for MATLAB:

- matlab.q
- Runs single or multi-processor in two steps: compile and execute.
- mcc.q
- Use the MATLAB compiler to create a stand-alone executable. If you are using mcc.q interactively, it will ask you if you want the executable produced by mcc to use a single processor or not. If you are using mcc.q in command line mode, to create an executable that will run on a single processor specify this argument:

`-R -singleCompThread`

- matexe.q
- Run a MATLAB stand-alone executable created with mcc.

The matlab.q script will compile the MATLAB files into a stand-alone program so that the execution of MATLAB files on computers will not require a MATLAB license at run time. For serial or implicit multi-threaded MATLAB code, no extra work will be needed to run MATLAB .m files using matlab.q in most cases.

## How to run MATLAB jobs that contain parallel instructions

The script: matlab_compile_and_submit.sh generates and submits a batch job

that builds and runs a matlab standalone application out of one or more matlab

functions. Matlab standalone executables support the use of the Distributed

Computing Toolox. The maximum number of parallel workers supported on

hoffman2 is 16. If any part of your matlab code includes a parfor loop

you will need to include the following lines:

```
#before the parfor loop, for example for 5 workers and the local profile:
p = parpool('local',5);
#after the parfor loop:
delete(p)
```

The script matlab_compile_and_submit.sh can be used as follows:

```
Usage:
./matlab_compile_and_submit.sh [-t time in hours]
[ -s number of processes ] [-m memory per process (in GB)]
[-f main matlab function] [-f matlab function 2] ... [-f matlab function n]
[-ns (to build a submission script without submitting the job)]
[ --help ]
```

## How to run MATLAB using UGE commands

See Running a Batch Job for guidelines to follow to create the required job scheduler command file. Alternatively, you could create an job scheduler command file with one of the queue scripts listed above. After saving the command file, you can modify it if necessary. See Commonly-Used UGE Commands for a list of the most commonly used job scheduler commands.

# Some toolboxes

See vendor documentation for MATLAB toolboxes at http://www.mathworks.com/help/

The following additional MATLAB toolboxes are available to all users on the Hoffman2 Cluster. Some resource groups have purchased additional toolbox licenses; ask your faculty sponsor for details.

- Compiler
- Compiles a MATLAB application into a standalone application or software component.
- Control System Toolbox
- A collection of MATLAB functions for classical and modern control system design, analysis, and modeling.
- Image Processing Toolbox
- A suite of digital image processing and analysis tools.
- Optimization Toolbox
- A collection of functions for: unconstrained/constrained nonlinear minimization, quadratic and linear programming, curve-fitting, solving nonlinear systems of equations and solving constrained linear least squares.
- Parallel Computing Toolbox and Distributed Computing Server
- Allows you to run as many as eight MATLAB workers on a single machine in addition to your MATLAB client session. The MATLAB Distributed Computing Server allows you to run multiple MATLAB workers on a cluster of computers. Several MathWorks products offer built-in support for the parallel computing products without requiring extra coding. For the current list of these products and their parallel functionality, see Built-in Parallel Computing Support in MathWorks Products.
- Signal Processing Toolbox
- Provides a customizable framework for digital signal processing (DSP), including tools for algorithm development, signal and linear system analysis, and time-series data modeling.
- Simulink
- An interactive environment for modeling, analyzing, and simulating a wide variety of dynamic systems. Simulink provides a graphical user interface for constructing block diagram models using “drag-and-drop” operations.
- Simulink Control Design
- Lets you design and analyze control systems modeled in Simulink.
- Simulink Design Optimization
- Lets you estimate and optimize model parameters using numerical optimization. You can also use this software to estimate initial conditions and lookup table values, and test and optimize designs for robustness.
- Statistics Toolbox
- Combines statistical algorithms with interactive graphical interfaces.
- Symbolic Math Toolbox
- Combines the symbolic mathematics and variable-precision arithmetic capabilities of Maple with MATLAB numeric and visualization capabilities. The toolbox offers more than 100 symbolic functions for performing algebraic, calculus, and integral transform operations.

# Running Mapreduce using Matlab Parallel Pool

(This section is under construction) You can run mapreduce through Matlab Parallel Pool in Parallel Computing Toolbox on Hoffman2. User needs to provide the data set, mapper and reducer for Matlab. In the following example, we will show how to find the maximum value of a single variable in a data set using mapreduce. All data sets, map and reduce functions are all available in Matlab demo directory (`$MATLABROOT/toolbox/matlab/demos`

).

**Load the matlab in an interactive session**

[jbruin@login2 ~] qrsh -l exclusive [jbruin@nxxxx ~] module load matlab [jbruin@nxxxx ~] matlab

**Create and preview datastore in Matlab**>> ds = datastore('airlinesmall.csv','TreatAsMissing','NA', 'SelectedVariableNames','ArrDelay','ReadSize',1000); >> preview(ds) ans = ArrDelay ________ 8 8 21 13 4 59 3 11

Note: the demo data set above is a 12-megabyte data set contains 29 columns of flight information for several airline carriers, including arrival and departure times. This example selects

`ArrDelay`

(flight arrival delay) as the variable of interest.**Start a 4-worker parallel pool on a local cluster in Matlab.**>> p = parpool('local',4); Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.

Note: we use a 4-worker local cluster, which works for all Parallel Computing Toolbox installations in compute nodes.

**Create a MapReducer object in Matlab**

>> mr = mapreducer(p);

Note: mapreducer sets the global execution environment for mapreduce using the created MapReducer object, mr.

**Run the mapreduce calculation in the MATLAB client session.**>> maxDelay = mapreduce(ds, @maxArrivalDelayMapper, @maxArrivalDelayReducer, mr); Parallel mapreduce execution on the parallel pool: ******************************** * MAPREDUCE PROGRESS * ******************************** Map 0% Reduce 0% Map 50% Reduce 0% Map 100% Reduce 0% Map 100% Reduce 100% >> readall(maxDelay) ans = Key Value _________________ ______ 'MaxArrivalDelay' [1014]

Note: the mapper finds the maximum arrival delay in each chunk of data. The mapper then stores these maximum values as the intermediate values associated with the key

`'PartialMaxArrivalDelay'`

.The reducer receives a list of the maximum arrival delays for each chunk and finds the overall maximum arrival delay from the list of values.`mapreduce`

only calls this reducer once, since the mapper only adds a single unique key. The reducer uses`add`

to add a final key-value pair to the output.

Further detailed information about running mapreduce on a Parallel Pool can be checked in the link.