Matlab is a mutli-purpose numerical computing environment.

Contents

# General information

On Hoffman2 there are only a small number of publicly available matlab licenses. To maximize availability, when possible, we encourage our users to compile their matlab script via the matlab compiler, mcc, and then submit their matlab executables as batch jobs.

Alternatively, if you need a toolbox that is not available on the cluster or if you need a dedicated matlab license for your group, you should purchase a network-type license from MathWorks and we will gladly install it and reserve it for your group.

# Interactive use

To run matlab interactively you have to request an interactive session via qrsh, when you get a prompt on the interactive node start matlab with:

module load matlab matlab

or, to use the MATLAB compiler, enter:

mcc [mcc-options]

To create an executable that will run on a single processor, include this mcc option:

-R -singleCompThread

## Runnig matlab standalone executables for matlab version 9.1 (R2016b)

Two modulefiles are available for matlab version 9.1 (R2016b):

matlab/9.1 matlab/9.1_MCR

while the matlab/9.1 file can be used to run matlab or use its compiler, to run standalone martlab executables, built with matlab version 9.1, you will need to load the matlab/9.1_MCR:

module load matlab/9.1_MCR

# Batch use

## How to Run MATLAB using the Queue Scripts

The easiest way to run MATLAB in batch from the login node is to use the queue scripts. See Running a Batch Job for a discussion of the queue scripts and how they are used.

The following queue scripts are available for MATLAB:

- matlab.q
- Runs single or multi-processor in two steps: compile and execute.
- mcc.q
- Use the MATLAB compiler to create a stand-alone executable. If you are using mcc.q interactively, it will ask you if you want the executable produced by mcc to use a single processor or not. If you are using mcc.q in command line mode, to create an executable that will run on a single processor specify this argument:

`-R -singleCompThread`

- matexe.q
- Run a MATLAB stand-alone executable created with mcc.

## How to Run MATLAB from Using UGE Commands

See Running a Batch Job for guidelines to follow to create the required job scheduler command file. Alternatively, you could create an job scheduler command file with one of the queue scripts listed above. After saving the command file, you can modify it if necessary. See Commonly-Used UGE Commands for a list of the most commonly used job scheduler commands.

# How to Run Parallel MATLAB in Batch

The matlab.q script will compile the MATLAB files into a stand-alone program so that the execution of MATLAB files on computers will not require a MATLAB license at run time. For serial or implicit multi-threaded MATLAB code, no extra work will be needed to run MATLAB .m files using matlab.q in most cases.

For running Parallel matlab using the Distributed Computing Toolbox, users have to start matlab interactively and configure a cluster profile so that their parallel matlab work can be executed via the batch submission on computing nodes on the Hoffman2 cluster. The following steps will help you understand the process of running parallel MATLAB codes using the Distributed Computing Toolbox on Hoffman2.

## Step 1: Crete a cluster configuration profile

- Open an interactive MATLAB desktop, click Parallel > Manage Cluster Profiles.
- click Add -> Select Custom -> Select Generic -> -> now click Edit button on lower right corner
- Use the following parameters for the profile:

Description=`parallelmatlab-could be any thing`

JobStorageLocation=`Current working folder;`

NumWorkers=`16;`

ClusterMatlabRoot=`/u/local/apps/matlab/current;`

IndependentSubmitFcns=`independentSubmitFcn;`

CommunicatingSubmitFcn=`communicatingSubmitFcn;`

OperatingSystem=`unix;`

HasSharedFilesystem=`true;`

NumWorkersRange=`1 16;`

CaptureDiary=`true;`

GetJobStateFcn=`getJobStateFcn;`

DeleteJobFcn=`deleteJobFcn;`

Note that if you try to validate the profile using Matlab’s validate profile tool, you will run in to problems because of an issue with current version of Matlab on Hoffman2. If you copied the parameters correctly, make the newly created parallel profile as default and use it to test the following example code. This will validate your profile.

## Step 2: Follow the Example

- The following example illustrate the simple use of parallel processing in matlab
**%make sure you have the following line before opening matlabpool**`n=100;`

**%make sure you have the following line before opening matlabpool**`pctconfig('preservejobs',true);`

**%parpool open statement assuming GenericProfile1 is the default profile you created as per Step 1 % and you want to use 2 workers out of available 16 workers**`parpool('GenericProfile1',2);`

**%do parallel work here, for example use parfor to generate random numbers**`parfor ii = 1:n rand(n,n); end`

**%finish parallel work and close the matlabpool**`delete(gcp('nocreate'));`

# Some toolboxes

See vendor documentation for MATLAB toolboxes at http://www.mathworks.com/help/

The following additional MATLAB toolboxes are available to all users on the Hoffman2 Cluster. Some resource groups have purchased additional toolbox licenses; ask your faculty sponsor for details.

- Compiler
- Compiles a MATLAB application into a standalone application or software component.
- Control System Toolbox
- A collection of MATLAB functions for classical and modern control system design, analysis, and modeling.
- Image Processing Toolbox
- A suite of digital image processing and analysis tools.
- Optimization Toolbox
- A collection of functions for: unconstrained/constrained nonlinear minimization, quadratic and linear programming, curve-fitting, solving nonlinear systems of equations and solving constrained linear least squares.
- Parallel Computing Toolbox and Distributed Computing Server
- Allows you to run as many as eight MATLAB workers on a single machine in addition to your MATLAB client session. The MATLAB Distributed Computing Server allows you to run multiple MATLAB workers on a cluster of computers. Several MathWorks products offer built-in support for the parallel computing products without requiring extra coding. For the current list of these products and their parallel functionality, see Built-in Parallel Computing Support in MathWorks Products.
- Signal Processing Toolbox
- Provides a customizable framework for digital signal processing (DSP), including tools for algorithm development, signal and linear system analysis, and time-series data modeling.
- Simulink
- An interactive environment for modeling, analyzing, and simulating a wide variety of dynamic systems. Simulink provides a graphical user interface for constructing block diagram models using “drag-and-drop” operations.
- Simulink Control Design
- Lets you design and analyze control systems modeled in Simulink.
- Simulink Design Optimization
- Lets you estimate and optimize model parameters using numerical optimization. You can also use this software to estimate initial conditions and lookup table values, and test and optimize designs for robustness.
- Statistics Toolbox
- Combines statistical algorithms with interactive graphical interfaces.
- Symbolic Math Toolbox
- Combines the symbolic mathematics and variable-precision arithmetic capabilities of Maple with MATLAB numeric and visualization capabilities. The toolbox offers more than 100 symbolic functions for performing algebraic, calculus, and integral transform operations.

# Running Mapreduce using Matlab Parallel Pool

(This section is under construction) You can run mapreduce through Matlab Parallel Pool in Parallel Computing Toolbox on Hoffman2. User needs to provide the data set, mapper and reducer for Matlab. In the following example, we will show how to find the maximum value of a single variable in a data set using mapreduce. All data sets, map and reduce functions are all available in Matlab demo directory (`$MATLABROOT/toolbox/matlab/demos`

).

**Load the matlab in an interactive session**

[jbruin@login2 ~] qrsh -l exclusive [jbruin@nxxxx ~] module load matlab [jbruin@nxxxx ~] matlab

**Create and preview datastore in Matlab**>> ds = datastore('airlinesmall.csv','TreatAsMissing','NA', 'SelectedVariableNames','ArrDelay','ReadSize',1000); >> preview(ds) ans = ArrDelay ________ 8 8 21 13 4 59 3 11

Note: the demo data set above is a 12-megabyte data set contains 29 columns of flight information for several airline carriers, including arrival and departure times. This example selects

`ArrDelay`

(flight arrival delay) as the variable of interest.**Start a 4-worker parallel pool on a local cluster in Matlab.**>> p = parpool('local',4); Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.

Note: we use a 4-worker local cluster, which works for all Parallel Computing Toolbox installations in compute nodes.

**Create a MapReducer object in Matlab**

>> mr = mapreducer(p);

Note: mapreducer sets the global execution environment for mapreduce using the created MapReducer object, mr.

**Run the mapreduce calculation in the MATLAB client session.**>> maxDelay = mapreduce(ds, @maxArrivalDelayMapper, @maxArrivalDelayReducer, mr); Parallel mapreduce execution on the parallel pool: ******************************** * MAPREDUCE PROGRESS * ******************************** Map 0% Reduce 0% Map 50% Reduce 0% Map 100% Reduce 0% Map 100% Reduce 100% >> readall(maxDelay) ans = Key Value _________________ ______ 'MaxArrivalDelay' [1014]

Note: the mapper finds the maximum arrival delay in each chunk of data. The mapper then stores these maximum values as the intermediate values associated with the key

`'PartialMaxArrivalDelay'`

.The reducer receives a list of the maximum arrival delays for each chunk and finds the overall maximum arrival delay from the list of values.`mapreduce`

only calls this reducer once, since the mapper only adds a single unique key. The reducer uses`add`

to add a final key-value pair to the output.

Further detailed information about running mapreduce on a Parallel Pool can be checked in the link.