*********
Computing
*********

.. sidebar:: What is a compute cluster? What is a node?

    A :term:`compute cluster` is made of individual :term:`nodes` which are interconnected between themselves and which aggregate power can be harnessed to address problems which nature requires either distributed or capacity computing.  

    A :term:`node`, or :term:`host`, is a physical server comprised of multiple CPU cores (which are referred to as :term:`slots` in the :term:`scheduler` lingo) rack-mounted in one of our two data centers and interconnected to other nodes (as well as to the storage) to form the cluster. 

Upon :ref:`connecting/logging into <Connecting/Logging in>` the cluster (unless :ref:`Connecting via Jupyter Notebook/Lab`) users access the cluster via its :term:`login nodes`. Login nodes are special hosts which sole purpose is to provide a gateway to the :term:`compute nodes` and their computational resources. For more information, see :ref:`Role of login nodes`.

Computational resources (such as memory, cores, runtime, CPU-type, GPU-type, etc.) on board of :term:`compute nodes` are managed by a :term:`job scheduler`. Any CPU, GPU or memory intensive computing task should be performed within either :ref:`interactive sessions <Requesting interactive sessions>` or :ref:`batch jobs <Submitting batch jobs>` scheduled on the cluster's :term:`compute nodes`.

This page describes:

* :ref:`resources available on the Hoffman2 Cluster <Computational resources on the Hoffman2 Cluster>`
* :ref:`how to request interactive sessions <Requesting interactive sessions>`
* :ref:`how to submit a non interactive job for batch execution <Submitting batch jobs>`
* :ref:`how to monitor resource utilization <Monitoring resource utilization>`

Computational resources on the Hoffman2 Cluster
===============================================

* :ref:`Node types`
* :ref:`Group-owned nodes`
* :ref:`*Highp* vs *shared* vs *campus* jobs`
* :ref:`Jobs and resources`
* :ref:`Requesting resources (other than cores)`
* :ref:`Requesting multiple cores`

Node types
----------

A summary of the types of :term:`nodes` that you will encounter while using the Hoffman2 Cluster and a description of their intended use is given in the :ref:`node-types` table:


.. _node-types:
.. list-table:: Types of nodes on the Hoffman2 Cluster
   :widths: 10 50
   :class: tight-table

   * - :term:`login nodes`
     - you access a login node upon connecting to the Hoffman2 Cluster via: 
         +  :ref:`terminal and SSH <Connecting via terminal and SSH>` at the fully qualified domain name: :guilabel:`hoffman2.idre.ucla.edu`
         + :ref:`remote desktop <Connecting via remote desktop>` at the :ref:`NoMachine <NoMachine client>` or :ref:`X2Go <X2Go client>` services running on: :guilabel:`nx.hoffman2.idre.ucla.edu` or :guilabel:`x2go.hoffman2.idre.ucla.edu` respectively

       .. note:: Login nodes are meant for light-weight tasks such as editing your code and submitting jobs to the scheduler. Login nodes are a resource shared by many concurrent users and are not intended for CPU or memory intensive tasks.  Please see :ref:`Role of login nodes`. 

       .. important:: All CPU and/or memory intensive (as well as GPU) computations need to run on compute nodes :ref:`accessed via the scheduler <Jobs and resources>`.   
   * - :ref:`data transfer nodes`
     - The Hoffman2 Cluster has two dedicated and performance-tuned data transfer nodes with advanced parallel transfer tools to support your research workflows.
   * - CPU-based :term:`compute nodes`
     - Most of the nodes on the Hoffman2 Cluster are CPU-based compute nodes.  These are where your jobs execute and can be accessed interactively via the :ref:`qrsh command <Requesting interactive sessions>` or by :ref:`batch job execution <Submitting batch jobs>`.
   * - GPU-based :term:`compute nodes`
     - A portion of compute nodes on the Hoffman2 Cluster is equipped with one or more :ref:`GPU-cards` of various types. Please refer to: :ref:`Role of GPU nodes` to see what workload is best suited to run on these nodes. Please refer to :ref:`GPU access` to learn how to request an :ref:`interactive session <Requesting interactive sessions>` or a :ref:`batch job <Submitting batch jobs>` to run on a GPU node.


The Hoffman2 Cluster has a number of :term:`compute nodes` available to the entire UCLA community. Additionally, research groups can :ref:`purchase dedicated compute nodes <Purchasing additional resources>`.  Users in groups who have contributed nodes to the cluster can access their nodes in a preferential fashion and for extended runtimes or access unused cores across the wider cluster (see: :ref:`*Highp* vs *shared* vs *campus* jobs`).  


Group-owned nodes
-----------------

Group-owned nodes, allow users to run :term:`jobs` (interactive or batch) on their computational resources for an extended runtime (up to fourteen days). Moreover, the portion of the :term:`jobs` submitted to owned-resources, that can be concurrently allocated on them, are guaranteed to start within twenty-four hours from their submission (wait time is typically less). Node ownership also allow users in that group to access any currently unused resource owned by a different group for up to a runtime of 24 hours. 

If your group is interested in purchasing nodes, please visit: :ref:`Purchasing additional resources`.


*Highp* vs *shared* vs *campus* jobs
------------------------------------

In the Hoffman2 Cluster jargon :term:`jobs` submitted to owned resources are referred to as :term:`highp jobs` while jobs submitted to other groups' currently unused resources as :term:`shared jobs`. Jobs submitted by users in groups that have not purchased nodes are limited to run on IDRE-owned resources for up to 24 hours; jobs from these users are referred to as :term:`campus jobs` and the users as :term:`campus users`. 


See also: :ref:`Job scheduling policy`. 


Jobs and resources
------------------

.. include:: Requestable-resources.inc


Requesting interactive sessions
===============================

* :ref:`Basic usage`
* :ref:`Customizing the qrsh command`

  - :ref:`qrsh command to run serial jobs`
  - :ref:`qrsh command to run shared memory jobs`
  - :ref:`qrsh command to run distributed memory jobs`
  - :ref:`qrsh command to run hybrid distributed/shared memory jobs`
  - :ref:`qrsh command to run on exclusively reserved nodes`
  - :ref:`qrsh command to run on your group’s nodes`
  - :ref:`qrsh examples`

* :ref:`qrsh startup time`
* :ref:`Resource limitation`
* :ref:`Interpreting error messages`
* :ref:`Running MPI with qrsh`
* :ref:`Additional tools`


.. include:: Interactive-sessions.inc


Submitting batch jobs
=====================

.. include:: Batch-jobs.inc


GPU access
==========

.. include:: GPU-access.inc


Monitoring resource utilization
===============================

.. include:: monitoring-resource-utilization.inc