********* Computing ********* .. sidebar:: What is a compute cluster? What is a node? A :term:`compute cluster` is made of individual :term:`nodes` which are interconnected between themselves and which aggregate power can be harnessed to address problems which nature requires either distributed or capacity computing. A :term:`node`, or :term:`host`, is a physical server comprised of multiple CPU cores (which are referred to as :term:`slots` in the :term:`scheduler` lingo) rack-mounted in one of our two data centers and interconnected to other nodes (as well as to the storage) to form the cluster. Upon :ref:`connecting/login into ` the cluster, unless :ref:`Connecting via Jupyter Notebook/Lab`, users access the cluster via its :term:`login nodes`. Login nodes are special hosts which sole purpose is to provide a gateway to the :term:`compute nodes` and their computational resources. For more information, see :ref:`Role of the login nodes`. Computational resources (such as memory, cores, runtime, CPU-type, GPU-type, etc.) on board of :term:`compute nodes` are managed by a :term:`job scheduler`. Any CPU, GPU or memory intensive computing task should be performed within either :ref:`interactive sessions ` or :ref:`batch jobs ` scheduled on the cluster's :term:`compute nodes`. This page describes: * :ref:`resources available on the Hoffman2 Cluster ` * :ref:`how to request interactive sessions ` * :ref:`how to submit a non interactive job for batch execution ` * :ref:`how to monitor resource utilization ` Computational resources on the Hoffman2 Cluster =============================================== * :ref:`Node types` * :ref:`Group-owned nodes` * :ref:`*Highp* vs *shared* vs *campus* jobs` * :ref:`Jobs and resources` * :ref:`Requesting resources (other than cores)` * :ref:`Requesting multiple cores` Node types ---------- A summary of the types of :term:`nodes` that you will encounter while using the Hoffman2 Cluster and a description of their intended use is given in the :ref:`node-types` table: .. _node-types: .. list-table:: Types of nodes on the Hoffman2 Cluster :widths: 10 50 :class: tight-table * - :term:`login nodes` - Upon connecting to the Hoffman2 Cluster via a :ref:`terminal and SSH ` at the fully qualified domain name: ``hoffman2.idre.ucla.edu``, or :ref:`connecting via remote desktop ` at either of the: ``nx.hoffman2.idre.ucla.edu`` or ``x2go.hoffman2.idre.ucla.edu``, you access a login node. Login nodes are meant for light-weight tasks such as working on your code and submitting jobs to the scheduler. Login nodes are a resource shared by many concurrent users and not intended for heavy-weight tasks. Please see :ref:`Role of the login nodes` * - CPU-based :term:`compute nodes` - Most of the nodes on the Hoffman2 Cluster are CPU-based compute nodes. These are where your jobs execute and can be accessed interactively via the :ref:`qrsh command ` or by :ref:`batch job execution `. * - GPU-based :term:`compute nodes` - A portion of compute nodes on the Hoffman2 Cluster is equipped with one or more :ref:`GPU-cards` of various types. Please refer to: :ref:`Role of GPU nodes` to see what workload is best suited to run on these nodes. Please refer to :ref:`GPU access` to learn how to request an :ref:`interactive session ` or a :ref:`batch job ` to run on a GPU node. The Hoffman2 Cluster has a number of :term:`compute nodes` available to the entire UCLA community. Additionally, research groups can :ref:`purchase dedicated compute nodes `. Group-owned nodes ----------------- Group-owned nodes, allow users to run :term:`jobs` (interactive or batch) on their computational resources for an extended runtime (up to fourteen days). Moreover, the portion of the :term:`jobs` submitted to owned-resources, that can be concurrently allocated on them, are guaranteed to start within twenty-four hours from their submission (wait time is typically less). Node ownership also allow users in that group to access any currently unused resource owned by a different group for up to a runtime of 24 hours. If your group is interested in purchasing nodes, please visit: :ref:`Purchasing additional resources`. *Highp* vs *shared* vs *campus* jobs ------------------------------------ In the Hoffman2 Cluster jargon :term:`jobs` submitted to owned resources are referred to as :term:`highp jobs` while jobs submitted to other groups' currently unused resources as :term:`shared jobs`. Jobs submitted by users in groups that have not purchased nodes are limited to run on IDRE-owned resources for up to 24 hours; jobs from these users are referred to as :term:`campus jobs` and the users as :term:`campus users`. See also: :ref:`Job scheduling policy`. Jobs and resources ------------------ .. include:: Requestable-resources.inc Requesting interactive sessions =============================== * :ref:`Basic usage` * :ref:`Customizing the qrsh command` - :ref:`qrsh command to run serial jobs` - :ref:`qrsh command to run shared memory jobs` - :ref:`qrsh command to run distributed memory jobs` - :ref:`qrsh command to run hybrid distributed/shared memory jobs` - :ref:`qrsh command to run on exclusively reserved nodes` - :ref:`qrsh command to run on your group’s nodes` - :ref:`qrsh examples` * :ref:`qrsh startup time` * :ref:`Resource limitation` * :ref:`Interpreting error messages` * :ref:`Running MPI with qrsh` * :ref:`Additional tools` .. include:: Interactive-sessions.inc Submitting batch jobs ===================== .. include:: Batch-jobs.inc GPU access ========== .. include:: GPU-access.inc Monitoring resource utilization =============================== .. include:: monitoring-resource-utilization.inc