Node utilization policy¶
The Hoffman2 Cluster, a High-Performance Computing (HPC) Linux platform, is comprised of a large group of interconnected computers (referred as nodes), among these three main classes are available: access nodes, compute nodes, and utility nodes. Access nodes include login nodes, data transfer, and remote desktop servers. Compute nodes, intended to run batch jobs and interactive sessions, are categorized into two major groups: group-owned nodes, contributed by PIs and available to users in their groups to run highp or shared jobs; and campus nodes, provided by OARC, for general use. Compute nodes shall only be accessed via interactive sessions, requested with appropriate commands, or utilized via batch jobs submission to the job scheduler. Utility nodes provide necessary services to the cluster (e.g., license server) and do not provide login services to users.
Various scheduling policies are implemented to maximize cluster utilization, ensure PI-group access to purchased nodes, and ensure fair-share use of available cluster resources.
All user jobs, whether interactive or batch, shall be run on compute nodes via the cluster’s job scheduler. No resource-intensive jobs or programs shall be run on access nodes (see also: Role of login nodes). The Hoffman2 Cluster team is available to assist users in resolving workflow issues related to resource consumption and node usage (as well as jobs submission, software installations and optimization, etc.).
The Hoffman2 Cluster team reserves the right to preemptively terminate jobs that inappropriately utilize scarce cluster resources (or limit jobs that can affect the scheduler stability). A non-exhaustive list includes:
CPU or memory-intensive processes running on any login or remote desktop nodes
Jobs running on compute nodes that have not been submitted or are no longer controlled by the job scheduler
Requesting, but not utilizing, resources in an attempt to circumvent normal scheduling rules
Running non-GPU jobs on GPU resources
Running array jobs with large number of tasks characterized by runtimes of 1 minutes or less within a short time (see How do I pack multiple short tasks of a job array?).
Users detected intentionally abusing the scheduling system or resources of the cluster may have their job-submission privileges temporarily reduced or suspended. Egregious or repeated violations may result in the user’s account being suspended or revoked.