Frequently asked questions¶
Frequently asked questions and their answers are organized by the following categories:
Getting help¶
Note
If you do not find answers to your question or issue, please open a support ticket via our online help desk.
For faster resolution, please provide helpful details, e.g. your user name, the relevant files and directories and whether you grant technicians access to any of them, any jobs scripts or job IDs affected and what steps are necessary to reproduce the issue.
Accounts¶
Accounts are created and managed via the Hoffman2 Cluster System Account Management (SIM) at: https://sim.idre.ucla.edu.
How do I create an account?¶
Visit our Requesting an account section to make sure you qualify. You can then request an account via our SIM at: https://sim.idre.ucla.edu.
What is the status of my user account application?¶
Most likely your application is pending sponsor approval. You may want to consider asking your sponsor to approve your application by logging into the SIM new account page at: https://sim.idre.ucla.edu/sim/account/new.
I no longer need my user account. What should I do?¶
If you no longer need your user account, you can send an email to accounts@idre.ucla.edu requesting its deletion.
Questions or comments? Visit our support online help desk at: https://support.idre.ucla.edu.
My SSH client says: Permission denied, please try again¶
In the example below user joebruin
is having a problem connecting via ssh to the Hoffman2 Cluster:
$ ssh hoffman2.idre.ucla.edu -l joebruin
joebruin@hoffman2.idre.ucla.edu's password:
Permission denied, please try again.
A permission denied warning could be due to several reasons:
Verify you are using your cluster username (Hoffman2 Cluster usernames are limited to 8-characters) and password
you can check your username and change your password by logging into the My Account page of SIM at: https://sim.idre.ucla.edu/sim/account/view
To change your password follow the link Change the password for <USERNAME> on the H2 cluster on the My Account page of SIM.
The system may not be accepting logins due to a scheduled maintenance (check your email for maintenance notification or https://www.hoffman2.idre.ucla.edu/).
If you continue to have problems, please submit a support ticket on our online help desk at: https://support.idre.ucla.edu/helpdesk/.
Questions or comments, visit our support online help desk at: https://support.idre.ucla.edu.
I need to change my Hoffman2 sponsor, how do I do that?¶
Please open a support ticket on our online help desk at: https://support.idre.ucla.edu
Getting access to project folders in a different research group¶
In order to get access to another research group’s purchased project storage volume, your cluster account will need to be a member of their Unix group. Please open a support ticket via our online help desk at: https://support.idre.ucla.edu and include your cluster username and the full path to the project folder to request access.
Acknowledging the Hoffman2 Cluster¶
Applications, compilers and libraries¶
How to load certain applications in your path / How to set up your environment¶
In Unix-like system the process that interacts with a user (or a user command), called the shell, maintains a list of variables, called environmental variables, and their values. For example in order to find an executable users should add its path to their $PATH
variable.
Users can permanently add certain values to their shell environmental variables by editing their shell initialization files (such as: .bash_profile
, .profile
, etc.) located in their $HOME
directories.
Alternatively Hoffman2 users can dynamically change their shell environment using the environmental modules utility.
How to use environmental modules interactively¶
Users can load a certain application/compiler in their environment (e.g.: $PATH
, $LD_LIBRARY_PATH
, etc.) by issuing the command:
$ module load application/compiler
where application/compiler is the name of modulefile relative to the application/compiler (for example: matlab, intel, etc.).
To see a list of available modulefiles relative to applications/compilers users should issue the command:
$ module avail
to learn about the application/compiler loaded by a certain module issue:
$ module whatis application/compiler
or:
$ module help application/compiler
to see how a module for a certain application/compiler will modify the user’s environment issue:
$ module show application/compiler
to check which modules are loaded issue:
$ module list
to unload a certain application/compiler previously loaded from one’s environment issue:
$ module unload application/compiler
for a full list of module commands issue:
$ module help
Users are encouraged to write their own modulefiles to load their own applications, you can learn how to do so here.
After loading the Intel compiler module, why is mpicc/mpicxx/mpif90 still not using Intel compiler?¶
Intel MPI compilers have unconventional names. After loading Intel compiler module, use:
mpiicc
for C programsmpiicpc
for C++ programsmpiifort
for Fortran programs
Questions or comments click here.
Connecting, Authentication, SSH public-keys¶
Connecting for the first time¶
As all connections are based on a secure protocol, the first time you connect from a local computer to the Hoffman2 Cluster you will be prompted with a message similar to:
The authenticity of host 'HOSTNAME (HOST IP)' can't be established.
ED25519 key fingerprint is SHA256:lZdo2eNOmwgroOyCOXXFFdQjfQQA1vMpBxgwhGwirwY.
Are you sure you want to continue connecting (yes/no)?
Where HOSTNAME
and HOST IP
are the hostname and IP address of the various classes of public hosts.
Warning
Only proceed to connect if the ED25519 key fingerprint displayed in the prompted message corresponds to one of the ED25519 fingerprints listed in the Hoffman2 Cluster Public hosts hostkey fingerprints section.
If the ED25519 fingerprint displayed by your SSH client does not match one of the ED25519 fingerprints above for the Hoffman2 Cluster public hosts, when attempting to connect you will get a message similar to:
@@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the ED25519 host key has just been changed.
in this case, do not continue authentication; instead, contact us here or by email at: support@idre.ucla.edu.
Problems with this answer? Please send comments here.
Public hosts hostkey fingerprints¶
The Hoffman2 Cluster has the following classes of public, or external facing, hosts that are used to connect or to transfer data to and from the cluster:
Class |
Hostname |
---|---|
Login nodes |
|
Data transfer nodes |
|
NX nodes |
|
x2go nodes |
|
All public facing hosts have the following hostkey fingerprint:
ED25519 MD5-hex:a4:eb:80:cd:84:8d:e3:69:62:a2:4a:3c:7b:f6:6d:f7
ED25519 SHA1-hex:/vL4oZulkOMuLnA1hd0EGZx0GcI
ED25519 SHA256-hex:lZdo2eNOmwgroOyCOXXFFdQjfQQA1vMpBxgwhGwirwY
ED25519 MD5-base64:a4eb80cd848de36962a24a3c7bf66df7
ED25519 SHA1-base64:fef2f8a19ba590e32e2e703585dd04199c7419c2
ED25519 SHA256-base64:959768d9e34e9b082ba0ec823975c515d4237d0400d6f329071830846c22af06
RSA MD5-hex:3c:9c:67:d8:c5:a4:ae:77:07:5f:10:2f:20:4a:75:0f
RSA SHA1-hex:t+AS3JPkPxJvcsD7z63Vekcamt8
RSA SHA256-hex:kah9BJwSzrlFnVp9Tg+El2IdcCN7JgN5+Ur2RyIdvwM
RSA MD5-base64:3c9c67d8c5a4ae77075f102f204a750f
RSA SHA1-base64:b7e012dc93e43f126f72c0fbcfadd57a471a9adf
RSA SHA256-base64:91a87d049c12ceb9459d5a7d4e0f8497621d70237b260379f94af647221dbf03
Even though all of our public, external-facing hosts use the same ED25519 (or RSA) public hostkey, depending on the software package you use to connect to the cluster, that public key can be represented with any one of the different fingerprint hashes given in the table Hoffman2 public hostkey fingerprints.
Warning
If the fingerprint hash doesn’t match one listed above, do not continue authentication and contact us here or by email at: support@idre.ucla.edu.
Problems with this answer? Please send comments here.
Set-up SSH public-key authentication¶
Using SSH public-key authentication to connect to a remote system is a robust, more secure alternative to logging in with an account password. SSH public-key authentication relies on asymmetric cryptographic algorithms that generate a pair of separate keys (a key pair), one “private” and the other “public”. You keep the private key on your local computer you use to connect to the remote system. The public key is “public” and can be stored on each remote system in a .ssh/authorized_keys directory.
Note
You need to be able to transfer your public key to the Hoffman2 Cluster. Therefore, you must be able to login with your password to add the public key to your ~/.ssh/authorized_keys
file in your home directory.
To set-up public-key authentication via SSH on macOS and Linux:
Use the terminal application to generate a key pair using the RSA algorithm.
To generate RSA keys, at the prompt, enter:
$ ssh-keygen -t rsa
You will be prompted to supply a filename (for saving the key pair) and a passphrase (for protecting your private key):
filename Press
enter
to accept the default filename (id_rsa
)passphrase Enter a passphrase to protect your private key
Warning
If you don’t passphrase protect your private key, anyone with access to your computer can SSH (without being prompted for the passphrase) to your account on any remote system that has the corresponding public key.
Your private key will be generated using the default filename (id_rsa
) or the filename you specified and stored on your local computer in your home directory, in a subdirectory named, .ssh
.
The public key will be generated using the same filename (id_rsa.pub
), but will have a .pub
extension added to it. The public key file is stored in the same location (~/.ssh/
).
Now the public key needs to be transferred to the remote system (Hoffman2 Cluster). You can use the program, ssh-copy-id or scp to copy the public key file to the remote system. It’s preferable to use ssh-copy-id because contents of the public key are added directly to your
~/.ssh/authorized_keys
file. If you use scp, then you will need to connect the remote compute and copy the contents ofid_rsa.pub
to the authorized_keys file manually. You will be prompted for your account password to complete the copy to the remote system.
transfer public key via ssh-copy-id
$ ssh-copy-id -i ~/.ssh/id_rsa.pub login_id@hoffman2.idre.ucla.edu
… where login_id
is replaced with your cluster username.
transfer public key via scp
$ scp ~/.ssh/id_rsa.pub login_id@hoffman2.idre.ucla.edu
$ ssh login_id@hoffman2.idre.ucla.edu
$ cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
$ rm ~/id_rsa.pub
You should be able to SSH to your Hoffman2 Cluster user account from your local computer with the private key. Replace
joebruin
with your cluster username.
[joebruin@macintosh ~]$ ssh joebruin@hoffman2.idre.ucla.edu
Enter passphrase for key '/Users/joebruin/.ssh/id_rsa':
Last login: Mon Jul 10 06:01:17 2020 from vpn.ucla.edu
SSH public-key authentication not working?¶
Please verify the file permissions. Typically, you will want:
$HOME/.ssh directory to be 700 (drwx——)
public key ($HOME/.ssh/id_rsa.pub) to be 644 (-rw-r–r–)
private key ($HOME/.ssh/id_rsa) to be 600 (-rw——-)
Problems with this answer? Please send comments here.
Data transfers¶
When I Log in to the Globus web application, I get a “Missing Identity Information” error¶
Missing Identity Information - Unable to complete the authentication process. Your identity provider did not release the attribute(s): {{eppn}}
To resolve this issue with missing attributes not being released to 3rd parties, you will need to contact the UCLA IT Support Center. UCLA Logon ID is not a service of the IDRE Research Technology Group.
Job Errors/ Job Scheduler¶
An IDRE consultant sent me an email about a lot of left over jobs running under my userid. How do I delete them?¶
You can get the process id’s using the ps
command and filter them using the grep
command to select only the jobs you want to delete and feed the result to kill
command.
To list the processes, use the command:
$ ps -u loginid | grep myjob | awk '{print $1}'
To kill the processes, use the command:
$ ps -u loginid | grep myjob | awk '{print $1}' | xargs kill
Replace loginid
with your username and myjob
with the executable name (e.g. bash
or python
).
Problems with this answer? Please send comments here.
In an interactive session (via qrsh), I am getting “not enough memory” error messages and my application is terminated abruptly. Why?¶
When issuing the qrsh
command, one must specify the memory size via -l h_data
, which is also imposed as the virtual memory limit for the qrsh
session. If the application (e.g. matlab) exceeds this limit, it will be automatically terminated by the scheduler. Each application has a different error message, but usually it contains key words like “not enough memory”, “increase your virtual memory”, “cannot allocate memory” or something similar. In this case, you will have to re-run the qrsh
command with an increased h_data
value.
Please also note that requesting an excessive amount of h_data
might cause the qrsh
to wait for a long time, or even fails to start, because there are fewer and fewer compute nodes that can meet your criterion as you increase the h_data
value. If this is your first time to launch the application in a qrsh
session, we recommend gradually increase h_data
until the application runs successfully.
Why is my job still waiting in the queue?¶
The following factors may contribute to longer wait time, or jobs not starting (depending on your account’s access level):
Larger memory request, requested with:
h_data
or the product of(h_data)*(-pe shared #)
is largeLonger run time, requested with:
h_rt
Specific CPU model, requested with:
arch
Many CPUs, requested with:
-pe dc* #
You are already running on some numbers of CPUs or nodes
For high priority jobs, requested with:
-l highp
Your group members are already running on your purchased nodes; there are not enough left for your job to start
Your request exceeds what your group nodes have
Hoffman2 cluster’s load
Problems with this answer? Please send comments here.
I have a lot of jobs in error state E. How do I find out what the problem is?¶
When the myjobs script or qstat -u $USER
shows you have jobs in an error state (“E”, “Eqw”, etc.) you can use the error_reason
script to show you why. It will print the error reason line from qstat -j jobid
output for all of your jobs that are in an error state.
$ error_reason -u loginid
Replace loginid
with your username.
What queues can I run my jobs in?¶
The qquota
command will tell you what resources available to your userid are in use at the moment that the qquota
command was run. The purpose of qquota
is not to provide a complete list of the resources available to your userid. If no resources are in use at the moment, qquota will not return any information.
For example:
resource quota rule limit filter
rulset1/10 slots=123/256 users @campus hosts @idre-amd_01g
where: slots=123/25
means 123 slots or cores are in use by your group out of 256 of your group’s total allocation. Enter man qquota
at the shell prompt for more information.
When will my job run?¶
The qstat
command will list all the jobs which are running (r) or waiting to run (qw), in order by priority (“prior” column). If all jobs requested the same resources, this would also be the order in which they start running. In reality, some jobs will request more nodes or a longer run time which is not presently available, so the job scheduler will “back-fill” and try to start jobs which require fewer resources that will complete without slowing down the start time of a job higher in the list.
If you are in a research group which has purchased nodes for the Hoffman2 Cluster, you can use the highp
complex to request that your job run on your group’s purchased nodes. It is guaranteed that some job submitted by someone in your research group will start within 24 hours. To see where your highp
job is with respect to the waiting jobs that everyone else in your group has submitted, you can use the groupjobs
script. It will display a list of pending jobs, or pending and running jobs, similar to regular qstat
output but only for everyone in your resource group. The job at the top of the list will in most cases start running before those later in the list. For help and a list of options, enter groupjobs -h
.
Problems with this answer? Please send comments here.
Why did my highp
job not start within 24 hours?¶
A highp
job will start in 24 hours provided that your group does not overuse purchased resources.
The common reasons a highp
job did not start in 24 hours are:
You did not specify the
highp
option in your job script.Check your job script, look for a line that starts with
#$ -l
;highp
should be one of a parameter. For example, the line should look like:#$ -l h_data=1G,h_rt=48:00:00,highp
The pending job in question does not have
highp
option. (See below about how to check this.)Members of your group are already running long jobs on the purchased compute nodes. In this case, your
highp
job will be queued until resources become available. (You still need to addhighp
to the job script described above.)Your research group is not a Hoffman2 shared cluster program participant. Consider join the program and enjoy the benefits.
The of
h_data
and number of slots is greater than the per-node memory size of your group nodes.
For example, you have h_data=8G
and -pe shared 7
. This means you are requesting a node with 56 GB (=8G*7) of memory. If each of your group’s nodes has, say, 32GB of memory, your highp
job will not start.
To check whether your pending job has the highp
option, use the following commands and steps:
Find out job ID (of the pending job):
$ qstat -s p -u $USER
Check if
highp
is specified for the job in question:$ qstat -j job_id | grep ^'hard resource_list' |grep highp
If you see no output from the command above, it means that job does not have highp
option. You need to specify highp
. See below about how to use qalter command to fix this.
If you see something like:
$ hard resource_list: h_data=1024M,h_rt=259200,highp=TRUE
This means the job does have highp
option specified.
To alter (without re-submitting it) a already-pending job from non-highp
to highp
, use following steps:
$ qalter -mods l_hard highp true job_id
For more information about qalter
, use the command:
$ man qalter
Problems with this answer? Please send comments here.
How much virtual memory should I request in job submission?¶
It is important to request the correct amount of memory size when submitting a job. If the request is too small, the job may be killed at run time due to memory overuse. If the request is too large (e.g. larger than the compute nodes you intend to run the job), the job may not start.
The followings are a few common techniques that can help you determine the virtual memory size of your program.
If your job has completed, run the command:
$ qacct -j job_ID
Look for the maxvmem
value. This is the virtual memory size that your program consumed as seen by the scheduler. Specify h_data
so that (h_data)*(number of slots) is no less than this value. For example, if maxvmem shows 11 GB, you can request 12 GB of memory on a compute node to run the job, such as one of the followings:
$ -l h_data=12GB # for a single-core run (if your program is sequential)
$ -l h_data=6GB -pe shared 2 # for a 2-core run (if your program is shared-memory parallel)
$ -l h_data=2GB -pe shared 6 # for a 6-core run (if your program is shared-memory parallel)
Note
In these examples, the product of h_data
* (number of slots) is always 12GB. If you specify -l h_data=12GB -pe shared 6
, you are actually requesting 12GB*6=72GB of memory on a node.
Note
If you are running multiple slots on a node, h_rt
* (number of slots) needs to be smaller than the total memory size of your nodes.
If you are not sure about the virtual memory size, run your program in “exclusive” mode first. Once done, use Method 1 above to determine the virtual memory size. To submit a job in exclusive mode, qsub the job with the command:
$ qsub -l exclusive your_job_script
where your_job_script is replaced by the actual file name of your job script. In this case, you should also specify h_data for node selection purposes. If you are running sequential or shared-memory parallel program (i.e. using only one compute node), we recommend using h_data=32GB and without specifying the number of slots. You can also append the exclusive option to the line starting with #$ -l
in your job script, e.g.:
#$ -l h_rt=24:00:00,h_data=32G,exclusive
Again, if your program is sequential or shared-memory parallel, DO NOT specify the number of slots (i.e. there should be no -pe
option in your job script or command line, otherwise you may over-request memory causing the job unable to start).
Problems with this answer? Please send comments here.
How do I pack multiple job-array tasks into one run?¶
Using job array is a way to submit a large number of similar jobs. In some cases each job task takes only a few minutes to compute. Running a large number of extremely short jobs through the scheduler is very inefficient — the system is likely to be more busy finding a node, sending jobs in and out, than doing the actual computing. With a simple change of your job script, you can pack pack multiple job-array tasks into one run (or dispatch), so you can benefit from the convenience of using job arrays and at the same time use the computing resources efficiently.
If you run too many short jobs (e.g. more than 200 less-than-3-minute jobs within an hour), your other pending jobs may be temporarily throttled. Please understand that this is a way to ensure the scheduler’s normal operation, not intended to cause user inconveniences.
At run time, the environment variable $SGE_TASK_ID
uniquely identified a task. The main ideas to pack multiple tasks into one run with minimum change to your job script are to:
change the job task step size.
create a loop inside the job script to execute multiple tasks (equal to the ‘step size’).
Of course, you may need to adjust h_rt to allocate sufficient wall-clock time to run the ‘packed’ version of job script.
Your original job script looks like:
#!/bin/bash
...
#$ -t 1-2000
...
./a.out $SGE_TASK_ID ...
To pack, say, 100 tasks into one run, change your job script to:
$ #!/bin/bash
$ ...
$ #$ -t 1-2000:100
$ ...
$ for i in `seq 0 99`; do
$ my_task_id=$((SGE_TASK_ID + i))
$ ./a.out $my_task_id ...
$ done
Your original job script looks like
$ #!/bin/csh
$ ...
$ #$ -t 1-2000
$ ...
$ ./a.out $SGE_TASK_ID ...
To pack, say, 100 tasks into one run, change your job script to:
.. code-block:: console
$ #!/bin/csh
$ ...
$ #$ -t 1-2000:100
$ ...
$ foreach i (`seq 0 99`)
$ @ my_task_id = $SGE_TASK_ID + $i
$ ./a.out $my_task_id ...
$ end
Problems with this answer? Please send comments here.
How do I request large memory to run sequential (1-core) program?¶
If you are requesting less than 512 GB, use the h_data to specify the requested memory size, e.g.:
$ qsub -l h_data=512G ...
You can also put -l h_data=512G
in your job script file.
In this case, you are requesting a single core (slot), so you should not specify any -pe
option.
If you are requesting more than 512GB, please contact us.
Problems with this answer? Please send comments here.
How do I request large memory to run multi-threaded (single node) program?¶
You will use -pe (number of cores) and -l h_data (memory per core) together to specify the total amount of memory you want. Note that the product of (number of cores)*(h_data) must be smaller than the total memory of a compute node, otherwise your job will never start.
For example, request 8 cores with 512G total memory (shared by all 8 cores):
$ qsub -l h_data=64G -pe shared 8 # any other needed resource
If your multi-threaded program will automatically use all CPUs available on the node, add the -l exclusive option, e.g.:
$ qsub -l h_data=64G,exclusive -pe shared 8 # any other needed resource
You can also put -pe shared 8 -l h_data=64G
in your job script file.
If you are requesting more than 512 GB total memory, please contact us.
Why cannot I submit too many individual jobs?¶
When there are too many pending jobs, the scheduler may fail to process all them, causing scheduling problems. Therefore, to maintain stability, the system has a limit on how many jobs a user can submit. This limit is usually in the hundreds, and may vary depending on the system’s load.
Most users who submit a huge number of individual jobs should consider using job arrays for one obvious benefit: one job-array job can hold thousands of “tasks” (or individual “runs”) and consume only one (1) job out of the user’s number of jobs limit. A user can then submit hundreds of job arrays (each containing thousands of “runs”). This usually can cover some of the largest “through-put” runs on the cluster.
If each individual task is very short (e.g. finish in a few minutes), users should pack several tasks into one run to increase throughput efficiency. See this FAQ for more details. Running a large number of short jobs is a severe waste of the cluster’s computing power.
For more information about job array, see this page.
Problems with this answer? Please send comments here.
When submitting a job, I get “Unable to run job: got no response from JSV script…”.¶
This could happen when the scheduler (software) is too busy handling jobs. One way to overcome this problem is to add the following line at the bottom of your initialization files to increase the default timeout limit:
$ export SGE_JSV_TIMEOUT=60
Then run:
$ source ~/.bashrc
(or just log out and log in), and try to submit your job again.
$ setenv SGE_JSV_TIMEOUT 60
Then run:
$ source ~/.cshrc
(or just log out and log in), and try to submit your job again.
Problems with this answer? Please send comments here.
Storage and File systems¶
What file systems are backed up?¶
The home and project file systems are backed up, with a target backup window that runs once per 24 hours to disk-based storage. See Backups
Protecting data from accidental loss¶
Here are several ways to protect your files from accidental loss.
Backup your files to another place, e.g. the hard drive on another computer. See File transfer. Make backup copies of files and directories in a compressed tar file. For example, to create a compressed tar file (.tgz) of all files under directory “myproject”:
$ tar -czf myproject.tgz myproject/
Enter man tar at the shell prompt for more information.
Modify your personal Linux environment to change the cp (copy), mv (move), and rm (remove) commands so that you are prompted for confirmation before any existing file is deleted or overwritten. bash shell: Add the following commands to your $HOME/.bashrc file:
$ alias cp='cp -i'
$ alias mv='mv -i'
$ alias rm='rm -i'
tcsh shell: Add the following commands to your $HOME/.cshrc file:
$ alias cp 'cp -i'
$ alias mv 'mv -i'
$ alias rm 'rm -i'
Modify your personal Linux environment to prevent any existing file from being overwritten by the output redirection (>) symbol. bash shell: Add the following command to your $HOME/.bashrc file:
$ set -o noclobber
tcsh shell: Add the following command to your $HOME/.cshrc file:
$ set noclobber
Use the chmod command to remove your own write access to files you intend to not change or delete. Example:
$ chmod -w myfile
You will be unable to accidentally modify such a file in the future. If you try to delete a file for which you have removed your own write access without specifying the -f (force) flag on the rm command, you will be prompted and have to reply affirmatively before the file will be removed. Enter man chmod at the shell prompt for more information.
Data Sharing on Hoffman2¶
There are a few things to keep in mind. Users on the cluster are organized into groups. Every user belongs to a primary group and may be in several secondary groups. You can see the list of groups you belong to with the groups command, e.g.
joebruin@login2:~$ groups joebruin
joebruin : web gpu
In the previous example, you see my primary group is web (first in the list), and I only belong to one secondary group, gpu.
A user can belong to many groups, but a file or directory can be owned by only one owner and one group. Group membership can give you access to files and directories belonging to that group.
So, in order to share data, you must have a common group to which both users belong. For example, if I want to share a folder with Hoffman2 user, sambruin, I would check for a common group that we both belong to with the groups command:
joebruin@login2:~$ groups $USER sambruin
joebruin : web gpu
sambruin : acct gpu
In this example, gpu is a group we’re both members of, so I could share data as long as it’s owned by the group, “gpu” and the permissions on the directory and files give the group members access. Group ownership does not imply group access; you must set the file access permissions so your group can use the files. For that you would use the chown and chmod commands, or newgrp to change your working group in your current shell.
My program writes lot of scratch files in my home directory. This results in exceeding my disk space quota. What is the solution?¶
There are several things you can do:
If you are a member of a research group which has contributed nodes to the Hoffman2 Cluster, your PI can purchase additional disk space for use by the members of your group. Each process in your parallel program can write to the local /work on the node it is running on. When the program finishes, you can copy the files off to a place where you have more space. Since /work is local to the nodes, using it is very efficient. You can write to /u/scratch and you have 7 days after the job completes to copy the files somewhere else.
How do I transfer my files from the Hoffman2 Cluster to my machine?¶
For any size file, you can use the scp command to transfer a file or directory from one machine or system to another. For safety reasons, as outlined in the Security Policy for IDRE-Hosted Clusters, always scp from your machine to the IDRE-Hosted cluster. NEVER scp from the IDRE-Hosted cluster back to your local machine.
Is there a simpler way to copy all my files to my new Hoffman2 account?¶
Once you have been notified that your login ID has been added to the Hoffman2 Cluster, login to your local machine and from your local machine’s home directory enter the command:
tar -clpzf - * | ssh loginid@hoffman2.idre.ucla.edu tar -xpzf -
Replace loginid with your Hoffman2 Cluster loginid.
Note that this transfer will not copy any of the hidden (dot) files from your local home directory to your new home directory on the Hoffman2 Cluster. Since many of the dot files in your home directory are operating system version specific, it would not be appropriate or useful to transfer these files.
What is my disk storage quota and usage?¶
From the Hoffman2 Cluster login nodes, at the shell prompt, enter:
myquota
The myquota command will report the usage and quota for filesystems where your userid has saved files, including /u/scratch as well as your home directory. Use the myquota command instead of the quota command.
Problems with the answers in this section? Please send comments here.
Other¶
How do I print my output?¶
There is no printer directly associated with the Hoffman2 Cluster. If you have a printer attached to your local desktop machine, you can copy your file to your local machine and print your file locally. Recall that for security reasons you should issue the scp command from your local machine, and not from the Hoffman2 command line.
Here is a little script that you could save on a unix/linux machine that might make printing a text file easier. You might name this script h2print.
scp login_id@hoffman2.idre.ucla.edu:$* .
lpr $*
where login_id
is your Hoffman2 Cluster user name (i.e., login ID). You can omit login_id@
if your user_id
on your local machine is the same as your Hoffman2 Cluster login ID. Note the period (.) at the end of the scp command line. Mark the script as executable with the chmod command:
$ chmod +x h2print
To print a Hoffman2 text file in your home directory, from your local machine’s command prompt, enter:
$ h2print hoffman2_filename
where hoffman2_filename
is the name of your text file on the Hoffman2 Cluster that you want to print.
The scp command will prompt you for your Hoffman2 Cluster password, unless you have previously setup an rsa key pair on your local machine with the ssh-keygen -t rsa
command, and appended a copy of the public key (id_rsa.pub
) to ~/.ssh/authorized_keys
on your Hoffman2 Cluster account.