Table of contents
Introduction
TEMPO is the supercomputer cluster that belongs to Bernardi Research Group at Caltech. It has 35 computer nodes (up to Oct. 2020).
Queues
We have a few queues on Tempo. The following table shows the names of those queues and their walltime and max number of nodes per request:
Queue | Walltime (hours) | Max # of nodes |
---|---|---|
shortJobsQ | 12 | 20 |
longJobsQ | 168 | 20 |
debugQ | 1 | 2 |
singleNodeQ | 168 | 1 |
veryShortJobsQ | 2 | NA |
Note that to get the node information, we can type qstat -q
on TEMPO.
TEMPO has, in principle, 35 working nodes (each node has 10 cores and each core has 2 CPUs). If some of them are not working, please discuss the related issues on our SLACK channel (hpc
). If we cannot solve the issues, we usually contact IMSS using Caltech Help; however, please discuss the relevant issues first in our HPC Slack channel.
Access
-
log into help.caltech.edu and open a ticket to request access to tempo.
-
fill out the relevant details as specified below:
- outline the reason for request details
- attach your public SSH key with the ticket (see below to get one)
- provide one or two preferred username options (you could write these information in “request details”)
- speficy which SHELL you would like to use (basically we choose
bash
, andzsh
is not available on Tempo) - cc the Principal Investigator, Prof. Bernardi
Generate a SSH Key…
An SSH key is a component of the secure shell protocol used to log in to remove servers through a local machine. Please google “how to generate a ssh key” to find out how to get one if you are not familiar with ssh-key.
- an SSH key pair can be generated using the command
ssh-keygen
for linux, mac and, windows users.ssh-keygen
or with a name
ssh-keygen -f tempo
- change into the directory
./.ssh
where you will see two filesid_rsa
(ortempo
) andid_rsa.pub
(ortempo.pub
). Attachid_rsa.pub
(ortempo.pub
) to tempo access request ticket, but never share your private key with anyone.
When your TEMPO account is approved…
- We suggest you edit you ssh configure file (
~/.ssh/config
). If the file does not exit, just create one there. Add some lines below,Host tempo HostName tempo.caltech.edu User <your username> IdentityFile ~/.ssh/tempo
or more complete one according to your perference
Host tempo HostName tempo.caltech.edu User <your username> IdentityFile ~/.ssh/tempo ForwardX11 yes ForwardX11Timeout 0 ServerAliveInterval 60 ForwardAgent yes
Once the request has been approved, open the terminal, and type ssh <username>
. You are all set. Now you can setup your account, and start running calculations.
Trouble shooting
If you meet with the following error when you are trying to login in to Tempo by ssh <username>
,
Unable to negotiate with 131.215.148.172 port 22: no matching host key type found. Their offer: ssh-rsa,ssh-dss
that means your openssh version are not compatiable with the one on Tempo because the latter are too old. One solution is following the steps below,
- generate
DSA
keys instead ofRSA
,ssh-keygen -t dsa -f tempo_new
- contact IMSS to add this public key (
tempo_new.pub
) in the~/.ssh/authorized_keys
file of your Tempo account (not your own computer). - add two lines to your local ssh configuration file
~/.ssh/config
Host tempo HostName tempo.caltech.edu User <your username> IdentityFile ~/.ssh/tempo HostKeyAlgorithms +ssh-dss PubkeyAcceptedKeyTypes +ssh-dss ForwardX11 yes ForwardX11Timeout 0 ServerAliveInterval 60 ForwardAgent yes
- then you can try to login in to Tempo (
ssh <username>
), everything should be good. But if you met unluckily with an error like,@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the DSA key sent by the remote host is SHA256:IO46356sdfgsf234r3dsgsgfsdgtf554. Please contact your system administrator. Add correct host key in /Users/yourcomputer/.ssh/known_hosts to get rid of this message. Offending RSA key in /Users/yourcomputer/.ssh/known_hosts:20 Host key for tempo.caltech.edu has changed and you have requested strict checking. Host key verification failed.i
You should open
~/.ssh/known_hosts
, and delete the line containing ‘tempo’. - try again (
ssh <username>
), everything should be fine now.
Organization
Home directory
When you log into TEMPO ( type user@tempo.caltech.edu
on your terminal), you are in your home directory, e.g., my username is louis
, and the current directory will be /home/louis
.
Other users’ directories and the shared directories
Go to the upper directory (from the home directory) and we can see all the other users and other directories, among which the directory timescale
contains all the shared data, files, and software, and subdirectories.
If you consider sharing data and files on TEMPO, please put the shared data and files in the directory timescale
. Make sure that you modify the permission of your files and data (e.g., by chmod 777 <file>
) such that others can edit or copy your files and data in the directory timescale.
In the directory timescale
, we also have some shared libraries and software in the directory comsoftware
. You can export the corresponding software paths into your .bash_profile
in your home direcotry. For example, you can include export PATH="/home/timescale/comsoftware/htop-2.1.0:$PATH"
in your bash_profile, and source ~/.bash_profile
. Now you can use htop
to check the status of the CPUs on the login nodes.
If you don’t want to use the common software or libraries and consider creating your own libraries in your home directory, please see the Compile Codes
section.
A few rules
Each member can have at most 500 GB in his/her home directory. To check the memory size of a target directory, we can use the following command du -skh <target-directory>
, which lists the memory size of the target directory.
More than 500 GB? Please delete or move some of your data from TEMPO to external hard disks (don’t have one? please ask Marco in order to purchase one). Please see how to copy your data to your laptop or hard disks in the section Copy your data.
Job Submission
Running calculations on the login node is not a good practice, since it is shared by all users, and gets significantly slow if an expensive calculation is performed.
Calculations are performed by submitting jobs via the queue system. TEMPO uses PBS queue system to manage jobs. Each job can be categorized into different queues, depending on the walltime and resources required to run it.
Please do not run large calculations on the login node!!!
Script example (name of your job/ # of nodes and cores)
A sample job script (here, e.g., job_script.pbs) using PBS is shown below:
#/bin/bash
#PBS -N Job_Name # Specify job name. Will be displayed in the queue
#PBS -j oe # Flag to omit object and executable files
#PBS -l walltime=30:00:00 # Max. job walltime. Job will be terminated once it reached specified walltime.
#PBS -V
#PBS -l nodes=2:ppn=20 # Number of nodes and processors per node
#PBS -q queue_type # Type of queue you want to submit your job to
# Specify OMP parallelization
export OMP_NUM_THREADS=20 # Number of Threads per MPI process
# change into the current directory when running jobs
cd $PBS_O_WORKDIR
# Run the job using mpirun command
mpirun -np 40 myexecutable.x -npools 10 -i input_file.in > input_file.out
To increase the efficiency of the communication between and within nodes, please optimize the number of MPI processes and the number of OpenMP threads in your job script.
Simple job commands
- To submit your job to a queue, use
qsub <job_script.pbs>
. - To view the status, use commands
qstat
andshowq
. For example,qstat -u louis
, which lists the jobs of the user, louis. - To delete a job mid-session, use
qdel <jobid>
.<jobid>
can be found when you useqstat
to see your job status. - To access each computing node, please type
ssh tempo001
to enter the computing node tempo001, for example. To exit from the node, please typeexit
.
Copy Data
There are several file transfer protocols used in order to transfer files and/or directories between two servers. Two such examples are scp
and rsync
.
If the data you would like to copy is a small file, for example, file01, please use the command:
>> scp <usrname>@tempo.caltech.edu:<dir-path>/file01 <local-path>
for example,
>> scp louis@tempo.caltech.edu:/home/louis/demo/file01 ./
this will copy the file01 to the current directory on my laptop.
If the data you want to copy is a directory, for exampel, dir01, please use the command:
>> scp -r <usrname>@tempo.caltech.ed:<dir-path>/dir01 <local-path>
for example,
>> scp louis@tempo.caltech.edu:/home/louis/demo/dir01 ./
this will copy the dir01 to the current directory on my laptop.
Equivalently, a modern and sometimes faster command to copy files from and to remote servers is rsync
. If you do not have this command on your laptop, please consider installing it using Homebrew or other software package managers (see Other Research Tools). A typical rsync command looks like this:
>> rsync --progress <usrname>@tempo.caltech.edu:<dir-path>/file01 <local-path>
The flag ‘ --progress’ displays progress of the file transfer between two servers.
A shell script to easily backup a lot of data and files:
#!/usr/bin/env bash
Backup_dir=./
# please modify the username; Rmt means remote
RmtUser=ilu
# please modify the cluster that you would like to copy data from
RmtIP=tempo.caltech.edu
# please modify the directory path that you would like to copy from the remote cluster
RmtPath="/global/cscratch1/sd/ilu/charged-defect-dojo"
# please modify the direcotry path where you are going to backup your data
# here I have a disk called ThousandSunny-Drive
LocPath="/Volumes/ThousandSunny-Drive/NERSC/${Backup_dir}"
# If you would like to synchronize several directories, here is a simple example
##for dir in `cat dirs-list-wdir.dat`; do
## /opt/local/bin/rsync -avzP $RmtUser@$RmtIP:${RmtPath}/${dir} ${LocPath}
##done
/usr/local/bin/rsync -avzP $RmtUser@$RmtIP:${RmtPath} ${LocPath}