Link Search Menu Expand Document
Table of contents
  1. Introduction
  2. Access
  3. Organization
  4. Job Submission
  5. Copy Data

Introduction

TEMPO is the supercomputer cluster that belongs to Bernardi Research Group at Caltech. It has 35 computer nodes (up to Oct. 2020).

Queues

We have a few queues on Tempo. The following table shows the names of those queues and their walltime and max number of nodes per request:

Queue Walltime (hours) Max # of nodes
shortJobsQ 12 20
longJobsQ 168 20
debugQ 1 2
singleNodeQ 168 1
veryShortJobsQ 2 NA

Note that to get the node information, we can type qstat -q on TEMPO.

TEMPO has, in principle, 35 working nodes (each node has 10 cores and each core has 2 CPUs). If some of them are not working, please discuss the related issues on our SLACK channel (hpc). If we cannot solve the issues, we usually contact IMSS using Caltech Help; however, please discuss the relevant issues first in our HPC Slack channel.


Access

  • log into help.caltech.edu and open a ticket to request access to tempo.

    https://i.imgur.com/wWYnZ0E.png

  • fill out the relevant details as specified below:

    • outline the reason for request details
    • attach your public SSH key with the ticket (see below to get one)
    • provide one or two preferred username options (you could write these information in “request details”)
    • speficy which SHELL you would like to use (basically we choose bash, and zsh is not available on Tempo)
    • cc the Principal Investigator, Prof. Bernardi

Generate a SSH Key…

An SSH key is a component of the secure shell protocol used to log in to remove servers through a local machine. Please google “how to generate a ssh key” to find out how to get one if you are not familiar with ssh-key.

  • an SSH key pair can be generated using the command ssh-keygen for linux, mac and, windows users.
    ssh-keygen 
    

    or with a name

    ssh-keygen -f tempo
    
  • change into the directory ./.ssh where you will see two files id_rsa (or tempo) and id_rsa.pub (or tempo.pub). Attach id_rsa.pub (or tempo.pub) to tempo access request ticket, but never share your private key with anyone.

When your TEMPO account is approved…

  • We suggest you edit you ssh configure file (~/.ssh/config). If the file does not exit, just create one there. Add some lines below,
    Host tempo
    HostName tempo.caltech.edu
    User <your username>
    IdentityFile ~/.ssh/tempo 
    

    or more complete one according to your perference

    Host tempo
    HostName tempo.caltech.edu
    User <your username>
    IdentityFile ~/.ssh/tempo 
    ForwardX11 yes
    ForwardX11Timeout 0
    ServerAliveInterval 60
    ForwardAgent yes
    

Once the request has been approved, open the terminal, and type ssh <username>. You are all set. Now you can setup your account, and start running calculations.

Trouble shooting

If you meet with the following error when you are trying to login in to Tempo by ssh <username>,

Unable to negotiate with 131.215.148.172 port 22: no matching host key type found. Their offer: ssh-rsa,ssh-dss

that means your openssh version are not compatiable with the one on Tempo because the latter are too old. One solution is following the steps below,

  • generate DSA keys instead of RSA,
    ssh-keygen -t dsa -f tempo_new
    
  • contact IMSS to add this public key (tempo_new.pub) in the ~/.ssh/authorized_keys file of your Tempo account (not your own computer).
  • add two lines to your local ssh configuration file ~/.ssh/config
    Host tempo
    HostName tempo.caltech.edu
    User <your username>
    IdentityFile ~/.ssh/tempo 
    HostKeyAlgorithms +ssh-dss
    PubkeyAcceptedKeyTypes +ssh-dss
    ForwardX11 yes
    ForwardX11Timeout 0
    ServerAliveInterval 60
    ForwardAgent yes
    
  • then you can try to login in to Tempo (ssh <username>), everything should be good. But if you met unluckily with an error like,
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
    Someone could be eavesdropping on you right now (man-in-the-middle attack)!
    It is also possible that a host key has just been changed.
    The fingerprint for the DSA key sent by the remote host is
    SHA256:IO46356sdfgsf234r3dsgsgfsdgtf554.
    Please contact your system administrator.
    Add correct host key in /Users/yourcomputer/.ssh/known_hosts to get rid of this message.
    Offending RSA key in /Users/yourcomputer/.ssh/known_hosts:20
    Host key for tempo.caltech.edu has changed and you have requested strict checking.
    Host key verification failed.i
    

    You should open ~/.ssh/known_hosts, and delete the line containing ‘tempo’.

  • try again (ssh <username>), everything should be fine now.

Organization

Home directory

When you log into TEMPO ( type user@tempo.caltech.edu on your terminal), you are in your home directory, e.g., my username is louis, and the current directory will be /home/louis.

Other users’ directories and the shared directories

Go to the upper directory (from the home directory) and we can see all the other users and other directories, among which the directory timescale contains all the shared data, files, and software, and subdirectories.

If you consider sharing data and files on TEMPO, please put the shared data and files in the directory timescale. Make sure that you modify the permission of your files and data (e.g., by chmod 777 <file>) such that others can edit or copy your files and data in the directory timescale.

In the directory timescale, we also have some shared libraries and software in the directory comsoftware. You can export the corresponding software paths into your .bash_profile in your home direcotry. For example, you can include export PATH="/home/timescale/comsoftware/htop-2.1.0:$PATH" in your bash_profile, and source ~/.bash_profile. Now you can use htop to check the status of the CPUs on the login nodes.

If you don’t want to use the common software or libraries and consider creating your own libraries in your home directory, please see the Compile Codes section.

A few rules

Each member can have at most 500 GB in his/her home directory. To check the memory size of a target directory, we can use the following command du -skh <target-directory>, which lists the memory size of the target directory.

More than 500 GB? Please delete or move some of your data from TEMPO to external hard disks (don’t have one? please ask Marco in order to purchase one). Please see how to copy your data to your laptop or hard disks in the section Copy your data.


Job Submission

Running calculations on the login node is not a good practice, since it is shared by all users, and gets significantly slow if an expensive calculation is performed.

Calculations are performed by submitting jobs via the queue system. TEMPO uses PBS queue system to manage jobs. Each job can be categorized into different queues, depending on the walltime and resources required to run it.

 Please do not run large calculations on the login node!!!

Script example (name of your job/ # of nodes and cores)

A sample job script (here, e.g., job_script.pbs) using PBS is shown below:

#/bin/bash
#PBS -N Job_Name                 # Specify job name. Will be displayed in the queue
#PBS -j oe                       # Flag to omit object and executable files
#PBS -l walltime=30:00:00        # Max. job walltime. Job will be terminated once it reached specified walltime.
#PBS -V
#PBS -l nodes=2:ppn=20           # Number of nodes and processors per node
#PBS -q queue_type               # Type of queue you want to submit your job to

# Specify OMP parallelization
export OMP_NUM_THREADS=20        # Number of Threads per MPI process

# change into the current directory when running jobs
cd $PBS_O_WORKDIR

# Run the job using mpirun command
mpirun -np 40 myexecutable.x -npools 10 -i input_file.in > input_file.out

To increase the efficiency of the communication between and within nodes, please optimize the number of MPI processes and the number of OpenMP threads in your job script.

Simple job commands

  • To submit your job to a queue, use qsub <job_script.pbs>.
  • To view the status, use commands qstat and showq. For example, qstat -u louis, which lists the jobs of the user, louis.
  • To delete a job mid-session, use qdel <jobid>. <jobid> can be found when you use qstat to see your job status.
  • To access each computing node, please type ssh tempo001 to enter the computing node tempo001, for example. To exit from the node, please type exit.

Copy Data

There are several file transfer protocols used in order to transfer files and/or directories between two servers. Two such examples are scp and rsync.

If the data you would like to copy is a small file, for example, file01, please use the command:

>> scp <usrname>@tempo.caltech.edu:<dir-path>/file01 <local-path>

for example,
>> scp louis@tempo.caltech.edu:/home/louis/demo/file01 ./
this will copy the file01 to the current directory on my laptop.

If the data you want to copy is a directory, for exampel, dir01, please use the command:

>> scp -r <usrname>@tempo.caltech.ed:<dir-path>/dir01 <local-path>

for example,
>> scp louis@tempo.caltech.edu:/home/louis/demo/dir01 ./
this will copy the dir01 to the current directory on my laptop.

Equivalently, a modern and sometimes faster command to copy files from and to remote servers is rsync. If you do not have this command on your laptop, please consider installing it using Homebrew or other software package managers (see Other Research Tools). A typical rsync command looks like this:

>> rsync --progress <usrname>@tempo.caltech.edu:<dir-path>/file01 <local-path>

The flag ‘ --progress’ displays progress of the file transfer between two servers.

A shell script to easily backup a lot of data and files:

#!/usr/bin/env bash

Backup_dir=./

# please modify the username; Rmt means remote
RmtUser=ilu
# please modify the cluster that you would like to copy data from  
RmtIP=tempo.caltech.edu

# please modify the directory path that you would like to copy from the remote cluster
RmtPath="/global/cscratch1/sd/ilu/charged-defect-dojo"
# please modify the direcotry path where you are going to backup your data
# here I have a disk called ThousandSunny-Drive
LocPath="/Volumes/ThousandSunny-Drive/NERSC/${Backup_dir}"

# If you would like to synchronize several directories, here is a simple example  
##for dir in `cat dirs-list-wdir.dat`; do
##    /opt/local/bin/rsync -avzP $RmtUser@$RmtIP:${RmtPath}/${dir}  ${LocPath}
##done 

/usr/local/bin/rsync -avzP $RmtUser@$RmtIP:${RmtPath} ${LocPath}