Running Perturbo on the NERSC Supercomputer (GPU) | Perturbo

PERTURBO module

On the NERSC supercomputer, users can run PERTURBO by loading a module. Currently, we provide the module for NERSC Perlmutter GPU.

Load the module:

# Load the perturbo module folder into the module environment
module use /global/cfs/cdirs/m2626/perturbo/

# For Perlmutter GPU
module load perturbo-3.0.0-perlmutter-gpu

# Additional environment to load
module load cudatoolkit/12.0 PrgEnv-nvidia nvidia/23.1 
module load cray-hdf5-parallel/1.12.2.3 cray-libsci/23.02.1.1

These commands could be added to your $HOME/.bashrc file to simplify the submission scripts.

The modules set up the perturbo.x and qe2pert.x executables. One can check that a module was loaded correctly by verifying the paths to executables (e.g. with the command which). For example, for the Perlmutter GPU module, the perturbo.x path is the following:

which perturbo.x
>>> /global/cfs/cdirs/m2626/perturbo/bin/3.0.0-perlmutter-gpu/perturbo.x

PERTURBO Slurm scripts

Here, we provide the optimal OpenACC, MPI and OpenMP setting for PERTURBO on the NERSC supercomputer as well as the examples of submission scripts.

Note: Replace m1234_g in the submission scripts with the code of your NERSC allocation.

Perlmutter GPU

We recommend to use 4 MPI tasks per node with 1 GPU card and 16 OpenMP threads per MPI task for Perlmutter GPU nodes. Here is a typical submission script (in this example, we use 2 Perlmutter GPU nodes):

#!/bin/bash
#SBATCH --account m1234_g
#SBATCH -N 2
#SBATCH -C gpu,hbm40g
#SBATCH -q regular
#SBATCH -J perturbo
#SBATCH --ntasks-per-node=4
#SBATCH -t 01:00:00
#SBATCH --cpus-per-task=16
#SBATCH --gpus-per-task=1
#SBATCH --gpu-bind=single:1

# Load Perturbo module
module use /global/cfs/cdirs/m2626/perturbo/
module load perturbo-3.0.0-perlmutter-cpu
module unload cudatoolkit/12.2
module load cudatoolkit/12.0 PrgEnv-nvidia nvidia/23.1
module load cray-hdf5-parallel/1.12.2.3 cray-libsci/23.02.1.1
module load conda python/3.11

#OpenMP settings:
export OMP_NUM_THREADS=$cpus_per_task
export OMP_PLACES=threads
export OMP_PROC_BIND=true

mpi_tasks_per_node=4
cpus_per_node=$SLURM_CPUS_ON_NODE
total_nodes=$SLURM_NNODES

total_mpi_tasks=`expr $mpi_tasks_per_node \* $total_nodes`
cpus_per_task=`expr $cpus_per_node / $mpi_tasks_per_node`


# Run perturbo.x (only perturbo.x support GPU while qe2pert.x does not)
srun -N $total_nodes -n $total_mpi_tasks -c $cpus_per_task \
--cpu_bind=cores --gpus-per-task=1 --gpu-bind=single:1 \
perturbo.x -npools $total_mpi_tasks -i pert.in

Installation from scratch

In this section, we will demonstrate how to compile the GPU version of QE/Perturbo on NERSC perlmutter supercomputer. In the future, we will provide a module for users to load for running Perturbo on Perlmutter GPUs.

Building Perturbo on NERSC Perlmutter is somewhat fussy because the NVIDIA compiler toolchain is itself a bit fussy. As of this writing (May 2024), the nvfortran compiler is the only compiler we are aware of that can use both OpenMP and OpenACC at the same time, with OpenMP generating parallel CPU code, and OpenACC generating NVIDIA GPU code. At the same time, Perturbo invariably seems to expose nvfortran compiler bugs and other dependency issues. Thus, moving to a new version of NVHPC is usually a challenge.

These instructions have been verified to work with NVHPC Toolkit 23.1. Specifically, we have identified an nvfortran compiler bug in version 23.9 that prevents us from using that version of the Toolkit. After detailed test, version 23.1 and below are good to use, while the bug still exists in 23.7-24.5 and was fixed in 24.7.

NOTE: Currently, NERSC doesn’t keep up with the absolute latest versions of the NVIDIA HPC Toolkit (NVHPC) on Perlmutter. The reason is that NERSC also tries to provide a CUDA-aware MPI implementation usable across multiple compilers; upgrading the NVHPC Toolkit requires the rebuilding of many other dependencies on the cluster.

NOTE: The NVIDIA compilers are fussy in general, but the fussiness is exacerbated by using OpenBLAS. The NVIDIA HPC Toolkit includes GPU-aware libraries like cublas, and using the NVIDIA-provided libraries seems to produce a much more stable program.

Step 1: Modules to load for compilation

Make sure to switch to the NVIDIA HPC Toolkit version 23.1 as of this writing (December 2024). Cause PrgEnv-nvhpc will be deprecated from CPE/24.11 due to the conflicts with cudatoolkit (see here for details), we change to the toolchain as PrgEnv-nvidia.

module load cudatoolkit/12.0 PrgEnv-nvidia nvidia/23.1 
module load cray-hdf5-parallel/1.12.2.3 cray-libsci/23.02.1.1

The details of what modules are loaded may vary depending on what you are doing, but this is what the output of module list should roughly look like after everything is configured properly:

Currently Loaded Modules:
  1) craype-x86-milan
  2) libfabric/1.20.1
  3) craype-network-ofi
  4) xpmem/2.6.2-2.5_2.40__gd067c3f.shasta
  5) perftools-base/23.12.0 
  6) cpe/23.12
  7) craype-accel-nvidia80
  8) gpu/1.0
  9) sqs/2.0 
 10) darshan/3.4.4
 11) cudatoolkit/12.0    (g)
 12) craype/2.7.30       (c)
 13) cray-dsmml/0.2.2
 14) PrgEnv-nvidia/8.5.0 (prgenv)
 15) nvidia/23.1                      (g,c)
 16) cray-mpich/8.1.28                (mpi)
 17) cray-hdf5-parallel/1.12.2.3
 18) cray-libsci/23.02.1.1
 19) conda/Miniconda3-py311_23.11.0-2
 20) evp-patch
 21) python/3.11

  Where:
   mpi:   MPI Providers
   math:  Mathematical libraries
   io:    Input/output software
   c:     Compiler
   dev:   Development Tools and Programming Languages

Step 2: Download QuantumESPRESSO 7.3.1.

Quantum ESPRESSO (QE) has a generally straightforward build process.

First, clone the QE repository and switch to version 7.3.1.

# Go to the home directory to download Quantum ESPRESSO
cd ~
git clone https://gitlab.com/QEF/q-e.git

# Switch to the version of Quantum ESPRESSO tagged as 7.3.1
cd q-e
git checkout qe-7.3.1

Step 3: Configure and build QuantumESPRESSO 7.3.1.

Next, configure QE to build with the NVIDIA HPC compilers, using MPI, OpenMP and OpenACC. (Examples are given later of how to disable various technologies.) NERSC strongly recommends the use of their compiler-wrapper programs, to ensure that all essential parameters are passed to the compilers.

NOTE: Specifying optimization levels above -O2 and -fast cause nondeterminism bugs in Perturbo! Thus, we strongly recommend sticking with -fast for optimized builds using the NVIDIA compilers.

# Configure Quantum ESPRESSO with compiler switches required by Perturbo.
# Note that this is one single command wrapped across multiple lines.
FC=ftn F90=ftn MPIF90=ftn CC=cc \
  FFLAGS="-fast -Mlarge_arrays -mcmodel=medium" CFLAGS="-fast -mcmodel=medium" \
  LDFLAGS="-L/opt/nvidia/hpc_sdk/Linux_x86_64/23.1/math_libs/lib64" \
  BLAS_LIBS="-lblas" LAPACK_LIBS="-llapack" \
  ./configure --enable-parallel=yes --enable-openmp=yes \
  --with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/23.1/cuda \
  --with-cuda-cc=80 --with-cuda-runtime=12.0

Finally, build the necessary parts of Quantum ESPRESSO 7.3.1.

NOTE: Using -j to enable parallel compilation (e.g. make pw ph pp w90 -j) might be broken sometimes, so this step takes some time for seriel version. Go get a cup of coffee and read a nice paper about condensed matter physics while you wait.

# Build the necessary parts of QE 7.3.1.
make clean
make pw ph pp w90

NOTE: If you see the C preprocessor cpp explicitly invoked during the Quantum ESPRESSO build process then something has gone wrong with the QE configure step. The nvfortran compiler has a built-in preprocessor, and QE knows to use this if it detects the NVHPC compiler.

OPTIONAL: Run QE 7.3.1 Tests

Once the Quantum ESPRESSO build process finishes, you may want to run the QE 7.3.1 test suite. However, you should note that it’s common for at least a few tests to fail, so this may be of questionable value. It can be useful to see if the QE 7.3.1 build fails catastrophically; if many tests fail then the QE7.3.1 build is probably not reliable.

cd test-suite
make run-tests
# [lots of test output!]
cd ..

Step 4: Download and configure and build Perturbo.

Next, clone the Perturbo repository as a subdirectory of the Quantum ESPRESSO codebase.

# Clone Perturbo's codebase inside Quantum ESPRESSO.
# NOTE:  Still in the q-e directory!!
git clone git@github.com:perturbo-code/perturbo.git

Now, there should be a perturbo subdirectory within the q-e directory.

The Perturbo build process must be configured by putting a make.sys file in the perturbo directory; this file includes Perturbo-specific build settings. Multiple example files are included in the perturbo/config directory. The perturbo/config/make_nersc_nvhpc.sys file is provided specifically for NERSC Perlmutter, and can simply be copied to the path perturbo/make.sys.

# NOTE:  Start in the ~/q-e directory!!
cd perturbo
cp config/make_nersc_nvhpc.sys ./make.sys

Next it should be straightforward to build Perturbo.

# The "make clean" is just in case any build artifacts are hanging around.
make clean
make

NOTE: Certain flags must be passed to the NVIDIA compilers to ensure that Perturbo will run correctly. If you see bad behavior from Perturbo, a good first step is to rebuild it from scratch (make clean && make), and ensure that the necessary flags are present. These are as follows:

-mpreprocess to enable the C preprocessor in nvfortran, which is off by default

-Mlarge_arrays -mcmodel=medium to allow for arrays and allocations larger than 2GB in size (see here for more details)

-fast (or -O2) should be the maximum optimization level; -O3 or -O4 will generate spurious results

If the above instructions are carefully followed, these flags will be picked up from the Quantum ESPRESSO configuration step.