Mercury Architecture and Usage Limits

Partitions and Limits

Mercury is made up of compute nodes with a variety of architectures and configurations. A partition is a collection of compute nodes that all have the same, or similar, architecture and configuration. While the standard partition will meet most users’ needs, we also offer specialized partitions for specific purposes. The long partition accommodates jobs that require longer wall clock time. The highmem partition is suitable for jobs requiring more than 32 GB per core. The gpu partition is to be used only when running jobs utilizing the Nvidia K80 GPU cards. Currently, Mercury is configured with the following partitions:

Partition Nodes Cores Mem-per-CPU Wall clock
standard
Def: 1
Max: 1
Def: 1
Max: 28
Def: 2GB
Max: 32GB
Def: 4h
Max: 7d
long
Def: 1
Max: 1
Def: 1
Max: 24
Def: 2GB
Max: 32GB
Def: 1d
Max: 30d
highmem
Def: 1
Max: 1
Def: 1
Max: 32
Def: 32GB
Max: 512GB
Def: 4h
Max: 2d
gpu
Def: 1
Max: 1
Def: 1
Max: 28
Def: 2GB
Max: 242GB
Def: 4h
Max: 2d
interactive
Def: 1
Max: 1
Def: 1
Max: 2
Def: 2GB
Max: 32GB
Def: 2h
Max: 4h

To see a list of available partitions, use the sinfo command:

$ sinfo

Each user association has a concurrent service unit (SU) limit based on the account that was used to submit the job. See the table column labeled ‘Job Factor’ for an indication of how to estimate the number of SUs for a given job. Notice that jobs submitted to non-standard partitions incur a higher billing factor and thus reduce the number of jobs that are allowed to run concurrently.

Affiliation Concurrent Limits
Default (--account=basic) 2,000 SU
PhD (--account=phd) 224,000 SU
Collaborator (--account=pi-*) 224,000 SU
Faculty (--account=faculty) 400,000 SU
Partition Job Billing Factor Partition Limits
standard \(\max\left\{1000\times\text{Ncpus};125\times\text{GBmem}\right\}\) N/A
long \(\max\left\{1000\times\text{Ncpus};125\times\text{GBmem}\right\}\) N/A
highmem \(\max\left\{1000\times\text{Ncpus}; 32\times\text{GBmem}\right\}\) 32,000 SU
gpu \(\max\left\{1000\times\text{Ncpus};125\times\text{GBmem};7000\times\text{Ngpus}\right\}\) 32,000 SU

Note

Concurrent and partition limits are subject to change based on cluster usage.

If at any moment you need to verify which accounts and qos you have access to, you may view your association:

$ sacctmgr show association where user=<BoothID> format=cluster,account%24,user%24,qos

Cluster                  Account                     User                  QOS
------- ------------------------ ------------------------ --------------------
mercury                      phd                <BoothID>               bronze
mercury                    basic                <BoothID>                 clay

Scratch Space

Mercury has 6 TB of shared scratch space in /scratch. This is the recommended place to store temporary job files such as transformed data, temporary logs, parallel job metadata or intermediate output. It is not to be used for file/data storage not related to running jobs. We recommend placing your scratch files within a personal folder organized by job number: /scratch/$SLURM_JOB_USER/$SLURM_JOB_ID/. Here, ${SLURM_JOB_USER} and ${SLURM_JOB_ID} are environment variables which you can query in your job script. Your code should automatically delete the job-specific files and folder upon successful completion.

Warning

You should delete all unused scratch files as soon as they are no longer needed. All scratch files will be automatically deleted 35 days after creation without notice. If scratch space fills up, the oldest files will be deleted first without notice.

Below is an example of creating a temporary directory in /scratch and deleting it once the job finishes.

#!/bin/bash

#SBATCH --job-name=mystatajob  # name of job
#SBATCH --partition=standard   # assign the job to the "standard" partition

# create a new scratch directory for this job
scratch_dir="/scratch/${SLURM_JOB_USER}/${SLURM_JOB_ID}"
mkdir -p $scratch_dir

# use scratch dir to store tmp files
export STATATMP=$scratch_dir

# run stata
dofile="choosevars.do"
/apps/bin/stataMP  -b do $PWD/$dofile

# remove scratch directory when done
rm -r $scratch_dir