Mercury Architecture and Usage Limits

Partitions and Limits

Mercury is made up of compute nodes with a variety of architectures and configurations. A partition is a collection of compute nodes that all have the same, or similar, architecture and configuration. While the standard partition will meet most users’ needs, we also offer specialized partitions for specific purposes. The long partition accommodates jobs that require longer wall clock time. The highmem partition is suitable for jobs requiring more than 32 GB per core. The gpu_h100 partition is to be used only when running jobs utilizing the Nvidia H100 GPU cards. Currently, Mercury is configured with the following partitions:

Partition	Nodes	Cores	Mem-per-CPU	Wall clock
standard	Def: 1 Max: 1	Def: 1 Max: 64	Def: 2GB Max: 32GB	Def: 4h Max: 7d
long	Def: 1 Max: 1	Def: 1 Max: 24	Def: 2GB Max: 32GB	Def: 1d Max: 14d
highmem	Def: 1 Max: 1	Def: 1 Max: 32	Def: 32GB Max: 512GB	Def: 4h Max: 4d
gpu_h100	Def: 1 Max: 1	Def: 1 Max: 28	Def: 2GB Max: 242GB	Def: 4h Max: 2d
interactive	Def: 1 Max: 1	Def: 1 Max: 1	Def: 2GB Max: 64GB	Def: 2h Max: 4h

To see a list of available partitions, use the sinfo command:

$ sinfo

Each user association has a concurrent service unit (SU) limit based on the account that was used to submit the job. See the table column labeled ‘Job Factor’ for an indication of how to estimate the number of SUs for a given job. Notice that jobs submitted to non-standard partitions incur a higher billing factor and thus reduce the number of jobs that are allowed to run concurrently.

Affiliation	Concurrent Limits
Default (`--account=basic`)	2,000 SU
PhD (`--account=phd`)	650,000 SU
Collaborator (`--account=pi-*`)	650,000 SU
Faculty (`--account=faculty`)	650,000 SU

Partition	Job Billing Factor	Partition Limits
standard	max{1000 x Ncpus; 125 x GBmem}	N/A
long	max{1000 x Ncpus; 125 x GBmem}	N/A
highmem	max{1000 x Ncpus; 32 x GBmem}	200,000 SU
gpu_h100	max{62.5 x Ncpus; 5.492 x GBmem; 1000 x Ngpus}	4,000 SU

Note

Concurrent and partition limits are subject to change based on cluster usage.

If at any moment you need to verify which accounts and qos you have access to, you may view your association:

$ sacctmgr show association where user=<BoothID> format=cluster,account%24,user%24,qos

Cluster                  Account                     User                  QOS
------- ------------------------ ------------------------ --------------------
mercury                      phd                <BoothID>               bronze
mercury                    basic                <BoothID>                 clay

Scratch Space

Mercury has 6 TB of shared scratch space in /scratch. This is the recommended place to store temporary job files such as transformed data, temporary logs, parallel job metadata or intermediate output. It is not to be used for file/data storage not related to running jobs. We recommend placing your scratch files within a personal folder organized by job number: /scratch/$SLURM_JOB_USER/$SLURM_JOB_ID/. Here, ${SLURM_JOB_USER} and ${SLURM_JOB_ID} are environment variables which you can query in your job script. Your code should automatically delete the job-specific files and folder upon successful completion.

Note

The scratch directory is only available on the compute nodes, not on the front end nodes.

Warning

You should delete all unused scratch files as soon as they are no longer needed. All scratch files will be automatically deleted 35 days after creation without notice. If scratch space fills up, the oldest files will be deleted first without notice.

Below is an example of creating a temporary directory in /scratch and deleting it once the job finishes.

#!/bin/bash

#SBATCH --job-name=mystatajob  # name of job
#SBATCH --partition=standard   # assign the job to the "standard" partition

# create a new scratch directory for this job
scratch_dir="/scratch/${SLURM_JOB_USER}/${SLURM_JOB_ID}"
mkdir -p $scratch_dir

# use scratch dir to store tmp files
export STATATMP=$scratch_dir

# run stata
dofile="choosevars.do"
/apps/bin/stataMP  -b do $PWD/$dofile

# remove scratch directory when done
rm -r $scratch_dir