Python

serial python

The following python script (hello_world.py) is a simple script.

hello_world.py
1print ("Hello, World!")
submit.sh
1#!/bin/bash
2#SBATCH --job-name=serial_python
3#SBATCH --output=serial_python.out
4#SBATCH --partition=standard
5
6# execute Python script
7srun python3 hello_world.py

Data Parallelism in Python

The multiprocessing module in Python provides users the ability to distribute multiple inputs to a single function over multiple processors (data parallelism). This is useful when evaluating a single function multiple times so that the computations are shared by multiple processes.

Python script

The following example is a demonstration of a simple function being distributed to multiple processor cores.

 1import os
 2import time
 3import multiprocessing as mproc
 4
 5def myfunc(x):
 6
 7    # 1 second delay
 8    time.sleep(1)
 9    xsq = x**2
10
11    # get information on the current process and print to stdout
12    proc = mproc.current_process()
13    print ('{0:2}^2 = {1:4} calculated on {2} (pid {3})'.format(x, xsq, proc.name, proc.pid), flush=True)
14
15    return xsq
16
17
18#------------------------------------------------------------------------------
19# USE A TIMER FOR TESTING SERIAL EXECUTION
20
21# start timer
22start = time.time()
23
24# serial processes
25result_serial = [myfunc(i) for i in range(8)]
26
27# stop timer and calculate elapsed time
28stop = time.time()
29time_elapsed = stop - start
30print ('\n Serial job completed in {:.2f} seconds\n'.format(time_elapsed))
31
32#------------------------------------------------------------------------------
33# USE A TIMER FOR TESTING PARALLEL EXECUTION
34
35# get the number of processors directly from the submit.sh script
36pool_size = int(os.environ['SLURM_JOB_CPUS_PER_NODE'])
37
38# start timer
39start = time.time()
40
41# data parallel execution -- "with" statement closes pool when done
42with mproc.Pool(processes=pool_size) as pool:
43    result_parallel = pool.map(myfunc, range(8))
44
45# stop timer and calculate elapsed time
46stop = time.time()
47time_elapsed = stop - start
48print ('\n Parallel job ({} CPUs) completed in {:.2f} seconds\n'.format(pool_size, time_elapsed))
49
50#------------------------------------------------------------------------------
51# VERIFY THAT SERIAL AND PARALLEL METHODS PRODUCE SAME RESULTS
52
53# check that serial and parallel results are equivalent
54if (result_serial == result_parallel):
55    print ('\n Results from serial and parallel calculation are equal!\n')

Submission script

To submit the job for parallel execution on Mercury, use a submission script similar to the following:

 1#!/bin/bash
 2
 3#SBATCH --cpus-per-task=4      # launch 4 CPUs (cores)  per task
 4#SBATCH --job-name=smp-py      # name of job
 5#SBATCH --partition=standard   # assign the job to a specific queue
 6#SBATCH --output=smp-py.log    # join the output and error files
 7
 8# load desired python version (check availability with 'module avail')
 9module load python/booth/3.6/3.6.12
10
11# execute python script
12srun python3 mp.py

In submit.sh, line #3 requests four CPU cores to be made available for the task. This job has only 1 task. You can scale up to the Max Cores value listed in the Partitions and Limits table, however we generally recommend 12 as a high value. Parallel efficiency drops as core count increases, thus high core counts should be justified by profiling your parallelism efficiency. Additionally, high core count jobs will generally be queued longer while awaiting resources to become available. The name of the job on line #4 is user-specified and should be a descriptive name subject to the following rule:

The name may be any arbitrary alphanumeric ASCII string, but
may not contain  "\n", "\t", "\r", "/", ":", "@", "\", "*",
or "?".

Line #5 assigns the job to a specific queue (for more info, see Queues and Limits). Line #10 loads the requested software in the user’s environment. More information on modules can be found in Software Modules. Job Submission

Finally, to submit the job from a login node on Mercury, simply type sbatch submit.sh at the prompt. Note that both files (mp.py and submit.sh) should be in the same directory. Sample Output Running this example produces the following output:

0^2 =    0 calculated on MainProcess (pid 30017)
1^2 =    1 calculated on MainProcess (pid 30017)
2^2 =    4 calculated on MainProcess (pid 30017)
3^2 =    9 calculated on MainProcess (pid 30017)
4^2 =   16 calculated on MainProcess (pid 30017)
5^2 =   25 calculated on MainProcess (pid 30017)
6^2 =   36 calculated on MainProcess (pid 30017)
7^2 =   49 calculated on MainProcess (pid 30017)

Serial job completed in 8.02 seconds


0^2 =    0 calculated on ForkPoolWorker-1 (pid 30019)
3^2 =    9 calculated on ForkPoolWorker-4 (pid 30022)
2^2 =    4 calculated on ForkPoolWorker-3 (pid 30021)
1^2 =    1 calculated on ForkPoolWorker-2 (pid 30020)
4^2 =   16 calculated on ForkPoolWorker-1 (pid 30019)
5^2 =   25 calculated on ForkPoolWorker-4 (pid 30022)
6^2 =   36 calculated on ForkPoolWorker-3 (pid 30021)
7^2 =   49 calculated on ForkPoolWorker-2 (pid 30020)

Parallel job (4 CPUs) completed in 2.10 seconds


Results from serial and parallel calculation are equal!

gpu python

how to

Virtual Environment

Virtual environments are a convenient way to manage Python environments. You can reuse virtual environments and modify them by installing or removing components at any time.

# start an interactive session
[mfe01] $ srun --pty bash --login

# load the python module of your choice
[mcn01] $ module load python/booth/3.6/3.6.12

# create a new virtual environment (here named "myenv")
[mcn01] $ virtualenv --system-site-packages -p python3 ~/venv/myenv

# activate the virtual environment
[mcn01] $ source ~/venv/myenv/bin/activate

# install packages from pip
[mcn01] (myenv) $ pip install <python-package>

# when finished, leave the venv
[mcn01] (myenv) $ deactivate

# log out of the compute node
[mcn01] $ exit

Python packages installed in a virtual environment are available only inside the virtual environment. A submit script to activate the virtual environment called “myenv” may look like the following:

#!/bin/bash

#SBATCH --account=<account-name>

# load the python module
module load python/booth/3.6/3.6.12

# activate the venv
source ~/venv/myenv/bin/activate

# load package that was installed in the venv
srun python3 -c "import <python-package>"