Tensorflow with GPU

Python

Singularity Container (recommended)

User may utilize Singularity containers for Tensorflow with GPU support. We encourage users to try these before installing their own versions of tensorflow-gpu.

# request a node with one GPU in interactive mode
mfe01 ~ $ srun --partition=gpu_h100 --gres=gpu:1 --pty bash --login

# load CUDA environment
mgpu01 ~ $ module load cuda/12.2

# set the container name
mgpu01 ~ $ container=/apps/containers/tensorflow-gpu/tensorflow-2.4.3-gpu-jupyter.sif

# launch the container environment interactively
mgpu01 ~ $ singularity run --nv ${container} bash

# alternatively, run a python script directly and exit container
mgpu01 ~ $ singularity run --nv ${container} python myscript.py

Singularity containers can also be used in batch jobs by creating a submit script similar to the following.

submit.sh

#!/bin/bash
#SBATCH --partition=gpu_h100
#SBATCH --gres=gpu:1

# load CUDA environment
module load cuda/12.2

# set the container name
container=/apps/containers/tensorflow-gpu/tensorflow-2.4.3-gpu-jupyter.sif

# run Python script inside the container
srun singularity run --nv ${container} python3 myscript.py

Virtual Environment

To use Tensorflow with GPU support, you must first create a virtual environment. You can reuse virtual environments, so you will typically only have to do this once.

# start an interactive session
srun --pty bash --login

# load the python module
module load python/booth/3.12

# create a new virtual environment
virtualenv --system-site-packages -p python3 ~/venv/tensorflow-gpu

# activate the virtual environment
source ~/venv/tensorflow-gpu/bin/activate

# upgrade pip (inside the venv)
pip install --upgrade pip

# ensure tensorflow is up to date (inside the venv)
pip install --upgrade tensorflow

# install tensorflow-gpu (inside the venv)
pip install tensorflow-gpu

# when finished, leave the venv
deactivate

# log out of the compute node
exit

To test that Tensorflow works and is utilizing the GPU, test the installation inside a GPU node. For this purpose, create a simple example file and execute it inside a GPU node.

example.py

import tensorflow as tf

print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# use this setting to find out which devices your operations and tensors are assigned to
tf.debugging.set_log_device_placement(True)

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

print(c)

# request a node with one GPU in interactive mode
srun --partition=gpu_h100 --gres=gpu:1 --pty bash --login

# load the python module
module load python/booth/3.12
module load cuda/12.2

# activate the virtual environment created in the previous step
source ./venv/tensorflow-gpu/bin/activate

# run the example script
python example.py

# leave the venv
deactivate

# log out of the node
exit

Finally, to submit the example file as a batch job, create a submit script and submit it using the sbatch command.

submith.sh

#!/bin/bash

#---------------------------------------------------------------------------------
# Accounting information

#SBATCH --account=phd

#---------------------------------------------------------------------------------
# Resources requested

#SBATCH --partition=gpu_h100
#SBATCH --gres=gpu:1

#---------------------------------------------------------------------------------
# Job specific name (helps organize and track progress of jobs)

#SBATCH --job-name="tf-gpu_example"

#---------------------------------------------------------------------------------
# Commands to execute

# load the python module
module load python/booth/3.12
module load cuda/12.2

# activate the virtual environment
source ~/venv/tensorflow-gpu/bin/activate

# run the example script
srun python3 example.py

Submit the job by typing: sbatch submit.sh