Tensorflow with GPU
Python
Singularity Container (recommended)
User may utilize Singularity containers for Tensorflow with GPU support. We encourage users to try these before installing their own versions of tensorflow-gpu.
# request a node with one GPU in interactive mode
mfe01 ~ $ srun --partition=gpu_h100 --gres=gpu:1 --pty bash --login
# load CUDA environment
mgpu01 ~ $ module load cuda/12.2
# set the container name
mgpu01 ~ $ container=/apps/containers/tensorflow-gpu/tensorflow-2.4.3-gpu-jupyter.sif
# launch the container environment interactively
mgpu01 ~ $ singularity run --nv ${container} bash
# alternatively, run a python script directly and exit container
mgpu01 ~ $ singularity run --nv ${container} python myscript.py
Singularity containers can also be used in batch jobs by creating a submit script similar to the following.
#!/bin/bash
#SBATCH --partition=gpu_h100
#SBATCH --gres=gpu:1
# load CUDA environment
module load cuda/12.2
# set the container name
container=/apps/containers/tensorflow-gpu/tensorflow-2.4.3-gpu-jupyter.sif
# run Python script inside the container
srun singularity run --nv ${container} python3 myscript.py
Virtual Environment
To use Tensorflow with GPU support, you must first create a virtual environment. You can reuse virtual environments, so you will typically only have to do this once.
# start an interactive session
srun --pty bash --login
# load the python module
module load python/booth/3.12
# create a new virtual environment
virtualenv --system-site-packages -p python3 ~/venv/tensorflow-gpu
# activate the virtual environment
source ~/venv/tensorflow-gpu/bin/activate
# upgrade pip (inside the venv)
pip install --upgrade pip
# ensure tensorflow is up to date (inside the venv)
pip install --upgrade tensorflow
# install tensorflow-gpu (inside the venv)
pip install tensorflow-gpu
# when finished, leave the venv
deactivate
# log out of the compute node
exit
To test that Tensorflow works and is utilizing the GPU, test the installation inside a GPU node. For this purpose, create a simple example file and execute it inside a GPU node.
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
# use this setting to find out which devices your operations and tensors are assigned to
tf.debugging.set_log_device_placement(True)
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
print(c)
# request a node with one GPU in interactive mode
srun --partition=gpu_h100 --gres=gpu:1 --pty bash --login
# load the python module
module load python/booth/3.12
module load cuda/12.2
# activate the virtual environment created in the previous step
source ./venv/tensorflow-gpu/bin/activate
# run the example script
python example.py
# leave the venv
deactivate
# log out of the node
exit
Finally, to submit the example file as a batch job, create a submit script and submit it using the sbatch command.
#!/bin/bash
#---------------------------------------------------------------------------------
# Accounting information
#SBATCH --account=phd
#---------------------------------------------------------------------------------
# Resources requested
#SBATCH --partition=gpu_h100
#SBATCH --gres=gpu:1
#---------------------------------------------------------------------------------
# Job specific name (helps organize and track progress of jobs)
#SBATCH --job-name="tf-gpu_example"
#---------------------------------------------------------------------------------
# Commands to execute
# load the python module
module load python/booth/3.12
module load cuda/12.2
# activate the virtual environment
source ~/venv/tensorflow-gpu/bin/activate
# run the example script
srun python3 example.py
Submit the job by typing: sbatch submit.sh