Tensorflow with GPU

Python

Virtual Environment

To use Tensorflow with GPU support, you must first create a virtual environment. You can reuse virtual environments, so you will typically only have to do this once.

# start an interactive session
srun --pty bash --login

# load the python module
module load python/booth/3.8

# create a new virtual environment
virtualenv --system-site-packages -p python3 ~/venv/tensorflow-gpu

# activate the virtual environment
source ~/venv/tensorflow-gpu/bin/activate

# upgrade pip (inside the venv)
pip install --upgrade pip

# ensure tensorflow is up to date (inside the venv)
pip install --upgrade tensorflow

# install tensorflow-gpu (inside the venv)
pip install tensorflow-gpu

# when finished, leave the venv
deactivate

# log out of the compute node
exit

To test that Tensorflow works and is utilizing the GPU, test the installation inside a GPU node. For this purpose, create a simple example file and execute it inside a GPU node.

example.py
import tensorflow as tf

print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# use this setting to find out which devices your operations and tensors are assigned to
tf.debugging.set_log_device_placement(True)

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

print(c)
# request a node with one GPU in interactive mode
srun --partition=gpu_h100 --gres=gpu:1 --pty bash --login

# load the python module
module load python/booth/3.8
module load cuda/12.2

# activate the virtual environment created in the previous step
source ./venv/tensorflow-gpu/bin/activate

# run the example script
python example.py

# leave the venv
deactivate

# log out of the node
exit

Finally, to submit the example file as a batch job, create a submit script and submit it using the sbatch command.

submith.sh
#!/bin/bash

#---------------------------------------------------------------------------------
# Accounting information

#SBATCH --account=phd

#---------------------------------------------------------------------------------
# Resources requested

#SBATCH --partition=gpu_h100
#SBATCH --gres=gpu:1

#---------------------------------------------------------------------------------
# Job specific name (helps organize and track progress of jobs)

#SBATCH --job-name="tf-gpu_example"

#---------------------------------------------------------------------------------
# Commands to execute

# load the python module
module load python/booth/3.8
module load cuda/12.2

# activate the virtual environment
source ~/venv/tensorflow-gpu/bin/activate

# run the example script
srun python3 example.py

Submit the job by typing: sbatch submit.sh