Tensorflow

From the Tensorflow github page

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working within the Machine Intelligence team at Google Brain to conduct research in machine learning and neural networks. However, the framework is versatile enough to be used in other areas as well.

TensorFlow provides stable Python and C++ APIs, as well as a non-guaranteed backward compatible API for other languages.

Tensorflow can most easily be installed and used by installing it into a virtual environment. A python package for tensorflow with options for Intel hardware (intel-extension-for-tensorflow) is available.

Running Tensorflow with GPU acceleration on CSD3

To deploy Tensorflow with GPU support execute the following on an interactive session on a GPU node (see Interactive jobs via the scheduler for starting one).

# 1. Load appropriate modules and environment
module purge
module load rhel8/slurm rhel8/global
module load python/3.8.11/gcc-9.4.0-yb6rzr6
module load cuda/12.1 cudnn/8.9_cuda-12.1
# Access to optional ampere modules (uncomment if needed)
#module use /usr/local/software/spack/spack-modules/a100-20210927/linux-centos8-zen2
#module use /usr/local/software/spack/spack-modules/a100-20210927/linux-centos8-zen3

# 2. Install latest Tensorflow in a python virtualenv
python -m venv tensorflow-ampere-env
source ./tensorflow-ampere-env/bin/activate
python -m pip install tensorflow 
export CUDNN_PATH=/usr/local/Cluster-Apps/cudnn/8.9_cuda-12.1/lib64
export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CUDA_PATH
# 3. Test your installation (be in a interative session
# via sintr or submit a job to a gpu compute node)
cat << 'EOF' > helloworld.py
#!/usr/bin/env python
import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
          tf.keras.layers.Flatten(input_shape=(28, 28)),
          tf.keras.layers.Dense(512, activation=tf.nn.relu),
          tf.keras.layers.Dropout(0.2),
          tf.keras.layers.Dense(10, activation=tf.nn.softmax)
          ])
model.compile(optimizer='adam',
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
EOF

Running Tensorflow

Tensorflow can be run with a sbatch script similar to:

#!/bin/bash
#SBATCH --account CHANGE_TO_YOUR_ACCOUNT-GPU
#! Update next line to the gpu cluster of your choice.
#SBATCH --partition ampere 
#SBATCH -t 00:20:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH -o tensorflow.out
#SBATCH -e tensorflow.err

module purge

module load rhel8/slurm rhel8/global
module load python/3.8.11/gcc-9.4.0-yb6rzr6
module load cuda/12.1 cudnn/8.9_cuda-12.1
# Access to optional ampere modules (uncomment if needed)
#module use /usr/local/software/spack/spack-modules/a100-20210927/linux-centos8-zen2
#module use /usr/local/software/spack/spack-modules/a100-20210927/linux-centos8-zen3

# Activate locally installed python virtual environment
source ./tensorflow-ampere-env/bin/activate
export CUDNN_PATH=/usr/local/Cluster-Apps/cudnn/8.9_cuda-12.1/lib64
export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CUDA_PATH

# run script (change helloworld.py to your own)
python helloworld.py

Running Tensorflow in containers

CSD3 also supports running inside apptainer containers. A Tensorflow container can be downloaded from the Docker Hub and run with:

# The next command generates a > 3.8 GB file in the folder it is issued. Use with care.
apptainer pull docker://tensorflow/tensorflow:latest-gpu
apptainer exec --nv ./tensorflow_latest-gpu.sif python ./helloworld.py