Tensorflow

From the Tensorflow website tensorflow.org

TensorFlow™ is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains.

Tensorflow can most easily be installed and used by installing it into a virtual environment. Python packages for tensorflow with gpu acceleration (tensorflow-gpu) and with options for Intel hardware (intel-tensorflow) are available.

Running Tensorflow with GPU acceleration on CSD3

To deploy Tensorflow with GPU support execute the following:

# 1. Load appropriate modules and environment
ssh login-gpu.hpc.cam.ac.uk
module purge
module load rhel7/default-gpu
module unload cuda/8.0
module load python/3.6 cuda/10.0 cudnn/7.5_cuda-10.0

# 2. Install latest Tensorflow in a python virtualenv
python -m venv ./tensorflow-env
source tensorflow-env/bin/activate
pip install tensorflow-gpu

# 3. Test your installation (be in a interative session
# via sintr or submit a job to a gpu compute node)
cat << 'EOF' > helloworld.py
#!/usr/bin/env python
import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
          tf.keras.layers.Flatten(input_shape=(28, 28)),
          tf.keras.layers.Dense(512, activation=tf.nn.relu),
          tf.keras.layers.Dropout(0.2),
          tf.keras.layers.Dense(10, activation=tf.nn.softmax)
          ])
model.compile(optimizer='adam',
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
EOF

Running Tensorflow

Tensorflow can be run with a sbatch script similar to:

#!/bin/bash
#SBATCH --account CHANGE_TO_YOUR_ACCOUNT-GPU
#SBATCH --partition pascal
#SBATCH -t 00:10:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1

module purge
module load rhel7/default-gpu
module unload cuda/8.0
module load python/3.6 cuda/10.0 cudnn/7.5_cuda-10.0

source ./tensorflow-env/bin/activate
python ./helloworld.py

Running Tensorflow in containers

CSD3 also supports running inside singularity containers. A Tensorflow container can be downloaded from Docker Hub and run with:

singularity pull docker://tensorflow/tensorflow:latest-gpu
singularity exec --nv ./tensorflow-gpu.simg python ./helloworld.py

Building Tensorflow from source on CSD3

The following instructions have been adapted from the Tensorflow documentation here: tensorflow.org/install/source.

  1. Repeat step 1 above and then clone the latest Tensorflow repository:

    git clone https://github.com/tensorflow/tensorflow
    
  2. The preceding git clone command creates a subdirectory named tensorflow. After cloning, you may optionally build a specific branch (such as a release branch) by invoking the following commands:

    cd tensorflow
    git checkout Branch # where Branch is the desired branch
    
  3. Repeat step 2 above: Activate a virtual-env and install all necessary dependencies using pip. Load Google’s Bazel and install mock:

    module load bazel-0.13.0-gcc-5.4.0-6hnokt7
    module unload cuda/8.0
    module load python-3.6.2-gcc-5.4.0-me5fsee cuda/10.0 cudnn/7.5_cuda-10.0
    virtualenv --system-site-packages ./tensorflow-env
    source ./tensorflow-env/bin/activate
    pip install --upgrade pip
    pip install --upgrade numpy scipy wheel cryptography
    pip install --upgrade mock
    
  4. cd to the top-level directory created and run the configure script. The following is an example of tensorflow built with MPI emabled:

    cd tensorflow
    ./configure
    
    Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python2.7
    Found possible Python library paths:
      /usr/local/lib/python2.7/dist-packages
      /usr/lib/python2.7/dist-packages
    Please input the desired Python library path to use.  Default is [/usr/lib/python2.7/dist-packages]
    
    Using python library path: /usr/local/lib/python2.7/dist-packages
    Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
    Do you wish to use jemalloc as the malloc implementation? [Y/n] n
    no jemalloc support enabled
    Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
    No Google Cloud Platform support will be enabled for TensorFlow
    Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
    No Hadoop File System support will be enabled for TensorFlow
    Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] N
    No XLA support will be enabled for TensorFlow
    Do you wish to build TensorFlow with VERBS support? [y/N] N
    No VERBS support will be enabled for TensorFlow
    Do you wish to build TensorFlow with OpenCL support? [y/N] N
    No OpenCL support will be enabled for TensorFlow
    Do you wish to build TensorFlow with CUDA support? [y/N] Y
    CUDA support will be enabled for TensorFlow
    Do you want to use clang as CUDA compiler? [y/N]
    nvcc will be used as CUDA compiler
    Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: (empty)
    Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
    /usr/local/Cluster-Apps/cuda/9.0/
    Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
    /usr/local/software/spack/spack-0.11.2/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/gcc-5.4.0-fis24ggupugiobii56fesif2y3qulpdr/bin/gcc
    Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7
    Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
    /usr/local/Cluster-Apps/cudnn/7.0_cuda-9.0/
    Please specify a list of comma-separated CUDA compute capabilities you want to build with.
    You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
    Please note that each additional compute capability significantly increases your build time and binary size.
    
    Do you wish to build TensorFlow with MPI support? [y/N] y
    MPI support will not be enabled for TensorFlow
    Configuration finished