Tensorflow ========== From the Tensorflow website tensorflow.org TensorFlow™ is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains. Tensorflow can most easily be installed and used by installing it into a virtual environment. Python packages for tensorflow with gpu acceleration (`tensorflow-gpu`) and with options for Intel hardware (`intel-tensorflow`) are available. Running Tensorflow with GPU acceleration on CSD3 ------------------------------------------------ To deploy Tensorflow with GPU support execute the following:: # 1. Load appropriate modules and environment ssh login-gpu.hpc.cam.ac.uk module purge module load rhel7/default-gpu module unload cuda/8.0 module load python/3.6 cuda/10.0 cudnn/7.5_cuda-10.0 # 2. Install latest Tensorflow in a python virtualenv python -m venv ./tensorflow-env source tensorflow-env/bin/activate pip install tensorflow-gpu # 3. Test your installation (be in a interative session # via sintr or submit a job to a gpu compute node) cat << 'EOF' > helloworld.py #!/usr/bin/env python import tensorflow as tf mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test) EOF Running Tensorflow ------------------ Tensorflow can be run with a sbatch script similar to:: #!/bin/bash #SBATCH --account CHANGE_TO_YOUR_ACCOUNT-GPU #SBATCH --partition pascal #SBATCH -t 00:10:00 #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --gres=gpu:1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 module purge module load rhel7/default-gpu module unload cuda/8.0 module load python/3.6 cuda/10.0 cudnn/7.5_cuda-10.0 source ./tensorflow-env/bin/activate python ./helloworld.py Running Tensorflow in containers -------------------------------- CSD3 also supports running inside singularity containers. A Tensorflow container can be downloaded from Docker Hub and run with:: singularity pull docker://tensorflow/tensorflow:latest-gpu singularity exec --nv ./tensorflow-gpu.simg python ./helloworld.py Building Tensorflow from source on CSD3 --------------------------------------- The following instructions have been adapted from the Tensorflow documentation here: tensorflow.org/install/source. 1. Repeat step 1 above and then clone the latest Tensorflow repository:: git clone https://github.com/tensorflow/tensorflow 2. The preceding git clone command creates a subdirectory named tensorflow. After cloning, you may *optionally* build a specific branch (such as a release branch) by invoking the following commands:: cd tensorflow git checkout Branch # where Branch is the desired branch 3. Repeat step 2 above: Activate a virtual-env and install all necessary dependencies using pip. Load Google's Bazel and install mock:: module load bazel-0.13.0-gcc-5.4.0-6hnokt7 module unload cuda/8.0 module load python-3.6.2-gcc-5.4.0-me5fsee cuda/10.0 cudnn/7.5_cuda-10.0 virtualenv --system-site-packages ./tensorflow-env source ./tensorflow-env/bin/activate pip install --upgrade pip pip install --upgrade numpy scipy wheel cryptography pip install --upgrade mock 4. cd to the top-level directory created and run the configure script. The following is an example of tensorflow built with MPI emabled:: cd tensorflow ./configure Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python2.7 Found possible Python library paths: /usr/local/lib/python2.7/dist-packages /usr/lib/python2.7/dist-packages Please input the desired Python library path to use. Default is [/usr/lib/python2.7/dist-packages] Using python library path: /usr/local/lib/python2.7/dist-packages Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: Do you wish to use jemalloc as the malloc implementation? [Y/n] n no jemalloc support enabled Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N No Google Cloud Platform support will be enabled for TensorFlow Do you wish to build TensorFlow with Hadoop File System support? [y/N] N No Hadoop File System support will be enabled for TensorFlow Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] N No XLA support will be enabled for TensorFlow Do you wish to build TensorFlow with VERBS support? [y/N] N No VERBS support will be enabled for TensorFlow Do you wish to build TensorFlow with OpenCL support? [y/N] N No OpenCL support will be enabled for TensorFlow Do you wish to build TensorFlow with CUDA support? [y/N] Y CUDA support will be enabled for TensorFlow Do you want to use clang as CUDA compiler? [y/N] nvcc will be used as CUDA compiler Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: (empty) Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/Cluster-Apps/cuda/9.0/ Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/local/software/spack/spack-0.11.2/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/gcc-5.4.0-fis24ggupugiobii56fesif2y3qulpdr/bin/gcc Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/Cluster-Apps/cudnn/7.0_cuda-9.0/ Please specify a list of comma-separated CUDA compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. Do you wish to build TensorFlow with MPI support? [y/N] y MPI support will not be enabled for TensorFlow Configuration finished .. #here we are using an input file ``lammps.in`` to run on the GPU system, making use of 16 GPUs on 4 compute nodes. To use more or less GPUs the ``-N`` and ``-n`` options should be changed, bearing in mind that our GPU compute nodes have 4 GPUs per node. As always the ``-A`` option should be changed to your specific slurm accoutn and the job time limit can be adjusted with the ``-t`` option.