Jupyter Notebooks

In this guide we’ll demonstrate how to request cluster resources using jupyter.

Setup Jupyter on CSD3

  • load a recent python:

    $ module load python-3.6.1-gcc-5.4.0-64u3a4w py-numpy-1.12.1-gcc-5.4.0-cjrgw2k py-matplotlib-2.2.2-gcc-5.4.0-6oe6fph
    $ module load py-virtualenv-15.1.0-gcc-5.4.0-gu4wi6c
    
  • create and activate a virtual environment:

    $ virtualenv --system-site-packages ~/jupyter-env
    $ source ~/jupyter-env/bin/activate
    
  • install pytorch into your virtual env:

    $ pip install jupyter
    $ pip install kiwisolver
    

Running Jupyter

  • activate the virtual environment:

    $ source ~/jupyter-env/bin/activate
    
  • start notebook server:

    $ jupyter notebook --no-browser --ip=127.0.0.1 --port=8081
    

this will print out lots of messages, finishing with a web address. Copy this web address into the clipboard and make a note of which login node you have used.

Then from your local machine, forward a port to be able to connect to the notebook server:

$ ssh -L 8081:127.0.0.1:8081 -fN login-e-1.hpc.cam.ac.uk

and ensure that you pick the same login node (here we are assuming login-e-1) that you started the notebook server on. You can then open the web address and connect to your notebook server.

Running Jupyter on a compute node

  • On Wilkes-2 submit a GPU job by running the following or setting the appropriate SLURM variables in a submission script:

    $ srun -t 02:00:00 --nodes=1 --gres=gpu:1 --ntasks-per-node=1 --cpus-per-task=3 --partition=pascal -A YOURACCOUNT --pty bash
    

    (or use sintr).

  • Now start the notebook server as follows:

    $ jupyter notebook --no-browser --ip=* --port=8081
    

Put this web address into your copy buffer.

  • From your local machine, forward a port to be able to connect to the notebook server. Replace gpu-e-X with the requested resource and login-e-Y with the node you started the notbook server on:

    $ ssh -L 8081:gpu-e-X:8081 -fN login-e-Y.hpc.cam.ac.uk
    
  • Copy the address into your browser and change the “gpu-e-X” string to 127.0.0.1

  • If you encounter the “bind: Address already in use” issue, it’s because a port has already been opened. In which case you can stop the process that is associated with that port and try again:

    $  lsof -ti:8081 | xargs kill -9
    

Running jobs from within Jupyter

  • It is possible to bypass SLURM altogether and use Jupyter notebooks to submit jobs on the cluster. To do this, follow the documentation outlined above to install Jupyter and run on the login node (not compute). Begin by installing remote_ikernel:

    $ pip install remote_ikernal
    
  • The following is an example of configuring a remote kernel that uses the SLURM interface:

    $ remote_ikernel manage --add \
        --kernel_cmd="ipython kernel -f {connection_file}" \
        --name="Kernel name" --cpus=32 --interface=slurm \
        --remote-precmd "source $VENV_PATH/bin/activate" \
        --remote-launch-args '-A your-account -p skylake -t 12:00:00' --verbose
    
  • Where VENV_PATH needs to be changed to point to the virtual environment location, but the connection_file can be left blank as above. The remote-precmd avoids you having to modify your .bashrc.

  • Launch the Jupyter notebook and select the appropriate kernel with the name provided in the string above. This will request the resources for the time specified in –remote-launch-args when the kernel is executed. To see the kernel, it might be necessary to refresh the jupyter notebook. It will then appear in NEW –> KERNEL_NAME.

  • You can delete the kernel by running:

    $ remote_ikernel manage delete KERNEL_NAME
    
  • And you can find the name of the currently installed list of kernels by doing:

    $ jupyter kernelspec list
    
  • It is possible to add other jupyter kernels including gnuplot, julia …

  • We still need to look into multi-node support.