Using Python

CSD3 provides central installations of both Python 2 and Python 3. The central installation provides some of the most common packages used for scientific computation and data analysis and additional packages can be added by users by using either virtual environments or conda environments.

Recent versions of the Python interpreter can be accessed as python/2.7, python/3.5, python/3.6, python/3.7 or python/3.8. The default is currently python/3.6. These packages will be automatically upgraded to the latest point release.

Using Virtual Environments

Virtualenv provides a method for installing new or upgraded Python packages as a user without the need to ask support to make changes centrally. It is currently supported in all the Python modules.

Step-by-step guide

We have installed virtualenv into the centrally available Python modules and it is also available for the native Python installed as part of Scientific Linux 7 on CSD3. If you are happy to use the latter (which is version 2.7.5) there is no need to load a Python module, otherwise, please load the desired Python module first.

A guide to using virtualenv can be found here : https://pypi.Python.org/pypi/virtualenv.

In short, you can create a sandboxed version, in the directory of your choice (and if does not exist you must create one first) e.g. YOUR_PYTHON, of Python via:

virtualenv YOUR_PYTHON

Then activate this via:

source YOUR_PYTHON/bin/activate

and deactivate via:

deactivate

You can get it to inherit the central packages via:

virtualenv --system-site-packages YOUR_PYTHON

Once the virtualenv environment is activated, use the normal methods for downloading and installing Python packages (e.g. pip) and the packages will be installed into your YOUR_PYTHON directory, where they will override the contents of the central Python installation. Invoke Python as normal and the new components should be visible (as long as the virtualenv environment is activated).

Additional notes

If you are not in the location of the filesystem where YOUR_PYTHON is present, you can use a full path instead of a relative path (e.g. /home/my-crsid/YOUR_PYTHON) to activate the virtual env from every location of the filesystem.

The command virtualenv must be done only once to create and initialize the sandbox. After that, you just have to activate and deactivate accordingly to your need.

Using Anaconda Python

To setup your environment to use the Anaconda distributions you should use:

for Python 2:

[user@login-e-17 ~]$ module load miniconda/2

or for Python 3:

[user@login-e-17 ~]$ module load miniconda/3

You can verify the current version of Python with:

[user@login-e-17 ~]$ module load miniconda/3
[user@login-e-17 ~]$ python --version
Python 3.7.4

You can also verify the current version of conda with:

[user@login-e-17 ~]$ conda --version
conda 4.7.12

If this is the first time using Anaconda Python then it is important to run the first time conda init command to correctly prepare your shell environemnt for using the full suite of conda commands. This only needs to be done once.

[user@login-e-17 ~]$ module load miniconda/3
[user@login-e-17 ~]$ which conda
/usr/local/software/master/miniconda/3/bin/conda
[user@login-e-17 ~]$ conda init
[... list of modifications not made ...]
modified      /home/user/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

Using Anaconda Python in SLURM scripts

Without running conda init the commands conda activate and conda deactivate will present the following warning.

[user@login-e-15 ~]$ conda activate
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.

Since conda init injects some logic into your .bashrc file it must be sourced. This is not an issue for interactive shell sessions as it happens automatically when you login. However this is not the case for SLURM jobs, therefore it is important to add source .bashrc to your submission script to allow subsequent conda activate or conda deactivate commands to work.

Installing additional Anaconda Python modules

If the central base installation does not have a package or module that you require, you can install this yourself by using conda environments.

A conda environemnt is a local copy of the central install that you can then modify with additional modules/packages or even use different versions of existing packages.

Full documentation on using conda environments can be found online at Managing conda environments.

Below we show a short example of creating a local Python environment and installing the biopython package.

[user@login-e-1 ~]$ module load miniconda/3
[user@login-e-1 ~]$ conda create -n biopython biopython
Collecting package metadata: done
Solving environment: done

## Package Plan ##

environment location: /home/user/.conda/envs/biopython

added / updated specs:
   - biopython


The following packages will be downloaded:
[... list of packages ...]

The following NEW packages will be INSTALLED:
[... list of packages ...]

Proceed ([y]/n)? y

[... download and install the packages ...]


[user@login-e-1 ~]$ conda info --envs
# conda environments:
#
biopython                /home/user/.conda/envs/biopython
base                  *  /usr/local/software/archive/linux-scientific7-x86_64/gcc-9/miniconda3-4.7.12.1-rmuek6r3f6p3v6fdj7o2klyzta3qhslh

[user@login-e-1 ~]$ conda activate biopython
(biopython) [user@login-e-1 ~]$ python
Python 3.8.5 (default, Aug  5 2020, 08:36:46)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet.IUPAC import unambiguous_dna
>>> new_seq = Seq('GATCAGAAG', unambiguous_dna)
>>> new_seq[0:2]
Seq('GA', IUPACUnambiguousDNA())
>>> new_seq.translate()
Seq('DQK', IUPACProtein())
>>>