AlphaFold
=========

From the Alphafold website https://github.com/deepmind/alphafold

    This package provides an implementation of the inference pipeline of AlphaFold v2.0. 
    This is a completely new model that was entered in CASP14 and published in Nature.
    For simplicity, we refer to this model as AlphaFold throughout the rest of this document.

    We also provide an implementation of AlphaFold-Multimer. This represents a work in progress and AlphaFold-Multimer isn't expected to be as stable as our monomer AlphaFold system. Read the guide for how to upgrade and update code.


AlphaFold data on CSD3 
----------------------
The 2.8TB dataset is stored in::

  /datasets/public/AlphaFold/data

Note that you may need to `ls` the directory in order for it to be mounted.  There are example sequences stored in the `input` directory.

The dataset has been recently updated (November 2023) so scripts may not work without pointing to new versions of the files. In addition, to address this `issue <https://github.com/google-deepmind/alphafold/issues/810>`_, newer versions of the ``uniref`` can be found here `/datasets/public/AlphaFold/data/uniref30/UniRef30_2023_02*`.

Running AlphaFold2 on CSD3
--------------------------

There are various ways to run AlphaFold2 on CSD3. We encourage the use of:

  * `ParaFold <https://github.com/RSE-Cambridge/ParallelFold>`_
  * `AlphaPullDown <https://github.com/KosinskiLab/AlphaPulldown>`_
  * `ColabFold <https://github.com/sokrypton/ColabFold>`_

To get up and running quickly on CSD3 it is possible to run the Singularity container provided as a module::

  module load alphafold/2.3.2-singularity

See `Singularity`_ for more information. This is not performant as it runs the slow CPU step and GPU step in sequence meaning that 4 GPUs are sitting idle for most of the time. Instead, see the `ParaFold section`_ for instructions on how to obtain better performance.

.. The easiest way to get started with AlphaFold2 on CSD3 is to use the docker build ported to Singularity. Unfortunately, it is not easy to decouple the CPU and GPU steps meaning we are confined to a certain node type for both steps.

If you would like us to support other implementations of AlphaFold2 or if anything here is unclear or incorrect please contact `support <support@hpc.cam.ac.uk>`_.

.. _ParaFold section:

Separating CPU and GPU steps using ParallelFold and Conda
---------------------------------------------------------
`ParaFold` https://parafold.sjtu.edu.cn is a fork of AlphaFold2 that separates the CPU MSA step from the GPU prediction so that it can be executed in a two step process. This is more desirable because the GPU remains idle for most of the running time when using DeepMind's Singularity build as shown :ref:`below <Singularity>`.

To install, create a Conda environment using CSD3's Conda module or download it yourself:

.. literalinclude:: scripts/alphafold/alphafold_parafold_setup.sh
    :language: bash

If downloading yourself, use `Miniforge <https://github.com/conda-forge/miniforge>`_ as it is shipped with `mamba <https://github.com/mamba-org/mamba>`_ which is an optimised implementation on Conda.

Then follow the instructions https://github.com/RSE-Cambridge/ParallelFold, with usage information `here <https://github.com/RSE-Cambridge/ParallelFold/blob/main/docs/usage.md>`_. Note that this `fork <https://github.com/RSE-Cambridge/ParallelFold>`_ is an optimised form of the `original <https://github.com/Zuricho/ParallelFold>`_ to run on CSD3. 

First we run the CPU MSA step on an Icelake node. The `-f` flag means that we only run the featurisation step:

.. literalinclude:: scripts/alphafold/alphafold_parafold_cpu_msa_cpu_sbatch
    :language: bash
		    
The featurisation step will output feature.pkl and an MSA directory in the `output` directory. To run a monomer prediction, execute the following command on the GPU:

.. literalinclude:: scripts/alphafold/alphafold_parafold_gpu_sbatch
    :language: bash

Optimising the hhblits step
---------------------------
Follow the advice `here <https://github.com/soedinglab/hh-suite/issues/281#issuecomment-888689484>`_ for speeding up the hhblits step on CPU. The optimisation involves copying the two `cs219` files from bfd directory and creating symbolic links to the remaining four on the local ssd::

  ln -s /datasets/public/AlphaFold/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex /local/

Then point to the correct path by modifying the Parafold script (a similar apprach should work for other implementations of AlphaFold).  

If running as part of a Slurm script, be sure to add the copy and symlink commands to the beginning of the script. 

.. _Singularity:

Running AlphaFold2 using Singularity on CSD3
--------------------------------------------

Load the Singularity image which exposes a `run_alphafold` script into the environment. The script sets some default paths to the dataset.::

  module load alphafold/2.3.2-singularity

Create a slurm script with the following contents to predict the structure of the T1050 sequence (779 residues). The script assumes that an input subdirectory exists containing T1050.fasta file:

.. literalinclude:: scripts/alphafold/alphafold_singularity_gpu_sbatch
    :language: bash

OR execute the full singularity command::

  SIMAGE=/usr/local/Cluster-Apps/singularity/images/alphafold-2.3.2.sif
  # point to location of AlphaFold data
  DATA=/datasets/public/AlphaFold/data

  singularity run --env \
    TF_FORCE_UNIFIED_MEMORY=1,XLA_PYTHON_CLIENT_MEM_FRACTION=4.0,OPENMM_CPU_THREADS=32 \
    -B $DATA:/data \
    -B .:/etc \
    --pwd /app/alphafold \
    --nv ${SIMAGE} \
    --data_dir /data/ \
    --fasta_paths $PWD/input/T1050.fasta \
    --output_dir $PWD/output/T1050/ \
    --use_gpu_relax=True \
    --max_template_date=2020-05-14 \
    --uniref90_database_path=/data/uniref90/uniref90.fasta \
    --mgnify_database_path /data/mgnify/mgy_clusters_2022_05.fa \
    --template_mmcif_dir=/data/pdb_mmcif/mmcif_files \
    --obsolete_pdbs_path=/data/pdb_mmcif/obsolete.dat \
    --bfd_database_path /data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --uniref30_database_path /data/uniref30/UniRef30_2021_03 \
    --pdb70_database_path=/data/pdb70/pdb70 


.. SINGULARITY_DIR=/usr/local/Cluster-Apps/singularity/images/alphafold-2.1.2.sif

..   singularity run --env TF_FORCE_UNIFIED_MEMORY=1,XLA_PYTHON_CLIENT_MEM_FRACTION=4.0,OPENMM_CPU_THREADS=32 \
..   -B $DATA:/data -B .:/etc --pwd /app/alphafold --nv $SINGULARITY_DIR/alphafold-2.1.2.sif \
..     --benchmark \
..     --output_dir  $PWD/output/T1050 \
..     --fasta_paths $PWD/input/T1050.fasta \
..     --data_dir /data/ \
..     --max_template_date=2020-05-14 \
..     --uniref90_database_path /data/uniref90/uniref90.fasta \
..     --mgnify_database_path /data/mgnify/mgy_clusters_2018_12.fa \
..     --uniclust30_database_path /data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
..     --bfd_database_path /data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
..     --pdb70_database_path /data/pdb70/pdb70 \
..     --template_mmcif_dir /data/pdb_mmcif/mmcif_files \
..     --obsolete_pdbs_path /data/pdb_mmcif/obsolete.dat \
..     --preset full_dbs \
..     --cpus 32

.. OR execute full singularity command with reduced dataset::


..   DATA=/rds/project/rds-5mCMIDBOkPU/ma595/alphafold/data_reduced-2
..   singularity run --env TF_FORCE_UNIFIED_MEMORY=1,XLA_PYTHON_CLIENT_MEM_FRACTION=4.0,OPENMM_CPU_THREADS=32 \
..   -B $DATA:/data -B .:/etc --pwd /app/alphafold --nv $SINGULARITY_DIR/alphafold-2.1.2.sif \
..     --benchmark \
..     --output_dir  $PWD/output/T1050 \
..     --fasta_paths $PWD/input/T1050.fasta \
..     --data_dir /data/ \
..     --max_template_date=2020-05-14 \
..     --uniref90_database_path /data/uniref90/uniref90.fasta \
..     --mgnify_database_path /data/mgnify/mgy_clusters_2018_12.fa \
..     --bfd_database_path /data/small_bfd/bfd-first_non_consensus_sequences.fasta \
..     --pdb70_database_path /data/pdb70/pdb70 \
..     --template_mmcif_dir /data/pdb_mmcif/mmcif_files \
..     --obsolete_pdbs_path /data/pdb_mmcif/obsolete.dat \
..     --db_preset=reduced_dbs \
..     --cpus 32
 
.. Note that the `cpus` flag is set to 32 (or higher if desired), this change is required to get good performance on the CPU hhblits preprocessing step. 

Timings are reported below running on 5 models for T1050 (779 residues) sequence::

  real    149m52.862s
  user    1111m23.014s
  sys     22m55.353s
  {
      "features": 5646.51958489418,
      "process_features_model_1": 95.72981929779053,
      "predict_and_compile_model_1": 233.02064847946167,
      "predict_benchmark_model_1": 130.08757734298706,
      "relax_model_1": 334.7365086078644,
      "process_features_model_2": 4.438706398010254,
      "predict_and_compile_model_2": 184.557687997818,
      "predict_benchmark_model_2": 116.91508865356445,
      "relax_model_2": 307.3584554195404,
      "process_features_model_3": 3.6764779090881348,
      "predict_and_compile_model_3": 163.3666865825653,
      "predict_benchmark_model_3": 121.80361533164978,
      "relax_model_3": 420.58361291885376,
      "process_features_model_4": 4.023890972137451,
      "predict_and_compile_model_4": 169.06972408294678
      "predict_benchmark_model_4": 121.70339488983154,
      "relax_model_4": 300.7459502220154,
      "process_features_model_5": 4.179120063781738,
      "predict_and_compile_model_5": 154.17626547813416,
      "predict_benchmark_model_5": 108.35132598876953,
      "relax_model_5": 329.9167058467865
  }

The Singularity image was built from Deepmind's docker script and has been tested on the A100 nodes. The MSA construction and model inference are done on the same node type - it isn't easy to decouple the two steps without using separate implementations (see below). Users can choose to run on CPU but the inference step takes considerably longer than on a GPU. Running on a GPU means that the CPU preprocessing (MSA) step can dominate the running time (depending on the particular sequence we wish to predict the structure of). 

Current Issues
--------------
We are aware of the slow preprocessing time of hhblits on CSD3 and are working to improve this. For the small database it's possible to pre-stage the data on the local ssd drive (with rsync), but this is not possible for the full database as it exceeds the capacity of the local ssd drive.  

Appendices
----------

Building the singularity image on CSD3::

  git clone alphafold
  cd ./alphafold
  docker build -f docker/Dockerfile -t alphafold .
  docker tag alphafold:latest ma595/alphafold:latest
  docker push alphafold:latest