Best Practice

Data written to RCS is cached on fast storage before being written to tape. Policies govern when data is sent from the cache to the tape library, typically this will be when storage thresholds are reached. Data on the first tape library is duplicated to a second library.

How data moves around RCS

Data is sent to a cache before being written to tape.

Writing Data to RCS

Compressing Data

When writing data to tape it is always better to have less larger files instead of more smaller files. This can be accomplished by compressing (archived) your data before uploading it to RCS. The common ways to do this are using the zip and tar utilities. Another reason for compressing data is to avoid quota limit on number of files.

Important

On RCS we are enforcing quota not only on space but also on number of files per TB of data. This is to avoid having a lot of small files being transfered to RCS. We strongly recommend to archive directories into tar or zip bundles when moving them to RCS. See example below.

For users with HPC accounts using tar and ssh: Archive large directory on your local computer with tar and move it to RCS, while splitting it into smaller parts. tar -cvf - $HOME/directory/ | ssh login.hpc.cam.ac.uk 'cd /rcs/project/wjt27/rcs-wjt27-test-project/ && split -b 50G - files.tar.'

The split parts of a file can be joined together again with the “cat” command.

For any user Using SFTP to RCS gateway Archive large directory on your local computer with tar and move it to RCS.

outfile=backup_`date +%Y%m%d_%H%M%S`.tar

tar cvf $outfile $HOME/directory/ && echo "put $outfile" | sftp rcs.uis.cam.ac.uk:/rcs/project/wjt27/rcs-wjt27-test-project/

Copying and packing data from RDS to RCS using tar Archive folder on RDS to RCS. You need to be logged on to HPC login node.

source=$HOME/rds/rds-wjt27-test-project/software

outfile=/rcs/project/wjt27/rcs-wjt27-test-project/backup_`date +%Y%m%d_%H%M%S`.tar

tar cvf $outfile $source

You can list the content of yoru archive by running following command line:

tar tvf $outfile

Naming Conventions

Folders in RCS follow the naming convention rcs-<PIs CRSID>-<Project Name>. Below that, however you are free to create your own folder structure and upload files. It is suggested that your file names for compressed data either describe the data held within or you keep a reference of which data is held in which archives. This could be stored in a text file in the same folder as the compressed data.

Recovering Data from RCS

When you attempt to download a file RCS pulls the file from the tapes and puts it back on the fast tier of storage. If the file is large then this can take a while, which may cause your FTP client to bring up a time out error message or disconnect from the session entirely.

There are two ways of dealing with this:

  • Try to pull the data from the store. This will trigger RCS to pull back the file from the tapes to the fast tier. Give it some time to complete then retry the job.

  • Alter your client’s default timeout settings. This varies from client to client but an example screenshot for FileZilla is below.

Changing Filezilla settings

Filezilla’s Timeout settings. Try changing it to something higher…