Tutorial: Migrating your cryoSPARC Instance

A guide to moving cryoSPARC v2+ from one location to another.

Introduction

There may come a time when you want to move your cryoSPARC instance from one location to another. This may be between folders, different network storage locations, or even different host machines entirely. There are four main areas we will focus on:

  1. The paths of any raw particle, micrograph, or movie data imported into cryoSPARC

  2. All cryoSPARC project directories

  3. The cryoSPARC database and its (new) location

  4. The identities/hostnames of compute nodes or the master node & the cryoSPARC binaries

All four of the above areas can be taken care of in isolation, but if combined, will amount to a full-out migration of your cryoSPARC instance.

Requirements

We will be using a combination of the shell as well as an interactive python session to complete this migration. You will need access to the master node in which the cryoSPARC system is hosted.

It is also recommended that a database backup is created before starting anything. More details are at: Setup, Configuration and Management

Use Cases

A. Moving only raw particle, micrograph or movie data already imported into cryoSPARC

If you're moving data that you used an "Import Particles", "Import Micrographs" or "Import Movies" job to bring into cryoSPARC, you will need to repair these jobs. When cryoSPARC imports these three types of data, it creates symlinks to each file inside the job's imported directory. These symlinks may become broken if the original path to the file no longer exists. You can check the status of the symlinks by running ls -l inside the imported directory of the job. Note: The "Import Templates" and "Import Volumes" jobs copy the specified files directly into the job directory.

Step One - Identify Affected Jobs

Start up an interactive python session

cryosparcm icli

Execute a MongoDB query for all potentially affected "import" jobs

>>> jobs = list(db.jobs.find({ 'deleted' : False, 'job_type' : { '$in' : [ 'import_particles', 'import_movies' ] } },{ '_id' : 0, 'project_uid' : 1, 'uid' : 1, 'job_type' : 1 } ) )
>>> print jobs
[{u'job_type': u'import_movies', u'project_uid': u'P1', u'uid': u'J3'},
{u'job_type': u'import_movies', u'project_uid': u'P2', u'uid': u'J3'},
{u'job_type': u'import_movies', u'project_uid': u'P1', u'uid': u'J41'},
{u'job_type': u'import_particles', u'project_uid': u'P1', u'uid': u'J42'},
{u'job_type': u'import_movies', u'project_uid': u'P2', u'uid': u'J29'},
{u'job_type': u'import_particles', u'project_uid': u'P2', u'uid': u'J51'},
{u'job_type': u'import_movies', u'project_uid': u'P2', u'uid': u'J55'},
{u'job_type': u'import_movies', u'project_uid': u'P2', u'uid': u'J64'},
...]

From this point, you can take a look into each list job's imported directory

>>> cli.get_job_dir_abs('P1', 'J3')
u'/data/cryosparc_projects/P1/J3'
>>> !ls -l /data/cryosparc_projects/P1/J3/imported
total 99
lrwxrwxrwx 1 cryosparcuser cryosparcuser 83 Jan 31 2018 14sep05c_00024sq_00003hl_00002es.frames.mrc -> /data/EMPIAR/10025/data/14sep05c_raw_196/14sep05c_00024sq_00003hl_00002es.frames.mrc
lrwxrwxrwx 1 cryosparcuser cryosparcuser 83 Jan 31 2018 14sep05c_00024sq_00003hl_00005es.frames.mrc -> /data/EMPIAR/10025/data/14sep05c_raw_196/14sep05c_00024sq_00003hl_00005es.frames.mrc
lrwxrwxrwx 1 cryosparcuser cryosparcuser 83 Jan 31 2018 14sep05c_00024sq_00004hl_00002es.frames.mrc -> /data/EMPIAR/10025/data/14sep05c_raw_196/14sep05c_00024sq_00004hl_00002es.frames.mrc
lrwxrwxrwx 1 cryosparcuser cryosparcuser 83 Jan 31 2018 14sep05c_00024sq_00006hl_00003es.frames.mrc -> /data/EMPIAR/10025/data/14sep05c_raw_196/14sep05c_00024sq_00006hl_00003es.frames.mrc
...

Use the command job_import_replace_symlinks(project_uid, job_uid, prefix_cut, prefix_new) where prefix_cut is the beginning of the link you'd like to cut (e.g. /data/EMPIAR) and where prefix_new is what you'd like to replace it with (e.g. /data). This function will loop through every file inside the job directory, find all symlinks, and only modify them only if they start with prefix_cut. The function returns the number of links it modified. Below it is used in a loop to modify all jobs across all projects all at once.

>>> failed_jobs = []
>>> for job in jobs:
try:
print "Repairing %s %s"%(job['project_uid'], job['uid'])
modified_count = cli.job_import_replace_symlinks(job['project_uid'], job['uid'], '/data/EMPIAR', '/data')
print "Finished. Modified %d links."%(modified_count)
except Exception as e:
failed_jobs.append(job['project_uid'], job['uid'])
print "Failed to repair %s %s: %s"%(job['project_uid'], job['uid'], str(e))
...
>>> failed_jobs
[]

B. Moving Only cryoSPARC Project Directories (and all jobs inside them)

If you're moving the locations of the projects and their jobs, you will need to point cryoSPARC to the new directory where the projects reside. Jobs inside cryoSPARC are referenced by their relative location to their project directory. This allows a user to specify a new location for the project directory only, rather than each job, which we figured would be easier since it is likely there are far fewer projects than jobs in a normal cryoSPARC instance.

Step One - Identify All Project Directories

Start up an interactive python session

cryosparcm icli

Execute a MongoDB query to list all project directories

>>> projects = list(db['projects'].find({},{'uid':1, "project_dir":1, '_id':0}))
>>> projects
[{u'project_dir': u'/data/cryosparc_projects/P1', u'uid': u'P1'},
{u'project_dir': u'/data/cryosparc_projects/P2', u'uid': u'P2'},
{u'project_dir': u'/data/cryosparc_projects/P3', u'uid': u'P3'},
{u'project_dir': u'/data/cryosparc_projects/P4', u'uid': u'P4'},
...]

Step Two - Modify One or Many Project Directories

Use the command update_project(project_uid, attrs, operation='$set') where attrs is a dictionary whose keys correspond to the fields in the project document to update. Below it is used in a loop to modify all project directory paths.

>>> failed_projects = []
>>> new_parent_dir = '/cryoem/cryosparc_projects'
>>> for project in projects:
new_project_dir = os.path.join(new_parent_dir, os.path.basename(project['project_dir'])
try:
print "Modifying project directory for %s: %s --> %s"%(project['uid'], project['project_dir'], new_project_dir)
cli.update_project(project['uid'], {'project_dir' : new_project_dir})
except Exception as e:
failed_projectst.append(project['uid'])
print "Failed to update %s: %s"%(project['uid'] str(e))
...
>>> failed_projects
[]

C. Moving the cryoSPARC database

The cryoSPARC database doesn't necessarily have to be in the same location as the cryoSPARC binaries. If you want to move the database, all you need to do is tell cryoSPARC master where the absolute path to the database is.

Step One - Turn off cryoSPARC

cryosparcm stop

Step Two - Move the Database

rsync -r --links /data/cryosparc/cryosparc_database/* /cryoem/cryosparc/cryosparc_database

Step Three - Modify Configurations

Navigate to the cryosparc_master directory

cd /data/cryosparc/cryosparc_master

Modify config.sh to contain the new directory path to the database

nano config.sh
#modify the line below
export CRYOSPARC_DB_PATH="/cryoem/cryosparc/cryosparc_database"

Step Four - Restart cryoSPARC

cryosparcm start

D. Hosting cryoSPARC on another machine

Assuming that the cryoSPARC instance is located on a shared storage layer, if you want to host it on another machine, (i.e. server1:39000 —> server2:39000 you just need to specify the new hostname to cryoSPARC master. Read further if the new machine doesn't have access to the same file system.

Step Zero - Reinstall cryoSPARC on new machine (Optional)

NOTE: Skip this step if you're using a shared filesystem (e.g. a remote storage server that is hosted on all machines). This means that the cryoSPARC binaries, database and project directories are already accesible on the new machine.

First, follow all previous parts of this guide in order. Read this entire guide thoroughly before starting anything. You will need your old instance started and working, so make sure you still have access to it. Then, follow the install guide to install cryoSPARC normally using the migrated database path. After this step, you are done.

Step One - Turn off cryoSPARC

cryosparcm stop

Step Two - Modify Configurations

Navigate to the cryosparc_master directory

cd /data/cryosparc/cryosparc_master

Modify config.sh to list the new master node

nano config.sh
#modify the line below
export CRYOSPARC_MASTER_HOSTNAME="newnode"

Step Three - Restart cryoSPARC

On the new machine, start cryoSPARC. Ensure you start cryoSPARC using the same user.

cryosparcm start