# Guide: Migrating your CryoSPARC Instance

## Introduction

There may come a time when you want to move your CryoSPARC instance from one location to another. This may be between folders, different network storage locations, or even different host machines entirely. There are four main areas we will focus on:

1. The paths of any raw particle, micrograph, or movie data imported into CryoSPARC
2. All CryoSPARC project directories
3. The CryoSPARC database and its (ne&#x77;*)* location
4. The identities/hostnames of compute nodes or the master node and the CryoSPARC binaries

All four of the above areas can be taken care of in isolation, but if **combined**, will amount to a full-out migration of your CryoSPARC instance.

## Requirements

We will be using a combination of the shell as well as an interactive python session to complete this migration. You will need access to the master node in which the CryoSPARC system is hosted.

It is also recommended that a database backup is created before starting anything, and to not use CryoSPARC after the backup is created until the migration is complete. More details are at: [Setup, Configuration and Management](https://guide.cryosparc.com/setup-configuration-and-management/hardware-and-system-requirements)

## Use Cases

### A. Moving only raw particle, micrograph or movie data already imported into CryoSPARC

When raw data is imported into a CryoSPARC project, rather than copy the data into the project directory, symlinks are created inside the import job directories pointing to the original data files. Read [#7.-imported-data-and-symlinks-in-project-directories](https://guide.cryosparc.com/setup-configuration-and-management/guide-data-management-in-cryosparc-v4.0#7.-imported-data-and-symlinks-in-project-directories "mention")for more details.

If you're moving data that you used an "Import Particles", "Import Micrographs" or "Import Movies" job to bring into CryoSPARC, you will need to repair these jobs. When CryoSPARC imports these three types of data, it creates [symlinks](https://devdojo.com/tutorials/what-is-a-symlink) to each file inside the job's `imported` directory. These symlinks may become broken if the original path to the file no longer exists. You can check the status of the symlinks by running `ls -l` inside the `imported` directory of the job. Note: The "Import Templates" and "Import Volumes" jobs copy the specified files directly into the job directory.

#### Modify Project/Job symlinks

{% tabs %}
{% tab title="CryoSPARC v5.0+" %}
Start up an interactive python session

```bash
cryosparcm icli
```

Use the cli to find all the symlinks for an entire project or a single job.

```bash
>>> api.projects.get_symlinks(’P1’)

[{'exists': True,
'link_path': '/bulk8/data/dev_nwong_projects/P1/J2/imported/004525579726026751633_14sep05c_c_00003gr_00014sq_00011hl_00003es.frames.tif',
'link_target': '/bulk8/data/dev_nwong_projects/testdata/empiar_10025_subset/14sep05c_c_00003gr_00014sq_00011hl_00003es.frames.tif'},
…]
```

where

* `link_path` is the path to the symlink file
* `link_target` is the file the symlink points to
* `exists` indicates if the target file exists

Use the command `api.jobs.update_directory_symlinks(project_uid, job_uid, prefix_cut, prefix_new)` where `prefix_cut` is the beginning of the link you'd like to cut (e.g. `/data/EMPIAR`) and where `prefix_new` is what you'd like to replace it with (e.g. `/data`). This function will loop through every file inside the job directory, find all symlinks, and only modify them only if they start with `prefix_cut`. The function returns the number of links it modified. Below it is used in a loop to modify all jobs across all projects all at once.

```python
>>> jobs = api.jobs.find(project_uid="P1")
>>> failed_jobs = []
>>> for job in jobs:
        try:
            print(f"Repairing {job.full_uid}")
            modified_count = api.jobs.update_directory_symlinks(job.project_uid, job.uid, '/data/EMPIAR', '/data')
            print(f"Finished. Modified {modified_count} links.")
        except Exception as e:
            failed_jobs.append((job.project_uid, job.uid))
            print(f"Failed to repair {job.full_uid}: {str(e)}")
...
    
>>> failed_jobs
[]
```

{% endtab %}

{% tab title="CryoSPARC v4.0-v4.7.1" %}
Start up an interactive python session

```bash
cryosparcm icli
```

Use the cli to find all the symlinks for an entire project or a single job.

```bash
>>> cli.get_project_symlinks(’P1’)

or

>>> cli.get_job_symlinks(’P1’, ‘J3’)

[{'exists': True,
'link_path': '/bulk8/data/dev_nwong_projects/P1/J2/imported/004525579726026751633_14sep05c_c_00003gr_00014sq_00011hl_00003es.frames.tif',
'link_target': '/bulk8/data/dev_nwong_projects/testdata/empiar_10025_subset/14sep05c_c_00003gr_00014sq_00011hl_00003es.frames.tif'},
…]
```

where

* `link_path` is the path to the symlink file
* `link_target` is the file the symlink points to
* `exists` indicates if the target file exists

Use the command `cli.job_import_replace_symlinks(project_uid, job_uid, prefix_cut, prefix_new)` where `prefix_cut` is the beginning of the link you'd like to cut (e.g. `/data/EMPIAR`) and where `prefix_new` is what you'd like to replace it with (e.g. `/data`). This function will loop through every file inside the job directory, find all symlinks, and only modify them only if they start with `prefix_cut`. The function returns the number of links it modified. Below it is used in a loop to modify all jobs across all projects all at once.

```python
>>> failed_jobs = []
>>> for job in jobs:
        try:
            print("Repairing %s %s" % (job['project_uid'], job['uid']))
            modified_count = cli.job_import_replace_symlinks(job['project_uid'], job['uid'], '/data/EMPIAR', '/data')
            print("Finished. Modified %d links." % (modified_count))
        except Exception as e:
            failed_jobs.append((job['project_uid'], job['uid']))
            print("Failed to repair %s %s: %s" % (job['project_uid'], job['uid'], str(e)))
...
    
>>> failed_jobs
[]
```

{% endtab %}

{% tab title="CryoSPARC ≤v3.3" %}
Start up an interactive python session

```
cryosparcm icli
```

Execute a MongoDB query for all potentially affected "import" jobs

```python
>>> jobs = list(db.jobs.find({'deleted': False, 'job_type': {'$in': ['import_particles', 'import_movies', 'import_micrographs']}}, {'_id': 0, 'project_uid': 1, 'uid': 1, 'job_type': 1}))
>>> print(jobs)
[{'job_type': 'import_movies', 'project_uid': 'P1', 'uid': 'J3'},
{'job_type': 'import_movies', 'project_uid': 'P2', 'uid': 'J3'},
{'job_type': 'import_movies', 'project_uid': 'P1', 'uid': 'J41'},
{'job_type': 'import_particles', 'project_uid': 'P1', 'uid': 'J42'},
{'job_type': 'import_movies', 'project_uid': 'P2', 'uid': 'J29'},
{'job_type': 'import_particles', 'project_uid': 'P2', 'uid': 'J51'},
{'job_type': 'import_movies', 'project_uid': 'P2', 'uid': 'J55'},
{'job_type': 'import_movies', 'project_uid': 'P2', 'uid': 'J64'},
...]
```

From this point, you can take a look into each list job's `imported` directory

```python
>>> cli.get_job_dir_abs('P1', 'J3')
'/data/cryosparc_projects/P1/J3'
    
>>> !ls -l /data/cryosparc_projects/P1/J3/imported
total 99
lrwxrwxrwx 1 cryosparcuser cryosparcuser 83 Jan 31  2018 14sep05c_00024sq_00003hl_00002es.frames.mrc -> /data/EMPIAR/10025/data/14sep05c_raw_196/14sep05c_00024sq_00003hl_00002es.frames.mrc
lrwxrwxrwx 1 cryosparcuser cryosparcuser 83 Jan 31  2018 14sep05c_00024sq_00003hl_00005es.frames.mrc -> /data/EMPIAR/10025/data/14sep05c_raw_196/14sep05c_00024sq_00003hl_00005es.frames.mrc
lrwxrwxrwx 1 cryosparcuser cryosparcuser 83 Jan 31  2018 14sep05c_00024sq_00004hl_00002es.frames.mrc -> /data/EMPIAR/10025/data/14sep05c_raw_196/14sep05c_00024sq_00004hl_00002es.frames.mrc
lrwxrwxrwx 1 cryosparcuser cryosparcuser 83 Jan 31  2018 14sep05c_00024sq_00006hl_00003es.frames
```

Use the command `cli.job_import_replace_symlinks(project_uid, job_uid, prefix_cut, prefix_new)` where `prefix_cut` is the beginning of the link you'd like to cut (e.g. `/data/EMPIAR`) and where `prefix_new` is what you'd like to replace it with (e.g. `/data`). This function will loop through every file inside the job directory, find all symlinks, and only modify them only if they start with `prefix_cut`. The function returns the number of links it modified. Below it is used in a loop to modify all jobs across all projects all at once.

```python
>>> failed_jobs = []
>>> for job in jobs:
        try:
            print("Repairing %s %s" % (job['project_uid'], job['uid']))
            modified_count = cli.job_import_replace_symlinks(job['project_uid'], job['uid'], '/data/EMPIAR', '/data')
            print("Finished. Modified %d links." % (modified_count))
        except Exception as e:
            failed_jobs.append((job['project_uid'], job['uid']))
            print("Failed to repair %s %s: %s" % (job['project_uid'], job['uid'], str(e)))
...
    
>>> failed_jobs
[]
```

{% endtab %}
{% endtabs %}

### B. Moving Only CryoSPARC Project Directories (and all jobs inside them)

For instructions on moving cryoSPARC project directories in v4.0+, see [#use-case-moving-a-project-directory-from-one-storage-location-to-another](https://guide.cryosparc.com/setup-configuration-and-management/guide-data-management-in-cryosparc-v4.0#use-case-moving-a-project-directory-from-one-storage-location-to-another "mention")

<details>

<summary>Instructions for CryoSPARC ≤ v3.3</summary>

If you're moving the locations of the projects and their jobs, you will need to point CryoSPARC to the new directory where the projects reside. Jobs inside CryoSPARC are referenced by their relative location to their project directory. This allows a user to specify a new location for the project directory only, rather than each job.

#### Update a Single Project

`cryosparcm cli "update_project('PXX', {'project_dir' : '/new/abs/path/PXX'})"`

* Where `'PXX'` is the project UID and `'/new/abs/path/PXX'` is the new directory.&#x20;

#### Updating Multiple Projects

#### Step One - Identify All Project Directories

Start up an interactive python session

```
cryosparcm icli
```

Execute a MongoDB query to list all project directories

```python
>>> projects = list(db['projects'].find({}, {'uid': 1, 'project_dir': 1, '_id': 0}))
>>> projects
    [{'project_dir': '/data/cryosparc_projects/P1', 'uid': 'P1'},
     {'project_dir': '/data/cryosparc_projects/P2', 'uid': 'P2'},
     {'project_dir': '/data/cryosparc_projects/P3', 'uid': 'P3'},
     {'project_dir': '/data/cryosparc_projects/P4', 'uid': 'P4'},
    ...]
```

#### Step Two - Modify One or Many Project Directories

Use the command `update_project(project_uid, attrs, operation='$set')` where `attrs` is a dictionary whose keys correspond to the fields in the project document to update. In the following example, `update_project` used in a loop to modify all project directory paths.

```python
>>> failed_projects = []
>>> new_parent_dir = '/cryoem/cryosparc_projects'
>>> for project in projects:
        new_project_dir = os.path.join(new_parent_dir, os.path.basename(project['project_dir']))
        try:
            print("Modifying project directory for %s: %s --> %s" % (project['uid'], project['project_dir'], new_project_dir))
            cli.update_project(project['uid'], {'project_dir': new_project_dir})
        except Exception as e:
            failed_projects.append(project['uid'])
            print("Failed to update %s: %s" % (project['uid'], str(e)))
...

>>> failed_projects
[]
```

</details>

### C. Moving the CryoSPARC database

The CryoSPARC database doesn't necessarily have to be in the same location as the CryoSPARC installation directories. To move the database location, use the following steps.

{% hint style="warning" %}
Ensure that CryoSPARC project directories are not modified while the database is being moved or copied. If the database and project direcotires become out-of-sync, CryoSPARC may not function correctly.
{% endhint %}

#### Step One - Shut down CryoSPARC

```
cryosparcm stop
```

#### Step Two - Move the Database

```
rsync -r --links /data/cryosparc/cryosparc_database/* /new/path/cryosparc_database
```

#### Step Three - Modify Configurations

Navigate to the `cryosparc_master` directory

```
cd /data/cryosparc/cryosparc_master
```

Modify `config.sh` to contain the new directory path to the database

```
nano config.sh
    
#modify the line below
export CRYOSPARC_DB_PATH="/new/path/cryosparc_database"
```

#### Step Four - Start CryoSPARC again

```
cryosparcm start
```

### D. Hosting CryoSPARC on another machine

If you have a CryoSPARC instance where the master application was running on one machine and now need to run the master on a new machine, use the following steps.

If the CryoSPARC master installation directory (i.e. `cryosparc_master` ) resides on a filesystem that is shared and mounted at the same location on both the old and new machine, use **Option 1** which is simplest. This may be the case for example if CryoSPARC was installed in a user home directory such as `/home/cryosparcuser/cryosparc_master` and home directories are shared across machines in your setup.

If the CryoSPARC installation directory is not on shared storage, use **Option 2**.&#x20;

{% hint style="warning" %}
In either case, the new machine must have project directories and raw data directories mounted at the same locations as the old machine, and must have access to the same worker nodes and cluster schedulers as the old machine.
{% endhint %}

### Option 1: `cryosparc_master` is on a shared Filesystem

1. Shut down CryoSPARC on the old machine.

```
cryosparcm stop
```

2. On the new machine, **log in as the same user as on the old machine**, and navigate to the CryoSPARC installation directory and modify the configuration:

Navigate to the cryosparc\_master directory

```
cd /data/cryosparc/cryosparc_master
```

Modify config.sh to list the new master node hostname

```
nano config.sh
    
#modify the line below
export CRYOSPARC_MASTER_HOSTNAME="newnode"
```

3. On the new machine, start CryoSPARC using `cryosparcm start`&#x20;

### Option 2: Installation is not shared

In this case, follow the [Installation guide](https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc) to install a fresh copy of CryoSPARC on the new machine, **using the same LICENSE\_ID as was used on the old machine.** Then:

1. On the old machine, shut down CryoSPARC using `cryosparcm stop` .
2. Copy the CryoSPARC database directory from the old machine to the new machine, for example at `/new/path/cryosparc_database`.&#x20;
3. On the new machine, shut down CryoSPARC using `cryosparcm stop`.
4. On the new machine, edit the `cryosparc_master/config.sh` file to point to the new database path by editing `export CRYOSPARC_DB_PATH=/new/path/cryosparc_database` .&#x20;
5. On the new machine, in the `cryosparc_master/config.sh` file, add any configuration variable that were present in the same file on the old machine, if relevant.
6. On the new machine, start CryoSPARC using `cryosparcm start` .
