Automated Workflows

Get started with automated, end-to-end data processing in CryoSPARC.

Automated repeat-target structure determination

Single particle cryo-EM is a highly valuable technique for life sciences and drug discovery. Currently, obtaining state of the art results from cryo-EM data analysis requires a human in the loop to analyze intermediate results and make image processing decisions. This bottleneck limits the achievable throughput of structural characterization, especially in high-throughput settings such as structure-based drug design.

In the link below, we describe the development of an end-to-end automation strategy using new tools built in CryoSPARC for “repeat-target” structure determination. We demonstrate our automation strategy on 21 G protein-coupled receptor (GPCR) datasets including both active and inactive states. In nearly all cases, the automated workflow meets or exceeds the published resolution and map quality. Our results demonstrate that it is now possible to completely automate the data processing workflow for repeat-target scenarios, and to obtain a high quality particle stack, consensus reconstruction and local refinement in the ligand binding region that are suitable for model building. Using the Workflows tool in CryoSPARC and the strategy described here, users can replicate, adapt and extend the automated workflow for their own targets.

Download Workflow JSON and Inputs

Below we provide a link to download a .zip archive containing CryoSPARC Workflow JSON files and required inputs which can be imported into CryoSPARC v4.7.1+ and used to process GPCR datasets like the ones we have tested. Two versions of the Workflow file are provided: one that processes raw movies directly from import, and another that processes micrographs that are generated after pre-processing in CryoSPARC Live. Please refer to the paper for details.

Using the GPCR workflow file on your own GPCR datasets

The instructions below outline how to take our GPCR Workflow file and use it on your own GPCR datasets (i.e., targets of the same “class”).

  1. Take the volumes you downloaded above. Upload these volumes to your compute setup. They do not need to be located within the project.

  2. Import workflow JSON file into CryoSPARC using the workflow panel in the sidebar.

  3. Select the workflow from the list, set all parameters, and choose the compute node to utilize. Parameters that might need to be adjusted (on a per dataset basis) are below:

    Import movies

    • Movies data path

    • Gain reference path

    • Raw pixel size (Å)

    • Accelerating voltage (kV)

    • Spherical aberration (mm)

    • Total exposure dose (e/Å^2)

    • Flip gain ref and defect file in Y

    Import 3D Volumes

    • Path to volumes/masks for GPCR reference, junk volumes, and the receptor mask

    Patch Motion Correction

    • Output F-crop factor

    Exposure Group Utilities

    • Regular expression string

    Extract from micrographs (x2)

    • Extraction box size

      Note: if using our reference, this should be 310.6 divided by the pixel size of the motion corrected micrographs (rounded to the nearest even pixel - e.g. 361.18 —> 362).

  4. Click on the green “Apply” button in the bottom of the workflow GUI.

  5. Download and inspect your final volumes, and proceed with any advanced processing if desired.

Adapting the GPCR workflow for other target classes

Performing automated processing for a new target class is straightforward in CryoSPARC. Our strategy can be applied to any type of target, including but not limited to, membrane proteins, soluble proteins, nucleic acid samples, nucleoprotein targets, small proteins and large complexes. As described in the paper, it is necessary to gather and set the target-class level inputs (a low-resolution reference map and masks, junk volumes, particle diameter, and particle separation distance) and optionally the workflow level inputs (number of rounds of decoy classification, 2D and 3D selection thresholds). These can then remain fixed for each dataset in the class.

Below is a simple protocol for setting up a new automation Workflow for a new target class, by manually processing an exemplar dataset and then saving the resulting processing steps as a new Workflow. Once this setup is done, the Workflow can be saved and re-used in a single click for new datasets.

  1. Decide on a definition of the target class (e.g., GLP-1 receptor with Nb6 nanobody)

  2. Choose an exemplar cryo-EM dataset from the target class to use for instantiating the workflow.

  3. Find a reference density map for the target class, for example from a previous manual processing of the exemplar dataset, from EMDB, or from structure prediction tools. This reference should be as similar as possible to the target class, but only needs to be approximately 15Å in resolution. Care should be taken to ensure that box sizes are appropriate.

  4. As a starting point, import the GPCR automation Workflow JSON file (available here) into CryoSPARC v4.7.1+. Apply the workflow in a new project, but do not queue all the jobs.

    1. Modify the Import Movies job to import the exemplar dataset and associated microscope parameters including exposure groups.

    2. Modify the Import Volumes jobs to import the reference density for this target class.

  5. Run each job in the Workflow to process the exemplar dataset. At each of the following points, inspect results and re-run jobs to set parameters, as appropriate, before proceeding to the next job:

    1. Blob picking: adjust particle diameter and particle separation distance for blob picking, and visualize picks using Inspect Particle Picks to confirm.

    2. Reference Based Auto Select 2D: adjust selection thresholds if necessary so that only good classes are selected.

    3. Template picking: update parameters using the values from Blob picking.

    4. Decoy classification: import existing junk volumes from a similar target class, or produce new junk volumes by running Ab-initio Reconstruction and terminating it early or manually editing and importing volumes. Add additional rounds of decoy classification if a single round has not sufficiently curated particles.

    5. Reference Based Auto Select 3D: adjust selection thresholds if necessary.

    6. Local Refinement: produce a local mask around the region of interest using Volume Tools. Masks should have adequate dilation (3-5Å) and a very soft edge (3x dilation distance).

  6. In addition to the above, if prior processing experience with the target class is available, any other processing parameters (e.g. in exposure curation, 2D classification, refinements, etc) can be modified to optimize for the target class.

  7. Once the Workflow is working for the exemplar dataset, select all the jobs and save as a new Workflow.

  8. The new Workflow can now be re-used in a single click for fully automated processing of new datasets from the target class.

Practical tips: uploading files and using Workflows

Uploading volumes and masks for use in workflow

All volumes and masks need to be within your filesystem including reference volumes, junk volumes, and masks, but they do not necessarily need to be located within the project where you intend to launch the automation workflow. The easiest way to upload files from your local filesystem to your compute infrastructure for use within CryoSPARC is to drag the file (or multiple files) to any CryoSPARC browser window:

  • Files uploaded to CryoSPARC through the browser are added to a directory named uploads in the selected CryoSPARC project directory. The upload dialog lists all files already in the uploads directory in the first panel.

  • Once the uploads have completed, the Upload Files dialog can be closed (by clicking the green Done button) or more files can be uploaded by dragging and dropping into the dialog.

  • The Upload Local Files guide page has more tips and guidance on the CryoSPARC browser upload system.

If these volumes will be used across multiple CryoSPARC projects, you can make a directory within your compute filesystem where these volumes can be uploaded. Use scp from the command line or an SFTP software to upload them to the appropriate directory.

Importing the workflow JSON file

A workflow can be imported by clicking the “Import Workflow” button (red arrow) on the footer of the Workflows sidebar. This will open a file browser where you can find and upload a workflow .json file from your local filesystem.

Once selected the file will be imported into your instance and will appear in the Workflows sidebar like any other workflow. The imported template has no special properties outside of a imported attribute to demarcate it as created outside of the instance. The imported workflow can be used, modified, and exported like any other.

The automated processing workflows for GPCRs will be placed into a group titled Automated Workflows.

Applying the workflow

Navigate to the “Workflows” sidebar panel:

From here, click on the appropriate workflow to open the Workflow Apply Dialog:

  • The settings section contains a Queue on Apply option. This allows you to set all jobs to queue as soon as the template is applied. The Queue to Lane option allows you to choose the lane the jobs will queue onto during application if you toggled the Queue on Apply option.

  • The proceeding job panels include all of the parameters that were exposed during the workflow’s creation.

    • Jobs that had no parameters set to a custom value or made visible during creation will not be shown, and will be coloured grey in the tree view.

    • Locked parameters are read-only in this view, and are denoted with a lock icon.

    • Resetting a parameter in this view will set it back to the custom value defined during creation.

    • Parameters that are flagged for adjustment prior to running are highlighted in orange and can easily be navigated to using the ‘Flagged’ parameters menu.

Adjusting parameters in a workflow

The Workflows functionality allows for parameters to be flagged as a pseudo-requirement to running the workflow of which there are multiple flagged parameters in the provided workflows.

On the dialog footer you will see a “Flagged Parameters” tracker that shows the total number of flagged parameters and number of updated flagged parameters. Clicking this button will reveal a menu checklist of flagged parameters, organized by job, with a check mark-circle indicating whether it has been updated or not. Clicking a parameter in the menu will navigate to it.

Changing the value of a flagged parameter will update its colour to green and mark it with a green check mark.

In these workflows the parameters that will need to be updated include:

  1. Import movies

    1. Movies data path

    2. Gain reference path

    3. Raw pixel size (Å)

    4. Accelerating voltage (kV)

    5. Spherical aberration (mm)

    6. Total exposure dose (e/Å^2)

    7. Flip gain ref and defect file in Y

  2. Import 3D Volumes

    1. Path to volumes/masks for GPRC reference, junk volumes, and the receptor mask

  3. Patch Motion Correction

    1. Output F-crop factor

  4. Exposure Group Utilities

    1. Regular expression string

  5. Extract from micrographs (x2)

    1. Extraction box size

It is important to note that flagged parameters are not hard requirements. A workflow can be deployed without its flagged parameters having been updated. This is to maintain flexibility and not lock a user into updating a parameter they do not wish to.

Executing the workflow

The footer includes an “Apply” button to deploy the workflow into the current workspace, and a “Repeat” Button, which allows you to deploy the workflow and then open an identically configured Apply Dialog for quickly repeating the pipeline.

For this example, we will maintain our default parameter values, select the “Queue on Apply” option, and apply the workflow in the tree view by clicking the “Apply” button. All of the jobs included in the workflow will now be automatically created, connected together, and queued onto the selected lane.

More details about all things workflow related can be found on the Workflows guide page, including how to modify or rebuild a workflow for further customization.

Last updated