Workflows

Introduction

Workflows are a new system in CryoSPARC for quickly and easily populating a workspace with pre-defined sets of jobs. The system is designed with flexibility at its core. This means that it can be used to construct a top to bottom automated pipeline of jobs that can take you from import to refinement, a predefined branch of jobs for exploratory processing, or an arbitrary set of disconnected jobs and branches that can then be run independently or connected using preexisting systems.

Workflows can be self contained trees extending from import jobs, or branches dependent on any number of parent jobs created independently. This design allows workflows to build repeatable pipelines when the optimal path is known, while also retaining the ability to rapidly create branches for exploratory processing.

At a high level, workflows are designed to facilitate consistency when processing data using repeatable strategies. They are meant as a way to preload a series of jobs with their inputs connected and parameters set.

The process of using workflows is broken down into two steps. First a user must select a set of jobs as a template and use those to create the workflow. Once created the workflow can then be implemented in any number of project workspaces.

Creating a Workflow

We will use the EMPIAR-10025 T20S Proteasome dataset and the general processing strategy presented in the CryoSPARC introductory tutorial to demonstrate how workflows can be created and applied.

We will begin in a workspace with a successful run-through of this processing pipeline. If you have processed this dataset before, the job chain will likely look familiar. In this chain, we perform motion correction and CTF estimation of the movies, pick particles, clean with 2D classification, and finally perform ab-initio and homogeneous reconstruction.

First, we will select all of the jobs in this chain using either the multi-select mechanic (command/control + click), or simply selecting the first and last job in the chain and using the “Select Job Chain” quick action.

Once all of the jobs in the chain have been selected, the “Create Workflow” dialog can be opened by either clicking on the option in the quick actions menu or on the footer button in the multi-selection sidebar.

Using either of these options will open the “Create Workflow” dialog which is automatically populated with the selected jobs.

The “Create Workflow” dialog is broken into two panels:

  • Configuration Panel:

    • A settings section at the top allows you to set a title, category, and description for the workflow.

    • Below the settings section is a sequential set of job panels with configurable details and parameters.

      • Each parameter can be customized with a predefined value that is the default when the workflow is applied.

      • Reseting a parameter in this view will set it back to the job’s default parameter value.

      • Visibility can be toggled for each parameter, which defines whether or not the parameter can be seen when applying the workflow.

      • Locked status can be toggled for each parameter, which defines whether or not the parameter value can be changed from the default when applying the workflow.

  • Tree View

    • The tree view is a graph representing all of the jobs and input/output connections in the workflow. It allows the workflow to be visualized more easily from a spacial perspective and can be used as a map for navigating the workflow jobs when setting parameters.

    • Job nodes are colour coded for legibility.

      • Default job nodes are set to a light grey with a blue accent for their ID.

      • The currently selected (building) job will be coloured purple.

      • Jobs that have modified parameters will be coloured green.

      • Disabled job nodes will appear light grey (these are nodes with no editable parameters).

      • Parent job nodes will appear light purple with a dotted border during creation of the workflow. During application of the workflow these jobs will be colour coded to their status (as denoted at the top of the configuration panel).

    • Clicking on a job node will select the job and automatically scroll the configuration panel to it.

To create our example T20S workflow we will add a title of “Processing Pipeline”, a category of “T20S”, and we will leave all of the default parameters as they are. To complete the creation process we will click the green “Create” button at the bottom right of the dialog.

Applying a Workflow

Now that we’ve created our T20S workflow we can apply it. Navigate to the “Workflows” sidebar panel:

From here we can click on our template to open the Apply Dialog:

The Apply Dialog is largely identical to the Create Dialog in composition and layout. The major differences reside in the left-hand configuration panel.

  • The settings section contains a Queue on Apply option. This allows you to set all jobs to queue as soon as the template is applied. The Queue to Lane option allows you to choose the lane the jobs will queue onto during application if you toggled the Queue on Apply option.

  • The proceeding job panels are structured very similarly to the Create Dialog and include all of the parameters that were exposed during the workflow’s creation.

    • Jobs that had no parameters set to a custom value or made visible during creation will not be shown, and will be coloured grey in the tree view.

    • Locked parameters are read-only in this view, and are demarcated with a lock icon.

    • Reseting a parameter in this view will set it back to the custom value defined during creation.

  • The footer includes an “Apply” button to deploy the workflow into the current workspace, and a “Repeat” Button, which allows you to deploy the workflow and then open an identically configured Apply Dialog for quickly creating divergent branches or multiple exploratory pipelines.

For this example we will maintain our default parameter values, select the “Queue on Apply” option, and apply the workflow in the tree view by clicking the “Apply” button. All of the jobs included in the workflow will now be automatically created, connected together, and queued onto the selected lane.

Modifying a Workflow

Workflows cannot be edited or deleted unless you are the creator of the workflow or an administrator.

Workflows can be modified from the Workflows sidebar panel by clicking the triple dot overflow button beside the individual workflow you would like to modify. This will open an overflow menu with multiple options for modifying and interacting with the workflow.

Editing a Workflow

Clicking the “Edit” option in the overflow menu will open the Edit Dialog. This dialog is identical to the Create Dialog and allows you to modify all parameters and settings to your liking. Clicking update will save the workflow with all of the updated values.

Forking a Workflow

Clicking the “Fork” option will open the Fork Dialog and allow you to create a new workflow using the selected workflow as the base. This dialog is identical to the Create Dialog but with all of the settings (minus the title) and all of the custom parameters preset to mirror the workflow that is being forked. From here the new workflow can be modified to your liking and created with a new title.

Deleting a Workflow

Clicking the “delete” option will open a confirmation popover which, upon confirming the deletion, will irreversibly remove the workflow from the instance.

Workflows with Parent Connections

Workflows are designed to support both self-contained pipelines as well as exploratory processing. So far in this example we have been focused primarily on the former, so now let’s turn our attention to the latter.

Exploratory processing requires branching off from a singular pipeline in order to try different processing strategies simultaneously. For example, you may want to try building multiple refinement jobs of the same or different types, with a variety of divergent parameters set, and run them in parallel with the intention of seeing which strategy yields the best results. In this scenario, you might create a workflow containing a preferred setup of these jobs, beginning with a job to generate the initial volume (Ab-Initio Reconstruction) and a spread of refinement jobs.

In order to accommodate this style of processing, the workflows feature includes a concept of “Parent Connections”. Parent jobs are jobs that a workflow does not contain, but are required externally by the workflow. The parent to the aforementioned example workflow would be a job that outputs particles, such as a Select 2D, which would be connected into the Ab-Initio. Lets take a look at how this is done in practice.

Creating a Workflow with Parent Connections

We will create a new workflow that begins with a “Select 2D” job as the parent, an “Ab-Initio” job as the first child, and a variety of refinement jobs connected to its outputs.

First we select the “Ab-initio” job and each of the refinement jobs using the multi-select mechanic (or the “Select Descendant Jobs” option from the quick actions menu). From here we will open the Create Dialog using the “Create Workflow” option in the quick actions menu. We now see our selected jobs in the tree view, with a “Select 2D” parent job connected to the “Ab-initio”. This parent job is indicated by it’s divergent colouring and “Px” ID (as opposed to “Jx”).

The “Select 2D” parent job is a dependency of this workflow, because the “Ab-intio” relies on its inputs to function. The parent job will not be created by the workflow when it is deployed, but must be selected in order to allow the workflow to connect to its outputs.

Applying a Workflow with Parent Connections

We will now take our “Refinement Fork” workflow and apply it to a partially completed version of the T20s pipeline we worked on previously, that is currently stopped after a “Select 2D” job.

We will now use that “Select 2D” job to connect the particles output to our workflow by selecting it and then opening the workflow apply dialog from the sidebar. Once the dialog is open we can see that the “Select 2D” job is indicated as connected by the green notice at the top of the configuration panel, as well as its corresponding green colouring in the tree view.

Applying the workflow will create and connect all of the workflow jobs as expected, as well as automatically connecting the outputs of the parent “Select 2D” job with the created “Ab-initio” job.

We now have our “Refinement Fork” workflow deployed and queued. It has been automatically connected to our “Select 2D” job and will use the particle output to populate its input requirements.

Missing Parent Connections

If a workflow requires a parent connection to populate all of its necessary inputs, but no suitable parent job has been selected, it will display a “Missing Parent Connections” notice and the parent node in the tree view will be coloured orange.

You must exit the apply dialog by clicking the “Cancel” button or pressing the Escape key and select a suitable parent job within your workspace in order to create the connection. Once you have selected a suitable job, open the dialog by selecting the workflow from the sidebar again. It will now display a “Connected Parents” notice and the parent job node in the tree view will be coloured green.

Rerouted Parent Connections

In some circumstances you may want to use a particular workflow with a different parent job than that which it was initially created, for example connecting an “Import Movies” job instead of an “Import Micrographs” job. Workflows includes a system to reroute input connections to facilitate this option. Any replacement job must output all of the required inputs and their necessary lower level slots that are consumed by jobs across the workflow. If these requirements are met, the original parent job will be replaced by the new selected parent job and all of the inputs will be connected as expected.

This system is automatic and does not need any configuration. If a suitable replacement job is selected it will be substituted in the workflow automatically.

Annotating a Workflow

Workflows are meant to facilitate consistency in complexity, and as such will often be composed of a variety of interconnected jobs with pre-set parameters. In order to maintain intelligibility, especially over time and between users, workflows include a suite of annotation options.

Workflow Details

This is the highest level of annotation, and is meant to be used to give details relevant to the entire workflow. This may include links to related documentation, academic papers, and/or a relevant dataset. It can also be helpful to add any notes here about how the workflow is expected to be used. Workflow details can be added or changed in the configuration section of the Create Dialog or Edit Dialog.

These notes are displayed in the workflow sidebar tooltip that appears when hovering a workflow, as well as in the configuration section in the Apply Dialog in a “Details” panel that is closed by default.

Job Details

Job details operate nearly identically to those found in the job builder. They allow you to apply annotations on a job by job basis by adding a title and/or description. When creating a workflow these fields will be pre-populated with any preexisting titles or descriptions that the original jobs were given. When applying the workflow the titles and descriptions can be edited in the Apply Dialog, and will be added to the jobs that are created by the workflow.

Parameter Details

Each parameter includes an option for annotation. This can be accessed by clicking the “Additional Options” toggle button to the far right of the parameter.

From here you can add any relevant notes about the parameter, for instance why it was set to the current value and/or in what situations it should be changed. When viewing this workflow in the Apply Dialog an info icon will be shown beside the annotated parameter title. When mousing over the icon the added note will appear inside of an info tooltip.

Flagging Parameters

It can be desirable to add a requirement to change certain job parameters in a workflow to facilitate its use with different datasets. There are many situations where a parameter will almost always need to be changed in order to run the workflow successfully, for example import paths or box size. Workflows allows you to flag a parameter as a pseudo-requirement to address this situation.

You can access the flag button from the same “Additional Options” parameter toggle button used to add parameter notes. By clicking the flag button in the top right of the panel you can flag the parameter.

When viewing the Apply Dialog the flagged parameter will now have an orange outline and a flag icon beside its title, the corresponding job node in the tree view will be accompanied by a flag icon. On the dialog footer you will see a “Flagged Parameters” tracker that shows the total number of flagged parameters and number of updated flagged parameters. Clicking this button will reveal a menu checklist of flagged parameters, organized by job, with a check mark circle indicating whether it has been updated or not. Clicking a parameter in the menu will navigate you to it.

It is important to note that flagged parameters are not hard requirements. A workflow can be deployed without its flagged parameters having been updated. This is to maintain flexibility and not lock a user into updating a parameter they do not wish to.

Once you have added a title and modified included job parameters to your liking, you can navigate to the dialog footer and click the “Create” button to create the workflow.

Importing / Exporting a Workflow

Workflows are designed from the ground up for portability. This means that they can be easily saved, stored, and shared between instances. Workflows contain no identifying information of the instance that they were created in, and no references to the jobs that were used to create them. This allows workflows to be a powerful tool for sharing successful data processing pipelines or strategies, maintaining a catalog of your proprietary processing workflows, or cooperatively iterating on a processing approach agnostic of institution or instance.

Exporting a Workflow

A workflow can be exported by clicking on the the triple dot overflow button beside the individual workflow you would like to export. Clicking on the “Export” option in the menu will download the workflow to your device. The downloaded file is a .json (JavaScript Object Notation) file and can be easily inspected using any modern web browser. This compact file includes all of the necessary information to recreate a workflow in any CryoSPARC instance (running on a version current to or greater than the introduction of the workflows feature).

Importing a Workflow

A workflow can be imported by clicking the “Import Workflow” button on the footer of the Workflows sidebar. This will open a file browser where you can find and upload a previously exported workflow .json file.

Once selected the file will be imported into your instance and will appear in the Workflows sidebar like any other workflow. The imported template has no special properties outside of a imported attribute to demarcate it as created outside of the instance. The imported workflow can be used, modified, and exported like any other.

Rebuilding a Workflow

In order to add or remove jobs from a workflow, or to modify a job’s input connections within a workflow, you must create a new version of said workflow. The best practice is to apply a workflow into a new (or existing) workspace without queueing it, modify the jobs as needed, and then select all jobs and create a new workflow. This allows you to leverage the powerful tools that already exist in CryoSPARC for creating, deleting, and connecting or disconnecting jobs.

Rebuilding Walkthrough

Enter a new or existing workspace in which you would like to edit your workflow. Navigate to your workflow using the “Workflows” sidebar, open the Apply Dialog, and apply the workflow into the workspace.

Once the workflow has been successfully applied, you can modify it as you would any other set of jobs in CryoSPARC.

After modification is complete, you can select all jobs using any of the multi-selection methods available and access the Create Workflow option through the multi-select sidebar footer or the job card quick actions menu.

From here parameters, annotations, and flags can be set as for the new workflow. Note that your workflow category and description, as well as your parameter annotations will not be retained in the new workflow. Job titles and descriptions will, as they live with the jobs created by the original workflow.

Workflow category, description, and parameter annotations cannot be retained in a new workflow.

Summary

CryoSPARC's Workflows allow for the creation of pre-defined sets of jobs for data processing. Workflows can be self-contained trees or branches dependent on parent jobs. They facilitate consistency in processing data and can be used for pipelining known quantities or breaking up different steps during exploration. Workflows can be created by selecting jobs and opening the "Create Workflow" dialog, which allows for customization of parameters and visibility. Workflows can be modified, forked, and deleted, and can be exported and imported as JSON files.

Limitations

Currently jobs that create their outputs while in running status rather than when in building status cannot be automatically linked together when applying a workflow. This will cause any job chain to break where these jobs are present.

However, each chain will still have all of its parameters set, applicable jobs queued, and jobs created with a workflow tag. The branches can then be connected once the initial trunk has run to completion and the last job in it has run and generated its outputs.

Example jobs that cause this behaviour:

  • Import Particle Sets

  • Particle Sets Tool

Last updated