Workflows
Last updated
Last updated
Workflows are a new system in CryoSPARC for quickly and easily populating a workspace with pre-defined sets of jobs. The system is designed with flexibility at its core. This means that it can be used to construct a top to bottom automated pipeline of jobs that can take you from import to refinement, a predefined branch of jobs for exploratory processing, or an arbitrary set of disconnected jobs and branches that can then be run independently or connected using preexisting systems.
Workflows can be self contained trees extending from import jobs, or branches dependent on any number of parent jobs created independently. This design allows workflows to build repeatable pipelines when the optimal path is known, while also retaining the ability to rapidly create branches for exploratory processing.
At a high level, workflows are designed to facilitate consistency when processing data using repeatable strategies. They are meant as a way to preload a series of jobs with their inputs connected and parameters set.
The process of using workflows is broken down into two steps. First a user must select a set of jobs as a template and use those to create the workflow. Once created the workflow can then be implemented in any number of project workspaces.
We will use the EMPIAR-10025 T20S Proteasome dataset and the general processing strategy presented in the CryoSPARC introductory tutorial to demonstrate how workflows can be created and applied.
We will begin in a workspace with a successful run-through of this processing pipeline. If you have processed this dataset before, the job chain will likely look familiar. In this chain, we perform motion correction and CTF estimation of the movies, pick particles, clean with 2D classification, and finally perform ab-initio and homogeneous reconstruction.
First, we will select all of the jobs in this chain using either the multi-select mechanic (command/control
+ click
), or simply selecting the first and last job in the chain and using the “Select Job Chain” quick action.
Once all of the jobs in the chain have been selected, the “Create Workflow” dialog can be opened by either clicking on the option in the quick actions menu or on the footer button in the multi-selection sidebar.
Using either of these options will open the “Create Workflow” dialog which is automatically populated with the selected jobs.
The “Create Workflow” dialog is broken into two panels:
Configuration Panel:
A settings section at the top allows you to set a title, category, and description for the workflow.
Below the settings section is a sequential set of job panels with configurable details and parameters.
Each parameter can be customized with a predefined value that is the default when the workflow is applied.
Reseting a parameter in this view will set it back to the job’s default parameter value.
Visibility can be toggled for each parameter, which defines whether or not the parameter can be seen when applying the workflow.
Locked status can be toggled for each parameter, which defines whether or not the parameter value can be changed from the default when applying the workflow.
Tree View
The tree view is a graph representing all of the jobs and input/output connections in the workflow. It allows the workflow to be visualized more easily from a spacial perspective and can be used as a map for navigating the workflow jobs when setting parameters.
Job nodes are colour coded for legibility.
Default job nodes are set to a light grey with a blue accent for their ID.
The currently selected (building) job will be coloured purple.
Jobs that have modified parameters will be coloured green.
Disabled job nodes will appear light grey (these are nodes with no editable parameters).
Parent job nodes will appear light purple with a dotted border during creation of the workflow. During application of the workflow these jobs will be colour coded to their status (as denoted at the top of the configuration panel).
Clicking on a job node will select the job and automatically scroll the configuration panel to it.
To create our example T20S workflow we will add a title of “Processing Pipeline”, a category of “T20S”, and we will leave all of the default parameters as they are. To complete the creation process we will click the green “Create” button at the bottom right of the dialog.
Now that we’ve created our T20S workflow we can apply it. Navigate to the “Workflows” sidebar panel:
From here we can click on our template to open the Apply Dialog:
The Apply Dialog is largely identical to the Create Dialog in composition and layout. The major differences reside in the left-hand configuration panel.
The settings section contains a Queue on Apply
option. This allows you to set all jobs to queue as soon as the template is applied. The Queue to Lane
option allows you to choose the lane the jobs will queue onto during application if you toggled the Queue on Apply
option.
The proceeding job panels are structured very similarly to the Create Dialog and include all of the parameters that were exposed during the workflow’s creation.
Jobs that had no parameters set to a custom value or made visible during creation will not be shown, and will be coloured grey in the tree view.
Locked parameters are read-only in this view, and are demarcated with a lock icon.
Reseting a parameter in this view will set it back to the custom value defined during creation.
The footer includes an “Apply” button to deploy the workflow into the current workspace, and a “Repeat” Button, which allows you to deploy the workflow and then open an identically configured Apply Dialog for quickly creating divergent branches or multiple exploratory pipelines.
For this example we will maintain our default parameter values, select the “Queue on Apply” option, and apply the workflow in the tree view by clicking the “Apply” button. All of the jobs included in the workflow will now be automatically created, connected together, and queued onto the selected lane.
Workflows cannot be edited or deleted unless you are the creator of the workflow or an administrator.
Workflows can be modified from the Workflows sidebar panel by clicking the triple dot overflow button beside the individual workflow you would like to modify. This will open an overflow menu with multiple options for modifying and interacting with the workflow.
Clicking the “Edit” option in the overflow menu will open the Edit Dialog. This dialog is identical to the Create Dialog and allows you to modify all parameters and settings to your liking.
Clicking “Save” will overwrite the Workflow with any changes you have made. Clicking “Save New” will save a new Workflow that is identical to the old one but with all of the changes you have made; you must change the Workflow title in order to use the “Save New” option.
Clicking the “delete” option will open a confirmation popover which, upon confirming the deletion, will irreversibly remove the workflow from the instance.
If you are simply updating a Workflow’s parameters and/or annotations, it is recommended to use Edit functionality. Rebuilding is specifically designed to be used when fundamentally modifying input/output connections or adding/removing jobs from a Workflow.
Rebuilding a Workflow allows you to add or remove jobs from the Workflow, or change the input/output connections between jobs in the Workflow. If these jobs were created using the workflow that you are rebuilding, then all of their annotations will be carried over (notes, locked status, visibility status, etc.).
The best practice for rebuilding an existing workflow is as follows:
Apply a workflow into a new (or existing) workspace without queueing it.
Add new jobs, remove existing jobs, or change input connections as needed.
Select all of the relevant jobs that you would like to include in the rebuilt workflow.
Navigate to the workflow in the sidebar panel and select the “Rebuild” option from the triple dot overflow menu. This will launch the Rebuild Dialog, allowing you to see which jobs have been maintained or updated, and to create your rebuilt Workflow.
This system allows you to leverage the powerful tools that already exist in CryoSPARC for creating, deleting, and connecting or disconnecting jobs, while still retaining all annotations from the previous Workflow.
Enter a new or existing workspace in which you would like to edit your Workflow. Navigate to your Workflow using the “Workflows” sidebar, open the Apply Dialog, and apply the workflow into the workspace.
Once the workflow has been successfully applied, you can modify it as you would any other set of jobs in CryoSPARC.
After modification is complete, you can select all jobs using any of the multi-selection methods available and access the Create Workflow option through the multi-select sidebar footer or the job card quick actions menu.
The Rebuild Dialog is largely identical to the Create Dialog, but includes a set of indicators at the top right of dialog in the header for more contextual information regarding differences from the original Workflow. Rebuilt jobs are those that existed in the original Workflow and have had all of their annotations copied over to the new Workflow. The “Rebuilt” and “New” indicators can be clicked on to open up a context menu with a list of all of the jobs pertaining to those statuses. Clicking on a job in the menu will select it and navigate you to it in the Workflow tree and sidebar.
A Rebuilt Workflow must be saved as a completely new Workflow, distinct from the original. Rebuilding is treated as a fundamental modification of the original Workflow and so overwriting it is disallowed. We recommend adding a modifier to the title to indicate that this is a new version of the original (eg. “T20S Base Processing [v2]).
Workflows are designed to support both self-contained pipelines as well as exploratory processing. So far in this example we have been focused primarily on the former, so now let’s turn our attention to the latter.
Exploratory processing requires branching off from a singular pipeline in order to try different processing strategies simultaneously. For example, you may want to try building multiple refinement jobs of the same or different types, with a variety of divergent parameters set, and run them in parallel with the intention of seeing which strategy yields the best results. In this scenario, you might create a workflow containing a preferred setup of these jobs, beginning with a job to generate the initial volume (Ab-Initio Reconstruction) and a spread of refinement jobs.
In order to accommodate this style of processing, the workflows feature includes a concept of “Parent Connections”. Parent jobs are jobs that a workflow does not contain, but are required externally by the workflow. The parent to the aforementioned example workflow would be a job that outputs particles, such as a Select 2D, which would be connected into the Ab-Initio. Lets take a look at how this is done in practice.
We will create a new workflow that begins with a “Select 2D” job as the parent, an “Ab-Initio” job as the first child, and a variety of refinement jobs connected to its outputs.
First we select the “Ab-initio” job and each of the refinement jobs using the multi-select mechanic (or the “Select Descendant Jobs” option from the quick actions menu). From here we will open the Create Dialog using the “Create Workflow” option in the quick actions menu. We now see our selected jobs in the tree view, with a “Select 2D” parent job connected to the “Ab-initio”. This parent job is indicated by it’s divergent colouring and “Px” ID (as opposed to “Jx”).
The “Select 2D” parent job is a dependency of this workflow, because the “Ab-intio” relies on its inputs to function. The parent job will not be created by the workflow when it is deployed, but must be selected in order to allow the workflow to connect to its outputs.
We will now take our “Refinement Fork” workflow and apply it to a partially completed version of the T20s pipeline we worked on previously, that is currently stopped after a “Select 2D” job.
We will now use that “Select 2D” job to connect the particles output to our workflow by selecting it and then opening the workflow apply dialog from the sidebar. Once the dialog is open we can see that the “Select 2D” job is indicated as connected by the green notice at the top of the configuration panel, as well as its corresponding green colouring in the tree view.
Applying the workflow will create and connect all of the workflow jobs as expected, as well as automatically connecting the outputs of the parent “Select 2D” job with the created “Ab-initio” job.
We now have our “Refinement Fork” workflow deployed and queued. It has been automatically connected to our “Select 2D” job and will use the particle output to populate its input requirements.
If a workflow requires a parent connection to populate all of its necessary inputs, but no suitable parent job has been selected, it will display a “Missing Parent Connections” notice and the parent node in the tree view will be coloured orange.
You must exit the apply dialog by clicking the “Cancel” button or pressing the Escape
key and select a suitable parent job within your workspace in order to create the connection. Once you have selected a suitable job, open the dialog by selecting the workflow from the sidebar again. It will now display a “Connected Parents” notice and the parent job node in the tree view will be coloured green.
In some circumstances you may want to use a particular workflow with a different parent job than that which it was initially created, for example connecting an “Import Movies” job instead of an “Import Micrographs” job. Workflows includes a system to reroute input connections to facilitate this option. Any replacement job must output all of the required inputs and their necessary lower level slots that are consumed by jobs across the workflow. If these requirements are met, the original parent job will be replaced by the new selected parent job and all of the inputs will be connected as expected.
This system is automatic and does not need any configuration. If a suitable replacement job is selected it will be substituted in the workflow automatically.
Workflows are meant to facilitate consistency in complexity, and as such will often be composed of a variety of interconnected jobs with pre-set parameters. In order to maintain intelligibility, especially over time and between users, workflows include a suite of annotation options.
This is the highest level of annotation, and is meant to be used to give details relevant to the entire workflow. This may include links to related documentation, academic papers, and/or a relevant dataset. It can also be helpful to add any notes here about how the workflow is expected to be used. Workflow details can be added or changed in the configuration section of the Create Dialog or Edit Dialog.
These notes are displayed in the workflow sidebar tooltip that appears when hovering a workflow, as well as in the configuration section in the Apply Dialog in a “Details” panel that is closed by default.
Job details operate nearly identically to those found in the job builder. They allow you to apply annotations on a job by job basis by adding a title and/or description. When creating a workflow these fields will be pre-populated with any preexisting titles or descriptions that the original jobs were given. When applying the workflow the titles and descriptions can be edited in the Apply Dialog, and will be added to the jobs that are created by the workflow.
Each parameter includes an option for annotation. This can be accessed by clicking the “Additional Options” toggle button to the far right of the parameter.
From here you can add any relevant notes about the parameter, for instance why it was set to the current value and/or in what situations it should be changed. When viewing this workflow in the Apply Dialog an info icon will be shown beside the annotated parameter title. When mousing over the icon the added note will appear inside of an info tooltip.
It can be desirable to add a requirement to change certain job parameters in a workflow to facilitate its use with different datasets. There are many situations where a parameter will almost always need to be changed in order to run the workflow successfully, for example import paths or box size. Workflows allows you to flag a parameter as a pseudo-requirement to address this situation.
You can access the flag button from the same “Additional Options” parameter toggle button used to add parameter notes. By clicking the flag button in the top right of the panel you can flag the parameter.
When viewing the Apply Dialog the flagged parameter will now have an orange outline and a flag icon beside its title, the corresponding job node in the tree view will be accompanied by a flag icon. On the dialog footer you will see a “Flagged Parameters” tracker that shows the total number of flagged parameters and number of updated flagged parameters. Clicking this button will reveal a menu checklist of flagged parameters, organized by job, with a check mark circle indicating whether it has been updated or not. Clicking a parameter in the menu will navigate you to it.
It is important to note that flagged parameters are not hard requirements. A workflow can be deployed without its flagged parameters having been updated. This is to maintain flexibility and not lock a user into updating a parameter they do not wish to.
Once you have added a title and modified included job parameters to your liking, you can navigate to the dialog footer and click the “Create” button to create the workflow.
Workflows are designed from the ground up for portability. This means that they can be easily saved, stored, and shared between instances. Workflows contain no identifying information of the instance that they were created in, and no references to the jobs that were used to create them. This allows workflows to be a powerful tool for sharing successful data processing pipelines or strategies, maintaining a catalog of your proprietary processing workflows, or cooperatively iterating on a processing approach agnostic of institution or instance.
A workflow can be exported by clicking on the the triple dot overflow button beside the individual workflow you would like to export. Clicking on the “Export” option in the menu will download the workflow to your device. The downloaded file is a .json
(JavaScript Object Notation) file and can be easily inspected using any modern web browser. This compact file includes all of the necessary information to recreate a workflow in any CryoSPARC instance (running on a version current to or greater than the introduction of the workflows feature).
A workflow can be imported by clicking the “Import Workflow” button on the footer of the Workflows sidebar. This will open a file browser where you can find and upload a previously exported workflow .json
file.
Once selected the file will be imported into your instance and will appear in the Workflows sidebar like any other workflow. The imported template has no special properties outside of a imported
attribute to demarcate it as created outside of the instance. The imported workflow can be used, modified, and exported like any other.
Any Workflow that has a parent connection can be pinned to the quick actions menu, allowing it to be accessed in the menu of any job matching the Workflow’s parent job type (eg. if the Workflow has a parent connection to a Patch CTF Estimation job, pinning that Workflow would allow you to access it through the quick actions menu of any Patch CTF Estimation job).
In order to pin a Workflow, navigate to it in the Workflows sidebar and open its overflow menu from the triple dot button. Click the “Pin” option from the menu. The Workflow will now appear in the quick actions menu for any job with the parent job’s type. You can “Unpin” the job from the overflow menu the same way, if you no longer want it to appear in the quick actions menu.
The pinned Workflow will only appear in the quick actions menu for exact matches of the parent job’s type. If you would like to reroute the Workflow’s parent connections from a different job type, you will need to select the job and open the Apply Dialog from the sidebar.
Using the pinned quick action will still open the Apply Dialog, allowing you to set all preferred options before applying into a workspace.
CryoSPARC's Workflows allow for the creation of pre-defined sets of jobs for data processing. Workflows can be self-contained trees or branches dependent on parent jobs. They facilitate consistency in processing data and can be used for pipelining known quantities or breaking up different steps during exploration. Workflows can be created by selecting jobs and opening the "Create Workflow" dialog, which allows for customization of parameters and visibility. Workflows can be modified and deleted, and can be exported and imported as JSON files.
Jobs that create their outputs while in running
status rather than when in building
status cannot be automatically linked together when applying a Workflow. This will cause any job chain to break where these jobs are present. This is a fundamental limitation of Workflows as they must create all input/output connections on building
jobs.
Current jobs known to exhibit this behaviour are:
Import Result Groups: This job type cannot be used directly in a Workflow, however it may still be used as a parent connection.
Flex Generate: Can still be used in a Workflow without issue so long as it does not have any jobs connected to its outputs
You will be warned by the Workflow if these jobs are present as parents, and you will be unable to create the Workflow until they are updated or removed from the selection.
There are also jobs that will exhibit this behaviour if they were created in earlier versions of CryoSPARC:
Import Particle Sets: Includes a parameter Enable Strict Checking
which was added in v4.4.0 and needs to be toggled on in order for outputs to be generated when building
. This parameter is on by default in newer CryoSPARC versions.
Exposure Group Utilities: Has been updated in v4.4.1 to generate outputs while in building
status instead of in running
status. This will not be an issue for instances running versions after v4.4.0.
Jobs which create their outputs while in running
status will be disconnected from subsequent jobs created by the Workflow. To create a workflow which includes this type of job, you can create separate workflows for before and after these jobs. The first Workflow would include all jobs up to the job that generates its outputs while running
, and the second Workflow would use that job as a parent connection and run the rest of the jobs in the pipeline. In this scenario you would apply the first Workflow and allow all jobs to run to completion, and then select the final job and apply the second Workflow.
In most cases it is possible to simply run a single workflow until the job in question is complete and then connect its generated outputs to the disconnected branch of the Workflow. However, re-connecting the workflow in this way can cause certain passthroughs to be missing in jobs downstream of the re-connected jobs. This can lead to downstream jobs failing due to not having access to required input group slots. It is therefore not the recommended way of handling jobs which create their outputs in the running
status.