CryoSPARC Guide
  • About CryoSPARC
  • Current Version
  • Licensing
    • Non-commercial license agreement
  • Setup, Configuration and Management
    • CryoSPARC Architecture and System Requirements
    • CryoSPARC Installation Prerequisites
    • How to Download, Install and Configure
      • Obtaining A License ID
      • Downloading and Installing CryoSPARC
      • CryoSPARC Cluster Integration Script Examples
      • Accessing the CryoSPARC User Interface
    • Deploying CryoSPARC on AWS
      • Performance Benchmarks
    • Using CryoSPARC with Cluster Management Software
    • Software Updates and Patches
    • Management and Monitoring
      • Environment variables
      • (Optional) Hosting CryoSPARC Through a Reverse Proxy
      • cryosparcm reference
      • cryosparcm cli reference
      • cryosparcw reference
    • Software System Guides
      • Guide: Updating to CryoSPARC v4
      • Guide: Installation Testing with cryosparcm test
      • Guide: Verify CryoSPARC Installation with the Extensive Validation Job (v4.3+)
      • Guide: Verify CryoSPARC Installation with the Extensive Workflow (≤v4.2)
      • Guide: Performance Benchmarking (v4.3+)
      • Guide: Download Error Reports
      • Guide: Maintenance Mode and Configurable User Facing Messages
      • Guide: User Management
      • Guide: Multi-user Unix Permissions and Data Access Control
      • Guide: Lane Assignments and Restrictions
      • Guide: Queuing Directly to a GPU
      • Guide: Priority Job Queuing
      • Guide: Configuring Custom Variables for Cluster Job Submission Scripts
      • Guide: SSD Particle Caching in CryoSPARC
      • Guide: Data Management in CryoSPARC (v4.0+)
      • Guide: Data Cleanup (v4.3+)
      • Guide: Reduce Database Size (v4.3+)
      • Guide: Data Management in CryoSPARC (≤v3.3)
      • Guide: CryoSPARC Live Session Data Management
      • Guide: Manipulating .cs Files Created By CryoSPARC
      • Guide: Migrating your CryoSPARC Instance
      • Guide: EMDB-friendly XML file for FSC plots
    • Troubleshooting
  • Application Guide (v4.0+)
    • A Tour of the CryoSPARC Interface
    • Browsing the CryoSPARC Instance
    • Projects, Workspaces and Live Sessions
    • Jobs
    • Job Views: Cards, Tree, and Table
    • Creating and Running Jobs
    • Low Level Results Interface
    • Filters and Sorting
    • View Options
    • Tags
    • Flat vs Hierarchical Navigation
    • File Browser
    • Blueprints
    • Workflows
    • Inspecting Data
    • Managing Jobs
    • Interactive Jobs
    • Upload Local Files
    • Managing Data
    • Downloading and Exporting Data
    • Instance Management
    • Admin Panel
  • Cryo-EM Foundations
    • Image Formation
      • Contrast in Cryo-EM
      • Waves as Vectors
      • Aliasing
  • Expectation Maximization in Cryo-EM
  • Processing Data in cryoSPARC
    • Get Started with CryoSPARC: Introductory Tutorial (v4.0+)
    • Tutorial Videos
    • All Job Types in CryoSPARC
      • Import
        • Job: Import Movies
        • Job: Import Micrographs
        • Job: Import Particle Stack
        • Job: Import 3D Volumes
        • Job: Import Templates
        • Job: Import Result Group
        • Job: Import Beam Shift
      • Motion Correction
        • Job: Patch Motion Correction
        • Job: Full-Frame Motion Correction
        • Job: Local Motion Correction
        • Job: MotionCor2 (Wrapper) (BETA)
        • Job: Reference Based Motion Correction (BETA)
      • CTF Estimation
        • Job: Patch CTF Estimation
        • Job: Patch CTF Extraction
        • Job: CTFFIND4 (Wrapper)
        • Job: Gctf (Wrapper) (Legacy)
      • Exposure Curation
        • Job: Micrograph Denoiser (BETA)
        • Job: Micrograph Junk Detector (BETA)
        • Interactive Job: Manually Curate Exposures
      • Particle Picking
        • Interactive Job: Manual Picker
        • Job: Blob Picker
        • Job: Template Picker
        • Job: Filament Tracer
        • Job: Blob Picker Tuner
        • Interactive Job: Inspect Particle Picks
        • Job: Create Templates
      • Extraction
        • Job: Extract from Micrographs
        • Job: Downsample Particles
        • Job: Restack Particles
      • Deep Picking
        • Guideline for Supervised Particle Picking using Deep Learning Models
        • Deep Network Particle Picker
          • T20S Proteasome: Deep Particle Picking Tutorial
          • Job: Deep Picker Train and Job: Deep Picker Inference
        • Topaz (Bepler, et al)
          • T20S Proteasome: Topaz Particle Picking Tutorial
          • T20S Proteasome: Topaz Micrograph Denoising Tutorial
          • Job: Topaz Train and Job: Topaz Cross Validation
          • Job: Topaz Extract
          • Job: Topaz Denoise
      • Particle Curation
        • Job: 2D Classification
        • Interactive Job: Select 2D Classes
        • Job: Reference Based Auto Select 2D (BETA)
        • Job: Reconstruct 2D Classes
        • Job: Rebalance 2D Classes
        • Job: Class Probability Filter (Legacy)
        • Job: Rebalance Orientations
        • Job: Subset Particles by Statistic
      • 3D Reconstruction
        • Job: Ab-Initio Reconstruction
      • 3D Refinement
        • Job: Homogeneous Refinement
        • Job: Heterogeneous Refinement
        • Job: Non-Uniform Refinement
        • Job: Homogeneous Reconstruction Only
        • Job: Heterogeneous Reconstruction Only
        • Job: Homogeneous Refinement (Legacy)
        • Job: Non-uniform Refinement (Legacy)
      • CTF Refinement
        • Job: Global CTF Refinement
        • Job: Local CTF Refinement
        • Job: Exposure Group Utilities
      • Conformational Variability
        • Job: 3D Variability
        • Job: 3D Variability Display
        • Job: 3D Classification
        • Job: Regroup 3D Classes
        • Job: Reference Based Auto Select 3D (BETA)
        • Job: 3D Flexible Refinement (3DFlex) (BETA)
      • Postprocessing
        • Job: Sharpening Tools
        • Job: DeepEMhancer (Wrapper)
        • Job: Validation (FSC)
        • Job: Local Resolution Estimation
        • Job: Local Filtering
        • Job: ResLog Analysis
        • Job: ThreeDFSC (Wrapper) (Legacy)
      • Local Refinement
        • Job: Local Refinement
        • Job: Particle Subtraction
        • Job: Local Refinement (Legacy)
      • Helical Reconstruction
        • Helical symmetry in CryoSPARC
        • Job: Helical Refinement
        • Job: Symmetry search utility
        • Job: Average Power Spectra
      • Utilities
        • Job: Exposure Sets Tool
        • Job: Exposure Tools
        • Job: Generate Micrograph Thumbnails
        • Job: Cache Particles on SSD
        • Job: Check for Corrupt Particles
        • Job: Particle Sets Tool
        • Job: Reassign Particles to Micrographs
        • Job: Remove Duplicate Particles
        • Job: Symmetry Expansion
        • Job: Volume Tools
        • Job: Volume Alignment Tools
        • Job: Align 3D maps
        • Job: Split Volumes Group
        • Job: Orientation Diagnostics
      • Simulations
        • Job: Simulate Data (GPU)
        • Job: Simulate Data (Legacy)
    • CryoSPARC Tools
    • Data Processing Tutorials
      • Case study: End-to-end processing of a ligand-bound GPCR (EMPIAR-10853)
      • Case Study: DkTx-bound TRPV1 (EMPIAR-10059)
      • Case Study: Pseudosymmetry in TRPV5 and Calmodulin (EMPIAR-10256)
      • Case Study: End-to-end processing of an inactive GPCR (EMPIAR-10668)
      • Case Study: End-to-end processing of encapsulated ferritin (EMPIAR-10716)
      • Case Study: Exploratory data processing by Oliver Clarke
      • Tutorial: Tips for Membrane Protein Structures
      • Tutorial: Common CryoSPARC Plots
      • Tutorial: Negative Stain Data
      • Tutorial: Phase Plate Data
      • Tutorial: EER File Support
      • Tutorial: EPU AFIS Beam Shift Import
      • Tutorial: Patch Motion and Patch CTF
      • Tutorial: Float16 Support
      • Tutorial: Particle Picking Calibration
      • Tutorial: Blob Picker Tuner
      • Tutorial: Helical Processing using EMPIAR-10031 (MAVS)
      • Tutorial: Maximum Box Sizes for Refinement
      • Tutorial: CTF Refinement
      • Tutorial: Ewald Sphere Correction
      • Tutorial: Symmetry Relaxation
      • Tutorial: Orientation Diagnostics
      • Tutorial: BILD files in CryoSPARC v4.4+
      • Tutorial: Mask Creation
      • Case Study: Yeast U4/U6.U5 tri-snRNP
      • Tutorial: 3D Classification
      • Tutorial: 3D Variability Analysis (Part One)
      • Tutorial: 3D Variability Analysis (Part Two)
      • Tutorial: 3D Flexible Refinement
        • Installing 3DFlex Dependencies (v4.1–v4.3)
      • Tutorial: 3D Flex Mesh Preparation
    • Webinar Recordings
  • Real-time processing in cryoSPARC Live
    • About CryoSPARC Live
    • Prerequisites and Compute Resources Setup
    • How to Access cryoSPARC Live
    • UI Overview
    • New Live Session: Start to Finish Guide
    • CryoSPARC Live Tutorial Videos
    • Live Jobs and Session-Level Functions
    • Performance Metrics
    • Managing a CryoSPARC Live Session from the CLI
    • FAQs and Troubleshooting
  • Guides for v3
    • v3 User Interface Guide
      • Dashboard
      • Project and Workspace Management
      • Create and Build Jobs
      • Queue Job, Inspect Job and Other Job Actions
      • View and Download Results
      • Job Relationships
      • Resource Manager
      • User Management
    • Tutorial: Job Builder
    • Get Started with CryoSPARC: Introductory Tutorial (v3)
    • Tutorial: Manually Curate Exposures (v3)
  • Resources
    • Questions and Support
Powered by GitBook
On this page
  • Introduction
  • Inputs and Outputs in CryoSPARC
  • Passthrough results
  • CryoSPARC .cs file format
  • Job Outputs Tab
  • Job Builder Inputs Section
  • Building Jobs with Input and Output Groups
  • Overview Tab Output Groups
  • Fine-tuned control over Individual Results
  • Use Case: Local Resolution Estimation
  • Use Case: Downsampled Particles
  1. Guides for v3

Tutorial: Job Builder

An in-depth explanation of CryoSPARC's Job Builder, inputs and outputs.

PreviousUser ManagementNextGet Started with CryoSPARC: Introductory Tutorial (v3)

Last updated 6 months ago

For general guidelines on how to create and run jobs in CryoSPARC v4.0+, please see: Creating and Running Jobs

Introduction

One of CryoSPARC's staple features is its job builder, allowing you to quickly create jobs by simply dragging and dropping the outputs of one job into the inputs of another one.

Inputs and Outputs in CryoSPARC

CryoSPARC handles bookkeeping and management of all files, inputs and outputs for every job type. The structure of inputs and outputs in CryoSPARC is designed to allow flexibility while removing ambiguities.

In CryoSPARC, the basic unit of data/metadata transferred between jobs is an item. Every item has a type, for example an exposure, particle, volume, mask, etc. Every item also has properties, with each property having a name (eg. ctf) and sub-properties containing actual metadata values (eg. ctf/defocus, ctf/astigmatism, etc). Collections of items with the same type and same properties, constitute a dataset. A dataset is essentially a table, where each row is a single item, and columns are properties/sub-properties. Every job can only input and output datasets. Therefore every type of data/metadata is stored in a dataset. On disk, datasets are stored in the .cs file format, which is a binary numpy format descibed in a later section.

Each job defines the outputs it creates in the following way:

  • the job defines that it will output a certain types of item

  • for each type of item, the job defines certain properties that will be outputted. Each property is called a result. For example, a CTF estimation job would output a ctf result, containing sub-properties like defocus, astigmatism, spherical abberation, etc. The results that a job outputs are the basic component of what gets connected to other jobs.

  • the job defines certain result-groups - each one is a set of results that describe the same type of item. Thus a job can output a result-group defining particles, with two results, one ctf and another alignments.

Each job also defines the inputs that it takes in:

  • the job defines input-groups each allowing a certain type of item like particles, volumes, etc.

  • each input-group has certain slots, each taking in a particular kind of result. For example, an input-group taking in particles may have a slot for ctf and another for alignments

  • each input-group also defines the number of different result-groups that can be connected to it. In general all the items from all result-groups that are connected are appended together to make one larger dataset that forms the input to the job. So for example, connecting two particle stacks to a single input-group will cause those stacks to be appended together.

The reason for this abstraction of results, result-groups, etc is so that in CryoSPARC, most connections between jobs can be made simply at the group level, without having to specify particular files, paths, columns or rows in tables or text files. Subsets of datasets can be easily defined and passed around, and different subsets can be joined together as inputs to a further job. For advanced uses, however, the lower-level results allow a user to connect only certain metadata about an item from one job to another, or override the metadata for certain properties in a result-group. Examples of how and when to use this capability follow.

Passthrough results

In order to simplify long chains of processing, each job can input an arbitrary number of extra results that it doesn't actually need, and then output those results as "passthrough" metadata that is not read or modified by the job, but just passed along in its output so that subsetquent jobs can use it without needing to be manually connected to an earlier output in the chain.

CryoSPARC .cs file format

CryoSPARC uses a simple common tabular format to store metadata about all types of items that are managed by the CryoSPARC system. Items include movies, micrographs, particles, volumes, and masks. Each item can have many different properties that are kept track of as the items progress through processing. Only some job types create items: Imports, particle extraction, ab-initio reconstruction, volume tools, etc. Most job types simply load items, process them to compute new properties of those items, and output the new properties. A collection of items of the same kind is called a dataset and can be represented in a single table of rows and columns.

In CryoSPARC, each item that is managed is assigned a unique identifier uid (a 64-bit integer) that is used to maintain correspondences across chains of processing jobs and to ensure that regardless of the order that a job outputs items, the properties of each items are always correctly assigned to the correct item.

The tabular format that CryoSPARC uses for this metadata and uid is an array of C structures, implemented using numpy structured arrays. These arrays are stored in memory and on disk in the same format. On disk, we store these arrays in binary format in .cs files. Each .cs file in CryoSPARC contains a single table. Each row corresponds to a single item. A .cs file must contain a column for the uid of each item, and further columns define properties/sub-properties of that item. Multiple .cs files therefore can be used in aggregate to define all the properties of a set of items, since the rows in every table all have a uid that can be used to join the tables. In general, when multiple tables are used to specify a dataset, the dataset contains only the intersection of items included in each table.

Job Outputs Tab

There is an outputs tab in the details view of every job. It contains sections for each output group, and within each section a list of all individual results, including passthroughs.

You can easily copy the path of an individual output or download the file directly using the copy and download buttons, respectively. It's also now possible to inspect or select different versions of an individual output by toggling the 'versions' section.

When building a job, you can drag and drop the header of the output group section to add the whole group. If you'd like to override a particular input slot, you can drag and drop the header of the individual output to a matching input slot. We'll see examples of how this can be useful later in this tutorial.

Job Builder Inputs Section

The job builder's inputs section allows for not only removing an existing group, but clearing individual input slots that are not required. You can use the outputs tab to drag and drop output groups and individual outputs into the matching slots. It is possible to override both optional and required input slots by connecting matching individual outputs.

There is a requirements section for each group which specifies the minimum and maximum number of groups accepted, and whether or not repeat groups are accepted. The requirements section will be highlighted in green when you start to drag a matching output group and is highlighted in red when you do not meet that input group's requirements.

Building Jobs with Input and Output Groups

Building a job from output groups covers most use cases. Simply drag and drop an output group from the overview or outputs tab in the job details view.

Overview Tab Output Groups

The output groups list in the overview tab has been updated to be more user friendly and highlight key data. You can download the latest version of all individual outputs by selecting from the download dropdown menu.

Fine-tuned control over Individual Results

The addition of the outputs tab and the updated job builder inputs section allows for connecting low-level or individual outputs into an input group, overriding specific slots. This functionality alows for advanced users to experiment with their data more, and also makes certain tasks in cryoSPARC possible. Next, we'll cover two such use cases for fine-tuned control over individual results.

Use Case: Local Resolution Estimation

When building a local resolution estimation job, it's now possible to use the outputs tab and override the half_map_A and half_map_B inputs from different jobs. The example below outlines the three step process of using one input group in the local resolution estimation job builder to populate volume data from three separate jobs.

Use Case: Downsampled Particles

If you use the downsample particles job to shrink particles to make other jobs such as 2D classification run faster, you will end up with a subset of particles later on but need to reference the original (non-downsampled) particle data when running a refinement to get full particle resolution. In this case, you can use the outputs tab and override the particles.blob input slot with the non-downsampled data that you previously connected as an input group.