CryoSPARC Guide
  • About CryoSPARC
  • Current Version
  • Licensing
    • Non-commercial license agreement
  • Setup, Configuration and Management
    • CryoSPARC Architecture and System Requirements
    • CryoSPARC Installation Prerequisites
    • How to Download, Install and Configure
      • Obtaining A License ID
      • Downloading and Installing CryoSPARC
      • CryoSPARC Cluster Integration Script Examples
      • Accessing the CryoSPARC User Interface
    • Deploying CryoSPARC on AWS
      • Performance Benchmarks
    • Using CryoSPARC with Cluster Management Software
    • Software Updates and Patches
    • Management and Monitoring
      • Environment variables
      • (Optional) Hosting CryoSPARC Through a Reverse Proxy
      • cryosparcm reference
      • cryosparcm cli reference
      • cryosparcw reference
    • Software System Guides
      • Guide: Updating to CryoSPARC v4
      • Guide: Installation Testing with cryosparcm test
      • Guide: Verify CryoSPARC Installation with the Extensive Validation Job (v4.3+)
      • Guide: Verify CryoSPARC Installation with the Extensive Workflow (≤v4.2)
      • Guide: Performance Benchmarking (v4.3+)
      • Guide: Download Error Reports
      • Guide: Maintenance Mode and Configurable User Facing Messages
      • Guide: User Management
      • Guide: Multi-user Unix Permissions and Data Access Control
      • Guide: Lane Assignments and Restrictions
      • Guide: Queuing Directly to a GPU
      • Guide: Priority Job Queuing
      • Guide: Configuring Custom Variables for Cluster Job Submission Scripts
      • Guide: SSD Particle Caching in CryoSPARC
      • Guide: Data Management in CryoSPARC (v4.0+)
      • Guide: Data Cleanup (v4.3+)
      • Guide: Reduce Database Size (v4.3+)
      • Guide: Data Management in CryoSPARC (≤v3.3)
      • Guide: CryoSPARC Live Session Data Management
      • Guide: Manipulating .cs Files Created By CryoSPARC
      • Guide: Migrating your CryoSPARC Instance
      • Guide: EMDB-friendly XML file for FSC plots
    • Troubleshooting
  • Application Guide (v4.0+)
    • A Tour of the CryoSPARC Interface
    • Browsing the CryoSPARC Instance
    • Projects, Workspaces and Live Sessions
    • Jobs
    • Job Views: Cards, Tree, and Table
    • Creating and Running Jobs
    • Low Level Results Interface
    • Filters and Sorting
    • View Options
    • Tags
    • Flat vs Hierarchical Navigation
    • File Browser
    • Blueprints
    • Workflows
    • Inspecting Data
    • Managing Jobs
    • Interactive Jobs
    • Upload Local Files
    • Managing Data
    • Downloading and Exporting Data
    • Instance Management
    • Admin Panel
  • Cryo-EM Foundations
    • Image Formation
      • Contrast in Cryo-EM
      • Waves as Vectors
      • Aliasing
  • Expectation Maximization in Cryo-EM
  • Processing Data in cryoSPARC
    • Get Started with CryoSPARC: Introductory Tutorial (v4.0+)
    • Tutorial Videos
    • All Job Types in CryoSPARC
      • Import
        • Job: Import Movies
        • Job: Import Micrographs
        • Job: Import Particle Stack
        • Job: Import 3D Volumes
        • Job: Import Templates
        • Job: Import Result Group
        • Job: Import Beam Shift
      • Motion Correction
        • Job: Patch Motion Correction
        • Job: Full-Frame Motion Correction
        • Job: Local Motion Correction
        • Job: MotionCor2 (Wrapper) (BETA)
        • Job: Reference Based Motion Correction (BETA)
      • CTF Estimation
        • Job: Patch CTF Estimation
        • Job: Patch CTF Extraction
        • Job: CTFFIND4 (Wrapper)
        • Job: Gctf (Wrapper) (Legacy)
      • Exposure Curation
        • Job: Micrograph Denoiser (BETA)
        • Job: Micrograph Junk Detector (BETA)
        • Interactive Job: Manually Curate Exposures
      • Particle Picking
        • Interactive Job: Manual Picker
        • Job: Blob Picker
        • Job: Template Picker
        • Job: Filament Tracer
        • Job: Blob Picker Tuner
        • Interactive Job: Inspect Particle Picks
        • Job: Create Templates
      • Extraction
        • Job: Extract from Micrographs
        • Job: Downsample Particles
        • Job: Restack Particles
      • Deep Picking
        • Guideline for Supervised Particle Picking using Deep Learning Models
        • Deep Network Particle Picker
          • T20S Proteasome: Deep Particle Picking Tutorial
          • Job: Deep Picker Train and Job: Deep Picker Inference
        • Topaz (Bepler, et al)
          • T20S Proteasome: Topaz Particle Picking Tutorial
          • T20S Proteasome: Topaz Micrograph Denoising Tutorial
          • Job: Topaz Train and Job: Topaz Cross Validation
          • Job: Topaz Extract
          • Job: Topaz Denoise
      • Particle Curation
        • Job: 2D Classification
        • Interactive Job: Select 2D Classes
        • Job: Reference Based Auto Select 2D (BETA)
        • Job: Reconstruct 2D Classes
        • Job: Rebalance 2D Classes
        • Job: Class Probability Filter (Legacy)
        • Job: Rebalance Orientations
        • Job: Subset Particles by Statistic
      • 3D Reconstruction
        • Job: Ab-Initio Reconstruction
      • 3D Refinement
        • Job: Homogeneous Refinement
        • Job: Heterogeneous Refinement
        • Job: Non-Uniform Refinement
        • Job: Homogeneous Reconstruction Only
        • Job: Heterogeneous Reconstruction Only
        • Job: Homogeneous Refinement (Legacy)
        • Job: Non-uniform Refinement (Legacy)
      • CTF Refinement
        • Job: Global CTF Refinement
        • Job: Local CTF Refinement
        • Job: Exposure Group Utilities
      • Conformational Variability
        • Job: 3D Variability
        • Job: 3D Variability Display
        • Job: 3D Classification
        • Job: Regroup 3D Classes
        • Job: Reference Based Auto Select 3D (BETA)
        • Job: 3D Flexible Refinement (3DFlex) (BETA)
      • Postprocessing
        • Job: Sharpening Tools
        • Job: DeepEMhancer (Wrapper)
        • Job: Validation (FSC)
        • Job: Local Resolution Estimation
        • Job: Local Filtering
        • Job: ResLog Analysis
        • Job: ThreeDFSC (Wrapper) (Legacy)
      • Local Refinement
        • Job: Local Refinement
        • Job: Particle Subtraction
        • Job: Local Refinement (Legacy)
      • Helical Reconstruction
        • Helical symmetry in CryoSPARC
        • Job: Helical Refinement
        • Job: Symmetry search utility
        • Job: Average Power Spectra
      • Utilities
        • Job: Exposure Sets Tool
        • Job: Exposure Tools
        • Job: Generate Micrograph Thumbnails
        • Job: Cache Particles on SSD
        • Job: Check for Corrupt Particles
        • Job: Particle Sets Tool
        • Job: Reassign Particles to Micrographs
        • Job: Remove Duplicate Particles
        • Job: Symmetry Expansion
        • Job: Volume Tools
        • Job: Volume Alignment Tools
        • Job: Align 3D maps
        • Job: Split Volumes Group
        • Job: Orientation Diagnostics
      • Simulations
        • Job: Simulate Data (GPU)
        • Job: Simulate Data (Legacy)
    • CryoSPARC Tools
    • Data Processing Tutorials
      • Case study: End-to-end processing of a ligand-bound GPCR (EMPIAR-10853)
      • Case Study: DkTx-bound TRPV1 (EMPIAR-10059)
      • Case Study: Pseudosymmetry in TRPV5 and Calmodulin (EMPIAR-10256)
      • Case Study: End-to-end processing of an inactive GPCR (EMPIAR-10668)
      • Case Study: End-to-end processing of encapsulated ferritin (EMPIAR-10716)
      • Case Study: Exploratory data processing by Oliver Clarke
      • Tutorial: Tips for Membrane Protein Structures
      • Tutorial: Common CryoSPARC Plots
      • Tutorial: Negative Stain Data
      • Tutorial: Phase Plate Data
      • Tutorial: EER File Support
      • Tutorial: EPU AFIS Beam Shift Import
      • Tutorial: Patch Motion and Patch CTF
      • Tutorial: Float16 Support
      • Tutorial: Particle Picking Calibration
      • Tutorial: Blob Picker Tuner
      • Tutorial: Helical Processing using EMPIAR-10031 (MAVS)
      • Tutorial: Maximum Box Sizes for Refinement
      • Tutorial: CTF Refinement
      • Tutorial: Ewald Sphere Correction
      • Tutorial: Symmetry Relaxation
      • Tutorial: Orientation Diagnostics
      • Tutorial: BILD files in CryoSPARC v4.4+
      • Tutorial: Mask Creation
      • Case Study: Yeast U4/U6.U5 tri-snRNP
      • Tutorial: 3D Classification
      • Tutorial: 3D Variability Analysis (Part One)
      • Tutorial: 3D Variability Analysis (Part Two)
      • Tutorial: 3D Flexible Refinement
        • Installing 3DFlex Dependencies (v4.1–v4.3)
      • Tutorial: 3D Flex Mesh Preparation
    • Webinar Recordings
  • Real-time processing in cryoSPARC Live
    • About CryoSPARC Live
    • Prerequisites and Compute Resources Setup
    • How to Access cryoSPARC Live
    • UI Overview
    • New Live Session: Start to Finish Guide
    • CryoSPARC Live Tutorial Videos
    • Live Jobs and Session-Level Functions
    • Performance Metrics
    • Managing a CryoSPARC Live Session from the CLI
    • FAQs and Troubleshooting
  • Guides for v3
    • v3 User Interface Guide
      • Dashboard
      • Project and Workspace Management
      • Create and Build Jobs
      • Queue Job, Inspect Job and Other Job Actions
      • View and Download Results
      • Job Relationships
      • Resource Manager
      • User Management
    • Tutorial: Job Builder
    • Get Started with CryoSPARC: Introductory Tutorial (v3)
    • Tutorial: Manually Curate Exposures (v3)
  • Resources
    • Questions and Support
Powered by GitBook
On this page
  • At a Glance
  • Inputs
  • Particles
  • Outputs
  • Commonly Adjusted Parameters
  • Number of classes
  • Number of particles to use
  • Maximum resolution
  • Initial/final minibatch size
  • Symmetry
  • Class similarity
  • Common Uses
  • Common Problems
  • Recommended Alternatives
  • Next Steps
  • Implementation Details
  • Branch and Bound
  • Stochastic Gradient Descent
  • References
  1. Processing Data in cryoSPARC
  2. All Job Types in CryoSPARC
  3. 3D Reconstruction

Job: Ab-Initio Reconstruction

Previous3D ReconstructionNext3D Refinement

Last updated 2 days ago

At a Glance

Generate a number of initial 3D models directly from particle images without requiring an input model. Ab initio reconstruction is unique among 3D jobs in that it does not require input particle poses or starting volumes — the job generates a 3D volume from the beginning (hence ab initio).

Inputs

Particles

Note that when ab initio reconstruction is used to generate a single volume from a large number of particles, some particles may be omitted from the reconstruction as the algorithm does not need to see all particles to arrive at a converged reconstruction. The particles used to generate the volume are output in a separate group than those that were used. The particles that are used are selected randomly from the entire dataset.

Outputs

This job creates the requested number of volumes and a particle stack for each. If only one volume is requested, an additional particle output contining unused particles is also created.

Commonly Adjusted Parameters

Number of classes

Increasing this number generates more volumes. In an ideal (clean) particle stack, this parameter should match the number of distinct structures present. In practice, providing a small number of classes into which junk particles are classified can improve results.

Number of particles to use

Force the job to use this many particles (from the beginning of the stack — not random) to generate volumes. Results will only contain this many particles as well.

We generally recommend that this is left blank (i.e., use all particles) unless there are a great number of input particles, in which case setting this to a lower number will speed results.

Maximum resolution

Use information up to and including this resolution during reconstruction.

In highly-symmetric particles or small membrane proteins it can help to use higher resolutions (i.e., lower numbers), but keep in mind that ab initio reconstruction does not use gold-standard half sets and so can be sensitive to overfitting.

Initial/final minibatch size

Sets the number of particles that are included in each initial or final online-EM iteration, respectively.

Increasing this parameter increases memory demand and slows down each iteration, but may improve results with highly symmetric targets, small targets, or noisy images.

What are initial and final iterations?

When Ab-Initio Reconstruction starts, the proper values of several "hyperparameters" are unknown. Examples of hyperparameters are the learning rate, maximum resolution, noise model, etc.

These hyperparameters are fixed at a starting value during the initial iterations of Ab-Initio Reconstruction. This allows the class volumes to approach a more reasonable starting topology before hyperparametrs are optimized.

Ab-Initio Reconstruction then enters the intermediate iterations, during which hyperparameters "anneal" from their arbitrary starting values to more optimal values determined from the data itself. Annealing describes a process of slowly transitioning from the starting values to the data-determined values rather than jumping directly from one to the other. Ab-Initio Reconstruction uses a dynamic number of intermeidate iterations for this annealing process.

Once the job is using the data-determined hyperparameter values, it does a number of final iterations to optimize the volumes under the new values.

Symmetry

Symmetry to impose on the volumes.

We strongly recommend that this option is kept at C1 unless results show flattened, disc-like volumes and a different source of information supports a higher symmetry. In particular, for proteins with suspected octahedral, tetrahedral, or icosahedral symmetry, it may be necessary to impose symmetry to avoid converging to a flattened model.

Class similarity

A higher value for this parameter forces particle assignment to blur between classes during the early iterations. If particles are expected to look very similar, this blurring encourages the algorithm to initially create several similar classes which then can diverge during iterations. Without blurring, all particles which appeared similar would be placed in a single class, preventing the desired classification.

The two Class similarity anneal parameters control when the algorithm begins moving from this input similarity value to the value detected from the data.

Common Uses

The most common use cases for ab initio reconstruction are to perform initial cleaning of a particle stack in 3D and to generate one or more starting volumes for future refinements. Both of these goals can be achieved with a single ab initio reconstruction job by optimizing the number of requested classes. This typically requires launching several jobs, each with a different number of classes, and inspecting the results.

It has been reported that performing extensive particle cleaning in 2D (i.e., by repeatedly selecting 2D classes which look clean) can sometimes result in a loss of rare or difficult-to-distinguish particle views. Cleaning in 3D seems to be less susceptible to this problem.

Common Problems

Ab initio reconstruction may fail when the particle stack has extreme orientation bias. Rebalance 2D Classes can help reduce bias in these cases to generate the initial model.

Highly symmetric particles occasionally produce flattened, disc-like volumes in ab initio reconstruction. These are the only cases in which we recommend imposing symmetry on the ab initio volumes.

Ab initio reconstruction does not use half-sets or any other type of regularization. It can thus be susceptible to overfitting at high resolutions or with particles with low signal-to-noise ratios, and so we do not recommend that it is used to generate higher-resolution models.

Recommended Alternatives

When a particle stack is believed to be relatively free of junk, Heterogeneous Refinement can be a better tool for detecting distinct conformations or compositions of the target.

Next Steps

Implementation Details

The problem of volume generation in single particle analysis can be stated in broad terms as follows: given a set of input images, what volume most likely produced those images? This question can be further subdivided into two parts:

  • Given a volume and a set of images, what is the most likely pose of the volume in each image?

  • Given a set of images and a set of orientations, how can we improve the volume to make the images more likely?

In cryo-EM literature, methods that seek the most likely density that generate the particle images are referred to as Maximum Likelihood (ML) or Maximum A Posteriori (MAP) methods. All 2D and 3D reconstruction algorithms in CryoSPARC fall under the umbrella of ML methods, because they seek to maximize likelihood in this way.

What is a pose?

In cryoEM image processing, a pose is a set of five numbers: three describe the rotation of the volume in 3D space, and two describe the X- and Y-shifts of the volume (to account for imperfect centering of the extracted particle image). Searching for the correct pose is thus a 5D problem that needs to be solved for each particle image, and therefore can become very computationally expensive.

CryoSPARC’s ab initio algorithm approaches the first part using an algorithm called branch and bound (BnB), and the second part using an algorithm called stochastic gradient descent (SGD).

Branch and Bound

To find the most likely pose, one could create a 5D grid of finely-spaced points and check all of those poses for each particle image. This can be prohibitively slow. Branch and bound creates the same grid of finely-spaced points, where each point is associated with a point from a coarser grid via a subdivision scheme. This builds a tree of branching points at which to calculate how well the volume at the given pose fits the image.

The search for the correct pose starts at the coarsest grid. It then proceeds to the next coarsest grid, but only at points which branch from points that were above a certain probability threshold in the previous iteration. The selection of this threshold is the bound portion of branch and bound and is described in detail in Punjani et al. 2017.

Every pose of the volume has some error associated with it. The higher this error, the less likely the pose is correct. To find the optimal pose, one could calculate this error for every pose and pick the pose with the lowest error. However, the true error is extremely expensive to compute.

We therefore instead calculate a lower bound on the error. This function is orders of magnitude faster to compute and gives us the best-case scenario — we know that the true error cannot possibly be smaller than this lower bound.

We calculate this lower bound for each pose. We then calculate the expensive true error only once, for the pose with the lowest lower bound (put another way, the pose which has the possibility for the lowest error). This pose’s true error is almost certainly higher than this lower bound, but it also has a chance at being the best pose as measured by the true error.

We can then compare each pose’s best case to this true error. Any pose with a lower bound that has higher error than this calculated error cannot possibly be the best pose — we have already found a better pose than its best case. We can thus discard this pose and all of its following branched poses from future iterations.

Thus, the search for the optimal pose for a given image is accelerated by the two prongs of the branch and bound algorithm:

  • the branching structure enables us to rule out a large number of poses without ever calculating any error for them

  • the cheap lower bound lets us rule out a large number of poses without having to calculate their expensive true error

The branch-and-bound algorithm is so fast that it is used for all pose searches in CryoSPARC, not just ab initio reconstruction.

What exactly is this inexpensive lower bound?

Although it is easy to understand what branch and bound is doing, it is not obvious how exactly one might calculate a lower bound on possible error. Full details are available in the paper’s supplementary information, but briefly, error is first split into two parts: error due to signal above a critical frequency L and error due to signal above that frequency. The low-resolution error is relatively easy to calculate and is computed directly. The high-frequency error is bounded by further splitting it into a combination of error due to noise and the worst possible mismatch the reference volume could create.

Stochastic Gradient Descent

Once the optimal poses for all particles have been found, the information in the particle images can be used to improve the volume. In a homogeneous refinement, the existing volume would be updated to the most-likely result given the new particle poses. This algorithm is called Maximum Likelihood (often referred to as ML in the literature) or Maximum a posteriori (often MAP in the literature), and the function used to update the volume is called the gradient.

Maximum likelihood methods work well when the starting volume is close to correct. However, ab initio volumes are likely quite poor, especially in early iterations when the volume is essentially just noise. Poor starting volumes are unlikely to find the correct solution using maximum likelihood approaches because these techniques can only search locally — they will get stuck in suboptimal structures from where there is no clear direction that will improve the quality of the map. To overcome this issue, ab initio reconstruction uses stochastic gradient descent.

In stochastic gradient descent, the gradient is calculated using only a very small subset of particle images. The current volume is then updated according to this gradient. Importantly, the noise in the minibatch sampling and the approximate nature of the gradient creates noise in both the direction and distance of the step the algorithm takes. This noise helps the volume escape local optima and settle into a global optimum near to the correct structure. Thus, stochastic gradient descent can provide good initial volumes for maximum likelihood methods to optimize.

References

New in CryoSPARC v4.5: the job also produces an All volumes output which is a . This output includes a series result that contains a downloadable zip file of all volumes.

is useful for further cleaning, or to detect conformational or compositional differences in the target.

or jobs are useful for creating a single, “consensus” refinement of the particles at a high resolution.

For a full description of CryoSPARC’s ab initio reconstruction algorithm, see the .

Ali Punjani et al., “cryoSPARC: Algorithms for Rapid Unsupervised Cryo-EM Structure Determination,” Nature Methods 14 (06/online 2017): 290,

volumes group
Heterogeneous Refinement
Homogeneous Refinement
Non-Uniform Refinement
https://doi.org/10.1038/nmeth.4169
accompanying paper
Ab initio reconstruction generates a 3D volume from particle images without prior pose information. Data from EMPIAR 10025.
Increasing class similarity forces particles to evenly distribute among classes during early iterations of ab initio reconstruction. This can improve discrimination between different particle types.
A comparison of the volumes resulting from ab initio reconstruction (left) and homogeneous refinement (right). Data from EMPIAR 10025.
The evolution of the volume during an ab initio reconstruction job.
In a Branch and Bound algorithm, progressively finer grids are searched until the optimal result is found.
For each pose, a lower bound on the error is calculated.
Calculation of the true error is expensive, so it is only performed once.
Poses with best-case error worse than the calculated true error are removed from further consideration.
Stochastic gradient descent prevents ab initio reconstruction from becoming trapped in local minima.