Job: 3D Classification (BETA)

Description

3D Classification (BETA) is a new job in cryoSPARC v3.3+ to analyze discrete heterogeneity in single particle cryo-EM datasets. This job currently implements a version of 3D classification without alignment — a classification routine that can complement the existing Heterogeneous Refinement job in finding new discrete classes of data. Read the tutorial here.
Similar to Heterogenous Refinement, 3D Classification (BETA) uses Online Expectation Maximization (O-EM) to alternate between (1) computing the most likely class assignments for each particle image based on 3D maps for each class and (2) updating each 3D map based on these assignments. 3D Classification (BETA) differs from Heterogeneous Refinement in four important ways:
  1. 1.
    3D Classification does not update the 3D orientation or 2D shift associated with each particle image. All orientations and shifts are assumed to be known and fixed (i.e., set based on the alignments3D key of the input particles). Although these alignments may not be optimal, this reduces the dimensionality of the search space and can produce classes that may be missed by Heterogenous Refinement.
  2. 2.
    3D Classification can 'bootstrap' initial volume densities, obviating the need for volume inputs. By leveraging the known orientations and shifts, we can back-project initial structures. The job allows for two different modes to do this: simple and 'PCA'. Please refer to the tutorial for more information on these two modes.
  3. 3.
    3D Classification supports 'focussed' classification through an (optional) mask input. If no mask is provided, the job can also automatically generate a mask based on the initial 3D maps. This allows for classification based on heterogeneity in only a specific region of a density, ignoring variation that may be present elsewhere.
  4. 4.
    3D Classification uses higher-order interpolation instead of zero padding for back-projection — this reduces the size of 3D maps used during classification and reduces computation time, while ensuring minimal artefacts from interpolation.
In sum, these distinctions permit 3D classification to remain computationally tractable for large numbers of classes — up to 100 in our testing as of cryoSPARC v3.3.

Input

  • Particles (with alignments3D)
  • [Optional] Initial Volumes
  • [Optional] Mask (recommended even if not performing focussed classification)

Common Parameters

  • Number of classes: Number of classes to use in job. Note that this can be significantly larger than Heterogeneous Refinement for the same computational cost.
  • Target resolution: Desired resolution of each 3D map — this, combined with the extent of the particle images will determine 3D box size.
  • Number of O-EM epochs: Number of passes through the data to perform during classification.
  • Batch size per class: This parameter multiplied by the number of classes sets the batch size in each O-EM iteration
  • Initialization mode determines the way in which initial 3D maps are set:
    • simple: for K classes, select K random subsets of particle images and back-project K structures;
    • PCA: for K classes, select M >> K subsets of particle images, back-project M structures, apply Principal Component Analysis (PCA) on the space of 3D voxels, cluster into K subsets in principal component space, average volumes in each cluster for K initial structures;
    • input: for K classes, use K input volumes (please note that initial volumes should be distinct or the job will throw an error).
  • Auto-tune initial class similarity: This can set the expected similarity of structures until an empirically observed ESS (effective sample size) matches a target. Typically we expect that ESS should be near the number of classes to start.

Output

  • All particles
  • Particles for each class
  • 3D maps for each class
  • Volume series of all 3D maps (.zip)
  • Mask (passthrough input or auto-generated)

Common Next Steps

  • Further refinement of a subset of classes
Last modified 1mo ago