Job: 3D Classification

Description

3D Classification, first introduced in v3.3, can help discover discrete heterogeneity in single particle cryo-EM datasets. This job currently implements a version of 3D classification without alignment — a classification routine that can complement the Heterogeneous Refinementarrow-up-right and 3D Variabilityarrow-up-right jobs in finding new discrete classes of data.

In CryoSPARC v4.0, 3D Classification was updated with several notable improvements, including FSC regularization, focus and solvent mask inputs, new convergence criteria, and a number of new diagnostic plots and outputs.

circle-info

Note that in CryoSPARC v4.0+, cloning a 3D classification job that was created in CryoSPARC v3.3 will fail to launch due to a change in the inputs and parameters of the job type. Instead, please create a 3D classification job from scratch in v4.0 and re-connect the desired inputs and set parameters.

Under the hood, 3D Classification uses a combination of Online and Full-Batch Expectation Maximization (O-EM, and F-EM, respectively). These algorithms alternate between (1) computing the most likely class assignments for each particle image in a batch based on known 3D class volumes, and (2) updating each 3D volume based on these assignments.

Please also refer to the 3D Classification tutorialarrow-up-right, which has been updated for v4.1 with new considerations and example datasets.

Input

  • Particles (with alignments3D)

  • [Optional] Initial Volumes

    • To be used with the input initialization mode. The number of initial volumes must match the number of classes.

  • [Optional] Solvent mask

    • If not supplied, a solvent mask is computed by dilating and soft-padding the consensus volume.

  • [Optional] Focus mask

    • If not supplied, only the solvent mask will be used (i.e., the focus mask will be set to a volume of all ones).

Commonly Adjusted Parameters

Number of classes

Number of classes to use in job. Note that the 3D Classification requires far less computational effort than jobs which perform particle alignments (such as Heterogeneous Refinementarrow-up-right), so a far greater number of classes can be requested for the same computational cost.

Filter resolution

circle-info

In v4.5 this parameter name and its default value changed from Target resolution, default 6 Å, to its current name of Filter resolution, default unset.

Classification is performed at this resolution. This parameter must be set for the classification to run. Results are best when the resolution is just high enough to see the difference of interest. For instance:

  • 3-6 Å for small changes in density (presence/absence of ligand),

  • 6-10 Å for conformational changes between one domain in relation to another,

  • and >10 Å for presence/absence of a domain or binding partner.

For more information on selecting a filter resolution, see the TRPV5 case studyarrow-up-right.

This parameter also controls the box size and pixel size of the output class volumes. To reconstruct classes at their extracted box size, use the Heterogeneous Reconstruction Only jobarrow-up-right.

Use latent mixing coefficients

circle-info

This parameter is new in CryoSPARC v5.0, and is turned off by default. In older versions of CryoSPARC, 3D Classification behaves in the same way as if this parameter is turned off.

By default, 3D Classification assigns a particle amongst classes based solely on how well the particle matches each class's volume. This corresponds to assuming that there is no available information that indicates how large each class is.

When this parameter is turned on, 3D Classification instead treats the current class sizes at a given iteration as a representative example of the underlying, true class sizes (formally, it applies a prior over class posteriors based on the current sizes of classes). This means that when a particle matches equally well to the volumes of two classes, the algorithm assigns a higher probability to the larger class. Turning this parameter on encourages diverse class sizes and reduces the likelihood of encountering a great number of similar-looking classes with equal particle counts.

O-EM learning rate init

For a fixed O-EM batch size and epoch value, larger values will generally result in fewer populated classes.

Symmetry

Enforce point-group symmetry during back-projection of every class volume.

Generate solvent mask from consensus

circle-info

This parameter is new in CryoSPARC v5.0. In older versions of CryoSPARC, the if a focus mask was provided and a solvent mask was not, the focus mask would be used for both.

By default, 3D Classification automatically generates a solvent mask from the consensus volume. If this parameter is turned off, a spherical mask is used instead.

Use FSC to filter each class

FSC filtering may be turned off to match the filtering behaviour of 3D classification in CryoSPARC v3.3.

Convergence criterion (%)

Primary stopping criterion — percentage of particles that have switched classes across F-EM iterations. Increasing this value may result in ‘early stopping’ of the optimization.

RMS Density change convergence check

circle-info

The default value for this parameter changed in v4.5

If some particles have high probability of being in two or more different classes, the primary switching criterion may result in several F-EM iterations where a substantial number of particles switch classes but the class volumes do not differ significantly. To prevent unnecessary computation, this secondary criterion tracks the root mean square difference of the real-space class volumes across iterations. The job will converge when either criterion is satisfied.

Per-particle scale

Per-particle optimization can be turned off and scales can be set to their upstream values (input) or to a constant value of 1.0 (none).

Force hard classification

Turn off weighted back projection — this may improve performance for small(er) targets where the standard optimization may ‘smear’ a portion of particles across several classes.

Reorder classes by size

circle-info

The default value for this parameter changed in v4.5

By default, output classes are not reordered during 3D Classification. This means that the output Class 0 refers to the same volume as class 0 during classification.

If Reorder classes by size is turned on, classes will be reordered according to their size (i.e., number of assigned particles) at the end of classification, prior to output generation. This means that the output Class 0 is the class with the most particles, which is not necessarily the same as class 0 while the job was running.

To avoid potential confusion regarding class outputs, this option must be turned off if Keep intermediate results is turned on.

Keep intermediate results

circle-info

In versions of CryoSPARC older than v5.0, this parameter is called Output data after every F-EM iter

This option may be useful for larger datasets where one may want to monitor the 3D volumes prior to the completion of the job. This option can only be turned on if Reorder classes by size is turned off.

Output

  • All particles

  • All volumes

  • Solvent mask (passthrough input or auto-generated)

  • Consensus volume

  • Focus mask (passthrough input if provided)

  • Particles for each class

  • 3D volumes for each class

Common next steps

  • Job: Heterogeneous Reconstruction Only arrow-up-right

    • This job can be useful to reconstruct classes at a larger box size than the one set by the 3D classification target resolution.

  • Job: Regroup 3Darrow-up-right

    • For large sets of classes (e.g., 50+), this job can quickly group these classes into a smaller set of 'superclasses' based on real-space voxel correlations.

  • Further classification of subsets of classes

New in CryoSPARC v4.0+

A number of significant improvements to 3D Classification were added in CryoSPARC v4.0. We list them below.

Algorithmic Changes

  • Per-particle scale optimization (v4.1+)

    • By default, 3D Classification will perform per-particle scale optimization before starting the main EM classification loop.

  • FSC-based filtering (v4.0+)

    • By default, during both O-EM and F-EM iterations, 3D Classification will filter each class volume by its intra-class FSC curve.

  • Convergence criteria (v4.0+)

    • F-EM iterations will conclude when one of two stopping criteria is met:

      • % of particles that switch classes (primary stopping criterion)

      • weighted mean RMS density change falls below a threshold (optional, secondary criterion)

  • Separate focus and solvent mask inputs (v4.0+)

    • 3D Classification accepts two different types of masks. A solvent mask, SS, and a focus mask, FF. During optimization we use the following real-space volume for all likelihood computations of class kk:

VkS(FVk+(1F)Vˉ),V_k \leftarrow S * (F * V_k + (1-F)*\bar{V}),

where Vˉ\bar{V} is the consensus reconstruction.

If F F is not provided, we set F=1 F = 1 and apply VkSVkV_k \leftarrow S * V_k. Otherwise, we also plot real-space slices and projections of the mask overlayed on the consensus volume map:

Focus mask overlayed on real-space slices.
  • Filtered consensus volume output (v4.4+)

    • The consensus map is now filtered in accordance to its FSC. The resulting map is output by the job for inspection.

Diagnostic plots

Starting with CryoSPARC v4.0, 3D Classification outputs several new diagnostic plots listed below.

Per-particle Class ESS Histogram (added in v4.0)

This histogram can help diagnose poor classification results by showing if some particles have significant probability mass in more than one class. The ESS (Effective Sample Size) is a measure of how many classes each particle appears to belong to with significant probability. And ESS of 1.0 indicates that a particle is completely confidently assigned to only one class. An ESS of 2.0 would mean that a particle belongs with substantial probability to two classes. When many particles have a large ESS (> 1), this indicates that there is significant uncertainty in classification, any the classes may be overlapping or similar.

Difference from Consensus Real-Space Slices (added in v3.3, updated in v4.0)

This plot shows the real-space difference between the consensus map and each class map, regularized by the class FSC (if FSC regularization is turned on). This can quickly show areas of heterogeneity.

Class Flow Diagram (added in v4.0, updated in v4.1)

This diagram shows how many particles switched classes across F-EM iterations (output starts at the second F-EM iteration). An edge, (i,j), is drawn with a thickness, colour, and opacity defined by the amount of particles that switch from class i to class j.

Class Flow Matrix (added in v4.1)

This diagram visualizes class flow in a matrix format. Each column represents a 1D distribution of the particles in a given class at the current F-EM iteration. Each row represents the class which the particles belonged to at the previous iteration. In other words, each square in this grid represents an edge in the bipartite class flow graph above. This form of class flow can be useful in visualizing 'minor' edges that are difficult to see in the bipartite graph, and it can greatly improve clarity for class flow with large (25+) numbers of classes.

Class Assignment Histogram (added in v3.3, updated in v4.0)

This histogram now includes both total assignments and the ‘effective size’ of the class. The latter is a sum of the probability mass in that class. When the assignments and effective size bars are differently sized, this indicates that there is uncertainty in the classification, as many particles have probabilities that are spread out between classes (an effect included in the effective size) compared to the class where they have the maximum probability (the assignments).

Last updated