Job: 3D Classification (BETA)

3D classification without alignment.

This job type has been substantially improved from its original release in CryoSPARC v3.3. Changes in v4.0 and v4.1 are described below.

Description

3D Classification (BETA), first introduced in v3.3, can help discover discrete heterogeneity in single particle cryo-EM datasets. This job currently implements a version of 3D classification without alignment — a classification routine that can complement the Heterogeneous Refinement and 3D Variability jobs in finding new discrete classes of data.

In CryoSPARC v4.0, 3D Classification was updated with several notable improvements, including FSC regularization, focus and solvent mask inputs, new convergence criteria, and a number of new diagnostic plots and outputs.

Note that in CryoSPARC v4.0+, cloning a 3D classification job that was created in CryoSPARC v3.3 will fail to launch due to a change in the inputs and parameters of the job type. Instead, please create a 3D classification job from scratch in v4.0 and re-connect the desired inputs and set parameters.

Under the hood, 3D Classification (BETA) uses a combination of Online and Full-Batch Expectation Maximization (O-EM, and F-EM, respectively). These algorithms alternate between (1) computing the most likely class assignments for each particle image in a batch based on known 3D class volumes, and (2) updating each 3D volume based on these assignments.

Please also refer to the 3D Classification tutorial, which has been updated for v4.1 with new considerations and example datasets.

Input

  • Particles (with alignments3D)

  • [Optional] Initial Volumes

    • To be used with the input initialization mode. The number of initial volumes must match the number of classes.

  • [Optional] Solvent mask

    • If not supplied, a solvent mask is computed by dilating and soft-padding the consensus volume.

  • [Optional] Focus mask

    • If not supplied, only the solvent mask will be used (i.e., the focus mask will be set to a volume of all ones).

Common parameters

  • Number of classes: Number of classes to use in job. Note that this can be significantly larger than Heterogeneous Refinement for the same computational cost.

  • Target resolution: Desired resolution of each 3D map — this, combined with the extent of the particle images will determine 3D box size.

  • Output data after every F-EM iter(updated in v4.1.2): This option may be useful for larger datasets where one may want to monitor the 3D volumes prior to the completion of the job. Note that as of CryoSPARC v4.1.2, this option can only be turned on if class re-ordering is turned off (see below).

If 3D Classification is not producing good results, adjusting the following parameters may be a good starting point to get improved results:

  • O-EM learning rate init (default updated in v4.0): For a fixed O-EM batch size and epoch value, larger values will generally result in fewer populated classes

  • Use FSC to filter each class(new in v4.0): FSC filtering may be turned off to match the filtering behaviour of 3D classification in CryoSPARC v3.3.x.

  • Convergence criterion (%) (new in v4.0): Primary stopping criterion — percentage of particles that have switched classes across F-EM iterations. Increasing this value may result in ‘early stopping’ of the optimization.

  • RMS density change convergence check (new in v4.0): If some particles have high probability of being in two or more different classes, the primary criterion may not be sufficient. Turning on this parameter will force the job to also monitor the root mean square of the class volumes directly, which will provide a secondary source of convergence information.

  • Per-particle scale (new in v4.1): Per-particle optimization can be turned off and scales can be set to their upstream values (input) or to a constant value of 1.0 (none).

  • Force hard classification (new in v4.0): Turn off weighted back projection — this may improve performance for small(er) targets where the standard optimization may ‘smear’ a portion of particles across several classes.

Other salient considerations with regards to parameters:

  • Reorder classes by size (new in v4.1.2): With this parameter turned on (default), classes will be reordered according to their size (i.e., assigned particles) at the end of classification, prior to output generation. To avoid potential confusion regarding class outputs, this option must be turned off if Output data after every F-EM iter is turned on.

Output

  • All particles

  • Particles for each class

  • 3D volumes for each class

  • Volume series of all 3D maps (.zip)

  • Solvent mask (passthrough input or auto-generated)

  • Focus mask (passthrough input if provided)

  • Consensus volume

Common next steps

  • Job: Heterogeneous Reconstruction Only

    • This job can be useful to reconstruct classes at a larger box size than the one set by the 3D classification target resolution.

  • Job: Regroup 3D

    • For large sets of classes (e.g., 50+), this job can quickly group these classes into a smaller set of 'superclasses' based on real-space voxel correlations.

  • Further classification of subsets of classes

New in CryoSPARC v4.0+

A number of significant improvements to 3D Classification were added in CryoSPARC v4.0. We list them below.

Algorithmic Changes

  • Per-particle scale optimization (v4.1+)

    • By default, 3D Classification will perform per-particle scale optimization before starting the main EM classification loop.

  • FSC-based filtering (v4.0+)

    • By default, during both O-EM and F-EM iterations, 3D Classification will filter each class volume by its intra-class FSC curve.

  • Convergence criteria (v4.0+)

    • F-EM iterations will conclude when one of two stopping criteria is met:

      • % of particles that switch classes (primary stopping criterion)

      • weighted mean RMS density change falls below a threshold (optional, secondary criterion)

  • Separate focus and solvent mask inputs (v4.0+)

    • 3D Classification accepts two different types of masks. A solvent mask, SS, and a focus mask, FF. During optimization we use the following real-space volume for all likelihood computations of class kk:

VkS(FVk+(1F)Vˉ),V_k \leftarrow S * (F * V_k + (1-F)*\bar{V}),

where Vˉ\bar{V} is the consensus reconstruction.

If F F is not provided, we set F=1 F = 1 and apply VkSVkV_k \leftarrow S * V_k. Otherwise, we also plot real-space slices and projections of the mask overlayed on the consensus volume map:

  • Filtered consensus volume output (v4.4+)

    • The consensus map is now filtered in accordance to its FSC. The resulting map is output by the job for inspection.

Diagnostic plots

Starting with CryoSPARC v4.0, 3D Classification outputs several new diagnostic plots listed below.

Per-particle Class ESS Histogram (added in v4.0)

This histogram can help diagnose poor classification results by showing if some particles have significant probability mass in more than one class. The ESS (Effective Sample Size) is a measure of how many classes each particle appears to belong to with significant probability. And ESS of 1.0 indicates that a particle is completely confidently assigned to only one class. An ESS of 2.0 would mean that a particle belongs with substantial probability to two classes. When many particles have a large ESS (> 1), this indicates that there is significant uncertainty in classification, any the classes may be overlapping or similar.

Difference from Consensus Real-Space Slices (added in v3.3, updated in v4.0)

This plot shows the real-space difference between the consensus map and each class map, regularized by the class FSC (if FSC regularization is turned on). This can quickly show areas of heterogeneity.

Class Flow Diagram (added in v4.0, updated in v4.1)

This diagram shows how many particles switched classes across F-EM iterations (output starts at the second F-EM iteration). An edge, (i,j), is drawn with a thickness, colour, and opacity defined by the amount of particles that switch from class i to class j.

Class Flow Matrix (added in v4.1)

This diagram visualizes class flow in a matrix format. Each column represents a 1D distribution of the particles in a given class at the current F-EM iteration. Each row represents the class which the particles belonged to at the previous iteration. In other words, each square in this grid represents an edge in the bipartite class flow graph above. This form of class flow can be useful in visualizing 'minor' edges that are difficult to see in the bipartite graph, and it can greatly improve clarity for class flow with large (25+) numbers of classes.

Class Assignment Histogram (added in v3.3, updated in v4.0)

This histogram now includes both total assignments and the ‘effective size’ of the class. The latter is a sum of the probability mass in that class. When the assignments and effective size bars are differently sized, this indicates that there is uncertainty in the classification, as many particles have probabilities that are spread out between classes (an effect included in the effective size) compared to the class where they have the maximum probability (the assignments).

Last updated