Job: Rebalance 2D Classes

At a Glance

Group 2D class averages into superclusters and, optionally, balance the number of particles in each supercluster.

Description

Rebalance 2D Classes analyses the 2D class averages produced by 2D Classification jobs to produce superclusters of similar class averages. Optionally, particles can be randomly excluded from the more populated superclasses to balance the number of particles in each superclass.

Inputs

Particles

Particles must have been 2D Classified, and should come from the same job as the 2D Class Averages.

2D Class Averages

Particles and 2D class averages should come from the same job. Note that Rebalance 2D Classes analyzes the class averages and not the particles themselves, so results will be better with higher-quality (i.e., less noisy) class averages.

Commonly Adjusted Parameters

Rebalance factor

Particles will be dropped from each superclass such that the smallest class is, at smallest, this fraction of the largest class. A Rebalance factor of 0.0 does not discard any particles. A Rebalance factor of 1.0 randomly discards particles from all classes (except the smallest class) until all classes are the same size.

Number of superclasses or templates (integer)

This should optimally be set to the number of unique views in the 2D class averages. This number is typically not known precisely, and so some experimentation is often necessary. Note that if Do superclassification is turned off this number must equal the number of 2D class averages in the input.

Override maximum superclass size

Provided the Rebalance factor is not 0.0, this parameter will determine the maximum superclass size rather than the Rebalance factor. Setting this parameter to some integer N is functionally equivalent to setting the Rebalance factor to $n/N$, where $n$ is the number of particles in the smallest class.

Outputs

Particles selected

Particles remaining in the dataset after rebalancing classes according to the Rebalance factor (or Override maximum superclass size).

Templates

The templates are unchanged from the input.

Particles excluded

Particles excluded from the dataset after rebalancing classes according to Rebalance factor (or Override maximum superclass size).

Plots

Rebalance 2D Classes creates an Affinity Matrix which displays how similar 2D classes are to each other. It is this affinity that is used to group the class averages into the requested number of superclasses.

Say we start with ten class averages and we want to group them into two superclasses.

First, we calculate the affinity of the classes for each other. The affinity is a measure of how similar the two classes look, and varies from 0.0 (not similar at all) to 1.0 (identical).

We can map the pairwise affinities on a matrix, where the row and column represent a specific class, and each cell is colored by the similarity between the class in its row and the class in its column.

If we rearrange the rows and columns such that classes with a high affinity for each other are adjacent, we can easily see the superclasses as square structure in the matrix. This structure arises naturally in a well-clustered matrix, since a group of rows and columns all have high affinity for each other and low affinity for other classes.

If the matrix does not have a clear pattern of squares for each superclass, or if the superclasses have members which “project” darker colors in their row and column, it may be that a different number of superclasses is needed.

Common Next Steps

This job is most useful as a diagnostic to assess distribution of particles among views before moving to 3D, and often the outputs are not directly used in following jobs.

In some rare cases, rebalancing particles among views can improve initial results of Ab initio reconstruction in the case of severe orientation bias. If your Ab initio reconstruction shows evidence of severe bias (such as a flat map or a map with severe streaking), setting the Rebalance factor relatively high (e.g., 0.8) can improve results slightly.

The improved map may be useful for downstream tasks or for repeating particle picking, if the underrepresented views are present but not being picked. If, however, underrepresented views are simply not present in the micrograph it is unlikely that this technique (or any other) will recover an isotropic map.

Recommended Alternatives

If a 3D refinement of the particles exists, Orientation Diagnostics will provide quantitative description of orientation bias that may or may not exist in the particles, and whether or not that bias results in a significantly anisotropic map.

Similarly, if a 3D refinement of the particles exists, Rebalance Orientations will directly rebalance the particles based on viewing direction rather than by 2D Class.

References

Wong, Wilson, et al. "Cryo-EM structure of the Plasmodium falciparum 80S ribosome bound to the anti-protozoan drug emetine." Elife 3 (2014): e03080.
Tan, Y. Z. et al. Addressing preferred specimen orientation in single-particle cryo-EM through tilting. Nature Methods 14, 793–796 (2017).
Campbell, Melody G., et al. "2.8 Å resolution reconstruction of the Thermoplasma acidophilum 20S proteasome using cryo-electron microscopy." Elife 4 (2015): e06380.

PreviousJob: Reconstruct 2D Classes NextJob: Class Probability Filter (Legacy)

Last updated 3 months ago