Job: Rebalance 2D Classes (BETA)

Description

Rebalance the number of particles in the output of a Select 2D Classes or 2D Classification job, by clustering the class averages into similar views (i.e., "superclasses") and removing particles from oversampled views. The amount of balancing can be controlled (from none to uniform), so this job is also useful for simply clustering class averages by similarity without rebalancing. One can also choose to rebalance the number of particles in each individual 2D class, without performing the clustering at all.

Input

  • Particles (blob and alignments2D)
  • Templates (blob)

Output

  • Class averages
  • Particles (selected and excluded)

When to use Rebalance 2D Classes

Rebalance 2D Classes is used as a means of addressing particle datasets that are highly imbalanced in terms of orientation sampling. This issue is known as preferred particle orientation, which can occur when particles tend to "stick" to the air-water interface. One effect of strong orientation bias that can be observed in 3D maps is "smearing" of the overall map along a particular axis. Generally it is only recommended to use this job after an initial Ab-initio Reconstruction job has already been done, and erroneous features have been observed, as datasets with moderate orientation bias often still produce reasonable quality Ab-initio reconstructions. However, for some datasets, it has been observed that simply discarding particles from the oversampled views can be useful in addressing preferred orientation, which is the main functionality of this job.
Rebalance 2D classes is best used with the outputs of a Select 2D Classes job, which has significant overweighting of certain classes and underweighting of others. For example, in the set of selected 2D classes, you may notice that there is:
  • a large number of very similar classes, alongside a small number of more unique classes, or
  • a disproportionately large number of particles per class in the most populated classes
In either of these cases, Rebalance 2D classes would be useful in evening out the sampling of views in the dataset.

Common Parameters

  • Rebalance factor: Factor by which the superclasses are rebalanced. Must be between 0 and 1. For non-zero values, this approximately corresponds to the ratio between the number of particles in the smallest superclass, and the number of particles in the largest superclass.
    • Set to 0 to have no rebalancing done (all particles kept)
    • Set to 1 to have uniform rebalancing done (all superclasses have the same size)
  • Do superclassification: Whether rebalancing should be based on superclasses, or based directly on the templates passed.
    • If true, will use spectral clustering to form superclasses, and then rebalance the number of particles in each superclass
    • If false, will simply rebalance the number of particles in each class average
  • Number of superclasses or templates: Corresponds to the number of unique views that are present in the set of templates passed.
    • If Do superclassification is false, this must be exactly equal to the number of templates passed
    • If Do superclassification is true, this must be an integer strictly less than the number of templates passed
    • Tip: Running multiple jobs with different numbers of superclasses may help to find the best clustering
  • Override maximum superclass size: This can be set to the override the maximum desired number of particles in each superclass. If left as None, this will be calculated using the Rebalance factor parameter.
  • Split outputs: Whether the outputs (templates and particles) should be split by superclass/template, or should be merged together. This can be useful if one wants to run further 2D classification jobs on a particular view.
    • If true, the templates and selected particles will be split into individual groups based on their assigned superclass
    • If false, the templates and selected particles will each be output together with all superclasses merged

Example Images

Example of a clustering of class averages by similarity

Notes and Tips

  • A typical workflow may look like:
    • 2D Classification with 30-60 classes on a pre-curated stack of particles
    • Select 2D Classes to discard junk classes and particles, if any are present
    • Rebalance 2D Classes with the selected classes and particles
  • If Do superclassification is true, then the runtime is approximately proportional to the square of the number of inputted class averages
  • Running multiple jobs with different numbers of superclasses may help to find the best clustering
  • In order to increase the speed of the job, Downsampling factor can be increased to 2 or 4, which will use downscaled copies of the templates for the clustering computation

Limitations

  • Templates passed must all be located in the same .mrc file: this will be case for the output of a Select 2D job

Common Next Steps

  • Ab-initio reconstruction to generate one or more initial models from the selected particles
  • Subsequent rounds of 2D Classification using the selected particles with varied number of classes to continue removing junk particles