Job: 2D Classification
2D classification for stack cleaning and exploration of heterogeneity.
Description
2D Classification rapidly classifies particles into multiple 2D classes to facilitate stack cleaning and the removal of junk particles. It is also useful for investigating particle quality before moving into the 3D reconstruction stage, and to qualitatively explore the distribution of views within the dataset.
CryoSPARC v4.4+ features a re-implementation of the Legacy 2D Classification job that is more computationally efficient. It also adds three new parameters:
Minimum alignment res (A)
: This parameter is also added to the Legacy 2D Classification job. It is the minimum resolution in Angstroms to consider when aligning 2D classes. If this value is provided, it acts like a high-pass filter which may help prevent overfitting to micelles. Minimum alignment res between 40-60 A usually gives decent classifications.Sort classes by number of particles
: By default this is turned on as classes are displayed in order of descending particle count. Turning this off makes sure the order of classes stays the same, which can help better track changes in individual classes over iterations.Hard classify for last iteration
: Whether hard classification is used for the last iteration. Hard classification backprojects particles only into their best class, and treats every particle in the class with equal weight.
💡 In CryoSPARC v4.4+ the previous 2D Classification implementation can still be used by creating a 2D Classification (Legacy) job type.
Input
Particles (
blob
andctf
required)
Common Parameters
Number of 2D Classes:
In a typical dataset comprising hundreds of thousands of particles, the Number of 2D Classes is typically set between 50 and 200, or even as high as 300 classes. In general, as the Number of 2D Classes increases, the likelihood of finding "junk" classes also increases because "good" classes will become visually more obvious. With too few classes, "junk" particles may be hidden within what otherwise looks like a "good" class.Maximum alignment res (A)
: This is the highest resolution used to align particles to classes. By default, this is the same as the maximum reconstruction resolution, however this can be set to a higher value (lower resolution) which may help with overfitting, along with disabling theForce Max over poses/shifts
parameter.Initial classification uncertainty factor:
The Initial Classification Uncertainty Factor (ICUF) tries to capture the user's knowledge of the similarity in quality of particles within a dataset. When the ICUF is set to a value of 1, this reflects that "junk" particles look very different from good particles within the same dataset. On the other hand, a larger ICUF means that the "junk" may look very similar to good particles, and therefore the algorithm should at first be more uncertain about assigning particles to classes. Modifying this parameter instructs the optimization algorithm to search for 2D classes that are more similar (ICUF large) or less similar (ICUF small) to each other.Circular mask diameter (A)
: This controls the diameter of the circular mask applied to the 2D classes at each iteration. For crowded particles, setting the circular mask diameter to a value slightly greater than the maximum particle diameter helps prevent the algorithm from converging to classes with two particles in them. This should generally be used in combination with theRe-center 2D classes
parameter.Force Max over poses/shifts
: This controls whether during reconstruction, the algorithm will only use the maximum posterior pose, or will marginalize over poses. By default this istrue
, meaning maximization is used, but this can be set tofalse
to help achieve better 2D classes especially for very small molecules. Note: When theNumber of 2D Classes
is set to 20 or fewer,Force Max over poses/shifts
will turn off by default.Align filament classes vertically
: For filaments, this can be set to align all class averages vertically in the final iteration, enabling estimation of in-plane rotation. Note that this is approximate, and will not attempt to estimate the relative polarity of class averages.Remove duplicate particles
: Added in CryoSPARC v4.1+. Whether duplicate particles will be removed at the end of processing. This is turned on by default except for when the input particles are filaments, for which this will be turned off as filament picks are often intentionally dense.Minimum separation distance (A)
: Duplicate particles are considered to be those spaced together closer than this parameter's value (in the same way as Remove Duplicate Particles).Micrograph pixel size (A)
: Added in CryoSPARC v4.5+. If the pixel size of the micrographs during picking is known, this may be set to override the micrograph pixel size value for computing inter-particle distances. Particle datasets picked prior to version 4.4 may lack the micrograph pixel size value metadata; setting this parameter can ensure the correct pixel size is used. Particle datasets picked in v4.4+ do not require setting this parameter.
Number of online-EM iterations
: By default, 20 iterations of Expectation-Maximization are done, but this can be increased for particularly small particles or low SNR particles.Batchsize per class
: This controls the number of particles that are used for each iteration of Expectation-Maximization, per class. This can also be increased for particularly small or low SNR particles.2D Zeropad Factor
: This is the factor by which the classes are padded to in Fourier space. By default, classes are padded to twice their box size in Fourier space, but for particles that already have a large box size, this can be set to a minimum of 1 to reduce GPU memory requirements.Min over scale after first iteration
: This enables estimation of per-particle scale factors during Expectation-Maximization. This can help if ice thicknesses varied greatly during data collection.Depending on the results, in subsequent rounds of 2D Classification, you may also wish to adjust the following to achieve better visual class averages:
Use clamp-solvent: If the classes appear to have a lot of unwanted artefacts in the background, you can use a special optimization method to ensure that all classes will have a blank background. Set
Use clamp-solvent to solve 2D classes
totrue
.Enforce non-negativity: Along with activating the clamp-solvent parameter, non-negativity can be enforced in the class averages by setting the
Enforce non-negativity
parameter totrue
.
Output
Class averages
Particles, with assignments (and alignments) to classes
Rejected Particles (only if
Remove duplicate particles
was on), the particles that were rejected as being duplicate particle picks.
Notes and Limitations
For particularly small or low signal-to-noise ratio particles, the following may improve results of 2D Classifications:
Increasing the
Batchsize per class
parameter, which increases the number of images seen during each EM iteration.Increasing the
Number of online-EM iterations
parameter.Deactivating the
Force max over poses/shifts
parameter, which enables marginalization and may help with overfitting.
Common Next Steps
After 2D Classification, some of the classes may end up as "junk" classes (e.g., corresponding to non-particle images, ice crystals, or two particles stuck together, etc.), and you may want to filter the associated particles out from your particle dataset. This can be done with the Interactive Select 2D Classes Job.
If your dataset has strong preferred orientation, you can use the Rebalance 2D Classes job (after removing junk classes using the Select 2D Classes job) to cluster together similar views and remove particles from over-represented views.
You may also want to iterate the process of template-based particle picking, in which case good classes (selected using a Select 2D Classes job) can be input back into a Template Picker or Filament Tracer job.
Last updated