Job: Homogeneous Refinement

Homogeneous refinement.

At a Glance

Perform a global alignment of input particles to a reference volume, filtered by the GSFSC resolution.

Description

Homogeneous Refinement improves the input volume by iteratively aligning particles to the volume, then using the new poses to improve the volume. For more information on this algorithm (called Expectation Maximization), see the Expectation Maximization in Cryo-EMarrow-up-right page.

Homogeneous Refinement is a global refinement, meaning that prior pose estimates are neither used nor required. It uses half sets and its output volume is GSFSC-filtered.

Homogeneous Refinement will generally produce a high-quality volume for most samples, but we encourage users to read the Recommended Alternatives section since other types of refinement are better suited to some goals or samples.

Inputs

Particles

As Homogeneous Refinement is a global refinement, no prior pose information will be used or is needed. Particles must have CTF information in order for refinement to proceed.

Initial Volume

An initial volume is used the first iteration of a Homogeneous Refinement, since the images do not have 3D pose estimates and therefore cannot be used to create a reference volume. In subsequent iterations, the volume created by backprojection of the particles is used for alignments. The initial model will be low-pass filtered before alignment as specified by the Initial lowpass resolution parameter.

circle-exclamation

Mask (optional)

If a mask is provided, at each iteration of refinement the volume will be masked using this mask instead of the dynamic masking routine. This can be helpful if dynamic masking fails, or if unstructured regions (e.g., micelles) interfere with alignment. If the mask is being used to focus refinement on a particular region, Local Refinement may perform better — see Recommended Alternatives.

Note that the provided mask is only used for alignment. FSC is always calculated with a dynamic mask. You can calculate the FSC with your own mask using Validation (FSC)arrow-up-right once the refinement finishes. For masking behaviour in CryoSPARC v5+, see the dedicated guide page.

Commonly Adjusted Parameters

Window dataset (real-space)

In general, particle images are expected to be well-centered. This means that (ignoring signal delocalization due to the Contrast Transfer Function) the edges and corners of the particle image do not contain information about the particle. They may, however, contain noise or adjacent particles which interfere with alignment. Refinement algorithms therefore typically window these particle images before comparing them to reference projections.

The window gently transitions from 1.0 at the Window inner radius to 0.0 at the Window outer radius. In very crowded grids, it may help to use a tighter window (i.e., reduce both radii) to exclude neighboring particles. Note that windowing is performed before the images are centered, so if particle images are not well centered windowing may remove particle information.

Symmetry

The symmetry operator entered here is used to enforce (or relax) symmetry during refinement. By default it is C1 (i.e., no symmetry). Allowed symmetry operators are:

The symmetry operator entered here is used to enforce (or relax) symmetry during refinement. By default it is C1 (i.e., no symmetry). Allowed symmetry operators are:

  • Any cyclic group, e.g., C3. Cyclic groups have the following properties:

    • N-fold symmetry around a single axis. By convention in CryoSPARC this axis is aligned to the Z-axis.

  • Any dihedral operator, e.g. D5. Dihedral groups have the following properties:

    • N-fold symmetry around one axis. By convention in CryoSPARC this axis is aligned to the Z-axis.

    • Symmetry order of 2N

  • The tetrahedral group T, which has the following properties:

    • 3-fold symmetry around four axes. By convention in CryoSPARC one of these 3-fold symmetry axes is aligned with the Z-axis.

    • Symmetry order of 12

  • The octahedral group O, with the following properties:

    • 4-fold symmetry along three orthogonal axes. By convention in CryoSPARC these are aligned to the X, Y, and Z axes.

    • Symmetry order of 24

  • The icosahedral group I with the following properties:

    • Six 5-fold axes

    • Symmetry order of 60

      • I1 (or just I): 2-fold axes on X, Y, Z; the 5-fold axis point with greatest Z value in the YZ plane; the 3-fold axes with greatest Z value in the XZ plane

      • I2: 2-fold axes on X, Y, Z; the 5-fold axis point with greatest Z in the XZ plane; the 3-fold axes with greatest Z value in the YZ plane

Symmetry relaxation method

This parameter can take one of three possible values: “none”, “maximization”, or “marginalization”. For more information on symmetry relaxation, see the symmetry relaxation tutorialarrow-up-right.

None

The point group is used to enforce symmetry. Each particle is inserted in each of the N different symmetry-related poses, where N is the symmetry order. This effectively increases the signal to noise ratio by a factor of N, but can produce invalid maps or other poor results if the target is not truly symmetric.

Maximization

Each particle is aligned to the reference as in an asymmetric reconstruction, but a small neighborhood each of the N symmetry-related poses is then checked. Only the best of all symmetry-related poses is used. Note that this means particle images are only used once — the map is not forced to be symmetric.

Marginalization

Each particle is aligned to the reference as in an asymmetric reconstruction, but a small neighborhood of each of the N symmetry-related poses is then checked. Particles contribute to the reconstruction in each symmetry-related pose, weighted by the probability of that pose being correct. Note that this means particle images are only used once — the map is not forced to be symmetric.

Do symmetry alignment

If this parameter is on, the input volume is transformed and shifted such that the symmetry axes are aligned to map axes (e.g., the four-fold axis of a C4 symmetric input map is aligned to the Z axis).

Re-estimate greyscale level of input reference

Cryo-EM maps comprise a grid of voxels, with each voxel containing some value which is related to the Coulomb potential of the target at that position. However, these values only provide information about the relative potential within a single map, not the absolute potential of the target. In general, maps created from different sets of images will not have the same values in the same voxels. The range of values across all voxels is called the greyscale.

Since alignments are scored by assessing the difference between each image and the volume, a difference in greyscale leads to poor alignments. If this parameter is on, the greyscale of the input map will be adjusted to match those of the input particles. In general, we recommend that this parameter is on for Homogeneous Refinements.

This parameter ensures that the volume starts near the mean particle’s greyscale. Each particle will have slightly different contrast due to ice thickness, beam effects, etc. These per-particle differences in scale are fit by Per-Particle Scale.

Number of extra final passes

This many EM iterations will be performed with the full particle stack after the GSFSC resolution stops improving.

By default, refinement is considered complete after the first iteration in which the GSFSC does not improve. In most cases, this is sufficient. However, the GSFSC is only one measure of map quality. In some cases, continuing refinement after GSFSC resolution stops improving can still result in an overall higher-quality map.

The most common situation in which extra final passes improves the final result is symmetry relaxation. As of yet, there is not a good automated metric by which the refinement can validate whether the symmetry-relaxed poses of the particles have converged. As such, terminating the refinement upon GSFSC convergence may prevent the algorithm from sufficiently breaking pseudosymmetry. For example, consider data from EMPIAR-10256 (Dang et al. 2019).

GSFSC stops improving after the second iteration. However, signal from the symmetry-breaking CaM molecule is not fully resolved until iteration 32.

Adaptive Marginalization

When this parameter is turned on, particle poses will be marginalized, meaning that each particle image contributes to the 3D reconstruction from multiple poses, each weighted by their probability of being correct. Marginalization can improve the results of refinement, with small particles or low-SNR images benefiting the most. For medium and large particles or high-SNR images, maximization (Adaptive Marginalization off, the default) works just as well and is computationally less expensive.

Maximum align resolution (A)

During alignment (not reconstruction) the map uses frequencies only up to this resolution. If left blank, the map uses all frequencies up to the current resolution. Keep in mind that in both cases the map is also filtered by the GSFSC curve, so in practice maps may use coarser resolutions than this parameter dictates.

Setting this parameter to a higher numeric value (lower resolution) may reduce overfitting due to high-frequency noise for some datasets. Note that much of the alignable information in an individual particle image comes from the low frequencies. Thus, the reconstruction may achieve a higher resolution than that of the alignment limit. For example, consider data from EMPIAR-10424 (Nakane et al. 2020).

The map produced with this parameter is left empty (left) is of slightly higher quality (for example, the indicated histidine appears slightly more isotropic), but both maps achieve better than 1.5 Å resolution.

In addition to potentially preventing overfitting, setting the Maximum align resolution to a higher numeric value (lower resolution) may help symmetry relaxation converge if the asymmetric feature is large and the reconstruction goes to high resolution.

In this example again using data from EMPIAR-10256 (Dang et al. 2019), setting the maximum alignment resolution to 6 Å provided a significantly improved breaking of the C4 psuedosymmetry after the same number of iterations. Note that when using the GSFSC resolution (blue, left) significant density remains in all four positions (indicated with arrows). When limiting alignment to 6 Å, pseudosymmetry is successfully broken. Both maps are lowpass filtered to 6 Å to aid comparison.

Initial lowpass resolution (A)

Before the first iteration, the input volume is lowpass filtered to this resolution in Å. Typically, the default value of 20 Å does not need to be changed. For highly symmetric or very small particles, a finer resolution may improve results.

GSFSC split resolution (A)

Half setsarrow-up-right share information during refinement, up to this resolution. Put another way, half sets are only truly independent at frequencies finer than this value.

If a refinement was run with two completely independent half maps, over iterations the two maps might adopt different orientations in 3D space. The correlation between two half maps in different orientations would be very low, meaning that the GSFSC resolution would be extremely poor even if the half maps were identical.

To avoid this scenario, the components of each half map below the resolution specified by this parameter are averaged together in every iteration. This forces the half maps to adopt the same overall pose in 3D space, but retains their independence at higher resolutions.

This parameter should almost always be left at its default setting of 20 Å. If the GSFSC resolution for a highly symmetrical particle is surprisingly poor and the particles generate good 2D classes, you should first download and inspect the half maps. If they are clearly in different orientations, setting this parameter to a higher resolution may help. Keep in mind that the half-sets are not independent at resolutions coarser than this parameter, so it should be kept as coarse as possible.

Auto batchsize

For a typical dataset, the map used for alignment in the first iterations of a refinement is a poor estimate of the true, final volume. Poses aligned to this reference therefore also poor. It is thus wasteful to align every particle to these early volumes. CryoSPARC therefore, by default automatically estimates the number of particles (called a batch) to align before generating a new reference for following iterations.

For small particles or particles with poor signal-to-noise ratio, larger batch sizes may be necessary for optimal reconstruction. The automatic estimate of the optimal batch size can be changed using two similar but distinct parameters, described below. In general, adjusting the batch size with Batchsize snrfactor is recommended, since the effect of changing it to a specific value is more predictable across datasets.

  • Batchsize epsilon controls the estimated proportion of Fourier pixels which will be missed by the minibatch. Setting this value higher allows for fewer particles in the minibatch, while a lower value creates minibatches with more particles. Note that this parameter should always remain above 0.

  • Batchsize snrfactor directly multiplies the batch size calculated using Batchsize epsilon. Setting this parameter higher by a factor of 2 (i.e., 100 instead of 50) doubles the number of particles in the minibatch. Auto batch sizing can be disabled entirely by turning Disable auto batchsize on. In this case, the entire particle stack is used in each iteration. In general, this significantly slows jobs without appreciable improvement in the final result.

Minimize over per-particle scale

If this parameter is on, each particle’s optimal scale is calculated at each iteration. If this parameter is off, the particles’ input scales are used during each iteration.

The per-particle scale is a value for each particle image which adjusts the contrast of the reference volume to the contrast in the individual particle image.

For example, consider two particles produced by the same volume in the same pose, but in different ice thicknesses. The particle in thinner ice will have more contrast than the particle in thick ice, but the reference volume should have the same voxel values for both. Per-particle scale is used to adjust the greyscale of individual particle images to account for this fact. As the name implies, each particle has a scale value which relates its greyscale to that of the volume.

While per-particle scale in theory corrects for each image’s greyscale, particles with a low per-particle scale tend to be poorer quality than particles with high per-particle scale. For this reason, it may be beneficial to filter out particles with low scale. See the Subset Particles by Statisticarrow-up-right job page for more information on this process.

Reset input per-particle scale

If this parameter is on, all particles’ scale is set to 1.0 at the beginning of the refinement. If this parameter is off, particles’ scales are retained from the input. Note that if particles have not yet been refined, their starting scales are all 1.0.

If Minimize over per-particle scale is off and per-particle scales have previously been fit (in an earlier refinement, for example), you may wish to turn this parameter off to retain the previously-found scales.

Initialize noise model from images

If this parameter is on, the noise model is directly estimated from the images. If this parameter is off, a constant value is used to initialize the noise model.

In theory, a noise model inferred directly from particle images may help when Adaptive marginalization is on, since marginalization tends to be more sensitive to the choice of noise model. In practice, the noise model typically converges during the first or second iteration, so this setting has little impact on the final result.

Dynamic masking

circle-info

Starting in CryoSPARC v5, the dynamic mask near/far parameters are multiples of the current resolution instead of raw physical units. See 3D Masking in Refinementarrow-up-right for more information.

If a static mask is provided to the Mask input, that mask will be applied at each iteration. If a mask is not provided and Use dynamic refinement mask is turned on, a mask will be dynamically generated by the following process.

Once the GSFSC resolution is the same as or finer than Dynamic mask start resolution (A), a mask will be generated by thresholding each half map at the Dynamic mask threshold (0-1).

If Dynamic mask use absolute value is turned on, this thresholding is performed on the absolute value of the map (i.e., a threshold of 0.2 would include voxels with value less than -0.2 or greater than 0.2). This is useful if there are regions of the map which are expected to be empty. Since empty pockets will have lower density than the corners of the image (which may have neighboring particles or contaminants), they will tend to have negative map values. However, these pockets are typically small and near regions of high density, so this parameter is rarely required in practice.

The mask is then padded with 1.0 for a distance of Dynamic mask near (A) and a soft edge is added, reaching 0.0 at a distance of Dynamic mask far (A).

Dynamic masking can effectively be disabled by setting Dynamic mask start resolution (A) to an unrealistically low value, such as 0.1 Å. Starting in CryoSPARC v5, dynamic masking can be disabled by turning off Use dynamic refinement mask.

circle-info

Cryo-EM maps can have very different absolute voxel values. To account for this, the Dynamic mask threshold (0-1) parameter is a relative threshold. The map is thresholded at a voxel value of Dynamic mask threshold times the maximum voxel value in the map.

For instance, consider a map with voxels ranging from -0.10 to 0.23. If Dynamic mask threshold (0-1) is set to 0.5, all values greater than 0.115 are set to 1.0 and all values less than or equal to 0.115 are set to 0.0. The mask is then dilated and padded using the Dynamic mask {near, far} parameters.

GPU batch size of images

Reading images from the filesystem is slow. To speed up refinement, CryoSPARC will try to load as many images into the GPU at once as it can. However, it is challenging to precisely determine the space required by a given refinement, so the number of images that fits can only be estimated. If you run out of GPU memory during a refinement, you may be able to complete the refinement by manually setting this parameter to a low number of images.

Note that GPU batch size is a purely computational consideration — it will not have an effect on the final result. It differs in this way from the batch size of the refinement, which controls the number of images seen in each iteration and is controlled by the Batchsize epsilon and Batchsize snrfactor parameters.

Defocus Refinement and Global CTF refinement

CryoSPARC can estimate per-particle defocus and per-group higher-order CTF aberrations during a refinement. On-the-fly CTF estimation is controlled by Optimize per-particle defocus (for Local CTF Refinementarrow-up-right) and Optimize per-group CTF params (for Global CTF Refinementarrow-up-right). See the associated pages for information about the other CTF refinement parameters.

We recommend first performing separate Local and Global CTF Refinements and assessing whether the datasets benefit from these optimizations before performing them on-the-fly during refinement. For more information, see the guide page on CTF refinementarrow-up-right.

Do EWS correction

Whether to correct for the curvature of the Ewald sphere. This typically produces moderate resolution improvements for large particles which are already at high resolution without Ewald sphere correction. If this option is turned on, ensure that the correct curvature is selected in EWS curvature sign. For more information on these parameters, see the Ewald Sphere Correction section of this page, or the dedicated guide pagearrow-up-right.

Outputs

All particles

Particles are output with updated poses.

If CTF parameters were refined during the job, these output particles also have updated CTF estimates. Note that this may mean that exposure group parameters for these particles differ from those of the micrographs. If the particles are re-extracted, ensure that Force re-extract CTFs from micrographs is off (the default setting) to retain these refined CTF parameters.

Refined volume

The final volume produced by the refinement is output as map. It is filtered to the GSFSC resolution.

Additionally, a sharpening B-factor is automatically estimated and applied to the volume to produce a sharp volume (map_sharp). The B-factor is estimated using the Guinier plotarrow-up-right.

Masks used for refinement and FSC calculation are available as mask_refine and mask_fsc, respectively. Typically, the mask used for FSC calculation is tighter than the mask used for refinement.

Half maps are available as map_half_A and map_half_B.

Masks

In versions of CryoSPARC before v5, there is a single mask output, mask, which contains the refinement mask. It is the same as the mask_refine part of the Refined volume output.

Starting with CryoSPARC v5, each mask has its own output, typically mask_refine for the mask used during refinement, mask_fsc for the mask used to calculate the FSC, and mask_fsc_auto for the autotightened mask used to calculate the final FSC. See Dynamic Masking in Refinements (v5.0+)arrow-up-right for more on mask generation in v5.

Plots

The plots produced by Homogeneous Refinement are explained in the Common CryoSPARC Plotsarrow-up-right guide page.

Common Problems

Refinement is the central process of single particle analysis. As such, it is difficult to provide an exhaustive overview of potential problems arising during refinements. However, a few pathologies are more common than others.

Map has spikes or shells

Spikes of density radiating away from the center of the map or shells of density surrounding the map are both signs of overfitting. Often, this means that there is still a significant amount of “junk” in the particle stack, and more particle curation is necessary. If you’re unfamiliar with techniques for particle curation, they are covered in detail in a case study in this guidearrow-up-right.

Map has streaks or blurring along a single viewing direction

This effect is called anisotropy, and is a telltale sign of orientation bias, also known as preferred orientation. Typically, correcting this issue requires new data, but some cases of anisotropy can be corrected with careful particle picking. Maps with these issues usually have low cFAR scores, which is a measure of map anisotropy. More information about orientation bias is available in the Orientation Diagnostics tutorialarrow-up-right and a case study on EMPIAR 10096arrow-up-right.

Common Next Steps

Typically a volume needs to be visually inspected to understand the results of a refinement. In most cases, an improvement of visible features in the 3D map and/or a reduction in noise is desirable.

circle-info

A better GSFSC resolution alone may not be indicative of a truly improved map — visual inspection is an important component of the single particle analysis pipeline.

If noise features are visible (see Common Problems), the input particle stack should be cleaned and a new refinement re-run. At this stage, Heterogeneous Refinementarrow-up-right or Ab-Initio Reconstructionarrow-up-right are typically most suited to particle curation (rather than 2D methods).

If one region of the map is high quality and others are blurry, a Local Refinementarrow-up-right with a focused mask on the blurry region may be useful. Simultaneously, 3D Classificationarrow-up-right may be able to separate particles into separate conformations, which can then be refined independently.

If the map looks symmetric (or if symmetry was imposed), performing a refinement with symmetry relaxation turned on may reveal some asymmetry hidden in the data. 3D Classification of Symmetry Expandedarrow-up-right particles with a mask around a single asymmetric unit of the map can also help classify asymmetry.

Non-Uniform Refinementarrow-up-right may perform better than Homogeneous Refinement when the particles have unstructured density (e.g., a micelle or nanodisc) or large flexible regions.

Local Refinementarrow-up-right may perform better than Homogeneous Refinement when a particle has multiple domains which move relative to each other, Homogeneous Refinement tends to align the larger of the two. This process is discussed in more detail in the TRPV1 case study in this guidearrow-up-right, as well as a workshop recordingarrow-up-right working on the same dataset.

Ewald Sphere Correction

The interaction of the electron beam with the sample on the grid is a complex process which is difficult to precisely understand and model. Instead, for nearly all cryo-EM methods, the input particle images are treated as projections of the volume corrupted by the CTF. This is called the projection approximation, and works well for almost all datasets.

However, if a reconstruction of a large particle is going to high resolution, this approximation may degrade the quality of the final map. In these cases, the projection approximation must be corrected by turning on Do EWS correction (i.e., “Do Ewald sphere correction”). This name is due to the fact that the projection approximation is equivalent to considering a mathematical construction called the Ewald sphere as if it were flat.

In cases where Ewald sphere correction may improve the map, there are actually two possible solutions: the sphere may have a positive curvature and a negative curvature. There is no way of knowing the curvature of the sphere ahead of time — separate reconstructions with positive and negative curvature must be run. If the two reconstructions produce similar GSFSC resolutions, the Ewald sphere is likely negligible for this dataset and can be kept off in future refinements. If one is better than the other, that curvature is correct for all future refinements of this data — in other words, each dataset only needs the Ewald sphere curvature determined one time.

For example, consider this reconstruction of an Adeno-Associated Virus from EMPIAR 10202 (Tan et al. 2018), which goes to 1.87 Å without Ewald sphere correction:

A comparison of reconstructions of AAV particles with no EWS correction (left), negative curvature correction (center), and positive curvature correction (right). All three reconstructions are sharpened with a B-factor of -50 and had identical settings aside from EWS correction and curvature.

A comparison of reconstructions of AAV particles with no EWS correction (left), negative curvature correction (center), and positive curvature correction (right). All three reconstructions are sharpened with a B-factor of -50 and had identical settings aside from EWS correction and curvature.

In this case, correcting for the Ewald sphere does improve the reconstruction, both in terms of the GSFSC resolution (improved to 1.74 Å) and visible inspection of the resulting maps. It is also clear that, for this dataset, the Ewald sphere has a negative curvature — the positive curvature reconstruction is worse (GSFSC 1.97 Å) than when we do not correct for the Ewald sphere curvature at all.

Compare this result to that of performing Ewald sphere correction on a GPCR from EMPIAR 10673 (Zhang et al. 2020):

A comparison of reconstructions of a GPCR with no EWS correction (left), negative curvature correction (center), and positive curvature correction (right). All three reconstructions are sharpened with a B-factor of -50 and had identical settings aside from EWS correction and curvature.

A comparison of reconstructions of a GPCR with no EWS correction (left), negative curvature correction (center), and positive curvature correction (right). All three reconstructions are sharpened with a B-factor of -50 and had identical settings aside from EWS correction and curvature.

The initial, no Ewald sphere correction reconstruction goes to 2.28 Å. Correcting for the Ewald sphere with either curvature produces an essentially identical reconstruction — this particle is simply too small, and this map at too low a resolution, for the sphere’s curvature to have a noticeable effect on the reconstruction.

References

Dang, S. et al. Structural insight into TRPV5 channel function and modulation. Proceedings of the National Academy of Sciences 116, 8869–8878 (2019).

Nakane, T. et al. Single-particle cryo-EM at atomic resolution. Nature 587, 152–156 (2020).

Tan, Y. Z. et al. Sub-2 Å Ewald curvature corrected structure of an AAV2 capsid variant. Nature Communications 9, 3628 (2018).

Zhang, X. et al. Differential GLP-1R Binding and Activation by Peptide and Non-peptide Agonists. Molecular Cell 80, 485-500.e7 (2020).

Last updated