Comment on page
Job: Reference Based Motion Correction (BETA)
CryoSPARC v4.4 includes a new motion correction job, called “Reference Based Motion Correction”, which uses a 3D map as a reference to improve per-particle motion estimates.
Reference-based motion correction is an extension of Patch Motion Correction. Using known particle poses and positions, precise movement trajectories can be calculated for each particle. In addition, the effect of radiation damage during an exposure can be empirically measured and accounted for by weighting. In some cases, these procedures yield a significant improvement in final map quality.
The concepts and method in CryoSPARC’s Reference Based Motion Correction are inspired by Bayesian Polishing (Zivanov, Nakane & Scheres, 2019). CryoSPARC’s implementation includes a new method for hyperparameter optimization, is multi-GPU accelerated and optimized, and includes support for multiple reference volumes enabling simultaneous motion correction for particles from different conformations/species. In addition, patch motion correction can take as input the empirical dose weights estimated by Reference Based Motion Correction on a different dataset.
Following (Zivanov, Nakane & Scheres, 2019), Reference Based Motion Correction proceeds as follows:
- 1.For each particle in the input dataset, a synthetic reference image is created by projecting the reference volume in the particle’s pose and applying simulated CTF corruption to the resulting 2D reference.
- 2.A patch is extracted around the particle’s pick location from each frame in the movie it came from.
- 3.Each of these patches is then assigned a shift; the set of shifts across all frames makes up the particle trajectory.
- 4.The optimal trajectory is computed by finding the set of shifts which minimizes the error between the reference image and the shifted patches.
- 5.Particle images are reconstructed from frames by applying the optimal trajectories and dose weights.
An example of realistic trajectories. Each black dot is a particle pick location, and the tail associated with it is the motion estimate for that particle over the length of the exposure.
Due to the particularly low signal to noise ratio present in individual movie frames, the procedure just described would naturally overfit to noise - causing wild and nonsensical trajectories. To mitigate this and following (Zivanov, Nakane & Scheres, 2019), the reference-based motion correction job uses two kinds of regularization: the spatial and acceleration priors.
The spatial prior penalizes candidate trajectories that exhibit low spatial correlation; in other words, the spatial prior encourages solutions where the trajectories of nearby particles are similar to each other. This prior has two parameters that tune its behaviour: an overall strength parameter (how strongly to penalize non-spatially-correlated trajectories) and a correlation distance (over what distance do we expect the trajectories to be similar).
The acceleration prior penalizes trajectories that have high acceleration (i.e. non-smooth trajectories). This prior has one tuning parameter: the overall strength (how strong of a penalty to apply to non-smooth trajectories).
Together the three prior parameters are called the hyperparameters of this motion estimation method. If the priors are too strong, the output will have the trajectories set to zero at every frame because the method is ignoring the data and producing a set of trajectories that satisfy the priors. If the priors are too weak they will not achieve their goal, and the method will simply align the references to the noise in the data rather than the signal.
If hyperparameters are not supplied to the job, it will estimate them using the following method.
- 1.For each particle, two references are created: one from the half-map that the particle contributed to, and one from the opposite half map. This allows for downstream cross-validation, similar to the analysis performed for non-uniform refinement.
- 2.For a given set of hyperparameters, trajectories are estimated using the low-frequency data from the particle’s half-map.
- 3.To assess the quality of a set of hyperparameters, the particle trajectories are applied only to the high-frequency part of the data. The resulting corrected images are compared to the opposite half-map. Since high-frequency noise from a particle in half-set A should not correlate with high-frequency signal in half-map B, we can trust the high-frequency correlations from this comparison. This measurement is the cross-validation score, and a more negative number indicates better agreement between the two half-sets.
- 4.The set of hyperparameters that yields the lowest total cross-validation score is deemed the best. Said another way, the trajectory which correlates best with the opposite half map is considered best.
Changing the number of reference volumes will cause more Particle, Reference Volume, and Mask inputs to appear.
As inputs, the job requires movies, particles, and one or more reference volumes. The connected movies must have rigid motion estimates and background estimates. A patch motion correction job provides these estimates.
The connected reference volumes must have half-maps and a mask. Jobs such as homogeneous refinement include a mask with the volume output, but the user can always provide a mask of their own by connecting an input to the optional “Static mask N” slots.
Reference-based motion correction supports heterogeneous datasets. A parameter titled
Number of reference volumes(default 1) can be increased to allow multiple particle stacks and reference volumes to be connected.
It is recommended that you provide particles from the same refinement as the reference volume, and that the refinement job had the
minimize over per-particle scaleswitch on.
If you wish, you can connect dose weights and/or motion hyperparameters from a previous job into the
hyperparametersinput group. If you do so, the job will use the supplied motion hyperparameters and/or dose weights instead of recomputing them. If re-processing the same dataset, or processing a separate dataset with similar collection circumstances, the hyperparameters will likely be transferrable.
final processing stageparameter can be used to stop the job early. For example, after computing motion hyperparameters or dose weights.
Save results in 16-bit floating pointtoggle will cause the motion-corrected particles to be written to disk in half precision (float16 format, see the guide page for more information). Though off by default, this is not known to harm subsequent refinement quality in most cases, and reduces the disk space consumed by 50%. Its use is encouraged.
The method for searching hyperparameters in reference-based motion correction is important because the resulting trajectories depend heavily on hyperparameter selection, as discussed above. The following hyperparameter search method is designed to avoid poor selections on a wide range of test datasets.
The hyperparameter search is done in a cylindrical coordinate system of the 3-dimensional hyperparameter space. The hyperparameters consist of the total prior strength r, acceleration/spatial prior balance Θ, and spatial correlation distance z. At the start of the search, a number of “rays” (each of which have a fixed z and theta) are created at predetermined positions. During each iteration, the search proceeds outwards along each ray. Once the hyperparameters determined by a ray result in no particle motion, that ray is retired since further increasing the prior strength will have no effect.
- The number of rays that are searched is controlled by the
hyperparameter search thoroughnessparameter, which has 3 options: Fast, Balanced, and Extensive. The fast setting is usually sufficient, and completes in the shortest amount of time. The other settings use more rays, at the expense of more computation time.
- The parameter
maximum total prior strengthlimits the search distance along the rays (i.e., keeping the cylindrical search space “skinnier”). To determine the necessary total prior strength, monitoring trajectory activity and the cross-validation score is helpful. Trajectory activity is the average (across the micrographs used in the hyperparameter search) of the per-micrograph 75th percentiles of trajectory length (relative to rigid motion). If, on the last iteration, the trajectory activity hasn’t reached a value very close to zero, the maximum total prior strength may need to be increased.
Fraction of FCs to use for alignmentdetermines how many of the Fourier components are used when computing the trajectories, with the remainder being used for cross-validation (see the theoretical overview section for details). The default setting usually does not need to be changed.
- Only a subset of the overall dataset is needed to estimate hyperparameters. The parameter
Target number of particlesdetermines the number of particles to be used (micrographs are randomly selected from the dataset until this threshold is exceeded or the entire dataset has been used).
- If you wish to skip the hyperparameter optimization stage entirely, you can do so either by connecting hyperparameters from a previous job, or by manually entering numerical values in the three
override:parameters. You must supply either all three of these overrides, or none of them.
- Hyperparameter search only uses the lower-frequency Fourier components when computing trajectories. By default, the final iteration uses all frequency components instead. This improves the quality of the final step and is usually best, but can be turned off by turning
Use all Fourier componentsoff.
Output F-crop factorparameter can be used to reduce the size of the output particles by Fourier-space cropping. By default, the particles are extracted using the raw pixel size of the movies (including the upsampling factor, in the case of EER movies), and at whatever box size is necessary for the extracted particles to have the same physical extent as the reference volume. Note that this does not necessarily equate with the motion corrected pixel size used in earlier processing steps (e.g., super-resolution movies).
Although motion correction calculations are performed on GPUs, a fast CPU is necessary to load data into the GPU. Thus, a given configuration of GPUs may be “too fast” for a given CPU, which would result in GPUs being occupied but not performing at their best.
- Increasing the
Number of GPUsparameter can speed up processing. Good performance scaling to more than 3 GPUs usually requires a reasonably modern and fast CPU (e.g., 3rd generation Intel Xeon scalable, AMD Epyc Rome, etc).
- Any GPUs with more VRAM than the
GPU oversubscription memory thresholdwill work on two micrographs at a time instead of one. This can speed up processing, but increases the demand on the CPU. Setting this greater than or equal to GPU VRAM will force a single movie per GPU.
in-memory cache sizeparameter controls how much RAM is set aside for caching data in the hyperparameter estimation step. This parameter should be set between 60% and 80% of your machine’s RAM, preferably lower unless the machine has more than 256 GB of RAM.
Most cryo-EM motion correction methods, including Patch Motion Correction, use a dose-weighting scheme predicted from the physics of beam-induced radiation damage along with experimental data on a well-characterized specimen (Grant & Grigorieff, 2015). By contrast, Reference-Based Motion Correction calculates empirical dose weights, on a per-dataset basis, based on the Fourier Cylinder Correlation, or “FCC” (Zivanov, Nakane & Scheres, 2019). The FCC is a measure of how well the aligned frames correlate with the reference volume projections as a function of frame number and spatial frequency. First, the reference volume is projected using the particle’s pose. Then, for each frame, the correlation between the reference projection and the frame image is calculated at each resolution.
During reconstruction, particle images contribute information to the volume across all frequencies, and the images themselves are averages of the patches from movie frames. Empirical dose weighting allows for these sums to be weighted by the “quality” of the particle, as measured by each frame’s correlation with the reference volume at each frequency. These dose-weights are calculated by fitting a model to the FCC, then normalizing each column (i.e., each resolution).
The first frame has the least radiation damage, and so, for a perfectly static sample, it is theoretically the best source of high-resolution information. However, it is somewhat common for the first frame to exhibit poor correlation at high frequencies due to initial beam-induced motion. In these cases it’s best to trust a slightly later frame (e.g. 2 or 3) for the most high frequency detail. Empirical dose weights account for this; we have found that this effect is responsible for a significant proportion of the typical resolution improvement from the reference motion job overall.
The final stage of processing shows an overall progress bar and prints out a few example diagnostic plots. The following pair of plots is generated for the first 20 movies processed. After 20 movies, processing continues, but no further plots are generated (refer to the progress bar at the top of the log checkpoint to see overall progress).
A schematic overview of a micrograph showing particle locations and trajectories (axis labels are in pixels)
Example motion-corrected particles.
Also as of CryoSPARC v4.4, Patch Motion Correction has a new optional input for dose weights.
A hyperparameter output group from a reference-based motion correction job can be connected here to use the computed empirical dose weights instead of the standard dose-weighting curve. This might be of use if, for example, there are several datasets to process which were collected at the same time under the same conditions. Since the empirical dose weight computation is sometimes a significant portion of the overall benefit of doing reference-based motion correction, and since reference-based motion correction is quite computationally expensive, it may be possible and convenient to capture some of the benefit at much lower cost in this fashion.
In our testing an improvement in FSC resolution of about 0.2 Å is common.
In the following images, reference motion correction is compared against Patch Motion Correction on EMPIAR-10061 (beta-galactosidase). In the below images, the blue mesh is Patch Motion, the red is Reference Based Motion Correction. The model (PDB 6DRV) is for illustrative purposes only and has not been refined against the improved map.
In the following images of the same maps, gray is the result from Patch Motion, while cyan is the result from Reference Based Motion Correction.
Finally, for this dataset, we can see from overlayed FSC curves that within reference-based motion correction, the trajectory optimization and empirical dose weighting both contribute to the improvement in resolution, with dose weighting providing a slightly larger part of the improvement. This finding means that for subsequent similar dataset collections, a sizeable improvement could be had by reusing the dose weights estimated from this data in a new patch motion correction job.
The following plots compare the FSC curves from Patch Motion Correction versus reference-based motion correction on the heterogeneous EMPIAR-10261 (Nav1.7 ion channel) dataset. In this case, two volumes were connected as input as both the open and closed conformations of the channel are present in the data. The results show that resolutions for both classes improved.
Class 1, patch motion correction
Class 2, patch motion correction
Class 1, reference-based motion correction
Class 2, reference-based motion correction
Jasenko Zivanov, Takanori Nakane and Sjors H. W. Scheres (2019). A Bayesian approach to beam-induced motion correction in cryo-EM single-particle analysis. IUCrJ, 6, 5-17.
Timothy Grant and Nikolaus Grigorieff (2015). Measuring the optimal exposure for single particle cryo-EM using a 2.6 Å reconstruction of rotavirus VP6. eLife 4:e06980.