Job: Reference Based Motion Correction (BETA)
Last updated
Last updated
Use a high-quality reference volume, particle poses, and particle positions to estimate per-particle movement trajectories and empirical dose weights.
Reference-based motion correction is an extension of Patch Motion Correction. Using known particle poses and positions, precise movement trajectories can be calculated for each particle. In addition, the effect of radiation damage during an exposure can be empirically measured and accounted for by weighting. In some cases, these procedures yield a significant improvement in final map quality.
The concepts and method in CryoSPARC’s Reference Based Motion Correction are inspired by Bayesian Polishing (Zivanov, Nakane & Scheres, 2019). CryoSPARC’s implementation includes a new method for hyperparameter optimization, is multi-GPU accelerated and optimized, and includes support for multiple reference volumes, thereby enabling simultaneous motion correction for particles from different conformations/species. In addition, patch motion correction can take as input the empirical dose weights estimated by Reference Based Motion Correction on a different dataset.
This job type has an accompanying tutorial video:
To properly match a particle with its given reference, Reference Based Motion Correction accepts sets of Particles, Volumes, and Masks. These sets are given in separate numbered inputs. To include more than one set, increase the Number of Reference Volumes
parameter.
As inputs, the job requires movies, particles, and one or more reference volumes. The connected movies must have rigid motion estimates and background estimates. A patch motion correction job provides these estimates.
The connected reference volumes must have half-maps and a mask. Jobs such as homogeneous refinement include a mask with the volume output, but the user can always provide a mask of their own by connecting an input to the optional “Static mask N” slots.
Reference-based motion correction supports heterogeneous datasets. A parameter titled Number of reference volumes
(default 1) can be increased to allow multiple particle stacks and reference volumes to be connected.
It is recommended that you provide particles from the same refinement as the reference volume, and that the refinement job had the minimize over per-particle scale
switch on.
If you wish, you can connect dose weights and/or motion hyperparameters from a previous job into the hyperparameters
input group. If you do so, the job will use the supplied motion hyperparameters and/or dose weights instead of recomputing them. If re-processing the same dataset, or processing a separate dataset with similar collection circumstances, the hyperparameters will likely be transferrable.
This parameter can be used to stop the job early. For example, after computing motion hyperparameters or dose weights.
Turning this setting on will cause the motion-corrected particles to be written to disk in half precision (float16 format, see the guide page for more information). Though off by default, this is not known to harm subsequent refinement quality in most cases, and reduces the disk space consumed by 50%. Its use is encouraged.
Normally, the reference-based motion correction job will divide an EER file into a number of fractions that was specified when the movies were imported. This parameter allows the fraction count to be overridden. This parameter can only be used if no frames were discarded in patch motion correction (through the use of the start and end frame parameters).
If this parameter is active, then the input particles will be re-centered (their pick locations on the movie will be adjusted) based on their optimized shifts from the upstream refinement.
If some input movies have a different number of frames from the rest, the job will fail. If the skip movies with wrong frame count
switch is on, then the most common frame count will be assumed to be correct and all movies that don't have that frame count will be discarded by the job.
The number of rays that are searched is controlled by the hyperparameter search thoroughness
parameter, which has 3 options: Fast, Balanced, and Extensive. The fast setting is usually sufficient, and completes in the shortest amount of time. The other settings use more rays, at the expense of more computation time. For an explanation of what these rays represent, see the Hyperparameter Search section.
The parameter maximum total prior strength
limits the total strength of the priors. To determine the necessary total prior strength, monitoring trajectory activity and the cross-validation score is helpful. Trajectory activity is the average (across the micrographs used in the hyperparameter search) of the per-micrograph 75th percentiles of trajectory length (relative to rigid motion). If, on the last iteration, the trajectory activity hasn’t reached a value very close to zero, the maximum total prior strength may need to be increased.
This parameter determines how many of the Fourier components are used when computing the trajectories, with the remainder being used for cross-validation (see the theoretical overview section for details). The default setting usually does not need to be changed.
Only a subset of the overall dataset is needed to estimate hyperparameters. The Target number of particles
parameter sets the number of particles to be used. Micrographs are randomly selected from the dataset one-by-one until they have at least this many particles, or the entire dataset is used.
If you wish to skip the hyperparameter optimization stage entirely, you can do so either by connecting hyperparameters from a previous job, or by manually entering numerical values in the three override:
parameters.
You must supply either all three of these overrides, or none of them.
Hyperparameter search only uses the lower-frequency Fourier components when computing trajectories. By default, the final iteration uses all frequency components instead. This improves the quality of the final step and is usually best, but can be turned off by turning Use all Fourier components
off.
The Fourier-crop to box size
parameter can be used to reduce the pixel resolution of the output particles by Fourier-space cropping. By default, the particles are extracted using the raw pixel size of the movies (including the upsampling factor, in the case of EER movies) and whatever box size is necessary for the extracted particles to have the same physical extent as the reference volume.
The default box size and resolution do not necessarily equate with the motion corrected pixel size used in earlier processing steps (e.g., super-resolution movies).
Increasing the Number of GPUs
parameter can speed up processing. Good performance scaling to more than 3 GPUs usually requires a reasonably modern and fast CPU (e.g., 3rd generation Intel Xeon scalable, AMD Epyc Rome, etc).
Although motion correction calculations are performed on GPUs, a fast CPU is necessary to load data into the GPU. Thus, a given configuration of GPUs may be “too fast” for a given CPU, which would result in GPUs being occupied but not performing at their best.
Any GPUs with more VRAM than the GPU oversubscription memory threshold
will work on two micrographs at a time instead of one. This can speed up processing, but increases the demand on the CPU. Setting this greater than or equal to GPU VRAM will force a single movie per GPU.
The in-memory cache size
parameter controls how much RAM is set aside for caching data in the hyperparameter estimation step. This parameter should be set between 60% and 80% of your machine’s RAM, preferably lower unless the machine has more than 256 GB of RAM.
Normally, the fastest available GPU serves two simultaneous roles: it is responsible for creating particle references by projecting the reference volume through Fourier-space slicing, and it also acts as one of the workers computing trajectory estimates. In problems that have very high VRAM requirements, this can cause the job to fail due to insufficient GPU memory. Turning this switch off will isolate the first GPU for computing references only, thereby reducing VRAM pressure on that GPU. However, doing so also means that the job cannot run unless it is assigned at least two GPUs.
Most cryo-EM motion correction methods, including Patch Motion Correction, use a dose-weighting scheme predicted from the physics of beam-induced radiation damage along with experimental data on a well-characterized specimen (Grant & Grigorieff, 2015). By contrast, Reference-Based Motion Correction calculates empirical dose weights, on a per-dataset basis, based on the Fourier Cylinder Correlation, or “FCC” (Zivanov, Nakane & Scheres, 2019). The FCC is a measure of how well the aligned frames correlate with the reference volume projections as a function of frame number and spatial frequency. First, the reference volume is projected using the particle’s pose. Then, for each frame, the correlation between the reference projection and the frame image is calculated at each resolution.
During reconstruction, particle images contribute information to the volume across all frequencies, and the images themselves are averages of the patches from movie frames. Empirical dose weighting allows for these sums to be weighted by the “quality” of the particle, as measured by each frame’s correlation with the reference volume at each frequency. These dose-weights are calculated by fitting a model to the FCC, then normalizing each column (i.e., each resolution).
The first frame has the least radiation damage, and so, for a perfectly static sample, it is theoretically the best source of high-resolution information. However, it is somewhat common for the first frame to exhibit poor correlation at high frequencies due to initial beam-induced motion. In these cases it’s best to trust a slightly later frame (e.g. 2 or 3) for the most high frequency detail. Empirical dose weights account for this; we have found that this effect is responsible for a significant proportion of the typical resolution improvement from the reference motion job overall.
Also as of CryoSPARC v4.4, Patch Motion Correction has a new optional input for dose weights.
A hyperparameter output group from a reference-based motion correction job can be connected here to use the computed empirical dose weights instead of the standard dose-weighting curve. This might be of use if, for example, there are several datasets to process which were collected at the same time under the same conditions. Since the empirical dose weight computation is sometimes a significant portion of the overall benefit of doing reference-based motion correction, and since reference-based motion correction is quite computationally expensive, it may be possible and convenient to capture some of the benefit at much lower cost in this fashion.
The final stage of processing shows an overall progress bar and prints out a few example diagnostic plots. The following pair of plots is generated for the first 20 movies processed. After 20 movies, processing continues, but no further plots are generated (refer to the progress bar at the top of the log checkpoint to see overall progress).
Particle images from Reference Based Motion Correction are typically used toward the end of analysis in final refinements, such as Non-Uniform or Local Refinements.
Following (Zivanov, Nakane & Scheres, 2019), Reference Based Motion Correction proceeds as follows:
For each particle in the input dataset, a synthetic reference image is created by projecting the reference volume in the particle’s pose and applying simulated CTF corruption to the resulting 2D reference.
A patch is extracted around the particle’s pick location from each frame in the movie it came from.
Each of these patches is then assigned a shift; the set of shifts across all frames makes up the particle trajectory.
The optimal trajectory is computed by finding the set of shifts which minimizes the error between the reference image and the shifted patches.
Particle images are reconstructed from frames by applying the optimal trajectories and dose weights.
Due to the particularly low signal to noise ratio present in individual movie frames, the procedure just described would naturally overfit to noise - causing wild and nonsensical trajectories. To mitigate this and following (Zivanov, Nakane & Scheres, 2019), the reference-based motion correction job uses two kinds of regularization: the spatial and acceleration priors.
The spatial prior penalizes candidate trajectories that exhibit low spatial correlation; in other words, the spatial prior encourages solutions where the trajectories of nearby particles are similar to each other. This prior has two parameters that tune its behaviour: an overall strength parameter (how strongly to penalize non-spatially-correlated trajectories) and a correlation distance (over what distance do we expect the trajectories to be similar).
The acceleration prior penalizes trajectories that have high acceleration (i.e. non-smooth trajectories). This prior has one tuning parameter: the overall strength (how strong of a penalty to apply to non-smooth trajectories).
Together the three prior parameters are called the hyperparameters of this motion estimation method. If the priors are too strong, the output will have the trajectories set to zero at every frame because the method is ignoring the data and producing a set of trajectories that satisfy the priors. If the priors are too weak they will not achieve their goal, and the method will simply align the references to the noise in the data rather than the signal.
If hyperparameters are not supplied to the job, it will estimate them using the following method.
For each particle, two references are created: one from the half-map that the particle contributed to, and one from the opposite half map. This allows for downstream cross-validation, similar to the analysis performed for non-uniform refinement.
For a given set of hyperparameters, trajectories are estimated using the low-frequency data from the particle’s half-map.
To assess the quality of a set of hyperparameters, the particle trajectories are applied only to the high-frequency part of the data. The resulting corrected images are compared to the opposite half-map. Since high-frequency noise from a particle in half-set A should not correlate with high-frequency signal in half-map B, we can trust the high-frequency correlations from this comparison. This measurement is the cross-validation score, and a more negative number indicates better agreement between the two half-sets.
The set of hyperparameters that yields the lowest total cross-validation score is deemed the best. Said another way, the trajectory which correlates best with the opposite half map is considered best.
The following hyperparameter search method is designed to avoid poor selections on a wide range of test datasets.
The hyperparameter search is done in a cylindrical coordinate system of the 3-dimensional hyperparameter space. The hyperparameters consist of the total prior strength r, acceleration/spatial prior balance Θ, and spatial correlation distance z. At the start of the search, a number of “rays” (each of which have a fixed z and theta) are created at predetermined positions. During each iteration, the search proceeds outwards along each ray. Once the hyperparameters determined by a ray result in no particle motion, that ray is retired since further increasing the prior strength will have no effect.
In our testing an improvement in FSC resolution of about 0.2 Å is common.
In the following images, reference motion correction is compared against Patch Motion Correction on EMPIAR-10061 (beta-galactosidase). In the below images, the blue mesh is Patch Motion, the red is Reference Based Motion Correction. The model (PDB 6DRV) is for illustrative purposes only and has not been refined against the improved map.
In the following images of the same maps, gray is the result from Patch Motion, while cyan is the result from Reference Based Motion Correction.
Finally, for this dataset, we can see from overlayed FSC curves that within reference-based motion correction, the trajectory optimization and empirical dose weighting both contribute to the improvement in resolution, with dose weighting providing a slightly larger part of the improvement. This finding means that for subsequent similar dataset collections, a sizeable improvement could be had by reusing the dose weights estimated from this data in a new patch motion correction job.
The following plots compare the FSC curves from Patch Motion Correction versus reference-based motion correction on the heterogeneous EMPIAR-10261 (Nav1.7 ion channel) dataset. In this case, two volumes were connected as input as both the open and closed conformations of the channel are present in the data. The results show that resolutions for both classes improved.
Jasenko Zivanov, Takanori Nakane and Sjors H. W. Scheres (2019). A Bayesian approach to beam-induced motion correction in cryo-EM single-particle analysis. IUCrJ, 6, 5-17.
Timothy Grant and Nikolaus Grigorieff (2015). Measuring the optimal exposure for single particle cryo-EM using a 2.6 Å reconstruction of rotavirus VP6. eLife 4:e06980.