Expectation Maximization in Cryo-EM
An overview of the expectation maximization algorithm.
Last updated
An overview of the expectation maximization algorithm.
Last updated
In statistical estimation, the Expectation Maximization algorithm (hereafter E-M) is a broadly useful method for finding the most likely values of unknown or unobserved quantities given a set of observed data points. E-M is used in a wide range of fields for a variety of purposes, and an in-depth explanation of the theory behind it is beyond the scope of this guide. The goal of this page is to explain the basic concepts of E-M as it pertains to estimation of 2D and 3D reconstructions from cryo-EM data. A basic understanding will make it easier to understand what CryoSPARC jobs are doing, when you might want to change the values of various parameters, and the useful ranges of parameter values.
For cryo-EM data processing, the E-M algorithm is used whenever there is:
an unknown, unobserved, but important quantity that we wish to estimate (such as a 2D class average or a 3D density map),
a set of observed data that are related to the unknown quantity (such as particle images), and
for each observed datapoint, one or more "nuisance" quantities that are unobserved and unknown and are not themselves important (such as the pose of the target molecule in each particle image).
This guide explains E-M in the context of 2D classification, which is the simplest scenario where E-M is applied. The concepts explained herein can also be helpful in understanding how 3D refinement and 3D classification work.
To simplify discussion of E-M, we will first define a few terms.
You’re likely already familiar with the concept of probability. For example, if you’re asked how often a fair coin comes up heads or tails, you’d say 50% of the time (because the probability of a fair coin coming up heads is 0.5).
For the purposes of understanding cryo-EM reconstruction, you can consider the terms probability and likelihood to be interchangeable. We will often consider the likelihood of an unknown variable taking on a particular value, given the data we have observed. This is known as a conditional likelihood.
The goal of most cryo-EM methods is finding the reference which produced a given set of particle images. In 2D Classification the reference would be the 2D class averages, since we assume that all particles are images of the class averages. In 3D refinements the reference would be the 3D volume, since we assume that all particle images are projections of a 3D volume.
The combination of rotation and translation which describe the relationship between the reference and the particle image is known as that image’s pose. In two dimensions, the pose comprises a single rotation and two translations (X and Y). In three dimensions, there are still only two translations (since the Z translation is essentially modeled by the defocus), but three rotations.
The 3D pose consists of the three rotations and two translations which make the projection of the reference match a given particle image.
If we knew each particle’s pose with perfect accuracy, we could recreate a 2D class average simply by rotating and translating each particle in that average as described by its pose and averaging them together. Unfortunately, we do not know each particle’s pose. This is the missing, unobserved data. We therefore aim to estimate the most likely class average using our incomplete knowledge of the system, and we will use E-M to do so.
E-M involves the repeated application of two steps: the expectation step and the maximization step.
In the expectation step, we use our current estimate of the reference to update the particles’ poses. In the case of 2D Classification, this means comparing the particle image to the current reference and choosing the pose which minimizes the discrepancy between the reference and the image.
In the maximization step, we use the new poses found in the expectation step to update the class average. In the case of a 2D Classification, this means taking all of the particle images belonging to a class, rotating and translating them according to their pose, and averaging them together. The maximization step is also often referred to as backprojection or reconstruction.
A single application of these two steps will not find the optimal class average. The poses found during the expectation step likely have some amount of error because they were aligned to a poor initial class average. Likewise, the new class average created by the maximization step was made with imperfect poses and so has some amount of blurring and error. To improve the poses and the class averages, these two steps are iterated. The expectation step is repeated with the new class average, yielding improved pose estimates. These new estimates are used to produce a new class average, which can in turn be used to create a third set of even better pose estimates, etc. This forms the E-M algorithm.
A useful property of the E-M algorithm is that the iterations are guaranteed to converge after a sufficient number of iterations are performed. Moreover, it will converge to the most likely estimate of the unknown variables (the class averages and the poses) if the initial guesses were sufficiently close to correct. There is no universally applicable way to know when the E-M algorithm has converged. In the context of cryo-EM, the steps are often either repeated a pre-defined number of times, or repeated until some metric (like the GSFSC resolution) stops improving.
During the expectation step, we check how well the image matches the reference from each possible pose. Up to this point, we have been using the single most likely pose as our best guess. Instead, we can assign a likelihood to each pose based on how well the image matches from that pose. Then, during the maximization step, but we can “smear” the particle across every pose, weighted by how likely that pose is. This process of “smearing” is called marginalization.
For a concrete example, consider the following particle image and class average. When the particle image and class average are both noiseless (top) we can assign its pose with a high degree of confidence — it would be surprising if we were wrong. However, when the class average and the particle image are noisy (bottom), we are less confident in our assignment.
When we assess poses, we measure the error between the pose and the reference (in this case, the reference is the class average). The pose with the notch at the bottom has essentially no error, while the other poses are disagree with the reference to varying degrees. The notch-down pose would be used with a weight of 1.0 in backprojection. More abstractly, we could think of that particle image as contributing all of its “signal mass” to the class average in that pose.
In the noisy case, each of the three poses have some degree of error due to noise in both the image and the reference. Despite the fact that the pose with the notch at the bottom is still the best, we may decide we do not want to discard these other poses which aren’t quite as good, since we cannot be sure they are wrong. We would then marginalize over pose by including this particle in the average in each of the three poses, weighted by the likelihood we calculated for that pose. In this case, perhaps we’d assign a weight of 0.75 to the first pose and 0.125 to each of the other two. The particle would then contribute 75% of its “signal mass” to the average in the first pose and 12.5% in each of the other two poses.
Marginalization is the general process of allowing a particle to contribute information while accounting for uncertainty in a variable. For 2D classification, for example, those variables are pose and class. The choice to marginalize can be made separately for each variable. For instance, the Hard classify for last iteration
parameter turns off marginalization over class for the final iteration by forcing the particles to only contribute to one class. This case, in which particles only contribute in their best condition, is called maximization instead. For a final example, allowing particles to contribute only to their best class, but in all poses weighted by the probability of that pose, would be maximizing over class, but marginalizing over pose.
The probability of a pose (equivalently, the error between the particle image and the class average) is a function of three things:
The pose itself, i.e., intrinsic error. There is, ultimately, only one correct pose.
Noise in the particle images. Noisy images make the correct pose seem worse, since the image is no longer a perfect match for the projected volume. Noisy images may also make wrong poses seem better, if by random chance the noise improves agreement between a wrong pose and the reference.
Noise/errors in the reference. For example, in the first iteration of 2D Classification, the class averages are initialized essentially randomly. There is not a correct pose in this case — the class averages are fundamentally wrong. On the other hand, in the final iteration of a 2D Classification, the averages are likely quite good, so one particular pose should be better than the rest.
Unfortunately, we have no way of knowing how much of the error in a particular pose is due to each of the three possible sources, but we only want to discard a pose for having a high intrinsic error.
To disentangle the three sources of error, we use a noise model. The noise model tells us how reliable our current system of poses, averages, and images are at each frequency. We can calculate a noise model directly from the images using the current best poses and class averages.
By incorporating the noise model into the error calculation, we become more or less confident in the quality of a pose, depending on the level of noise.
Thus far in this page, we have focused on 2D classification and not 3D refinement. This is solely because it is easier to think about and display 2D operations — the same algorithms and principles apply in the 3D context. For instance, in a Homogeneous or Non-Uniform Refinement, the reference is now a volume and the pose now has three rotations, but the overall iterative process remains the same.
Just as in the two-dimensional case, this iterative process improves the quality and accuracy of both the volume and the particles’ pose estimates. Eventually, the process converges — the GSFSC volume stops improving and the particles’ poses stop changing.
In this animation, each iteration of a Non-Uniform Refinement is shown. The map used for alignment at each iteration is shown on the left. The poses of a random selection of five particles are plotted on the right. In the first iteration, the particles’s poses change by a lot (note the large movement of the yellow triangle). This is because the volume also improves significantly between those two iterations, so the pose estimates become much more accurate. Subsequent iterations fine-tune the poses, but none of the particles move as dramatically after the first.
E-M is used to find the most likely pose and class for each particle. Pose marginalization is controlled by Force max over poses/shifts
, while class marginalization can be turned off for the final Maximization step by turning on Hard classify for last iteration
.
E-M is used to find the most likely pose for each particle. Pose marginalization is controlled by the Adaptive Marginalization
parameter. Adaptive Marginalization
is off by default in Homogeneous Refinement, but on by default in Non-Uniform Refinement.
E-M is used to find the most likely pose and class for each particle. Class marginalization is controlled by the Force hard classification
parameter (off by default). Poses are always maximized.
Poses and class probabilities are fixed in this job, but class membership is marginalized using existing probabilities unless Force hard classification
is turned on.
E-M is used to find the most likely class for each particle (poses are fixed). Class membership is marginalized unless Force hard classification
is turned on. Class similarity
provides an additional control over how particles are marginalized over class membership.
E-M is used to find the most likely pose for each particle. Poses are marginalized if Marginalization
is turned on. Additionally, the likelihood of each pose can be fine-tuned with a prior probability if Use pose/shift gaussian prior during alignment
is turned on.
E-M is used to find the pose for each image. No marginalization is performed.
and