Case Study: Pseudosymmetry in TRPV5 and Calmodulin (EMPIAR-10256)
Processing EMPIAR-10256 with a focus on handling pseudosymmetry and 3D Classification parameter choices.
Last updated
Processing EMPIAR-10256 with a focus on handling pseudosymmetry and 3D Classification parameter choices.
Last updated
This case study focuses on breaking TRPV5/CaM’s pseudosymmetry using two approaches:
, in which symmetry-related poses are specifically checked at each iteration of a refinement
and subsequent realignment of the particles based on their CaM position.
This case study uses the pseudosymmetric TRPV5 channel to explore the following workflow:
Refinement-based symmetry breaking
C1 symmetric “naive” refinement
Refinement using C4 symmetry relaxation
Classification-based symmetry breaking
3D Classification
Map alignment
C1 Local Refinement
3D Classification parameter investigation
Solvent and focus masks
Filter resolution and hard classification
Class number
This case study processes TRPV5 particle images from EMPIAR 10256 (Dang et al, 2019).
TRPV5 (Figure 1) is a member of the Transient Receptor Potential (TRP) family of ion channels. Unlike many other TRP channels, TRPV5 is highly selective for calcium. This selectivity is, in part, regulated by the ubiquitous calcium-binding protein calmodulin (CaM).
TRPV5 has a transmembrane domain (TMD) and a C-terminal domain (CTD). TRPV5 is a homotetramer, meaning the channel has C4 symmetry around an axis passing through the pore of the channel. However, only a single CaM binds the channel’s CTD, which breaks the C4 symmetry, making the overall symmetry of the complex C1. Note that a single TRPV5 tetramer has four binding sites where CaM could bind, but CaM’s size means that almost all particles will have only one CaM within the CTD cavity.
In the early stages of processing, the C4 symmetry of the channel (which is much larger than the CaM) may dominate particle alignments, or the user may enforce C4 symmetry without realizing that the overall complex is not actually C4 symmetric. In either case, the particles will be aligned with CaM randomly positioned in one of the four symmetry-related positions in each particle. This will cause blurring of the CaM density into a C4-symmetric blob with no interpretable features. This situation, where a symmetry-breaking feature is present in an otherwise symmetric molecule, is commonly called “pseudosymmetry”.
This data is available as a set of 66,071 particle images and a STAR file containing the poses refined by the authors. You must download these images to a filesystem accessible to your CryoSPARC installation. For example:
The resulting directory should have this structure:
Particle meta path
path/to/rawdata/EMPIAR/10256/particles_cs5040.star
Replace with the correct path to the particles_cs5040.star
file
Particle data path
path/to/rawdata/EMPIAR/10256/Micrographs
Replace with the correct path to the directory which contains the .mrcs
files
Spherical Aberration (mm)
2.7
In the particles star file, most of the particles have a spherical aberration (Cs) of 2.6. However, the paper does not mention why this is, and some particles have a Cs of 2.7 (as expected for a Titan Krios). We therefore set the Cs to 2.7 for all particles.
Input: Particle stacks
Imported particles from the Import Particle Stack job.
This map of TRPV5 (Figure 2) produced using Homogeneous Reconstruct Only has a GSFSC resolution of 3.25 Å without imposing symmetry (i.e., using the default C1 symmetry). The TMD and CTD are both visible. If we view the channel along the symmetry axis (Figure 3), there’s a floating blob of density that does not seem connected to the rest of the channel. When we view a lower contour, it’s clear that there’s some large object in the center of the CTD.
This must be calmodulin (CaM). Because TRPV5 itself is C4 symmetric and dominates particle alignments, each particle has CaM aligned in one of the four possible binding positions, so when the particles are backprojected to create the 3D map shown here, the signal for CaM is spread out over each of the four symmetry related positions. This blurs the density for CaM (which should be C1 symmetric) among the four positions, creating a nonsensical four-fold symmetric density in the middle of the ion channel (Figure 4). Note that C4 symmetry is not explicitly applied in this reconstruction; it is the incorrect alignment of particles amongst the four symmetry-related poses that causes this effect.
In this case, we might say there is a “symmetry mismatch” between TRPV5 and CaM, or that the TRPV5/CaM complex has “pseudosymmetry”. In the rest of this case study, we will focus on different strategies for recovering the correct C1 symmetric map of the CaM/TRPV5 complex from the starting C4 map.
In general, the larger an asymmetric feature is, the easier it will be to break pseudosymmetry. Indeed, for this dataset, using an Ab-Initio Reconstruction job (where symmetry is not enforced) followed by a C1 Non-Uniform Refinement can produce a map with only one copy of CaM.
We chose this dataset as a teaching example because CaM makes it visually obvious when techniques have or have not worked. On other, more challenging datasets, the most effective jobs and the best parameter values for those jobs will depend on the size of the symmetry-breaking feature.
The first strategy we’ll try is breaking the symmetry through re-refinement of particle poses. Taking a closer look at our current map (Figure 5), we can see that the map is not perfectly symmetrical. Some positions for CaM have more map density than others, likely because a slightly greater number of particles have CaM in that position than the others.
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Volume
Volume from the Homogeneous Reconstruct Only job
This map (Figure 6) still has density in all four CaM positions, but it looks better than the input map, with the positions in the left and right of this image clearly having more density than the others. If we download the results of each iteration, we can see that the map is clearly improving as the job progresses.
It’s possible that this process would continue if more iterations were performed. Indeed, it’s possible that if you run the above job, your job might have a different number of iterations — by default, Non-Uniform Refinement and other global refinements stop once the GSFSC resolution stops improving (Figure 7). However, in this case, we’re evaluating the quality of these maps by how well CaM is resolved, not by their GSFSC. We therefore may need to add some additional iterations once the GSFSC resolution stops improving. We can do this with the Number of extra final passes
parameter in a second Non-Uniform Refinement job.
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Volume
Volume from the Homogeneous Reconstruct Only job
Number of extra final passes
30
There’s not currently a good way to know ahead of time how many additional iterations might be required to resolve issues like this. Since jobs can be terminated early but not continued once they’re complete, we generally recommend setting this value to a higher number and terminating early if the volume seems to have converged.
When the refinement gets 30 extra passes, the CaM density in the top and bottom positions has completely disappeared, but the left and right positions look almost identical (Figure 8).
Inspecting the map from each iteration (Figure 9, we see that the initial iterations produces a high-quality map of TRPV5, and the later iterations slowly align CaM molecules to the left and right positions, reducing density in the top and bottom positions.
Although this TRPV5/CaM map is significantly more interpretable than the original C4 map, there are two major problems with this approach:
We aren’t taking advantage of the fact that we know CaM is in one of four positions — we’re relying on random chance to pull the CaM molecules to the same position. Because of this, many extra iterations are needed even for this incomplete alignment. The job with the default number of iterations took 16 minutes to run, whereas the second job with extra iterations took 87 minutes and still did not properly align every CaM molecule.
It’s difficult to know how many particles have been aligned correctly. In the image above it looks like there is only CaM density in the left and right positions, but at a lower contour there is still some density visible in the bottom position as well. With a smaller asymmetric domain this problem would be even more significant, as misalignment would be less obvious.
We can at least partially alleviate these problems by taking advantage of the C4 symmetry using Symmetry Relaxation.
Symmetry relaxation is an extension of the normal refinement algorithm which excels at solving structures of pseudosymmetric particles or particles with symmetry mismatch. Refinements with symmetry relaxation start the same way as a typical C1 refinement. Particles are aligned to an asymmetric reference, and their best pose is found. In a normal refinement, this would be the particle’s final pose, and the next iteration would begin.
We can set up a Non-Uniform Refinement with C4 symmetry relaxation and see how the results compare to a normal C1 refinement.
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Volume
Volume from the Homogeneous Reconstruct Only job
Symmetry
C4
Even though we know the complex is truly C1 symmetric, we provide the order of the pseudosymmetry to inform the refinement of the relevant symmetry-related poses to check.
Symmetry relaxation method
maximization
Setting this to Maximization or Marginalization turns on symmetry relaxation, as opposed to enforced C4 symmetry. See the guide page for an explanation of these two modes.
Number of extra final passes
20
Just as with the Non-Uniform Refinement above, we need extra final passes since the map will continue to improve even as the GSFSC stays constant.
This job performs better than Non-Uniform refinement without symmetry relaxation, but it still fails to break out of the two-CaM local minimum. One possible reason for this is that there are high-resolution features (for instance, in the TRPV5 CTD) that align better when the CaM is in the wrong position. In effect, the alignment is getting “distracted” by the movement of individual helices when we’d prefer that it aligns the large CaM density. We can control this behavior using the Maximum align resolution (A)
parameter.
In all refinements, information only up to a certain resolution is used when aligning the 3D map to particle images in each iteration. Usually, this resolution is determined by the GSFSC. However, in some cases (like this one), results may be better if we set a custom maximum alignment resolution. We know the feature we are trying to align is still visible at 10 Å (and lower than that!), so using only information up to 10 Å may help the refinement align the particles using information we care more about, and ignore high resolution details that may be distracting. We can set up another Non-Uniform Refinement using this parameter to test this hypothesis.
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Volume
Volume from the Homogeneous Reconstruct Only job
Symmetry
C4
Symmetry relaxation method
maximization
Number of extra final passes
20
Maximum align resolution (A)
10
Other values around 10 would likely produce similar results. The important consideration is that we want this value to be low enough that irrelevant signals are ignored, but high enough that the CaM is still visible to the alignment.
This map is lower-resolution than the previous result because it was not aligned at a high resolution. However, the CaM is markedly improved, with the second position being significantly weaker than the primary position.
Compare the CTD of these maps to the CTD of the previous iteration (Figure 10). In Figure 10, the CTD is essentially unchanging across the later iterations, whereas in Figure 11 the CTD undergoes minute conformational changes as the CaM molecules consolidate into a single position. This supports our hypothesis that the high-resolution features (the CTD) were dominating the alignment, and ignoring them using the Maximum align resolution (A)
parameter improved the CaM alignment.
One additional optimization we could make is minimizing over per-particle scale during the refinement. The per-particle scale is an adjustment factor used to account for contrast differences between the volume and the particle images. This is typically thought to relate to differences in ice thickness between the particles, which would in turn change their contrast. However, it is also useful in down-weighting particles which don’t match the volume for other reasons. We set up another Non-Uniform refinement the same as the last one, but turning on Minimize over per-particle scale
.
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Volume
Volume from the Homogeneous Reconstruct Only job
Symmetry
C4
Symmetry relaxation method
maximization
Number of extra final passes
20
Maximum align resolution (A)
10
Minimize over per-particle scale
True
Generally, per-particle scales should only be refined when the entire map is being aligned (i.e., when the alignment mask covers the entire target).
Minimizing over per-particle scale improves the map again, with the second CaM molecule almost entirely disappearing. In fact, the second CaM density is still getting weaker during the final iterations; it may be that adding a few more iterations would finish the process and produce a map with only a single CaM molecule. Indeed, setting Number of extra final passes
to 30 produces a map with only a single CaM molecule (Figure 13).
It is not straightforward to determine why per-particle scale helps at this stage. It may be that particles with the CaM in the second position (left in Figure 12) have a slightly lower per-particle scale. In the next iteration, these particles will contribute slightly less to the map, further weakening the secondary CaM position.
Input: Particles
Particles from the single-CaM Non-Uniform Refinement
Input: Volume
Volume from the single-CaM Non-Uniform Refinement
Input: Mask
Mask from the single-CaM Non-Uniform Refinement
The mask generated by Non-Uniform Refinement is typically sufficient for good results in cases like this. Making your own mask which excludes the nanodisc might produce slightly better results.
Use pose/shift gaussian prior during alignment
True
Re-center rotations each iteration?
True
Re-center shifts each iteration?
True
Initial lowpass resolution (A)
5
Generally, this parameter can be set to a few Å worse than (i.e., higher value than) the GSFSC of the consensus refinement. Higher values would likely work just as well.
Minimize over per-particle scale
True
You only want to minimize the per-particle scale when you’re refining the whole target. In this job we are refining the whole ion channel, so it can be turned on.
The Local Refinement produces a high-resolution map (Figure 14, 3.3 Å) of TRPV5 and CaM. This two-step procedure, in which we first align the particles using only low-resolution information, then locally refine the poses, is a useful technique for a wide variety of pseudosymmetric samples. However, it generally works best when the symmetry-breaking feature is large. In cases where the symmetry-breaking feature is small, or when the initial map is highly symmetric, a classification-based workflow may perform better.
In each of the refinement-based solutions above, we attempted to iteratively align all of the particles to a single reference. However, the reference starts out being nearly C4 symmetric, so there is little difference between correct alignment of the CaM molecule and the other three symmetry related poses, relative to the reference. We may get better results if we first classify the particles into different classes based on how CaM is currently aligned, then align each of the resulting volumes such that the CaM density is in the same place (Figure 15).
When setting up any 3D Classification job, the most important considerations are
the number of classes
the focus and solvent masks
the filter resolution
In this case, the number of classes is obvious — there are four symmetry related positions, so we should ask for four classes. If we later suspect there may be some particles with more than one (or no) CaM molecules, we may come back and try 3D Classification jobs with more classes.
3D Classification can use two masks: a solvent mask and a focus mask (Figure 16). The solvent mask defines the regions in which density from the target molecule could exist. Outside the solvent mask, all classes are held to be zero. The focus mask defines the region in which we expect the classes to differ. Inside this mask, density is allowed to be different in each class. Density outside this mask is forced to be the same across all classes.
For a specific class, the density inside the focus mask is produced by backprojecting the particles in that specific class, just like in a refinement. Thus, each class’s final density is composed of the shared density inside the solvent mask plus the class’s unique density inside the focus mask. These density volumes still match the experimental images well (because they contain density for all parts of the molecule) but the job will not try to classify particles based on features we’re not interested in (i.e., features outside the focus mask).
The filter resolution sets a lowpass filter applied to the volumes during classification. This filter should generally be set to the lowest resolution for which the difference we’re interested in is still visible. In essence, this helps the 3D Classification job “ignore” differences which don’t matter (for example, the movement of individual helices in the CTD) and focus on the domain we’re interested in.
To start with, let’s create a 3D Classification job with a solvent mask (dark blue, Figure 17) around the entire ion channel and a focus mask (yellow) covering the four CaM positions.
We cover the entire protein and nanodisc with the solvent mask. If the solvent mask covered only the four CaM positions, the majority of the channel would be missing from the volume. These volumes would not match the images well (which still have the whole channel), and so classification may suffer.
We cover only the four CaM positions with the focus mask. This lets classes differ from each other only in the region where CaM binds. If the focus mask covered the whole channel, the classification may focus on uninteresting differences in the rest of the channel.
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Solvent mask
Solvent mask (dark blue mesh in Figure 17)
Input: Focus mask
Focus mask (yellow mesh in Figure 17)
Number of classes
4
Note that 3D Classification does not require input volumes. It will make 4 reconstructions using a small number of particles to seed the classes.
Filter resolution (Å)
4
This resolution might be too high (i.e., too low a numeric value). Remember, generally the filter resolution should be set to the lowest resolution at which you can still see the difference between the classes.
Init structure lowpass resolution (Å)
7
Generally, the initial structure lowpass resolution can be set a few Å greater than the filter resolution. Other values are likely acceptable as well.
While the job runs, we can inspect a few plots to monitor its progress. The per-particle class ESS (Figure 18) tells us, essentially, how many classes each particle is assigned to. We see this job is assigning most particles to only a single class (i.e., their ESS is close to 1.0). Generally this is a good sign, as it means most particles can be assigned to a class with high confidence.
Next, we can monitor the real-space difference from consensus (Figure 19). In this plot, red indicates a greater density than the consensus map while blue indicates weaker density. The consensus map has equal density in each of the four CaM positions. Thus, if classification is successful, we’d see a single cam position turn red (higher density than the consensus) and others turn blue (lower density than consensus).
It’s clear here that each class has stronger density for one CaM and weaker density for the others. Moreover, each class seems to have a more density in a distinct CaM position, so we have likely caught all four possible orientations of the particle!
Once the job finishes, we can download the volume series and see that we’ve successfully separated out the four orientations (Figure 20).
Input: Reference map
Class 0 from the previous 3D Classification
The choice of class here is arbitrary.
Input: Maps to align (volumes group)
All volumes
output from the previous 3D Classification
Input: Particles (all)
All particles
output from previous 3D Classification
This input will not appear until you turn on Update particle alignments
.
Update particle alignments
True
Without this option, the particles input will not appear and the job will only align the volumes.
Input: Particle stacks
All particles
output from Align 3D Maps
Input: Initial volume
Class 0 from the previous 3D Classification
This class must be the same one selected as the reference volume in Align 3D Maps
Input: Static mask
A mask covering the entire channel and the single CaM position
Use pose/shift gaussian prior during alignment
True
Standard deviation (deg) of prior over rotation
5
The choice here is somewhat arbitrary; many other values would likely produce good results. We know that the particles should start fairly well aligned, so we can use a small search prior and extent.
Standard deviation (A) of prior over shifts
2
Re-center rotations each iteration?
True
Re-center shifts each iteration?
True
Initial lowpass resolution (A)
6
Generally, this parameter can be set to a few Å worse than (i.e., higher value than) the GSFSC of the consensus refinement. Higher values would likely work just as well.
Minimize over per-particle scale
True
Typically, this is left turned off for Local Refinements, since you only want to minimize the per-particle scale when you’re refining the whole complex. In this case, we are refining the whole complex, and some particles may have a scale that is too low if their CaM was initially misaligned.
While the map of CaM is obviously significantly improved in the C1 case, the surrounding TRPV5 density looks similar in the two maps (Figure 23). There is perhaps slightly more noise in the C4 map from unmodeled CaM density, but enforcing C4 symmetry yields slightly more defined side chains in the channel. A good model of TRPV5 (not including CaM) could be built into either map.
Because CaM is so large, the first parameters we chose were able to adequately separate the particles. This is rarely the case! It often takes testing several combinations of parameters over many different jobs to produce good results. In this section we walk through several other settings which produce worse results in this case, but may be useful for your own studies later on.
As mentioned above, 3D Classification uses two masks (Figure 16). The solvent mask exists to ensure that the parts of the volume not being classified still match the particle images, but you could run a classification job providing only the focus mask. In this case, the focus mask is also used as the solvent mask, so the classes are zero outside the focus mask and differ per-class inside it. This is essentially how masks operate in other jobs which only accept one mask.
We can assess the importance of using a separate solvent and focus mask by creating a job which is identical to the 3D Classification job above, but providing the focus mask to the solvent mask slot.
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Solvent mask
Focus mask (yellow mesh in Figure 17)
If we only provide a focus mask, a solvent mask will be automatically generated from the structure and cover the entire ion channel. To use the focus mask as the only mask we must explicitly provide it to the solvent mask slot.
Input: Focus mask
We could provide the focus mask here too but it is unnecessary and will produce the same result as providing it only to the solvent mask slot.
Number of classes
4
Filter resolution (Å)
4
Init structure lowpass resolution (Å)
7
The classes from this job (Figure 24) have much greater density for other CaM positions than when we used both masks (Figure 20). This may be due to a mismatch between the volume used for classification (which only has the central portion of the channel) and the particle images (which still have the entire ion channel) when large portions of the target are outside the solvent mask (and therefore set to 0).
Next, we can investigate the effect of including only a solvent mask by creating another copy of the first 3D Classification job with only the solvent mask. Note that this is nearly equivalent to providing no masks at all, because 3D Classification will create a solvent mask from the consensus refinement if none are provided.
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Solvent mask
Solvent mask (dark blue mesh in Figure 17)
Input: Focus mask
Number of classes
4
Filter resolution (Å)
4
Init structure lowpass resolution (Å)
7
Only classes 1 and 2 look anything like TRPV5, and class 1 has very poor resolution (Figure 25). This is because, of the input 66k particles:
65k are in class 2,
1k are in class 1,
20 are in class 3,
and 10 are in class 0.
In other words, particles weren’t classified at all! This result is sometimes called “class collapse” and can be caused by a variety of factors. Generally, the following adjustments to a 3D Classification job may help prevent class collapse:
Turning off hard classification, if it was on. This helps by allowing good particles to “blur” across classes, reducing the likelihood that one class gets all of the particles.
Increasing the Filter resolution (Å)
parameter (i.e., to a higher numeric value and “lower” resolution). This helps by reducing the amount of high-frequency noise in the volumes. High frequency noise in one class may cause particles to accumulate in another.
Using both a focus and solvent mask. Using both mask types creates a region of the volume which is the same in all volumes, preventing one map from becoming better than the others on a global scale.
Filter resolution
sets the resolution to which volumes are filtered during classification. In other words, 3D Classification ignores all frequencies beyond the filter resolution value. In this way, it is similar to the Maximum align resolution
parameter in Non-Uniform or Homogeneous Refinement.
Generally, it's best to set the Filter resolution to the coarsest resolution at which you can still see the feature of interest. For example, consider the single-CaM Local Refinement map filtered to various resolutions (Figure 26).
At 4 Å, alpha helices are still clearly visible in the CTD, along with some density for bulky side chains. This may bias the classification to focus on finer details, disregarding the movement of CaM entirely. At the other end of the scale, the 20 Å map has lost the distinction between the CaM in the top-right and the lower-right CTD — it may therefore be difficult to tell the difference between a particle with CaM in the top and bottom right. With this information, we can set up a 3D Classification job with the filter resolution set much lower than 4.
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Solvent mask
Solvent mask (dark blue mesh in Figure 17)
Input: Focus mask
Focus mask (yellow mesh in Figure 17)
Number of classes
4
Filter resolution (Å)
12
The exact choice of resolution here is not critical. Values from 10—14 Å would likely produce comparable results.
Init structure lowpass resolution (Å)
14
Generally, the initial structure lowpass resolution can be set a few Å greater than the filter resolution. Other values are likely acceptable as well.
Increasing the filter resolution to 12 Å seems to have slightly degraded performance, as most of the volumes have a small amount of density in second CaM positions (Figure 27). However, it is somewhat challenging to assess these volumes due to their low resolution. When you download volumes from 3D Classification, these volumes:
Are filtered to the resolution set by the Filter resolution (Å)
parameter
Are not “hard classified” (unless Force hard classification
is turned on). Put another way, particles contribute to each volume weighted by the probability that the particle belongs to that class.
Input: Particles (all classes)
All particles from the 3D Classification with Filter Resolution 12
Input: Static mask
Solvent mask from the 3D Classification with Filter Resolution 12
Force hard classification
True
Turning on hard classification here lets us assess classes based on volumes produced only using particles from that class. Since the class particle outputs we are always only using particles assigned to that class, volumes produced with Force hard classification turned on will more accurately reflect the class composition.
The reconstructed maps (Figure 28) certainly show some remaining CaM in secondary positions. Compared to the classification performed with a 4 Å filter resolution (Figure 20), this is a worse result.
To resolve this problem, we could try a different Filter resolution value. However, sometimes turning on Force hard classification during the 3D Classification job improves results without other changes. To test this, we can create a clone of the 12 Å job and simply turn on Force Hard Classification.
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Solvent mask
Solvent mask (dark blue mesh in Figure 17)
Input: Focus mask
Focus mask (yellow mesh in Figure 17)
Number of classes
4
Force hard classification
True
Filter resolution (Å)
12
Init structure lowpass resolution (Å)
14
Turning on hard classification improved results, with all four classes having reduced CaM density in secondary positions (Figure 29). This may be because when particles were forced to contribute density only the class they matched best, the classes rapidly lost CaM in secondary positions.
Of course, the original 4 Å filter resolution job performed this well (if not better) with hard classification off. Perhaps turning on hard classification in the original job would further improve results. We can test this with another 3D Classification job.
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Solvent mask
Solvent mask (dark blue mesh in Figure 17)
Input: Focus mask
Focus mask (yellow mesh in Figure 17)
Number of classes
4
Force hard classification
True
Filter resolution (Å)
4
Init structure lowpass resolution (Å)
7
The surprising results of this classification are far worse than the original 4 Å filtered classification and both of the 12 Å classifications (Figure 30)! The most likely explanation for this result is that, in early iterations, particles match high-resolution features in a class which has CaM in a different position. If hard classification is off, the particle can still contribute some weight to the class with CaM in the right position, and will contribute less to the class it is assigned to. Compared to a classification which is hard classified, this will weaken the wrong CaM and strengthen the right CaM. This effect will compound until the particle is moved to the correct class.
This interplay between hard classification and filter resolution is an important observation. In general (but not always), classifications with a finer filter resolution (a “higher resolution”) will perform better with hard classification off. Conversely, classifications with a coarser filter resolution (a “lower resolution”) may benefit from turning hard classification on.
All of the 3D Classification jobs we’ve run so far have requested four classes. In this case, we know this is the correct number: there are four possible positions for CaM, so we need four classes. When you are classifying based on symmetry, you can rely on combinatorics in this way to calculate the theoretical number of classes. However, when working with more complex classification scenarios, the optimal number of classes is harder to determine.
For one example, consider a job which is identical to the first 3D Classification except we request six classes instead of four:
Input: Particles
Particles from the Homogeneous Reconstruct Only job
Input: Solvent mask
Solvent mask (dark blue mesh in Figure 17)
Input: Focus mask
Focus mask (yellow mesh in Figure 17)
Number of classes
6
Filter resolution (Å)
4
Init structure lowpass resolution (Å)
7
Among the six volumes produced by this job (Figure 31), there is at least one volume with CaM in each position. Class 0 has CaM in the top-left, classes 1 and 5 have CaM in the bottom-left, class 3 in the bottom-right, and classes 2 and 4 in the top-right.
Increasing the number of classes increased the density in secondary CaM positions in all of the classes compared to the four-class job. This effect is especially noticeable in classes 1 and 2 (each of which have a second class corresponding to their primary CaM position). Most likely, this is a result of particles with one CaM being incorrectly assigned to these classes.
However, we do not (and cannot) know for sure that there are no particles with a second CaM molecule. It may be that there are in fact twelve distinct classes with varying positions and numbers of CaM. Determining the exact true number of classes (or clusters) in a dataset is generally impossible with existing methods. You must bring in orthogonal data (for instance, mass spec data indicating that each channel has one or no CaM bound) and your own intuition when deciding whether the results of a classification make sense, or whether you ought to tweak some parameters.
Input: Particles (all classes)
All particles from the six-class 3D Classification
Input: Static mask
Focus mask (yellow mesh in Figure 17)
We want to cluster the classes based on the position of CaM. We therefore use the focus mask, which excludes the rest of the channel.
Number of super classes
4
As with the number of classes in a 3D Classification job, there is not one right number of super classes. However, based on an inspection of the 3D Classes, you can generally decide on a reasonable range of values to try for this parameter.
The superclasses (Figure 32) look similar to those of the original 3D Classification (Figure 20). From this point, the same process (aligning the classes and locally refining them) could be followed to produce a final map.
In this case study, we investigated two different means of properly aligning TRPV5 particles which have a C4 pseudosymmetry.
In the first, we relied on using a standard refinement to properly align particles to a single reference. The reference begins as a nearly-C4 symmetric map. The symmetry is not perfect because we do not impose symmetry, so slight fluctuations in the map due to noise exist. Importantly, these asymmetries will make some CaM positions (ideally one CaM position) stronger than the others. We next refine these particles against the map. As the iterations proceed, particles align such that their CaM molecule is in the slightly-stronger orientation (encouraged by symmetry relaxation). This proceeds until the map contains, ideally, one CaM molecule.
In the second strategy, we first classify the particles based on the orientation of CaM in the consensus refinement. The quality of this classification is sensitive to various parameters, especially the mask design, filter resolution, hard classification, and number of classes. Once the CaM orientations have been separated, they can be aligned to a single common orientation and locally refined to produce the final C1 symmetric map.
CaM is a relatively large symmetry-breaking feature. Other targets with smaller symmetry-breaking features will likely require very different parameter settings and mask design. This case study is meant to be a starting point for your own work on pseudosymmetric targets or 3D Classification, and to guide you in your thinking about which parameters to change and how to change them.
Dang, S. et al. Structural insight into TRPV5 channel function and modulation. Proceedings of the National Academy of Sciences 116, 8869–8878 (2019).
This study assumes you have the ability to view 3D Volumes. CryoSPARC has a built in but we recommend downloading and installing , as we refer to this program throughout the case study. ChimeraX is a powerful 3D visualization tool which can display and modify atomic models and cryo-EM maps (from CryoSPARC and elsewhere), prepare publication-quality images, and many other features. In this tutorial, we use ChimeraX and not Chimera (without the X), which is an older version that is no longer under active development and is .
This study also assumes passing familiarity with viewing 3D Volumes in your rendering software of choice. Throughout, terms like “contour up” and “contour down” are used to refer to viewing the volume with a higher or lower isosurface, respectively. The process of making masks is also not covered in detail here — a walkthrough for mask creation using ChimeraX is available .
First, we create an job. Checking the EMPIAR entry, we see that the metadata for particles used in the final reconstruction is in particles_cs5040.star
. We can provide this file as the Particle meta path
for the Import Particles job. We’ll provide the path to the Micrographs
directory (where the particle images themselves are) as the Particle data path
to overwrite whatever the path in the .star
file is.
To assess the alignment of the imported particles, we can create a 3D map using job. Homogeneous Reconstruction Only uses particles’ existing poses to produce a 3D volume.
If the authors had deposited movies or micrographs instead of refined particle stacks, we could have followed a preprocessing and particle picking/curation workflow, such as , to yield a particle stack similar to the one shown here.
We could try re-aligning particles to this map with no symmetry enforced. At each iteration, particles may be slightly more likely to align with their CaM in the position with the most density. This would increase the density of that position in the next iteration, which may pull more CaMs into that position, etc. To test this, we can simply run a job with C1 symmetry.
In a refinement with symmetry relaxation turned on, we additionally search all symmetry-related poses to see if they are a better match than the current best pose. In the C4 case, we check the four poses related by a 0, 90, 180, and 270° rotation about the Z axis. We pick the best of these poses (for each particle) for the next iteration. In this way, we produce a map which is still C1 symmetric, but for which we have explicitly found the best of the symmetry-related poses for each particle. More information about symmetry relaxation is available in the .
In any case, we now have a C1-symmetric map with only 1 CaM molecule. However, the map is very low-resolution because we limited the maximum alignment resolution. We can now produce a high-resolution map using . We do not want to use a global refinement (like Non-Uniform Refinement) because some particles may return to an incorrect CaM position if we allow poses to be re-aligned from scratch.
To demonstrate this workflow, we return to the originally imported particles. Our imported particles are already well-aligned to the C4 symmetric channel, though they may have CaM in the wrong position. For the classification workflow, we will not re-align the the particles. Instead, we classify them based on their CaM position. We should therefore use rather than Heterogeneous Refinement, which would also perform alignments while classifying.
With the classes separated in this way, we can use to put all four volumes into register. This job compares all of the input volumes to a reference volume and puts them in register (Figure 21). It will apply the same transformation to each volume’s associated particles. The output particles will therefore also be in the same alignment, even though we have not yet aligned the input particles themselves to a C1 reference.
Now that all of the particles have CaM in the same position, we can simply perform a to produce a high-resolution map, as before.
This local refinement (Figure 22) reached a GSFSC resolution of 3.13 Å and has clear density for only one CaM. The CaM molecule itself has clearly visible side chains. Compare this map with a similar Local Refinement of the initial, C4 symmetric map with a nominally better GSFSC resolution of 3.02. Note that both of these maps have been sharpened with a B-factor of -100 using , unlike the other maps shown in this case study so far.
Of course, you will not know ahead of time what the map looks like when it is completely classified and aligned. Over time, you will develop intuition regarding what resolutions work well for different features of your target, but we recommend some general resolution ranges to try for various features in the and tooltip for this parameter.
For these reasons, it is sometimes useful to run a job with hard classification on, after 3D classification. This will produce volumes filtered based on the FSC, and optionally with hard classification turned on (regardless of the setting during the 3D Classification job).
With only six classes, it was relatively straightforward to identify which classes should be combined for subsequent analysis. We identified four superclasses (groups of classes which we consider the same) based on the dominant CaM position. However, as class number increases, it can become infeasible to manually annotate superclasses. We can instead create a job to automatically cluster input classes into a user-specified number of superclasses.