Case Study: Yeast U4/U6.U5 tri-snRNP
Processing EMPIAR-10073 with a focus on Local Refinement.
Last updated
Processing EMPIAR-10073 with a focus on Local Refinement.
Last updated
Overview
In this tutorial we will work step-by-step through an ideal use of Local Refinement. Although we will explain the motivation behind our choice of jobs and parameter settings, the main Local Refinement guide page is an excellent resource for explanations of the theoretical and practical meanings of the parameters.
The tri-snRNP complex is a core component of the spliceosome. It comprises four main domains: the body, head, arm, and foot. These domains are arranged in a triskelion-like shape, with the head, foot, and body radiating from the center and the arm extending past the distal end of the body.
For this tutorial we will use the clean particle set from EMPIAR-10073. This dataset was originally collected and processed by Nguyen et al.
Note that at each step your results may not look exactly like those in the guide due to randomness inherent in the alignment algorithms. As long as your map looks similar overall, and you see similar increases in quality in the focused regions, you are on the right track.
Before beginning this tutorial, you should create a new project and a workspace within that project. Download the particle stack to a location of your choosing. Our data is downloaded to a directory called rawdata in the project directory using the command:
Next, download the STAR file. This file has starting poses for the particle images, skipping the initial volume generation steps.
Finally, import the data using an Import Particles job. The Particle meta path
should match the location of the STAR file, and the Particle data path
should be the directory containing the downloaded .mrcs
files.
To ensure that your particles were loaded correctly, plug the Imported particles
output into the Particle stacks
input of a Homogeneous Reconstruction Only job. This will use the poses from the STAR file and the imported particle images to build a 3D map, without performing any alignment.
At a low contour, all four domains are visible. However, at a higher contour the arm and head disappear entirely, and the quality of the foot also degrades.
In cases like this, where different regions of the target have dramatically different resolutions, Non-Uniform Refinement often performs exceptionally well compared to traditional Homogeneous Refinements. The poses in the STAR file were generated with homogeneous refinement, so the map may improve simply by performing a Non-Uniform Refinement instead. Plug the Homogeneous Reconstruction Only job’s particles and volume into a Non-Uniform Refinement job as inputs. Leave the mask blank to generate a dynamic mask. Leave all settings as default and launch the job.
Non-Uniform Refinement outperforms Homogeneous Refinement in cases like this because, in each iteration, the map is filtered based on its local quality rather than the global quality. For more information on this algorithm and how it is implemented in CryoSPARC, see the Non-Uniform Refinement page.
The Non-Uniform Refinement significantly improved the map, both as assessed by GSFSC resolution (4.17 Å → 3.55 Å) and by visual inspection, especially in the foot and body regions. However, the head and arm are still not visible at medium or high contours.
To summarize:
We can successfully align the particle to a consensus reconstruction of with a nominal resolution of 3.5 Å
We can resolve the body and foot domains at a very low resolution, so we know they are present in the particles
When we increase the contour, the head and arm disappear, indicating they are poorly aligned
The reason the head is blurred is that the particles can only align to either the head or the body or the foot — there is no pose which will perfectly align every domain of the particle. Since the body is the largest domain, that region of the particle is preferentially aligned. Relative to the body, the foot moves the least, the arm the most, and the head somewhere in between. This is why the three domains are blurred, and why we can see more of the foot than the head or arm.
Local refinement solves this problem by creating a mask around a sub-volume of choice (for instance, the head). Using this mask eliminates the rest of the volume. When the search volume only contains the head, an image’s assigned pose will only improve when the head is well aligned. In other words, aligning the larger body/foot region will result in a poorer score.
In a global alignment, it’s possible that the head would be too small to align on its own, or the masked head-only volume might incorrectly align to the foot at low resolutions. Local refinement solves this problem by incorporating pre-existing knowledge about these particles. We know the approximate pose of the head in all of our images. We use Local Refinement to fine-tune it, while not allowing the head to move so far that it aligns to the wrong domain or to background noise.
Let’s proceed to a local refinement of the head domain. The first step in doing so is creating the mask we will use to select only that sub-volume.
Mask generation is a complicated skill that is essential for cryoEM image processing. For more information and guidance about making and using masks, see the Mask Creation guide page.
To generate our mask around the head, first load the Non-Uniform Refinement result volume into ChimeraX. To smooth the map and attenuate high-frequency noise in the head, apply a Guassian filter to the map (replace #1 with the number for your input map):
Removing the high-frequency noise makes it much easier to see each domain and aids in building masks that are nice and smooth.
Next, we segment this map using Segger, which segments a volume using watershed segmentation. A GUI for the tool is opened with Tools > Volume Data > Segment Map. Click the “Shortcuts Options” dropdown to display buttons which run convenient commands. Select your Guassian-filtered map in the “Segment map” dropdown and click the “Segment” button, leaving all other settings as default. The results should look something like this:
Segger has split the map into (in this case) 61 regions. You can build your mask by selectively hiding these regions, leaving only the part of the map that is to be included in the mask.
Control-clicking a region selects it
Control-shift-clicking a region adds it to the current selection
Clicking “Hide” hides a region without deleting or un-selecting it
Clicking “Show” shows a region
Clicking “Delete” deletes a region
Clicking “Ungroup” splits a region into smaller subregions
Clicking “Group” combines two or more regions into one larger region
There is no undo feature in Segger! We highly recommend that you click “Hide” before deleting a region. If more is hidden than you expect, you can click “Show” and ungroup the region before trying again. If you go straight to “Delete”, you’ll have to start over!
To make a mask around the head, hide all of the regions corresponding to the foot, body, and arm. The final Segger model looks like this, with the map in displayed grey:
Next, the Segger model must be converted into an .mrc
file which CryoSPARC can read. To do this, first select the remaining regions by control-click and dragging over them. Then click File > Save selected regions to .mrc file… in the Segger panel. You can name this file anything you like. This saves an .mrc
map file with only the selected regions included. However, it is the wrong box size!
In this guide we call the volume we just generated a “mask base” because we will use it to create a mask, but it has not yet been dilated and padded, and so should not be used as a mask in any refinements.
To make sure the mask base is on the same box as the input map, it must be resampled. Luckily, ChimeraX has a function to do this. In the example command below, #4 should be your mask base .mrc volume and #1 should be your original, unblurred volume. Change the numbers as necessary to match your work.
Now your mask base and volume are in the same box size! To upload the mask base, it must first be saved to the local computer. In the command below, we save the mask (#5, adjust as necessary) to the desktop with a filename indicating
On what job is this mask based?
Project 300 (P300), job 3 (J3)
Which regions of the map are included in this mask base?
Head
Note that this naming convention is entirely optional, you can choose any name you like.
Regardless of the name, the mask base can now be uploaded to the compute system which runs CryoSPARC. In our case, mask bases are stored in separate directories per-target, but you can use any organizational scheme that helps you!
Back in the CryoSPARC UI, run an Import Volumes job to import the mask base you just uploaded. You can leave everything else as default, including that we are importing a map. Since the mask base has not yet been binarized and padded, we don’t want to accidentally use it where we need a mask!
Binarization is the process of converting a map (which has smoothly varying values ranging from, typically, 0.0 to 1.0) to a mask that has only 0.0 or 1.0. Padding is the process of adding a soft edge to the binary mask to reduce ringing artifacts.
The next step is dilating and padding the mask base to produce our final mask. Create a Volume Tools job and connect the imported volume as the Input Volume and change the following settings
These settings will binarize our mask so that everything we included during segmentation (which all has a value greater than 0.05) is set to 1.0 and everything else is 0.0.
Then, the mask will be expanded with 1.0 outward by 5 pixels (7 Å). This setting is the Dilation radius. We pad the mask to make sure that all of the information in the volume is covered by 1.0 once the alignment improves and the amount of the head we can see increases.
Finally, the mask is padded with a soft edge that is 17 pixels (23.8 Å) wide. This is the Soft padding width parameter. This is a bit wider than the minimum we recommend (in this case, 13 pixels), but it is generally better to start with too large of a soft edge and decrease it if alignments don’t improve.
It is absolutely critical that any mask which is used to cut through map density has a soft edge. See the Mask Creation page for more discussion of artifacts caused by masks with a hard edge.
Launch the Volume Tools job. It should run relatively quickly. Once it is complete, download the result and open it in the same ChimeraX window as your map. It should cover the head but not the rest of the tri-snRNP. As you contour the mask down, it should slowly expand away from your selection.
Before creating our first Local Refinement job, we will cover a few commonly-changed parameters. A full discussion of all the settings is available in the main job page.
One of the major differences between Local Refinement and other types of refinement is that Local Refinement uses our existing knowledge about the particle poses, rather than starting from scratch. During each iteration, the refinement algorithm checks what the particle’s pose currently is. Then it checks the poses within a certain distance from that starting pose to see which one matches the volume best. The distance the algorithm checks is the Search Extent.
For example, we know the head is not currently well aligned, but it is also not totally out of alignment. In other words, we expect that there is a moderate amount of rotation and movement, so the algorithm should check poses that are a moderate distance away from the current pose. If, however, we were aligning the body (which is already quite well-aligned), we could reduce the search extents significantly.
Counterintuitively, if you are working with a small flexible domain (such as the arm), you may want to reduce the search extents even if you believe the domain is quite flexible. When the algorithm is aligning a small domain it doesn’t have much information to work with. Only letting it move a small distance from the initial alignment prevents it moving these small domains far away from the main bulk of the protein due to nearby noise.
Consider the alignment algorithm again:
Mask out the subvolume
Search local translations and rotations for a better pose
Generate a new volume and repeat
It’s step 2 that’s important here: when the particle is rotated, what is it rotating around?
There’s no obvious best answer, so by default we rotate around the center of the mask. This works well when the mask covers a large proportion of the total volume.
The head domain does not cover a large proportion of the volume. In this case, it might be better to rotate the particles around the center of the box, since that’s closer to the real hinging motion we expect to see. Rotation around the box center tends to work better when the mask covers a moderately-sized proportion which rotates relative to the main volume.
But there’s no reason we have to pick one of these two points. Our intuition tells us that the head ought to rotate around some point on the head/body interface. In the next step you will pick a point on this interface and set that as the fulcrum.
Look back at your mask base in ChimeraX. To set the fulcrum, CryoSPARC needs the coordinates (in pixels, counting from the corner of the box) of the point we want to rotate about. So we need to determine the coordinates of a position on the surface of the interface, which is the edge of our mask base.
To get these coordinates in ChimeraX:
Navigate to the “Markers” ribbon menu.
Click “Surface” in the “Place markers” group.
Orient the mask so that you can see the surface you want to place the fulcrum on.
Right-click the surface to place a marker.
Read the coordinates in Å from the log.
Divide the coordinates by the pixel size (which is available with the command info #1
, replacing #1
with the correct number for your map) to get the pixel coordinates.
In this example, the marker was placed at (249.5, 250.1, 253.3) Å
, so the fulcrum position will be (178.2, 178.6, 180.9) px
.
It is finally time to build the first Local Refinement job! Create a Local Refinement and connect the particles and volume from the Non-Uniform Refinement to the correct slots. Then connect your mask to the static mask slot. Finally, set the fulcrum you calculated in the last step.
CryoSPARC expects the fulcrum in the form x,y,z, with no parentheses or spaces. For example, 178.2,178.6,180.9
Leave all the other parameters set to their default and launch the job!
Local Refinement takes about as long as other refinement jobs, depending on the search extents and the quality of the initial alignment. Once the first iteration finishes, you will have access to several diagnostic plots. These plots are useful in assessing job progress and ensuring that parameters were set as expected.
The first three plots that Local Refinement shows you are slices through the real space of your map, the Fourier space of your map, and the real space of your mask.
These plots largely exist to give you a sense of how the refinement is progressing without having to download the map at each stage of the refinement. One annotation to note is the pair of white dotted lines in the map and mask slices, one vertical and one horizontal. The intersection of these lines shows you the fulcrum point. Note that the fulcrum looks like it’s positioned at the head/body interface as expected!
Plots of the Gold Standard Fourier Shell Correlation (GSFSC) demonstrate the correlation between the two independent half maps and determine the resolution to which we can trust our maps. It’s not uncommon for the unmasked GSFSC curve to be relatively poor during a Local Refinement. The alignment ignores everything outside the mask, which means the score of a given pose is not affected by even significant mismatches outside the mask.
A Guinier plot visualizes the contribution of a given resolution shell to the final map. As discussed in Rosenthal and Henderson, this plot is used to determine the optimal sharpening factor (the B-factor). The “sharp” map output has this B-factor applied to it, but you can always generate maps with other sharpening factors using the Sharpening Tools job.
The noise model is an important component of any cryoEM image processing algorithm. Briefly, the noise model is used to modulate the penalties associated with poor correlation in a frequency shell by the expected quality of signal in that frequency shell. Put another way, if a particular frequency is very noisy, it should not surprise us when the 3D model does not agree well with the images in that frequency.
Noise models for cryoEM generally have a high peak at the low resolutions (left), rapidly drop in the moderate resolutions (middle), and steadily rise as the resolution increases.
The viewing direction and posterior precision distributions are used to determine whether a particle stack suffers from orientation bias. The direction distribution directly plots the number of particles with a given pose. The posterior precision distribution is a measure of how confident we are in the volume’s quality when viewed from each direction. As long as your lowest and highest values in this plot are within an order of magnitude, your dataset likely samples all orientations enough to avoid significant anisotropy.
These histograms display how much each particle moved during this iteration. In this first iteration, particles are moving a lot and (especially in the shifts), bumping up against our search extent. However, right now the map of this region is not very good (remember that the input map was lowpass filtered, so these particles are aligning to a 12 Å map). If we see large peaks at the edge of our search extents in late iterations, we will have to consider re-running the job with larger search extents.
Finally, per-particle scale is a way of accounting for the fact that different images will have different absolute contrast due to different ice thickness, defocus, etc. Generally one should only refine per-particle scale when looking at the entire volume, so we have not refined these values. Thus, all particles are still at 1.0.
These look good: they are both smooth, and there are no large spikes at the edge of our search parameters. There are some particles shifting all the way out to 10 pixels so future similar jobs may benefit from increasing the shift search extent, but this looks fine for now! The subvolume is indeed quite flexible, with thousands of particles rotating 20° or more!
Next, take a look at the GSFSC curve:
It is not surprising that the unmasked FSC curve is poor, since we’re only aligning a small subvolume. It’s good that all three curves are smooth and decrease all the way to zero. The mask in this case may have been a little tight — the Corrected curve does not closely track the Tight curve until higher resolutions. However, it does “catch up” eventually, so these results are likely still trustworthy.
When performing Local Refinements, pay attention to the Corrected GSFSC curve. If it does not closely track the Tight curve, your mask may be too tight. See the Mask Creation page for more information.
Finally, download the sharp map. This map has the sharpening factor from the Guinier plot automatically applied and will give you a good sense of the quality of the alignment.
This is a dramatic improvement in how much of the head is visible. Note, though, that the maps of the body and foot are much worse — when we align the head, the head/body flexibility causes these domains to blur out instead!
The GSFSC resolution (4 Å) is already slightly improved over that of the published result (4.2 Å) for the head. However, there is another step which may further improve the results.
Recall that Local Refinement aligns a masked volume to the full particle images. In this case, the masked volume is just the head, while the images contain all four domains. Many of the particle images have the body, foot, or arm directly above or below the head. When the electron beam passes through, these other domains “cast a shadow” on the image of the head domain. This can hurt the alignment, since the masked volume does not have information from these domains but the images do.
To fix this problem, we can first project the volume of the foot, body and arm (but not head) for each image, and then subtract this projection from each image. This leaves images containing only the information from the head domain, the same as our masked volume.
The efficacy of this technique depends on the quality of the subtracted domains’ alignment. Subtracting a blurry or flexible domain from images will leave shadows and other artifacts, which wouldn’t improve the results. Ultimately, whether Particle Subtraction helps or hurts with a particular dataset is empirical: you won’t know until you try!
For this job, we will make a mask using the same process as for the head, except including everything other than the head. Be sure you build this mask using the Non-Uniform Refinement, since we need the body and foot to be well-aligned!
If in the future you find that you are often performing both Particle Subtraction and Local Refinement, going through the process of map segmentation twice can be irritating. To avoid this, after saving your first mask (of the region you want to keep), you can delete the regions used to create the mask. Finally, show all regions to bring back the hidden regions. You can then save these regions to make your mask for Particle Subtraction without having to re-select anything.
More tips on mask generation can be found in our guide page.
Plug the particles and volume from the Non-Uniform Refinement into a Particle Subtraction job along with the mask you just made. All default parameters are fine, so go ahead and launch the job!
Once the job completes, we recommend performing a Homogeneous Reconstruction Only job to ensure that the results are as expected. This step is optional, as you won’t use the resulting map for anything, but it only takes a few minutes to run and can save you a lot of time if you catch a failed subtraction before running a whole refinement! The reconstructed map from the subtracted particles in this example looks like this:
Clearly, the body and foot have been subtracted successfully. At lower contours there is still some remaining signal from the arm. This isn’t entirely surprising, since the arm’s alignment was bad to begin with. In any case, the arm is small, so we have successfully subtracted most of the volume that lies outside our mask.
Use these subtracted particles in a Local Refinement to see if you can improve the resulting map. Clone your previous local refinement and replace the particles with the new, subtracted particle stack. You should also slightly increase the shift search extent to 14 A, since some particles were against the edge of the extent in the first refinement. Leave all other settings as you had them previously and launch the job.
With our masks, particle subtraction improved the GSFSC resolution by an additional 0.1 Å, which is within the realm of how much any two reconstructions might differ by chance. More importantly, signal subtraction kept the GSFSC curve higher in the middle resolutions, which has a significant impact on the overall quality of the map despite the similarity of the GSFSC resolution.
When directly comparing maps with and without particle subtraction, it appears that some regions benefit greatly from subtracting away the body and foot:
while other regions only benefit modestly:
In the end, like many other steps of a cryoEM workflow, the optimal combination of subtraction, masking, and parameters must be determined empirically for each dataset.
Local Refinement is an essential tool in the analysis of targets with rigid domains separated by a hinge. In each step of a Local Refinement, the optimal pose for a masked subvolume is found for a given set of particle images, which may or may not have signal from other regions of the particle subtracted.
This leaves three major domains of optimization for the user:
The mask
Search extent and other refinement parameters
Particle subtraction
These domains can be optimized together or independently, and often several iterations are necessary to achieve the best result.
Perform a local refinement of the foot domain. The published map for the foot domain alone is at 3.7 Å. We were able to improve the map of the foot all the way to 3.4 Å using the same techniques as for the head domain!
Make a series of masks using the same mask base but varying the dilation radius and soft padding width. Run Local Refinements with all settings the same, except using a different mask in each (you may want to use a subset of particles to speed things up).
Compare the results. Do you notice any trends? Which mask do you consider optimal for this refinement? Are the same settings best for other domains?
What is the smallest domain you can mask before the results become unreliable? Why do you think it’s harder to align smaller domains? Can you think of any settings you could change to improve the results?
Thi Hoang Duong Nguyen et al., “Cryo-EM Structure of the Yeast U4/U6.U5 Tri-snRNP at 3.7 Å Resolution,” Nature 530, no. 7590 (February 1, 2016): 298–302, https://doi.org/10.1038/nature16940.
Grigore Pintilie and Wah Chiu, “Comparison of Segger and Other Methods for Segmentation and Rigid-Body Docking of Molecular Components in Cryo-EM Density Maps.,” Biopolymers 97, no. 9 (September 2012): 742–60, https://doi.org/10.1002/bip.22074.
Peter B. Rosenthal and Richard Henderson, “Optimal Determination of Particle Orientation, Absolute Hand, and Contrast Loss in Single-Particle Electron Cryomicroscopy,” Journal of Molecular Biology 333, no. 4 (October 31, 2003): 721–45, https://doi.org/10.1016/j.jmb.2003.07.013.
Parameter | Value |
---|---|
For this mask you should not dilate the base at all, since you do not want to subtract surrounding noise from the particle images. A soft edge is still necessary. For this example, we used the recommended minimum soft padding of which works out to be 12 pixels.