Tutorial: Helical Processing using EMPIAR-10031 (MAVS)
Case study on using helical processing tools.
This page will focus on the application of the various tools for helical processing, to the EMPIAR-10031 dataset (Mitochondrial antiviral signalling (MAVS) filaments). Here, we will cover the workflow from particle picking to reconstruction and refinement. Prior to following this case study, it is recommended to read the page detailing helical symmetry in CryoSPARC, for more information on how helical symmetry is treated during reconstruction. If you are new to CryoSPARC, it is also strongly recommended to complete the T20S Proteasome tutorial before following along with this case study.
Before we pick particles, we must import the raw movies, motion correct the movies, and perform CTF estimation. This can be done using any of the motion correction and CTF estimation jobs within CryoSPARC. For this dataset, we use the following preprocessing steps:
- Patch Motion Correction (multi) with maximum alignment resolution of 3 Å
For this case study, we also use the Manually Curate Exposures job to remove exposures with CTF fit scores under 6 Å. After these steps have been done, we are ready to move on to particle picking. Below is the workflow tree up until this point.
Preprocessing workflow tree.
There are three main methods in CryoSPARC for particle picking on helical datasets. One can choose to first manual pick a subset of micrographs, and generate templates from 2D classification. Using templates, one can do either template-based filament tracing, or standard template picking. Alternatively, one can avoid manual picking by launching the filament tracer without providing templates, and enable template-free tracing by setting the minimum and maximum filament diameter parameters. Finally, one can also use any of the deep picker jobs available, including Topaz. For more details on these picking options, please refer to the linked job pages. Note that if you use pickers other than the filament tracer or template picker, you may need to set certain additional parameters such as the number of times to apply helical symmetry. This detailed in the helical refinement job page.
Relative to the template or blob pickers, the advantages of the filament tracer are that it allows the specification of a fixed inter-box distance, the detection of individual filaments, and the rejection of filaments that are too highly bent. The disadvantages are that the filament tracer assumes that all filaments are roughly cylindrical (at least at low resolution), and depends on two extra hysteresis thresholding parameters.
For this case study, we will manually pick a subset of micrographs, generate templates, and then use the template based filament tracer. To start, launch a Manual Picker job and manually pick around 150 - 200 particles from at least 10 different micrographs, with varying defoci. Manual picking for filaments works the same as for globular proteins, and you can pick overlapping segments of the filament. Try to avoid areas with crowding, intersection points, and picks near the edge of the micrograph.At this stage, the distance between particle picks does not matter, as we will later specify a constant inter-box distance between picks during filament tracing.
Example of manual picks on one micrograph.
Once the manual picking is complete, launch a 2D classification job and connect the manually picked particles to the 2D classification job. Set:
Number of 2D classesto 5, and
Force Max over poses/shiftsto true.
We don't need many classes as all views are very similar, and the micrographs are too noisy to produce high resolution templates. After the 2D classification is complete, launch a Select 2D job to select all of the good classes that show a clear bright filament against a dark background, with as little noise as possible. Here, we selected three of the five classes to serve as templates for filament tracing.
Selected classes for initial template-based filament tracing.
Next, we build a Filament Tracer (BETA) job. To this job, we connect the templates from the previous Select 2D job, and the micrographs from the Manually Curate Exposure job. The filament tracer needs two parameters to be set:
Filament diameter (A): The estimated diameter of the filament, in Angstroms
Separation distance between segments (diameters): The distance between adjacent picks along a filament, in terms of multiples of the filament diameter
We choose a diameter of 90 Å, and a separation distance of 0.25 (corresponding to 90/4 = 22.5 Angstroms). Note that there are also various advanced parameters in the filament tracer, giving finer control over the ridge detection filter and the thresholding parameters. For more information and tips on adjusting these parameters, refer to the Filament Tracer (BETA) job page. For this dataset, we can leave all parameters as defaults, and launch the job.
Once the filament tracer is complete, we should inspect filament picks to remove contaminant picks and highly bent filaments. Often, removing highly bent filaments can make a significant difference in the reconstruction quality. In addition to the normalized cross-correlation (NCC) score and the local power, the filament tracer tracks two dataset fields that measure how bent a filament is:
curvature: This is the estimated curvature (1/Å) of the filament at the pick location. Local curvature is useful to prune out the most bent locations along a filament, without removing all picks from that filament
sinuosity: This is the ratio between the actual filament contour length, and the straight line start-to-end distance. Filament sinuosity is useful to remove entire filaments that may correspond to contaminants or aggregated filaments
Launch an Inspect Picks job, and connect the outputs of the filament tracer to it. In the Inspect Picks job, you will see sliders for NCC score and local power (as usual), in addition to the local curvature and filament sinuosity sliders and their accompanying histograms. In addition to the curvature and sinuosity thresholds, adjusting the power threshold is often very useful to filter out contaminants, or ice/carbon picks. For this dataset, we stringently remove a large portion of curved picks, as well as picks with too high power score. The specific thresholds we changed are listed below, however with different templates, ideal values will likely differ.
Local Power: under 1296550
Curvature: under 0.0004
Sinuosity: under 1.08
Screenshot of the inspect picks interface with the additional filament sliders.
Now, we can click "Done Picking! Output Locations" to complete the job. The particle picking workflow up until this point is shown in the tree below.
Workflow tree from exposure curation to filament tracing and inspect picks.
The final step we need to complete is extraction. Build an Extract From Micrographs job, and connect the outputs of the Inspect Picks job to it. Here, we use a box size of 300. Once the job is complete, we can move onto 2D classification.
2D classification remains largely the same for helical proteins as with globular proteins. The main change in 2D classification is the vertical alignment of filament classes, which helps when visually comparing classes to each other during Select 2D. Note that vertical alignment is not enabled by default, and must be activated by turning on the
Align filament classes verticallyparameter in any 2D classification job.
For this dataset, we run a 2D classification from defaults and adjust the following parameters:
Number of 2D classes: 100
Align filament classes vertically: True
Remove duplicate particles: False
Batchsize per class: 400 (since these filaments are rather small)
Remove duplicate particlesin 2D Classification is activated by default for globular proteins. Since particle picks for filaments are often intentionally very dense, this parameter should generally be deactivated for processing of filaments. Helical Refinement can account for the proximity of adjacent picks by constructing gold-standard splits that prevents overlapping particles from being randomized to different half-sets.
Poor classes are those that include filament crossings, breaks, or end points, or otherwise have little or no high resolution detail. Good classes are those are straight and have high resolution detail, all the way out to the box edges (near where the filament touches the circular window). With this dataset, we obtained 21 good classes and 79 poor classes, with a total yield of 88,033 particles. A subset of the good and poor classes are shown in the image below.
A subset of 5 good classes (top) and 5 poor classes (bottom).
It should be noted that using these high quality 2D classes, the results from filament tracing can often by significantly improved by feeding the good classes back into the filament tracer, and running inspect picks, extract from micrographs, and 2D classification one more time. In particular, low SNR micrographs and small filaments benefit the most from the use of higher quality templates during picking.
From here, we can either refine the model with imposed symmetry, or we can attempt to reconstruct without imposing any symmetry. Note that many helical datasets are not amenable to reconstruction without symmetry estimates, and it's currently an open research problem to characterize the symmetry of helical assemblies without prior knowledge. Thus, if symmetry estimates are already present for your dataset, you can skip down to the symmetric helical refinement section. On the other hand, if you do not have initial symmetry estimates for your dataset, you can attempt a similar approach to the one presented in the asymmetric helical refinement and symmetry search sections.
Next, we will reconstruct the filament from the particles without applying symmetry. To do this, build a Helical Refinement (BETA) job, and connect the input particles from the select 2D job. The helical refinement job will use the standard maximum likelihood optimization done during any standard refinement, and it will not apply helical symmetry unless you provide initial estimates for the helical twist and rise.
Note: In some datasets, prior estimates of the helical symmetry parameters are necessary to obtain a correct structure. These can often be informed from prior similar structures, or Fourier-Bessel indexing of high quality 2D class averages. In these challenging datasets, asymmetric refinements (or refinements with incorrect symmetry imposed) may result in incorrect structures, and thus the outputs from a symmetry search utility job will not be useful.
The tools presented here do not circumvent this issue for all datasets. Regardless, we hope that these tools can be useful for many datasets in the exploratory phase of data processing, as well as the high-resolution refinement stage.
The helical refinement job also doesn't require an input volume. If a volume isn't provided, it will generate an initial density using the
filamentin-plane rotation estimates, which are written to either during filament tracing or 2D classification. Note that this generates a "cylindrical" density from the input particles – for filaments with highly oblong cross sections (e.g. amyloid filaments), Ab-Initio Reconstruction may produce a better initial model. We have generally seen that for filaments that are approximately cylindrical, and have constant diameter, directly running a helical refinement from the particles is more successful than running an ab-initio reconstruction job. Conversely, for filaments that are distinctly not cylindrical, ab-initio reconstruction can take advantage of the diversity of views along the helical axis, and often results in a better initial model.
For this dataset, we leave all parameters as default and we run the asymmetric helical refinement. Below shows the slice plots from the final iteration.
Slice plots from the final iteration of asymmetric helical refinement.
Next, we can use the symmetry search utility (BETA) job to take a look at the symmetry that is present in the reconstruction. Before running this job, ensure you have read the page for more information on how helical symmetry is treated during reconstruction, as well as the job information for the symmetry search utility.
Since the reconstruction was done without applying symmetry, analyzing how well the volume fits different symmetry parameters can give us an "unbiased" look into the symmetry present in the dataset. To start, build a symmetry search utility job, and connect the volume and mask from the asymmetric helical refinement. We also must tell the job over what parameter ranges it should search. We can either define a 2D grid of rise/twist values, or a 2D grid of pitch/number-of-subunit values. Here, we choose the latter option, with pitch values between 5 Å and 50 Å, and number of subunits per turn ranging from 3 and 8. In many cases, the best search ranges can be obtained by manually inspecting the asymmetric reconstruction in UCSF Chimera, and looking for clear signs of helical symmetry.
Once complete, this job will output various plots and tables that display the mean squared error (MSE) associated with all of the different symmetry parameters that were tested. By default, it will search over both right and left handed helical symmetry parameters, where checkpoint 1 shows MSE values over right handed parameters, and checkpoint 2 shows MSE values left handed parameters. For each hand searched over, the job will produce:
- 2D plots of the error surface, with the global optimum highlighted by intersecting horizontal/vertical gray dashed lines;
- 1D plots of the error, evaluated along the horizontal and vertical dashed lines;
- A table with the first 20 local minima in the MSE surface, listed in order of increasing error
Below is the 2D error surface plots for this asymmetric helical refinement, one for left-handed parameters and the other for right-handed parameters.
2D error surface plots for asymmetric refinement of the MAVS dataset.
If the symmetry is indeed discernible from the input volume, the global minima of MSE values should provide the best estimate of symmetry parameters. In some cases, the correct symmetry parameters may only be local optima of the MSE surface. For this reason, the job prints out a table of local minima, which are also produced as outputs of the job in the
For the MAVS dataset, asymmetric refinements tend to give accurate estimates for initial symmetry parameters. Across both plots, the global minima is at a MSE value of ~1217 with right handed symmetry parameters printed below.
Showing the 20 best local minima.
No. | p (A) | n | dz (A) | dphi (deg) | mse
00 | 018.059 | 003.549 | 005.088 | +101.436 | 1216.663
When estimates for symmetry parameters are known, it is always recommended to run a helical refinement with the estimated twist (º) and rise (Å) as parameters to the job. This is because enforcing symmetry has two important roles:
- boosting the effective amount of signal in the dataset, and
- imposing a strong structural constraint on the reconstruction
The first point is important, as it can increase the resolution of the reconstruction. For the MAVS dataset, since we already picked particles with fairly small inter-box distance, each image is used up to 4 times by default. For helices with particularly small rises, such as the Tobacco Mosaic Virus (EMPIAR-10022) with helical rise of ~1.41 Å, each image can be used 20 or 30 times in reconstruction, providing a dramatic increase in resolution. The second point is especially important for helical reconstruction, as it is often necessary to prevent the refinement from falling into a "local maxima" in the likelihood landscape.
We now build a helical refinement job. From the symmetry search utility, we plug in a twist of +101.436º, and a rise of 5.088 Å, as input to the initial symmetry parameters. We also connect the same set of particles into the refinement, and the volume from the asymmetric refinement. By default, optimization of the symmetry parameters will begin when the GSFSC cutoff resolution exceeds 5 Å, however, this threshold can be changed in the Helical Symmetry Search parameter section. Here, we leave it as default, and queue the job.
When the refinement is complete, we should always download and inspect the map to ensure that secondary structure is apparent. If you have experience interpreting cryo-EM density maps, you may notice that the alpha helices present in the refined map are left handed! Generally, left handed alpha helices are unexpected, and indicates that the handedness of our overall reconstruction is inverted. This can happen because the handedness of a cryo-EM map is ambiguous – particles can equally well reconstruct a density map as they can with its mirror image. In helical reconstruction, this ambiguity is linked to the value of the helical twist. Specifically, inverting the sign of the helical twist is equivalent to flipping the hand of the reconstruction.
Left handed alpha helices.
To correct this, we can build a Volume Tools job, connect the output of the helical refinement to the volume tools job, and set the Flip Hand parameter as true.
In order to obtain a final reconstruction with the correct hand, as well as a final set of aligned particles, we may want to run one last helical refinement with the corrected hand. To do this, we can take the output volume from the volume tools job, and connect it as an input to one final helical refinement. Similarly, connect the particles from the original select 2D job to the helical refinement. Be sure to input the symmetry parameters with the same rise, but inverted twist (-101.436º). Finally, we can optionally activate the Non-Uniform refinement switch, which will invoke an adaptive regularization process that often leads to higher map quality. Our final helical refinement (with non-uniform refinement enabled) reached a resolution of 3.6 Å. Below is the output sharpened and symmetrized map,
The workflow, from particle extraction to symmetric refinement, is shown in the tree below.
Workflow tree from particle extraction and 2D classification to symmetric refinement.