Case Study: DkTx-bound TRPV1 (EMPIAR-10059)

Processing EMPIAR-10059 in CryoSPARC v4.6 with a focus on particle curation, developing an intuition for when a domain may be blurred due to flexibility, and how to handle this type of flexibility.

Case Study: DkTx-bound TRPV1

Processing EMPIAR-10059 in CryoSPARC v4.6 with a focus on particle curation, developing an intuition for when a domain may be blurred due to flexibility, and how to handle this type of flexibility.

Overview

This case study assumes familiarity with CryoSPARC’s UI and job workflow. If you haven’t processed data in CryoSPARC before, you may be more comfortable starting with the T20S Proteasome tutorial. This case study also assumes familiarity with foundational cryo-EM concepts like particle pose and alignment. We provide more detail on these foundational concepts in the first video of the 2024 Image Processing Workshop recordings.

This dataset was also processed as part of the S2C2 Single Particle Analysis workshop in 2024. A recording is available here.

This case study provides a basic introduction to the fundamentals of cryo-EM single particle analysis and the study of symmetric ion channels. We will cover:

How to pick particles using the blob and template pickers
How to curate particle picks using 2D and 3D methods
How to create a consensus refinement
How to perform a Local Refinement of a specific domain

We also provide self-guided opportunities to investigate alternatives to enforcing symmetry and handling symmetry mismatch.

Outline

This case study covers the following workflow:

Preprocessing - preparing the data for single particle analysis
- CTF Estimation
- Exposure Curation
Particle Picking and Extraction - generating initial particle pick locations
- Blob Picker
- Extract from Micrographs
Coarse Particle Curation - removing junk picks from the particle stack
- 2D Classification and Selection
- Iterative Ab-Initio Reconstruction and Heterogeneous Refinement
Template Picking - re-picking micrographs using templates which match the target
- Template Generation using 2D Classification
- Template Picking
- Extract from Micrographs
Fine Curation - producing a clean final particle stack
- 2D Classification and Selection
- Iterative Ab-Initio Reconstruction and Heterogeneous Refinement
Consensus Refinement - generating a map covering the whole target
Masked Refinement of the CTD - improving the quality of the CTD map
- Mask Creation
- Masked Global Refinement
- Local Refinement

Data Background

The data used in this case study is from EMPIAR 10059, originally processed by Gao and colleagues (2016).

In this study, the authors purified rat TRPV1 from mammalian cells. The purified channel was reconstituted into soybean lipid nanodiscs and combined with vanniloid agonist resiniferatoxin (RTX) and double-knot toxin (DkTx), derived from tarantulas. Both of these molecules together trap the channel in an open state.

TRPV1 is a homotetramer. The channel (4 copies of TRPV1) and two copies of DkTx together weigh approximately 309 kDa.

The authors aimed to use this data to investigate lipids’ involvement in channel gating and toxin binding. To do so, a high-resolution map of the transmembrane domain (TMD) and DkTx is necessary.

Before you begin

Downloading the Data

This data is available as a set of 1200 micrographs. Download them to a filesystem accessible to your CryoSPARC installation. For example:

cd /path/to/rawdata/EMPIAR
mkdir 10059
cd 10059
wget -m -nH --cut-dirs 4 \
    ftp://ftp.ebi.ac.uk/empiar/world_availability/10059/data/micrographs

While the data is downloading, you may want to create a new CryoSPARC Project and Workspace to contain the jobs for this case study.

Viewing and Preparing 3D Volumes

This study assumes you have the ability to view 3D Volumes. CryoSPARC has a built in Volume Viewer, but we recommend downloading and installing UCSF ChimeraX, as we refer to this program throughout the case study. ChimeraX is a powerful 3D visualization tool which can display and modify atomic models and cryo-EM maps (from CryoSPARC and elsewhere), prepare publication-quality images, and many other features. In this tutorial, we use ChimeraX and not Chimera (without the X), which is an older version that is no longer under active development and is no longer recommended by the developers.

This study also assumes passing familiarity with viewing 3D Volumes in your rendering software of choice. Throughout, terms like “contour up” and “contour down” are used to refer to viewing the volume with a higher or lower isosurface, respectively. The process of making masks is also not covered in detail here — a walkthrough for mask creation using ChimeraX is available elsewhere in the guide.

Data Import and Preprocessing

In these early steps, we are not yet working at the level of individual particles. Rather, with a typical new dataset, we would perform motion correction of movies to produce micrographs, then estimate the CTF of these micrographs.

This particular dataset was deposited as motion-corrected micrographs, so we can skip Patch Motion Correction. If we had movies, Patch Motion Correction would be run with default settings.

In this section, we walk through performing Patch CTF Estimation to account for defocus, and discarding micrographs with especially poor CTF fit, since the particles from those micrographs would probably not be useful anyway. Discarding poor data at this early stage makes subsequent steps (especially particle picking and extraction) faster and easier.

Throughout this case study, job inputs and parameters will be presented in tables, like below. Inputs are prefixed with Input:. Any other row is a parameter. Any parameters not listed are set to their default value. The “Notes and Rationale” column describes why a parameter value or input is used, and presents alternate values which may also work.

In cases like the Micrographs data path parameter, the Notes and Rationale column will also highlight when your value will likely be different based on the location of data files in your system.

Once the data has downloaded, you can import it using an Import Micrographs job. This job simply imports micrographs to the CryoSPARC workspace and makes them available for downstream processing.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Micrographs data path

path/to/rawdata/EMPIAR/10059/micrographs/*.mrc

Replace with the correct path to the data you downloaded in the previous section

Pixel size (A)

1.2156

From the EMPIAR entry

Accelerating Voltage (kV)

300

From the EMPIAR entry

Spherical Aberration (mm)

2.0

The data was collected on a TF30 Polara electron microscope, which has a nominal Cs of 2.0 mm.

Total exposure dose (e/A^2)

From the EMPIAR entry

Next, we estimate the Contrast Transfer Function (CTF). Because cryo-EM data is collected away from focus, we must estimate the CTF to computationally correct for its corruption of the particle images. For most datasets, this is a relatively straightforward process using a Patch CTF Estimation job.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Micrographs

Micrographs output from Import Micrographs

Number of GPUs to parallelize

Using more GPUs speeds up the job, but using more than 1 is not required.

Finally, we discard micrographs with poor CTF fit using a Manually Curate Exposures job. Although the CTF fit resolution at the micrograph stage is not an absolute limit on final 3D map resolution, it is generally a reliable indicator of overall data quality. Generally, we will discard micrographs with a CTF fit worse than 6 Å. In this case, that is only 4 micrographs.

Other parameters often used to filter micrographs are the Total full-frame motion distance (pixels), Total local motion distance (pixels), and Relative Ice Thickness, but good thresholds for these values are dataset-dependent and so need to be set interactively within the Manually Curate Exposures job.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Exposures

Micrographs output from Patch CTF Estimation

CTF fit resolution (Å)

0,6

Setting this parameter skips the interactive phase and removes micrographs with CTF fit worse than 6 Å. If you want to pick any threshold(s) interactively, all parameters should be left blank.

Particle Picking and Extraction

After preprocessing is complete, we need to identify positions in the micrograph which are likely to contain particles, then extract square images of those regions. These processes are called particle picking and particle extraction, respectively. Since we have not yet produced any reference images of TRPV1, we will start with blob picking.

Blob picking uses simple “blobs” (Gaussians, ellipses, or rings) to pick particles, rather than a template image which looks like the target molecule. For many datasets, it is best to start with blob picking and move on to template picking only once templates produced from the data itself are available.

We recommend that template images produced from molecular models, or from maps not derived from the same dataset in question, are only used when the sample is well-characterized and particles are clearly visible by eye in the micrographs. Even then, extreme care should be taken to avoid template bias.

How does particle picking work in CryoSPARC?

CryoSPARC has three native particle picking jobs: Blob Picker, Template Picker, and Filament Tracer. Filament Tracer is a distinct algorithm used for helical proteins, and is covered in more detail in the job page and in the Helical Processing case study. CryoSPARC also includes a wrapper for Topaz, a neural-network based trainable particle picker.

Blob Picker and Template Picker are both used for picking globular or membrane proteins, and operate on the same principle. Both pickers use a set of template images to score every region of the micrograph. Blob Picker’s template images are simple geometric shapes generated based on the parameters, while Template Picker’s template images are specific to the dataset, and must be provided as an input.

The cross-correlation between each of the templates and the micrograph is calculated. Any region which is a local cross-correlation maximum is considered a particle pick.

For new samples, it is generally best to start with Blob Picker to avoid template bias. After generating good 2D references from blob-picked particles, the micrographs can be re-picked using Template Picker. Note that this job produces particle locations — the particles have not yet been extracted from the micrographs.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Micrographs

Micrographs output from Manually Curate Exposures

Minimum particle diameter (A)

100

Blob picker, by default, generates three Gaussian blobs of varying width from the minimum to the maximum particle diameter. TRPV1 is approximately 120 Å wide and relatively spherical, so a range of 100 — 130 Å works well. Wider ranges would likely also produce adequate results.

Maximum particle diameter (A)

130

In Figure 2 we plot the positions of particle picks (white circles) on a region of a single micrograph. This micrograph has been lowpass filtered to make the TRPV1 particles more visible. Because simple Gaussian blobs are used by Blob Picker, the picks are not very well centered, and some junk picks remain. This is an unavoidable consequence of the fact that we are using a template (i.e. circular blob) which does not look much like our target. Extracted particle images will naturally center themselves during later alignment steps (like 2D Classification and 3D refinements).

On the other hand, we can remove some “bad” picks (on empty ice or contaminants) by eye using a few simple thresholds. Inspect Particle Picks allows you to set thresholds on the Power and Normalized Cross-Correlation (NCC) sores to remove obviously bad picks.

The Power score is a measure of the total contrast of the region surrounding the particle pick. Contaminants (like carbon edges or crystalline ice) often have very high power — higher than would be present in a good particle image. We therefore will want to remove picks with a power score above some threshold.

Empty ice, on the other hand, will have a power score slightly lower than a good particle image. We will therefore also want to reject particle picks below some second threshold. The exact values of these two thresholds will differ between different datasets.

The Normalized Cross-Correlation score measures how similar the particle pick is to the template used to pick it. For Blob Picker, the template images are simple Gaussian blobs, ellipses, or rings. It is therefore hard to predict what particle or contaminant NCC scores will be. However, empty ice has essentially no features, and so is always expected to have a low NCC score. We therefore want to remove particle picks with an NCC score below some threshold, while keeping everything above it.

A typical workflow for Inspect Particle picks (Video 1) involves:

Finding a micrograph with high-contrast (i.e., dark) feature that is not a particle. This may be the edge of a hole, crystalline ice, or some other contaminant.
Adjust the high-power slider downward. At first, only picks on the contaminant will disappear. Continue to decrease the high-power slider until particles start to disappear, then bring the slider back up until all particles are picked again. At this stage, it is better to keep some junk than to remove potentially good particles.
Repeat this process in reverse with the low-power slider. Find picks on empty ice and increase the low-power slider until particle start to disappear.
Finally, investigate the effect of the NCC slider. Depending on the topology of the target and the sizes of blob used during the Blob Picker job, the effectiveness of the NCC slider will vary. Generally, it is best to find a few particle picks and a few remaining non-particle picks and watch what happens when the NCC slider is changed.

We can apply this technique to our blob picked particles using an Inspect Particle Picks job.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Micrographs

Micrographs output from Blob Picker

Input: Particles

Particles output from Blob Picker

In this example, Blob Picker produced 1.1M particle picks. Curating the particle picks reduced this by almost half (down to 680k particles) before extraction. This equates to a 50% savings in disk space for the extracted particle images and, more importantly, means downstream classification steps will require fewer classes to adequately sample the variability in the particle stack.

Extract from Micrographs

In the early stages of processing, it is reasonable to expect that a large fraction of our particle stack is still junk. We therefore recommend extracting with the appropriate extraction box size and downsampling to a final box size such that the final pixel size is around 2 Å (making the Nyquist limit 4 Å). This level of downsampling significantly speeds the initial processing steps, and the clean particle stack can be re-extracted at a smaller pixel size for subsequent refinements.

To extract particles, we set up an Extract from Micrographs job:

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Micrographs

Micrographs output from Inspect Particle Picks

Input: Particles

Particles output from Inspect Particle Picks

Extraction box size (pix)

256

The micrographs have a pixel size of 1.22 Å/pix. Thus, a 256 pixel extraction box size equates to a physical size of 312 Å, which is at least twice as large as TRPV1.

Fourier-crop to box size (pix)

128

Downsampling to 128 pixels produces a final pixel size of 1.22 * 256 / 128 = 2.44 Å/pix, meaning the Nyquist limit will be approximately 5 Å. This is sufficient for course particle curation, but we will need to re-extract before refining to high resolution.

Save results in 16-bit floating point

True

This setting makes images take up half as much disk space for no significant loss in accuracy. We recommend it is always turned on. More information is available in the relevant guide page.

This job produces individual, extracted images of the particles picked during Inspect Particle Picks. This collection of images is typically called a “particle stack” (evoking a physical stack of square images). At this point, because we expect there to be many non-target images in the particle stack, this would often be called a “dirty” particle stack.

When CryoSPARC extracts particles, it excludes particles that are so close to the edge that the box would extend outside the micrograph. In this case, the input contained 678k particle picks, but there are only 578k extracted particle images due to this effect. The job event log details how many particles were not included.

Blob Picking Overview

Comparing a region of a representative micrograph, we can observe a few features of blob picking:

Picks are not well-centered. For TRPV1, picks tend to accumulate around the periphery of the protein, especially on top-down views, due to the increased density in those regions
We removed a good number of picks on empty ice using Inspect Particle Picks, but some picks which look like they may have been good were also lost. There are also still picks on regions of the micrograph which look like higher-contrast noise rather than identifiable TRPV1 particles, but it is difficult to tell by directly inspecting the micrograph.

Our current particle stack still has too many off-target images to be useful for high-resolution cryo-EM. We will therefore proceed to using 2D and 3D methods to remove junk images from the particle stack.

Coarse Particle Curation

In this section, we will remove junk picks from our particle stack. This process is called “particle curation” or “particle cleaning”, and produces a “clean particle stack”.

Note that in most cases, it is impossible to be sure that your particle stack has no off-target images, i.e., you can’t have an absolutely clean particle stack. However, the fewer junk picks in the stack, the better results will generally be. For most new samples, particle curation begins with 2D Classification.

2D Curation

2D Classification is the simplest method of particle curation, but it is also one of the fastest and does not require an input 3D model. This technique groups and averages particle images based on their orientation in the ice. Note that the alignments found here are not used in subsequent 3D steps. We will set up an initial 2D Classification job like so:

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particle stack extracted in the previous Extract Micrographs job.

Number of 2D classes

200

The number of classes is somewhat arbitrary. More classes will be able to capture more heterogeneity, but the job will take longer and may not have enough particles per class.

This job produces 200 2D classes and their class averages. This class average is what each particle is aligned to during the job, and is the result of rotating, shifting, and averaging all particles in that class. A particle belongs to a class if it has, essentially, greater correlation to that class’s average than any other.

The resulting class averages for this job are shown below. Note that class averages depend heavily on the number of particles, the cleanliness of the particle stack, and the number of classes, so your results may differ. If your classes all look identical (e.g., if they are all top views), you may have been too strict while manually curating the particle picks. Conversely, if there are too many junk classes (classes which are very noisy, or are unrecognizable blobs), you may not have manually filtered out enough particles, or you may just need more classes.

The first thirty of these classes (the top two rows) are of reasonably high quality. Features of the ion channel are easily distinguishable against the background, and there is no noise in the background. Classes 20 and 25 are blurry, which many mean that they contain some junk, or that that view is simply harder to align.

The classes after class 33 contain:

some good but blurry views (like classes 47 and 77)
broken particles (maybe class 71)
particles too close to other objects to be properly aligned (like class 135)
class averages that are likely mostly empty ice or contaminants (like class 81 and 193)

Note that particles are always placed in the class they match best. There is no concept of a “junk class” which catches “bad particles”. It is important, therefore, to provide enough classes such that there are many different types of junk class, so that particles which look like that specific type of junk are filtered out in subsequent steps.

2D classification is useful because it takes advantage of the improved signal-to-noise ratio of these class averages to let us filter particles by class. We do this using the Select 2D Classes job.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particles from the previous 2D Classification

Input: 2D Class Averages

2D Class averages from the previous 2D Classification

This job produces two particle stacks. The first comprises only particles belonging to classes we manually selected because the class average which looks like TRPV1. Of the starting 578k particles, 314k were in these selected classes.

The second comprises particles which belong to “junk” classes, which do not look like good TRPV1 classes.

Note that we reject many classes that likely contain many good particles, like excluded classes 3 and 25 (Figure 7). At this stage, our goal is to quickly arrive at class averages which are clean and high quality for template picking. As such, we can afford to throw away some good particles to more quickly arrive at a clean stack. Later, when we are creating our final particle stack, the priority will flip: we will only throw away the very worst classes.

Before proceeding to template picking we will perform one more step of particle curation in 3D rather than 2D. First, we generate starting volumes using a subset of the good particles from 2D Classification.

3D Curation

When we curated particles in 2D, we performed the following procedure:

Create class averages (usually between 100 and 200) and group particles into those classes
Select classes which look like the target
Discard particles which do not belong to the selected classes

We need hundreds of classes because a single 2D class might represent a distinct particle type or a distinct viewing direction. We therefore need enough classes to cover all of the viewing directions of our particle(s) and all the distinct types of junk in the particle stack. Moreover, rare views (which would need their own class) may be lumped into a junk class, which would create orientation bias in the final particle stack. Because of this, we recommend that as much particle curation is done in 3D as possible.

The 3D curation procedure is essentially the same as that in 2D:

Create class volumes (usually between 3 and 5) and group particles into those classes
Select volumes which look like the target
Discard particles which do not belong to the selected volumes
Repeat if necessary

The most obvious difference here is that we use far fewer classes than the 2D process, since each 3D class contains all of the viewing directions for that class. We therefore need only a class for each distinct type of junk in the particle stack.

We generate the class volumes using Ab-Initio Reconstruction. This job quickly generates initial volumes without any input references. However, it is not the preferred method for classifying particles. Heterogeneous Refinement is better at classifying particles, though it requires input references. Thus, we split the generation of the volumes and the classification of the particles into two separate steps.

Generate volumes using Ab-Initio Reconstruction
Classify particles into those volumes using Heterogeneous Refinement
Select volumes which look like the target
Discard particles which do not belong to the selected volumes
Repeat if necessary

Why choose Heterogeneous Refinement over Ab-Initio Reconstruction for classifying particles?

Ab-Initio Reconstruction uses small subsets of the total particle stack to rapidly update volumes to look more and more like the particles that make up their classes. This is one reason the job is so fast, and also very important for being able to produce reasonable initial volumes from scratch. To avoid overfitting and increase the speed of this process, Ab-Initio Reconstruction is limited to using low resolution information. This means that, although the job quickly produces useable volumes, its particle classification is noisy and uncertain.

Heterogeneous Refinement, on the other hand, uses more high resolution information to produce higher quality maps of each class. This comes at the cost of a slower job which requires input volumes, but produces more reliable particle classifications. Thus, when using 3D techniques to sort particles, Heterogeneous Refinement should be the job of choice.

We begin by creating volumes using Ab-Initio Reconstruction:

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particles output from the Select 2D Classes job

Number of Ab-Initio classes

We request more than one class because the input particle stack is still dirty. The exact number is not too important.

Num particles to use

100,000

(do not include a comma in the input)

We only use a subset of the particles to generate starting volumes faster. We will classify every particle using the next job, so we do not need to include them all here.

Ab-Initio Reconstruction produced four volumes, as we requested (Figure 8). The first volume looks the most like TRPV1, with a recognizable nanodisc, CTD, and TMD. Class 1 looks like it may be a mixture of some junk picks and some top-down views. We should keep an eye on this class after Heterogeneous Refinement to make sure it doesn’t look like it contains too many good particles. Classes 2 and 3 are clear junk classes, with no recognizable TRPV1 features.

Note that none of these volumes are “good” in the sense that you could interpret molecular features or use them to build a model. For example, when we view higher contour levels of the best volume (Figure 9) it rapidly breaks into disconnected blobs. The goal of Ab-Initio Reconstruction is to produce volumes from scratch that are good enough that other refinement jobs have something to start from.

The next step in the 3D curation pipeline is a re-classification using the volumes we just generated and all of the particles (instead of the subset used to make the volumes) using Heterogeneous Refinement.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particles output from the Select 2D Classes job

Initial Volumes

All four volumes from the Ab-Initio Reconstruction job

Force hard classification

We are using this job to classify particles, so we want volumes to remain very distinct. Since the volumes look so different from each other in this case, the job would still likely work correctly if this parameter was left off, but in general hard classification works best for junk/not-junk classification jobs.

Again, class 0 (blue, Figure 10) contains almost certainly contains all of the TRPV1 particles. Of the input 314k particles, 152k are in this class.

The other three classes are likely mostly junk, or particles which have degraded on the grids, since the volumes don't at first glance appear to resemble TRPV1. However, it is also almost certain that class 0 also contains junk and the other three contain some good TRPV1 particle images — as mentioned earlier, particle curation is never perfect.

If you’re not familiar with plots produced by CryoSPARC during refinements or any other job, many of them are documented in the Common CryoSPARC Plots page.

Just as it did in the Ab-Initio Reconstruction job, Class 1 looks like a distorted top-down view of TRPV1 (Figure 11). Inspecting the viewing angle distribution, we also see two peaks (Figure 12), indicating that most of the images in this class have been aligned to either top-down or bottom-up views:

It may be that some (or even most) of the particles in this class are good, and the over-represented top views have spilled over into this second class. The orientation distribution of class 0 (Figure 13 has coverage of the top and bottom views, so it’s alright to discard class 1 in this case. However, if class 0 was missing the top and bottom views, we would probably keep class 1 and hope that some of the particles in that class are in fact high quality.

In any case, class 0 is now clean enough that we can generate templates for template picking.

Template Picking

Template Generation

Recall that when we pick particles, we measure the correlation between positions on the micrograph and some search image. With blob picking, the search image is a simple Gaussian blob. Now that we have a clean particle stack, we can produce high-quality 2D class averages and use those images as the search images. Re-picking is not strictly necessary, but Template Picker generally picks fewer empty or junk picks than Blob Picker, and the particle picks are more centered.

To generate templates, we will run another 2D Classification job.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particles output from class 0 of the previous Heterogeneous Refinement

Number of 2D classes

100

As with the first 2D Classification, the number of classes is somewhat arbitrary. Many other values would work just as well.

Next, we use Select 2D Classes to choose the class averages which will become our templates. Note that we are using 2D Classification in a fundamentally different way here than we were when performing coarse particle curation. When curating particle cases, we plan to use the particles for further downstream analysis. We therefore want to keep as many particles as we can so that we can filter out good particles using 3D methods.

Here, on the other hand, we will not use the particles in our selected classes at all. We only care about the class average images themselves. We thus will focus on selecting only high-quality class averages with a variety of viewing directions. Also note that the class averages are rotated during template picking, so there is no need to pick classes which look the same aside from an in-plane rotation.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particles from the previous 2D Classification

Input: 2D Class Averages

2D Class Averages from the previous 2D Classification

Note that the majority of excluded classes are also good, in that they are most likely also composed of TRPV1 particles. For example, excluded classes 0, 1, 3, 4, and 7 are all very high-quality templates for a top view, but there are already several top view templates in the selected classes, so we did not include these additional top views for particle picking.

Template Picking

With our templates in hand, we can set up a Template Picker job. Like the Blob Picker, this job compares each position in the micrograph to our provided templates and calculates the Power score and Normalized Cross-Correlation score which we will later use to filter the particle picks.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Templates

“Templates selected” output from the previous Select 2D Classes job

Input: Micrographs

Micrographs output from the Manually Curate Exposures job

Particle diameter (A)

130

Measuring the good map from the Heterogeneous Refinement job gave an approximate diameter of 130 Å. Other values will likely work similarly well.

We used a particle diameter measured from class 0 of the Heterogeneous Refinement job. This parameter typically has a relatively wide range of tolerated values, so many ways of measuring it will produce acceptable results. A comparison of 130 Å with 100 Å and 180 Å (Figure 17) show that the particle picks are overall similar, with the smaller diameter having denser picks (meaning more particles may be picked multiple times) and a larger diameter having sparser picks (meaning the particles may be better-centered, but some particles may be missed).

We will take these picks directly to particle extraction rather than use Inspect Particle Picks as we did for the picks from Blob Picking.

Since the result of this particle extraction will be our final particle stack, we want to keep as many good particles as possible. Inspect Particle Picks, which could be used at this point, is a simple method for filtering, relying on inspection of individual micrographs to set thresholds. For this dataset, we will instead rely on 2D and 3D particle curation methods later in the workflow to filter this particle set. For the final stack, we would rather extract too many particle images than miss any good images due to poor selection of Power or NCC thresholds.

Note, too, that this dataset is quite dense. Because of the density of particles, we expect that most of the particle picks from the template picker are on TRPV1. If the particles had been more sparse, it may still have been worth performing Inspect Particle Picks at this stage, since a greater proportion of the picks would have been on contaminants like empty ice.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Micrographs

Micrographs output from the Template Picker job

Input: Particles

Particles output from the Template Picker job

Extraction box size (pix)

256

This is the same box size as was used for the blob picks.

Fourier-crop to box size (pix)

None

We are not downsampling these particles (i.e., this parameter is left blank). This means we are using the full-size images, with a pixel size of 1.22 Å, equating to a Nyquist resolution of 2.44 Å.

Save results in 16-bit floating point

True

With the parameters above, Template Picker picked 1.7 M particles, only 1.5M of which are extracted (similar to the particles lost during extraction of the Blob Picker particles, the rest were too close to the micrograph edge).

Fine Curation

2D Curation

2D Classification is a fast, but simplistic, means of separating junk particles from good particles. For two main reasons, we generally recommend that only one round of 2D Classification is performed when curating particles:

It is difficult to determine whether a 2D class average is junk/off-target/noise, or whether it is an unfamiliar view of the target.
Unlike 3D methods, 2D Classification uses distinct classes to represent both different types of particle and different viewing directions of the same particle. This means that rare viewing directions can be subsumed into junk classes and discarded, introducing orientation bias in the final particle stack.

The above reasons mean that it can often be best to skip curation using 2D classification entirely. However, 2D classification’s speed means that it is worthwhile to run to get a visual sense of the contents of a particle set. It also can be beneficial for removing the very worst particles when the dataset is large and contains junk. In such a case, we recommend that any class which could conceivably contain good particles is kept. In other words: when selecting 2D classes, focus on rejecting clearly bad classes, not on retaining good classes.

For this case study, we will perform a single round of 2D classification of the particle stacks before proceeding to Heterogeneous Refinement.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Extracted Template Picker particles (from the most recent Extract Micrographs job)

Number of 2D classes

200

Circular mask diameter (A)

200

In previous 2D Classifications, some of the class averages drift off-center. Making the mask slightly tighter during 2D Classification can help prevent this problem by excluding information from neighboring particles from the class average.

Circular mask diameter outer (A)

240

We will now select good classes for further processing. Recall that at this stage, our focus is on rejecting the very worst classes, not keeping only the best classes. Also, even with the mask diameters set a bit smaller, many of the classes are off-center. We retain these off-center classes when one or both of the particles in the image look like they may contain TRPV1.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particles from the 2D Classification

Input: 2D Class Averages

2D Class Averages from the 2D Classification

Note that we retained 653k of the 1.5M input particles — a similar percentage as we did during template generation despite including some classes which look quite bad (like class 52 or 60). Many of these classes are clearly off center, perhaps because our particle diameter for Template Picking was too small. This is a clear illustration of why it is important to retain all classes which could conceivably contain good particles. Once we are working in 3D we will be able to re-center these particles, so it’s good to keep them now even though the class averages aren’t centered.

We still reject the majority of the class averages. There are a few rejected classes which may contain a number of good particles (like rejected classes 0—6 in the top row), but in general, we try to only reject the very worst classes. Ultimately, this is a subjective process; as long as the general principle of “Try to keep everything which could conceivably contain good particles” is followed, the result should be satisfactory.

3D Curation

Next, we begin the same iterative process of 3D particle curation as we did for the blob picked particles. In this case study, we will perform two rounds of Heterogeneous Refinement, but the number of rounds/iterations generally depend on the cleanliness of the starting particle stack, which itself depends on the quality of the micrographs.

We already have a set of one good and three bad volumes, produced by the Heterogeneous Refinement of the blob-picked particles. We can use those volumes directly as the inputs for the first Heterogeneous Refinement.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particles output from Select 2D Classes

Input: Initial Volumes

All four volumes from the final Heterogeneous Refinement of the blob-picked particles

When using Heterogeneous Refinement to clean particles, we often turn on Force hard classification. This forces particles to assign all of their probability to the class they match best, rather than allowing them to spread their probability across multiple classes weighted by the image’s correlation with that class.

However, early on in particle curation, leaving Force hard classification off often produces volumes which look more like the target. This in turn may allow for the creation of a class which captures images of malformed particles. We therefore leave it off here, but turning it on produces similar results in this case.

As with the blob-picked particles Heterogeneous Refinement, volume 0 (blue) is obviously of the highest quality, while volume 1 (orange) appears similar to TRPV1 from certain viewing directions. We include class 1 in the next curation step, but discard the particles from volumes 2 and 3 (pink and green) which appear to be entirely junk. Class 0 contains 251k of the 653k input particles, while class 1 contains 160k particles.

At this point, we want to remove more junk particles from our combined particle stack of classes 0 and 1. However, if we re-use the same volumes, classification will not be very effective because particles which look like those junk volumes have already been filtered out of the dataset. We must therefore use Ab-Initio Reconstruction to generate new junk volumes, and use these new junk volumes in a subsequent Heterogeneous Refinement.

Recall that existing cryo-EM classification methods always put images in the class they look most like. There is no concept of a “junk class” which captures “bad images”. We therefore need volumes which look like specific off-target images in order to filter those images out.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particles class 0 and particles class 1 from the Heterogeneous Refinement job we just performed.

Number of Ab-initio classes

Three classes is arbitrary. We are trying to strike a balance between sampling the full variety of particle types in the particle stack and giving a small enough number of classes that different viewing directions of the same object do not start to separate out into separate classes.

Num particles to use

100 000

Just as before, we are only using Ab Initio Reconstruction to generate new volumes, not classify particles, so we do not need to give it every particle image. If this parameter was left blank, the final result would be similar, but the job would take much longer.

The resulting volumes (Figure 21) are similar to those we’ve seen before:

a volume which looks like TRPV1 (class 0, blue)
a volume which looks like mostly top-views (class 1, orange)
a junk volume (class 2, pink).

As mentioned previously, in some cases Ab-Initio Reconstruction can split out distinct viewing directions into their own classes, even when the particles are good. Moving forward, if the “good” class (class 0 in this job) seems to be missing top views, we may want to come back and take some of the particles out of the top-view “bad” class and return them to our particle stack.

With these new volumes in hand, we can perform a second round of Heterogeneous Refinement to classify all of the particles.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

All particles and Unused particles from the previous Ab-Initio Reconstruction

The Unused particles output contains, perhaps obviously, the particles which were not used during Ab initio. Older versions of CryoSPARC may not have this output; if so, you can connect the same input particles to this job as to the previous Ab-Initio Reconstruction

Input: Initial Volumes

All three volumes from the previous Ab Initio Reconstruction plus the good volume from the Heterogeneous Refinement performed at the end of the Blob Picked particles.

We include the good volume from a previous job to ensure that good particles are collected into a single class. Including two classes which look like TRPV1 might help separate low-quality images of the ion channel as well.

Force hard classification

True

At this stage, we start enforcing hard classification. We would typically expect this to make the job perform better at removing junk which looks like TRPV1 (for example, TRPV1 particles which have denatured) at the cost of potentially reducing the quality of the volumes.

Again, Heterogeneous Refinement produced one good volume (Figure 22, volume 3, green) and several junk volumes. The good volume has 241k of the input 412k particles. This iterative process of Ab-Initio Reconstruction followed by Heterogeneous Refinement can be iterated until most (typically ~90%) particles are in the “good” volume or until different views of the same particle are placed in separate classes.

For this case study, we consider the particle stack from volume 3 to be clean enough to make a consensus refinement.

Consensus Refinement

The consensus refinement is a general term for a global refinement using all of the particles in the stack. Consensus refinements are useful because they give all particles a pose relative to the same volume. Jobs like 3D Classification or 3D Variability Analysis take advantage of these shared references to perform classification or analyze flexibility.

Since TRPV1 is a membrane protein, this consensus refinement will use Non-Uniform Refinement, since that job tends to perform better on membrane proteins.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particles from class 3 of the last Heterogeneous Refinement

Input: Initial Volume

Volume from class 3 of the last Heterogeneous Refinement

Symmetry

To this point, we have not enforced symmetry on the volume, but it does appear to be C4 symmetric. It is generally best to wait until you are certain of the symmetry order to enforce symmetry.

Minimize over per-particle scale

True

Per-particle scale is a way of accounting for the varying contrast of individual particle images. In general, we recommend turning thus parameter on when refining the whole volume, and leaving it off when refining a smaller region of the map.

For all maps, but especially for consensus refinements, we recommend downloading maps and inspecting them using ChimeraX. Its ability to update the map as you adjust the threshold in real time is essential for developing intuition about various map pathologies and detecting regions of your map which may have flexibility or unresolved heterogeneity.

When you download the map, look at each of the regions highlighted in Figure 1. Look at several different contour levels of each region. Do the different regions behave differently as you adjust the contour? What does this tell you about how well particles are aligned to each region?

In addition to looking at the map itself, you should investigate several of the diagnostic plots produced by Non-Uniform Refinement.

The FSC plot (Figure 24) tells us this map reached a GSFSC resolution 2.69 Å. The Corrected and Tight curves stay separated by a gap for the whole range after the Corrected curve first separates from the Tight curve. This tells us that the auto-generated mask may be too tight — before performing more refinements we should make our own mask with additional padding. More information about this plot is available elsewhere in the guide.

CryoSPARC refinements also assess whether particles have orientation bias. The orientation distribution plot (Figure 25) shows the FSC of conical regions of Fourier space and calculates a score called the cFAR. Generally, a cFAR score better than 0.5 indicates that a map is isotropic — that is, it has information from all viewing directions. We can therefore assume that the top views filtered out during particle curation are not necessary for a high-quality map. cFAR (and other ways of assessing orientation bias in maps) is discussed in greater detail in the Orientation Diagnostics page.

As mentioned in the parameter table, during this job we refined the per-particle scales, which are a measure of how well the particle image matches the reference in its best pose. The per-particle scales show a bimodal distribution (Figure 26). Sometimes this indicates that there are still contaminating particles, but in some cases there are simply two populations of particle images with different mean scales (perhaps due to differing ice thickness).

The TMD has clearly-visible density for side chains (Figure 27), as expected for a 2.7 Å map. On the other hand, the C-terminal domain has relatively poor resolution. The blurry region of the CTD makes few contacts with the rest of the protein, so it would make sense that it is slightly flexible. When two domains in a particle are flexible, the images must align either one domain or the other to the reference; it cannot align both. This both blurs out the domain and weakens the map density in that region. You may have noticed that the CTD disappears at a lower contour than the rest of the map (Figure 28).

We can further improve the quality of the CTD map using a mask.

Masked Refinement of the CTD

Mask design and local refinement are important skills and concepts in cryo-EM image processing. They are discussed in broad strokes here, and are covered in-depth in a dedicated tutorial.

In our consensus map, all of the particles have aligned to the TMD. This produces a map with a high-quality TMD, but the CTD of each particle is now in a slightly different position due to flexibility. Thus, the map of the CTD is low quality. We can align the particles to the CTD instead using a mask.

Masks are simply a second volume with the same size as the map. The mask has values ranging from zero to one in each voxel. When we use a mask in a refinement, each voxel in the map is multiplied by the corresponding voxel in the map. This produces a masked volume, to which we align the particle images.

In this case, if we create a mask which has a value of 1 in the same voxels as the CTD and a value of 0 elsewhere, we create an alignment map that only has the CTD. When we align particles to this map, they will align to the CTD instead of the TMD, since the map no longer has any density in the TMD.

We first create a mask base by erasing the TMD and nanodisc using the volume eraser tool in ChimeraX. We upload this mask base to CryoSPARC and import it using an Import 3D Volumes job.

We can then add padding and a soft edge using Volume Tools. It is important to add a soft edge to masks, allowing them to smoothly transition from values of 1.0 inside the mask to 0.0 outside the mask. This is discussed more in the mask creation tutorial.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Volume

Uploaded mask base

Type of output volume

Mask

This sets the metadata such that you can plug the output volume into Mask slots, rather than Volume slots.

Threshold

0.164

Set this value to whatever the threshold was when you were using the Volume Eraser tool. It will depend on your map and will likely be different from this value.

Dilation radius (pix)

This is not strictly necessary, but in general it is good to include a bit of extra space around the volume.

Soft padding width (pix)

When creating a mask, you must include soft padding for good results. Reasons and recommended values are discussed in the mask creation tutorial page. The exact amount of padding necessary for good performance will depend on the job type and the target. This value is on the low end of what one might try — if the resulting map has large, blurry features right at the edge of the mask we likely need more soft padding.

Your mask may look different depending on your exact choices during creation, but in general, it should loosely follow the topology of the entire CTD.

Using this mask and the particles and volume from the consensus refinement, we can create a new Non-Uniform Refinement job. We’ll call this the masked CTD refinement.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particles from the consensus refinement

Input: Initial Volume

Volume from the consensus refinement.

Input: Static mask

A mask created in ChimeraX surrounding the CTD

See below for a visualization of the mask. Note that a soft edge of 12 pixels was added with Volume Tools.

Symmetry

It is possible that the four CTDs flex in asymmetric ways. Later we will discuss how this could be handled using other jobs, but here we continue to enforce C4 symmetry.

Minimize over per-particle scale

False (default)

When masking out a small region of the target, it is important not to minimize over per-particle scale. The measurement of per-particle scale can be thrown off by the large amount of density missing in the masked volume.

Initial lowpass resolution (A)

We use a less aggressive initial lowpass filter here because the default of 30 Å may blur out all features of the CTD which could be aligned.

Recall that the map is multiplied by the mask during alignment, and the mask covers only the CTD with values of 1.0. Thus, the volume that Non-Uniform Refinement aligns particles only has density in the CTD — the TMD is completely empty (Figure 31).

Figure 32 is the result of this job, in which we aligned the particles from the consensus refinement (which are not changed in any way by the mask) to this masked volume:

Both the CTD and TMD of this map are significantly worse than those of the input map. We expect the TMD alignments to degrade when we focus on the CTD (as discussed above, these domains are flexible and so can never both be well-aligned in the same map). But why did the CTD also get worse?

There two likely reasons for this degradation:

Although we masked out the TMD from our volume, the TMD is still present in the particle images. The CTD alignments are degraded by interfering signal from the TMD.
There is not enough signal in the CTD alone to properly align the particles.

Point 1 is an important consideration when performing local refinements, especially if the region outside the mask is much larger than the region inside the mask. Particle Subtraction is the main way of handling this issue, and is covered in the Yeast Spliceosome case study. However, the TMD is not too much larger than the CTD, so we do not expect it to be a significant effect here.

Point 2, on the other hand, is very important. The masked CTD refinement can only align the particles based on information from the CTD, and it discards the particles’ input poses from the consensus refinement. Thus, it must check every pose of every particle again, but this time using far less information. As we see above, this can lead to poor alignments. We can solve this problem by using a Local Refinement instead of a global refinement.

Local Refinement

Local Refinements start from a particle’s existing pose and searches nearby poses to find the local optimum. Thus, a global refinement must be performed first to establish the initial poses. The better the input poses, the better the result of local refinement generally is. Here, we will run a Local Refinement with the same particles, input map, and mask as the masked CTD refinement. The only difference is that this job starts from the upstream Non-Uniform Refinement’s poses.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particles

Particles from the consensus refinement

Input: Initial Volume

Volume from the consensus refinement

Input: Static mask

The same mask was used for the masked CTD refinement

Symmetry

Minimize over per-particle scale

False (default)

As mentioned above, we should not change per-particle scales when we are aligning only a subregion of the overall complex.

Initial lowpass resolution (A)

12 (default)

The initial lowpass resolution is the same as it was for the masked CTD refinement.

Compared to the consensus refinement (yellow), the CTD of the local refinement (green) is significantly improved. This improvement comes at the cost of the CTD, which is invisible in the map since particles are no longer aligned to that domain. This trade-off is quite striking when comparing the two maps side-by-side:

Using these two maps together allows for confident model building of much more of the channel than either map on their own.

Conclusion

In this case study, we processed micrographs of TRPV1 using the following steps:

Preprocessing
Blob Picking
Coarse Particle Curation
Template Picking
Fine Particle Curation
Global Refinement
Local Refinement of Flexible Domains

We consider this one example of a “standard workflow” for initial single-particle processing. There are, of course, many more steps that one could take to improve their particle stack, or other branches of investigation that may yield interesting results. This is not an exhaustive list of jobs that must be performed for each dataset — rather, these are simply the jobs that are typically performed at the beginning of any processing pipeline, and they are often performed in this order.

At several stages throughout this case study, we had to make decisions when we had no clear right answer to guide us. How many 2D or 3D classes should we request? What is the right particle diameter? What is a good starting filter resolution? How should we design the mask for Local Refinement? At each stage, we tried the value or option we thought would work best. If it failed, we closely investigated the map and other outputs from the job and thought about why it failed and how we might change parameters to improve performance. This ability to diagnose why a job failed is an important skill built through practice. To that end, we present some exercises below for further self-guided investigation of the TRPV1 dataset.

Exercises

CTD and C4 symmetry

As mentioned in the parameter table for the masked CTD refinement, the CTD of TRPV1 is almost certainly not C4 symmetric. When one subunit’s CTD bends in toward the symmetry axis, there’s no reason to expect the subunit on the other side of the channel will bend in the same way (indeed, one might expect the opposite).

How would you produce a map of TRPV1’s CTD without imposing C4 symmetry while still taking advantage of the four-fold increase in SNR from the fact that we have four copies of the same domain? Do you think this strategy would improve the TMD portion of the map as well?

Hint

If you had four copies of the particle, each rotated by 90°, you could align each monomer independently in a Local Refinement. Is there a job that produces these image copies? Once you have the copies, how many subunits should your Local Refinement mask cover?

We produced the map below. Note that the TMD and CTD are both visible, but only for a single subunit! Why do you think that is? Do you think we used the same strategy as you came up with? Why or why not? Try your strategy and see if the results match!

How we produced the map above

First, we performed a C4 Symmetry Expansion on the consensus refinement particles. This creates a particle stack in which each image is replicated four times, each one rotated 90 degrees.

Next, we created a mask around a single subunit of TRPV1. Because there are really four copies of the channel, all four subunits of each particle are in this mask, but they can be aligned independently!

We finally performed a local refinement using this mask and similar settings to the local CTD refinement. Because the particles could align each subunit independently (rather than forcing them to be symmetric), the map accounts for asymmetry in the particles and produces a high-quality (3.0 Å) map of the entire subunit — but only one subunit! The others are misaligned, because they are outside the mask. Remember, though, that information from all four subunits is used to make this map of a single subunit due to symmetry expansion.

Double-Knot Toxin Linkers (advanced)

Recall that this dataset uses two copies of the double-knot toxin (DkTx) to hold the channel in an open state. If we take a closer look at the C4 symmetric consensus refinement map, we can see the knots bound to the extracellular side of the TMD (Figure 36).

As expected, four knots are visible: two toxins with two knots each. However, at lower contour, we see four linkers when there should only be two (Figure 37).

When we imposed C4 symmetry on the channel, the signal from each of the two linkers was spread out across all four symmetry-related positions. Thus, even though there are only two linkers on each particle, we see four in the map. How would you produce a C2 symmetric map in which the particle has only two DkTx linkers like the one in Figure 38?

Hint

The particles are already quite well-aligned. All you need to do is rotate a subset of particles by 90° about the Z axis and they will all be properly aligned. Is there a job that can classify particles based on the orientation of the linkers using the existing poses you’ve found so far? Are there specific parameters you can set to make this job more sensitive to the changes you’re looking for?

How we made this map

First, we performed a 3D Classification with 2 classes and C1 symmetry. We turned on force hard classification for this job to ensure the linkers were properly separated and set the filter resolution to 10 Å. This job produced two maps: one still had four linkers, but the other had 2 linkers, as expected! You may have to adjust some parameters to get this result — perhaps increasing the O-EM batch size and learning rate init, or maybe increasing the number of classes.

Once we had one class with only two linkers, we used Volume Alignment Tools to rotate this volume by 90° about the Z axis. This produces a second map with the linkers in the other position. We provided both of these two-linker maps to another 3D Classification job with Initialization mode set to “input”. This classified the particles into the two orientations.

We used another Volume Alignment Tools job to rotate one set of particles by 90°, followed by a final Local Refinement (with C1 symmetry) to align all of the particles (which now have the linkers in the same place) to the same reference.

References

Gao, Y., Cao, E., Julius, D. & Cheng, Y. TRPV1 structures in nanodiscs reveal mechanisms of ligand and lipid action. Nature 534, 347–351 (2016).

PreviousCase study: End-to-end processing of a ligand-bound GPCR (EMPIAR-10853)NextCase Study: Pseudosymmetry in TRPV5 and Calmodulin (EMPIAR-10256)

Last updated 6 months ago