Case Study: Discrete and Continuous Heterogeneity in FaNaC1 (EMPIAR-11631 and -11632)

Processing a combined dataset of an ion channel with a focus on techniques for separating conformational states from a heterogeneous mixture.

Overview

This case study assumes familiarity with CryoSPARC’s UI and job workflow. If you haven’t processed data in CryoSPARC before, you may be more comfortable starting with the T20S Proteasome tutorial or the TRPV1 case study. This case study also assumes familiarity with foundational cryo-EM concepts like particle pose and alignment. We provide more detail on these foundational concepts in the first video of the 2024 Image Processing Workshop recordings.

This dataset was also processed as part of the S2C2 Single Particle Analysis workshop in 2024. A recording is available here.

This case study provides an overview of the various techniques available to process data with distinct populations of particles mixed together. In this case, the two populations are ligand-bound and apo ion channels, but similar mixtures could exist if, e.g., a protein occupies two or more conformations. These datasets present exciting challenges, since the changes represent both a continuous deformation and a discrete state change.

Outline

Here, we work through the following steps:

3D Classification
- Classification from a consensus refinement
- Re-refinement of each class to update particle poses
Heterogeneous Refinement
- As a classification tool
- As a particle curation tool
3D Variability Analysis
- 3DVA theory and parameter choices
- The 3DVA display modes
- 3DVA as a classification tool
3D Flexible Refinement
- 3D Flex theory
- Important choices when preparing 3D Flex data
- Iterative optimization of 3D Flex Training parameters
- Ensuring modeled movements are physically plausible

Data Background

In this case study, we combine two datasets from the same paper: EMPIAR 11631 (apo state) and 11632 (FMRFa-bound). These datasets were collected by the same authors on the same microscope. We believe this is a useful case because we can quantitatively assess the quality of classification — we know the true state of each particle based on which dataset they come from.

The data used in this case study is from EMPIAR 11631 and 11632, originally processed by Kalienkova and colleagues. These two datasets together comprise 9,840 movies of the FMRFa-gated Sodium Channel (FaNaC1) from Malacoceros fuliginosus. FMRFa is a triheteromeric sodium-selective ion channel and is a member of the broader ENaC/DEG family of ion channels (Figure 1). ENaC/DEG channels are expressed in a variety of tissues in humans, including neurons, the kidney, and muscles.

Kalienkova and colleagues aimed to understand the conformational changes associated with ligand binding, and so collected two datasets. The first, EMPIAR 11631, is in the apo state — there is no ligand bound to this channel. The second, EMPIAR 11632, is bound to FMRFa and so is expected to be in the open state.

Before you begin

Downloading and Importing the Data

First, download the datasets from EMPIAR (11631 and 11632). For example:

cd /path/to/rawdata/EMPIAR
mkdir 11631
cd 11631
wget -m -nH --cut-dirs 4 \
    ftp://ftp.ebi.ac.uk/empiar/world_availability/11631/data/
mkdir ../11632 && cd ../11632
wget -m -nH --cut-dirs 4 \
    ftp://ftp.ebi.ac.uk/empiar/world_availability/11632/data/

EMPIAR 11631 contains three subsets, with a total count of 5,361 movies between them. Each of these three subsets has their own gain reference, named {date}_gainreference.MRC, where {date} is replaced with the date on which the dataset was collected. EMPIAR 11632 has 4,478 movies in two subsets, each of which also had their own gain reference with the same naming convention.

We can import each subset using a separate Import Movies job (i.e., creating 5 Import Movies jobs in total):

Throughout this case study, job inputs and parameters will be presented in tables, like below. Inputs are prefixed with Input:. Any other row is a parameter. Any parameters not listed are set to their default value. The “Notes and Rationale” column describes why a parameter value or input is used, and presents alternate values which may also work.

In cases like the Movies data path parameter, the Notes and Rationale column will also highlight when your value will likely be different based on the location of data files in your system.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Movies data path

/path/to/rawdata/EMPIAR/{number}/{collection}/*.tif

Replace with the correct path to a single dataset

Gain reference path

path/to/rawdata/EMPIAR/{number}/{collection}/gainreference.MRC

Replace with the path to the same dataset’s gain reference.

Flip gain ref & defect file in Y?

True

These datasets’ gain references are flipped in Y relative to the movies.

Raw pixel size

1.022

From the EMPIAR entry and paper

Accelerating Voltage (kV)

200

These data were collected on a Talos Arctica.

Spherical Aberration (mm)

2.7

The Talos Arctica has a spherical aberration (Cs) of 2.7 mm

Total exposure dose (e/A^2)

{Differs per dataset}

Each dataset has its own total dose, noted in the EMPIAR entry.

Viewing and Preparing 3D Volumes

This study assumes you have the ability to view 3D Volumes. CryoSPARC has a built in Volume Viewer, but we recommend downloading and installing UCSF ChimeraX, as we refer to this program throughout the case study. ChimeraX is a powerful 3D visualization tool which can display and modify atomic models and cryo-EM maps (from CryoSPARC and elsewhere), prepare publication-quality images, and many other features. In this tutorial, we use ChimeraX and not Chimera (without the X), which is an older version that is no longer under active development and is no longer recommended by the developers.

This study also assumes passing familiarity with viewing 3D Volumes in your rendering software of choice. Throughout, terms like “contour up” and “contour down” are used to refer to viewing the volume with a higher or lower isosurface, respectively. The process of making masks is also not covered in detail here — a walkthrough for mask creation using ChimeraX is available elsewhere in the guide.

Preprocessing and Standard Workflow

Once the datasets have been imported, they are ready for pre-processing, particle picking, and particle curation. For this case study, note that you can (and should) process all of the movies as if they come from a single data collection. For the sake of brevity, these common steps are listed but not reproduced in detail here. If you are not familiar with a standard cryo-EM single particle analysis pipeline, it is covered in detail in the TRPV1 case study and the accompanying S2C2 workshop recording.

What is a standard workflow?

A standard workflow is a set of steps commonly applied in every SPA data collection. While not a hard-and-fast rule, a standard workflow generally includes:

Patch Motion Correction
Patch CTF Estimation
Curate Exposures (removing micrographs with poor CTF fit)
Micrograph Denoiser
Blob Picker
2D Classification
3D particle curation
Template picker
3D particle curation

From this point forward, we will process these datasets as though they come from a single dataset and as though we do not know that both apo and FMRFa-bound particles exist. At the end of the case study, we will use the ground-truth knowledge of which particles came from which dataset to assess how effective our classification jobs were.

Although we recommend pre-processing and particle picking on your own, if you are already familiar with this workflow or want to compare your results to ours, you can import our particle picks from this STAR file.

Our combined, consensus refinement (Figure 2) is a Non-Uniform Refinement of all input particles with the automatically generated mask. It contains 1.3M particles and refined to 2.73 Å. Depending on decisions you made during particle curation, your final numbers may differ from these. The original authors report final particle counts several times greater than this!

This map has features we’d expect to see at this resolution, with clear density for ɑ-helices and β-strands as well as good density for some side chains. Additionally, clear density is visible for the ligand FMRFa (Figure 3).

However, upon closer inspection, we see that parts of the map are better than others. For example, compare the side chain density for residues in the palm and residues in the thumb (Figure 4).

Although both regions have sufficient density to build a model, the palm is significantly stronger and follows the full contour of even long residues like arginine. Similar to the thumb, the transmembrane domain (TMD) is fragmented and at a lower resolution than the core of the protein. We can confirm these observations with a local resolution analysis (Figure 5).

The thumb and TMD have a lower local resolution than the rest of the ion channel. In some ways, this is not surprising. Both of these domains are at the periphery of the protein, and may be somewhat flexible, which would result in a lower resolution. However, we know from biochemistry and related ion channels that coordinated movement of the thumb and TMD occurs during ligand binding. Perhaps some subset of the particles that produced this FMRFa-bound map are in fact missing the ligand.

3D Classification

To attempt separating the apo particles from the ligand-bound particles, we will first use 3D Classification. 3D Classification classifies particles using their existing poses (i.e., it does not perform any alignments to the new volumes generated for each class). We opt to use 3D Classification instead of Heterogeneous Refinement at this point because we already have high-quality pose estimates for these particles, as evidenced by the high resolution of the consensus refinement.

3D Classification requires two masks: a solvent mask, which covers the entire protein, and a focus mask, which covers the region we expect to differ between classes. For the solvent mask we can use the same mask used during refinement the consensus Non-Uniform Refinement. For the focus mask, we want to focus on the extracellular domain (ECD) (where FMRFa binds), so we will design a mask in ChimeraX which covers the ECD but excludes the TMD (Figure 6). Even though the TMD conformation is also coupled to ligand binding, the majority of the changes occur in the ECD, so this mask should still allow for successfully classifying the particles. Excluding the TMD has the benefit of ignoring disorder in the nanodisc region, which may hamper classification.

We created a mask around the ECD only by opening the consensus refinement in ChimeraX, selecting the Volume Eraser and centering the sphere over the ECD, then clicking “Erase outside sphere”. Once you have created your mask, first upload it to CryoSPARC and import it with an Import 3D Volumes job:

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Volume data path

The path to your uploaded volume

Then, run a Volume Tools job with the following parameters:

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Input volume

Imported volume from previous job

Type of output volume

mask

Threshold

0.05

The exact choice of threshold here depends on your consensus refinement and how you created the mask base.

Soft padding width (pix)

When your mask “cuts through” density, it is very important to give it a soft edge. More information is available in the mask creation guide page.

Finally, create a 3D Classification job with the following inputs and parameters:

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particle stacks

Particles from the consensus refinement

Input: Solvent mask

A mask covering the entire protein

Input: Focus mask

A mask covering the ECD only (shown in Figure 6)

Number of classes

Although there may be less-populated classes with different states, we want to focus solely on apo vs. FMRFa-bound particles.

Filter resolution (Å)

The exact choice of filter resolution here can be important, but a range of similar values would likely produce similar results. The main point is that we lowpass filter enough that the major difference between particles will be presence/absence of FMRFa, rather than other features we are less interested in at this time.

Symmetry

Although there may be particles with sub-stoichiometric binding or other symmetry-breaking features, for now we impose symmetry to improve overall ligand detection.

Init structure lowpass resolution (Å)

The exact choice of resolution is not critical. Generally, we set this parameter to a few Å coarser than the filter resolution.

Force hard classification

True

Since most of the channel will look very similar in apo vs. FMRFa-bound states, we force hard classification here to avoid two identical classes. If the filter resolution value was lower (i.e., higher resolution) we may have left this false, as a high filter resolution and hard classification together can sometimes put all particles into a single class.

The results of this 3D Classification job are run through a Heterogeneous Reconstruct Only job and then shown in Figure 7. Class 0 looks similar to the consensus volume, while Class 1 has a blurry thumb domain which has been shifted down and away from the finger. Class 1 therefore likely contains the apo particles we were searching for. However, closer inspection of the ligand-binding region shows that both maps still have density for FRMFa (Figure 8).

This is a surprising result! Biochemically, we know that FaNaC1 opens extremely rapidly upon FMRFa binding (see electrophysiology experiments in Kalienkova et al. 2024). We thus expect any channel with FRMFa present in the binding pocket to be in the open state, with the thumb nearer to the finger. However, Class 1 appears to have FMRFa in the binding pocket and be in the closed state.

Updating Poses

There are several possible explanations for this seeming contradiction, but there’s one more fact to take note of: Class 1 has blurry, smeared-out thumb and TMD domains. Again, there are a number of explanations for this phenomenon. It may be that apo particles are more flexible than the FMRFa-bound particles, which would blur out these domains. It’s also possible that the apo channels are sampling conformations similar to the ligand-bound conformation, even when FMRFa is not present. Furthermore, we imposed C3 symmetry on this job, so it may be that the weaker ligand and blurry thumb domain is a result of substoichiometric binding. All of these hypotheses are reasonable and worth testing. However, there is another consideration we should generally check first for results like these: particles in Class 1 (closed state) have poses that have only been refined against a map in the open state.

Recall that 3D Classification does not change particles’ poses — it only classifies them. The particles in class 1 have the same pose from the consensus refinement. The consensus map has density for FMRFa and is in the open conformation, so if Class 1 is mostly composed of particles without FMRFa, their poses will be poorly estimated and could be biased towards the consensus density. If this is the cause of the unexpected results from 3D Classification, we should be able to see a change in the map by re-refining the particles. We can create a Non-Uniform Refinement of each of the apo and FMRFa-bound particles.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particle stacks

Particles from 3D Classification

In this case, the apo job will take particles from Class 1 and the FMRFa job from Class 0.

Input: Initial volume

Volume from 3D Classification

In this case, the apo job will take the volume from Class 1 and the FMRFa job from Class 0.

Input: Static mask

Solvent mask from 3D Classification

Symmetry

Again, it is possible that the channels have substoichiometric binding and so are not C3 symmetric, but for now we will continue to impose symmetry.

Minimize over per-particle scale

True

Per-particle scale accounts for varying contrast among particle images. Typically this varying contrast is assumed to result from ice thickness effects. However, per-particle scale can also account for mismatch between the particle image and the volume. Since these particles were refined against a volume with FMRFa present, they may have a per-particle scale that is too low. Turning this parameter on will find new per-particle scales which reflect the apo map.

The FMRFa-bound map looks essentially the same before and after Non-Uniform Refinement. This is expected, since the poses from 3D Classification were from an FMRFa-bound volume. However, the apo class is significantly improved. Density for the thumb and TMD is sharper and less fragmented. Moreover, the FMRFa density has completely disappeared from the finger/thumb interface (Figure 10).

Note that the apo map of Figure 10 and Class 1 of the 3D Classification are made from the exact same particles. The only difference between the two is the poses of those particles. This is an important point — we originally chose 3D Classification because we thought we did not need to refine the poses. They were already good enough to give us a 2.7 Å map. However, those poses were only good for the FMRFa map.

3D Classification was ultimately successful at separating the particles. Class 0 contained FMRFa-bound particles and Class 1 contained apo particles. However, at first glance, the two maps looked somewhat similar due to this pose issue. Would the results have been more obviously correct if we’d used Heterogeneous Refinement, which allows poses to change during classification?

Heterogeneous Refinement

Heterogeneous Refinement simultaneously classifies particles and refines their poses. It is therefore possible that the classification would have been more obvious, if not more successful, using this job as opposed to 3D Classification. We will set up a Heterogeneous Refinement of the original consensus refinement particles to test this hypothesis.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particle stacks

Particles from the consensus refinement

Input: Initial volumes

Two copies of the consensus volume

When two identical volumes are given, Heterogeneous Refinement forces particles to assign randomly between the two, in the first few iterations. This is a useful way to use Heterogeneous Refinement to create two volumes which look similar to each other.

Symmetry

Force hard classification

True

Since the two volumes look somewhat similar, we will force hard classification to encourage the maps to be distinct.

Indeed, it is immediately obvious that these two maps are significantly different. The apo map shows the expected movement in the thumb and does not have any density for FMRFa, while the FMRFa-bound map has clear density from the ECD all the way through the TMD.

Particle Cleanup

The apo map from Heterogeneous Refinement looks much worse than the FMRFa-bound map. This may indicate that there are still low-quality particles which made it through our early particle curation steps. These particles may be reducing the quality of both maps. To check for bad particles, we can try another Heterogeneous Refinement. This time, we will refine each of the apo and FMRFa-bound particle sets against four copies their own map.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particle stacks

Particles from the apo class of the previous Heterogeneous Refinement

Input: Initial volumes

Four copies of the apo volume from Heterogeneous Refinement

We will use identical volumes here again, for the same reason we did in the previous Heterogeneous Refinement. Four copies is somewhat arbitrary — anywhere from two to six volumes would likely produce similar results. The point is to provide enough classes that bad particles separate from good ones, while also encouraging all of the good particles to stay in a single class.

Symmetry

Force hard classification

True

For the apo state, Class 2 looks significantly better than the others (Figure 12). Interestingly, only about half of the particles (303k of 551k) particles are in Class 2, meaning that the particle stack was still contaminated by a significant number of low-quality particles!

Cloning this job and replacing the particles and volumes with the FMRFa-bound class, we see the same pattern persists in the FMRFa set, with Class 0 being significantly better than the others (Figure 13). Only 430k of the 790k FMRFa-bound particles are in Class 0.

Refining these clean subsets into new maps improves both the apo and FMRFa-bound maps somewhat (Figure 14).

Removing bad particles should, in theory, improve particle maps. There are several possible explanations for why this did not happen in this case.

The relationship between particle count and resolution is decidedly non-linear. Put another way, going from 10k to 110k particles will result in a significantly greater resolution improvement than going from 900k to 1M particles. This relationship can be seen in ResLog plots, and is further explored by Rosenthal and Henderson (2003).

Another possibility is that the quality of the particles was already being handled by the refinement algorithm. Recall that during refinements, we have been turning on the Minimize over per-particle scale parameter. Most often, per-particle scale is thought of as accounting for varying ice thickness, but it may incorporate error from other sources as well. For more information, see the Per-Particle Scale section of the Subset Particles by Statistic page

Indeed, if we compare the distribution of the particles’ per-particle scale before and after filtering (Figure 15), we see that particles with higher per-particle-scale tended to be retained, and particles with low per-particle scale tended to be rejected.

Discrete Analysis Conclusion

Both of the techniques we have discussed so far treat the particle stack as a combination of particles which either do or do not have FMRFa bound. They differ in how the particles were classified and aligned.

In the first technique, we used 3D Classification to separate particles, in their existing poses, into apo- or ligand-bound classes. We then independently refined these particles to produce the associated maps. The final maps in this case were high quality, but the apo map produced initially by 3D Classification was poor due to the fact that the initial poses came from a ligand-bound map.

Starting from the consensus refinement, running the 3D Classification job, and then the two Non-Uniform Refinements (in parallel, each on a single GPU) took approximately 2 hours.

In the second technique, we used Heterogeneous Refinement to classify and align the particles at the same time. The intermediate maps produced by Heterogeneous Refinement did not have the blurring in the thumb that was present in the 3D Classification maps. Non-Uniform Refinement was then used to produce the highest quality map.

Starting from the consensus refinement, running the Heterogeneous Refinement followed again by two Non-Uniform Refinements also took approximately 2 hours, comparable to the 3D Classification technique.

Classification quality

This particle stack is composed of two separate datasets which we can assume are entirely apo or ligand-bound. We can therefore assess how successful each technique was at correctly identifying a particle as FMRFa-bound or apo by looking at the final classes and checking which dataset the particles came from (Figure 16).

Approximately 10% of the misclassified particles were misclassified by both job types, which may represent a relatively small population of particles that are inherently difficult to classify. Notably, a refinement of all of the FMRFa particles misclassified by either job (i.e., all of the particles in both of the pink hatched areas of Figure 16) produces a high-quality map (Figure 17). Put another way, there is nothing wrong with these particles per se which causes them to be misclassified. Instead, they are likely misclassified simply because it is difficult to achieve 100% accuracy when classifying low-contrast, noisy images. Given this inherent limitation, there will always be some degree of misclassification. By optimizing job parameters and careful inspection of the particles we aim to minimize, not eliminate, this problem.

Continuous Heterogeneity

In the processing of this dataset so far, we have treated each particle as belonging to either the FMRFa class or the apo class. Within each of these classes, the volume is treated as static — FMRFa is either bound or unbound; the channel is either open or closed, etc.

In reality, we know that proteins often sample a range of conformations; the same is likely true of the particle images. The broad categories of FMRFa-bound and apo still apply, but within and between these categories, individual particles may have slight differentiation in terms of the exact position of the thumb domain and TMD helices.

CryoSPARC provides two main means of analyzing data with continuous heterogeneity: 3D Variability Analysis and 3D Flexible Refinement. These techniques both attempt to infer information about the conformation of an individual particle image. Because of the low signal-to-noise ratio of individual images, each technique includes different constraints and regularization to avoid overfitting.

In the following sections, we walk step-by-step through the processes of running these jobs, as well as some theoretical background on each. However, interested readers should be sure to refer to the main guide pages for each job, where more information can be found. More discussion on the theoretical background of these techniques is also available in the recording of the presentation of this case study at the S2C2 image processing workshop.

Ligand binding is closer to a binary state than something like the movement of a flexible domain, which is clearly continuous. This is especially true in this case, where the dataset is composed of two discrete particle types combined. However, using a mixture where the ground-truth labels are known makes for a useful case study, since the results are easy to see and compare.

Results with your own continuous dataset will likely require adjustments to job parameters, and distributions of particles within each latent space will likely (but not certainly) be more continuous and smooth.

Aligning the Particles

Before we perform any continuous heterogeneity analysis, we must align particles so that they are “in register”. Typically, one would simply use the consensus refinement for this purpose. However, we performed some filtering during the Heterogeneous Refinement section above and it would be nice to use only the retained particles.

There are several ways to put the particles in register, including performing another consensus refinement, or using Particle Sets Tools to keep the filtered particles while using the pose information from the consensus refinement. However, we will instead use Align 3D Maps. This job aligns all input volumes to a reference volume, and can apply the same rotation and translation to particles as well.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Maps to align (individual volumes)

Both of the maps produced by the Non-Uniform Refinements performed at the end of the Heterogeneous Refinement section above.

We provide both maps here so that we can easily use the particles outputs from this job. This does mean that one of the maps will be aligned to itself, but that won’t cause any problems.

Input: Particles (map to align, connection 1)

Particles from whichever of the two Non-Uniform Refinements you connected first.

Be sure you connect the particles in the same order you connected the maps!

Input: Particles (map to align, connection 2)

Particles from whichever of the two Non-Uniform Refinements you connected second.

Update particle alignments

True

The particle input slots won’t appear until you turn this parameter on

The particles outputs from this job will have the same alignments as the maps, so they are in register with each other and ready for continuous heterogeneity analysis. This job took only two minutes to run, making it much faster than re-running a consensus refinement to produce similar results.

The particles output of this job provides the starting point for most of the conformational variability jobs from this point onward.

3D Variability Analysis

3D Variability Analysis (3DVA) uses a form of Principle Component Analysis to describe particle heterogeneity. In this technique, the variability of each particle image is modeled using a linear combination of difference components. Put another way, 3DVA generates two things:

Additional volumes (the difference components) which represent the major ways in which the map changes across the particle population. Each of these kinds of changes is known as a mode and is captured in one difference component per mode. The user sets the number of modes that 3DVA will solve.
A coordinate for each particle. A particle’s coordinate is a set of real numbers, one for each mode. The numbers represent how much each difference component should be scaled before being added to the consensus map in order to optimally model the particle.

More information on the underlying theory of 3DVA is available in the guide page.

3DVA Modes and Filter Resolution

The most important parameters to consider when beginning a 3DVA job are the Number of modes to solve and Filter resolution (A).

As discussed above, 3DVA models movement in terms of linear combinations of difference components. The Number of modes to solve parameter decides how many of these difference components the job should solve for. Generally, one can think of each component as a “type” of movement, or as a degree of freedom. Some concrete examples of components may be opening and closing of a ligand binding domain, or movement of the TMD in a particular direction, etc.

A 3DVA job run with more components may be able to solve for more types of motion at the same time. Typically, the modes solved by 3DVA will be output in order of how much of the variability in the particles that the modes explain; lower numbered modes explain more variability. Put another way: early modes will capture the largest motions and changes, and later modes will be noisier than early modes, so even if there are 10 types of movement among the particles, the later modes may be too noisy to model and interpret.

The Filter resolution (A) parameter sets the resolution to which the particles are lowpass filtered during 3DVA. The filter resolution serves two purposes: filtering noise and setting the scale of motion that 3DVA will model.

Filtering noise is perhaps the more obvious of the two. No particle image will be a perfect projection of the consensus volume; they all differ in some way for a variety of reasons. For a single particle, it is inherently not possible to tell whether a difference between an image and the volume is due to interesting changes in the particle’s conformation, or whether it is due to random noise. However, we do know that in general, higher frequencies tend to be noisier than low frequencies. We thus filter out high frequencies to reduce the contribution of uninteresting noise to the difference components produced by 3DVA.

The second purpose is more specific to 3DVA. Consider again how 3DVA models difference. We add the difference component, scaled by some particle-specific value, to the consensus volume to produce the particle-specific volume. The important point here is that the same difference volume is used for every particle.

Consider the simple case of a Gaussian peak moving back and forth (Figure 19).

Because the Gaussian is moving only a small fraction of its overall size, we can relatively accurately model this movement by adding or subtracting a certain amplitude of a single difference component. In this case, the difference component is positive on one side of the peak and negative on the other. However, consider what happens when the Gaussian moves a distance greater than its width (Figure 20).

The difference between the image and the consensus (shown in pink) now differs in both size and shape across the range of motion. Therefore, there is no single difference component that could be added or subtracted from the consensus to produce all of the shifted versions of the Gaussian. 3DVA will therefore not be able to model this motion with as much accuracy as the previous plot.

Instead, to model movement of this scale, we should first lowpass filter the images and consensus. This has the effect of blurring them. In this example, the blurring makes the same amount of movement equal to just a fraction of the peaks’ new, blurred size (Figure 21).

Now, a single difference component can model this amount of motion well. However, when the Gaussians are lowpass filtered, we lose the ability to accurately model the subtle movements from Figure 19. The fine details of the Gaussians and the motion have been lost due to the blurring effect (Figure 22).

Figure 22. The curves have the same colors and move the same distance as in 19.

Thus, setting the filter resolution parameter sets the scale of motion that 3DVA will model best. If large, sweeping movements of whole domains are expected, the filter resolution should be low even if the consensus map is very high resolution. Conversely, if fine movements of small regions of the target are expected, the filter resolution can be set closer to the GSFSC resolution (keeping in mind the other important purpose of the filter resolution in filtering out noise).

The idea of lowpass filtering your maps and particles to 10 Å (or more) may seem counterproductive — the ultimate goal is to understand the movement and interactions of domains with higher precision than that. Remember that the filter resolution is used during 3DVA to model these movements — we will later use jobs like 3D Variability Display and Homogeneous Reconstruction Only to reconstruct subsets of particles at their full resolution.

Setting Up a 3DVA Job

We will now set up a 3D Variability Analysis job to model the movement of the finger and thumb domain.

First, we must design a mask. 3DVA only models changes inside the mask; put another way, the difference components will be zero outside the mask. Therefore, the mask should cover not only the region of the consensus volume we expect to move, but also the region we expect it to move to. For this reason, it is generally good to use a mask with significant dilation and padding.

On the other hand, a mask covering the entire box would include too much solvent, and may produce difference components which model the movement and appearance of uninteresting contaminants rather than the target. This is especially important with membrane proteins like FaNaC1. Micelles and nanodiscs are highly dynamic structures, but their dynamics are typically not biologically relevant. Therefore, masks are often designed to exclude as much of the micelle and empty solvent as possible, while still covering the entire region in which the target is expected to lie.

For this example, we generated a mask using Volume Tools:

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Static mask

The refinement mask from the refinement used as the reference model in the Align 3D Maps job.

Type of input volume

mask

This parameter must be set to match the input used. If you used a map input, leave this as the default value of map.

Type of output volume

mask

This parameter sets which types of slots will accept the output. Since we want to use this result as a mask, it must be mask.

Threshold

0.99

Since masks range from 0 to 1, this results in a very slight padding of the input mask. We use a value of 0.99 rather than 1.0 to avoid issues resulting from floating point error or scaling artifacts. If you are using a map input, this threshold should be set depending on the values in the map.

Dilation radius (pix)

The exact value of this parameter is arbitrary. It should be large enough to ensure that the mask covers all potential movements of the finger and thumb domains.

Soft padding width (pix)

Again, this exact value is arbitrary, but it should be large enough to allow the difference components to model the movement finger and thumb.

The resulting mask is overlaid on the FMRFa-bound map at contour 0.75 in Figure 23. This mask is probably on the large side — a tighter mask would likely produce similar results.

Next, we must decide on a number of components. In general, it is best to start with a smaller number of components and increase them if an expected motion is missing, or if you see two motions coupled (that is, two things changing at once) when you expect them to move separately.

In this case, we certainly expect that the finger and thumb will move closer together and further apart. Theoretically, each FaNaC1 monomer could bind ligand independently; this would require three components just for ligand binding. We will start by assuming that the binding motion is concerted and only use one component. The movement of the finger and thumb is the only one we’re currently interested in, so we can create a 3DVA job requesting only the one component.

Now we must decide on the filter resolution. Since we already have reconstructions of the apo and FRMFa-bound maps, we can measure the distance the thumb moves between the two states using ChimeraX’s “Tape” tool.

Depending on the selected position, the movement ranges from 5.5 to 6.3 Å. Since we want to set the filter resolution to be on the same scale as the movement, any value near 6 Å should work well. For this first job, we will select a filter resolution of 4.

Typically you will not have access to clean maps of two conformations of a flexible molecule like we do here. In these cases, you can apply the same rules of thumb for 3DVA’s filter resolution as you would for the filter resolution in 3D Classification.

We now have all of the information needed to set up a 3DVA job.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particle stacks

Both of the particles outputs from the Align 3D Maps job

You could equivalently use particles from a consensus refinement, such as a Non-Uniform or Homogeneous Refinement with particles from both classes.

Input: Mask

The mask designed in the previous Volume Tools job

Number of modes to solve

Although we expect that there will be only one major motion (opening and closing of the channel as a result of FMRFa binding), it is often helpful to include at least one additional component to absorb noise (in this case, perhaps fluctuations of the micelle).

Filter resolution (A)

This is on a similar scale (slightly smaller in this case) as the expected motion.

Diagnostic Plots

The first diagnostic is the plot of the difference components themselves (Figure 25). These are plots of the difference components projected along each of the three cardinal directions (the X, Y, and Z axes). Recall that the difference components represent the difference between a particle with coordinate 1.0 in that component and the consensus map. In these projections, red represents adding density, and blue represents removing density.

We see that in early iterations, 3DVA has not yet found meaningful components and is merely modeling noise. By the time the job finishes, there is a clear pair of components. The adjacent red and blue in the thumb domain of component 0 (top) tells us that this component is moving the thumb domain back and forth, while the coloration of component 1 (bottom) is more difficult to interpret, but perhaps related to the micelle.

3DVA also plots the distribution of particles in each pair of components (in this case, only one plot). We see that in the first iteration particles seem to be normally distributed across both components. This is expected, because at this point the components are merely noise. However, in the final iteration, we see clear separation between two clusters of particles along component 0. We know this component is related to movement of the thumb, so this may be a separation of particles between apo and FRMFa-bound states!

3DVA Display

Although the diagnostic plots produced by 3DVA are useful while the job is running, it is difficult to know exactly what they mean for the maps themselves. To understand the meaning of the 3DVA components, we will use 3D Variability Display.

Simple Mode

The simplest way to visualize a 3DVA result is by inspecting what happens when we directly add the component volumes to the consensus, without performing any reconstruction steps.

Note that Simple Mode does not take into account how many particles exist at or near a certain coordinate. It is therefore not guaranteed that the volumes produced by simple mode actually exist in your data.

Here we will create a 3D Variability Display job in simple mode which will produce 30 volumes for each component. To do this, the job first calculates the range of values which cover a given proportion of the particles. By default, the coordinates will range from the 3rd to the 97th percentile.

Starting at the low limit, the difference component is scaled and added to the consensus volume to produce a single frame. This process is repeated for the requested number of steps, producing a final set of volumes which are available for download.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particle stacks

All particles output from 3D Variability

Input: Components

3D Variability volumes output from 3D Variability

Output mode

simple (default)

Number of frames/clusters

This parameter controls how many positions we create a map for along each component. More maps will produces a finer sampling of the component at the cost of a larger job and download size.

Downsample to box size

128

Since this 3DVA job was performed with a filter resolution of 8 Å, the high resolution information is uninteresting. Downsampling the volumes makes the download and animation processes faster. In this case, the final pixel size will be approximately 2 Å, yielding a Nyquist resolution of 4 Å, still well above the filter resolution.

Crop to sizer (after downsample)

100

Cropping the volume to remove the surrounding empty solvent further reduces the file size, speeding up the download and animation process.

When the job launches, it will first re-plot the relevant coordinates. Then, in simple mode, it will display the positions along each components for which maps are generated. For example, in this case, the log displays the following messages for component 0:

outputting component 0 frame 0 scale -32.24262
outputting component 0 frame 1 scale -30.094368
outputting component 0 frame 2 scale -27.946114
outputting component 0 frame 3 scale -25.797861
outputting component 0 frame 4 scale -23.649609
outputting component 0 frame 5 scale -21.501354
outputting component 0 frame 6 scale -19.353102
outputting component 0 frame 7 scale -17.20485
outputting component 0 frame 8 scale -15.056595
outputting component 0 frame 9 scale -12.908342
outputting component 0 frame 10 scale -10.760089
outputting component 0 frame 11 scale -8.6118355
outputting component 0 frame 12 scale -6.4635825
outputting component 0 frame 13 scale -4.3153296
outputting component 0 frame 14 scale -2.1670763
outputting component 0 frame 15 scale -0.01882326
outputting component 0 frame 16 scale 2.1294298
outputting component 0 frame 17 scale 4.277683
outputting component 0 frame 18 scale 6.425936
outputting component 0 frame 19 scale 8.574189
outputting component 0 frame 20 scale 10.722443
outputting component 0 frame 21 scale 12.870695
outputting component 0 frame 22 scale 15.018949
outputting component 0 frame 23 scale 17.167202
outputting component 0 frame 24 scale 19.315454
outputting component 0 frame 25 scale 21.463709
outputting component 0 frame 26 scale 23.611961
outputting component 0 frame 27 scale 25.760214
outputting component 0 frame 28 scale 27.908468
outputting component 0 frame 29 scale 30.05672

This means that the first volume of the series is made by multiplying the component 0 difference volume by -32.24262 and adding the result to the consensus volume. Similarly, the last volume in the series is made by multiplying the component 0 difference volume by 30.05672 and adding the result to the consensus volume.

To visualize the volume series, first download the series from the Volume series 0 output. This is a .zip file containing the 30 volumes generated for this component. Once the file has downloaded, extract it (typically by double-clicking the .zip) and open the volumes as a Volume Series in ChimeraX. There are several ways to do this. One is entering a command like the following into the command line:

open /path/to/volume/series/*.mrc vseries true

This opens all of the volumes as a single volume series which can be played with the slider.

The results of 3D Variability Display in simple mode (Figure 28) show that, as we expected from the 3DVA job’s plots, component 0 captures the major movement of the thumb and finger. Component 1 models some movement of the TMD, but it is not immediately clear how this movement is related to the finger/thumb position.

In the apo state of component 0, the lower part of the TMD disappears. From figure 14, we know that the maps of each class should have similar strength in the TMD. Perhaps each particle’s combination of components 0 and 1 produce a TMD with consistent strength, and we are not seeing this because each volume series generated in simple mode holds the other components at zero (ignoring their contribution to the final volume).

This is one of the downsides of simple mode. Although it produces clean volumes which make it easy to interpret the individual effect of each component, the missing contributions from the other components can make it difficult to construct a holistic picture of the particle conformational distribution. For this, the other two modes of 3D Variability Display sometimes provide more insight.

Intermediates Mode

Instead of adding together the difference components and consensus volume, intermediates mode splits the entire particle stack into subsets (called “frames”) based on the value of a component. 3D Variability Display then performs individual reconstructions of each frame, using the particles’ existing poses. This has the advantage of creating volumes based on the actual particle images rather than only using the volumes created by 3DVA.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particle stacks

All particles output from 3D Variability

Input: Components

3D Variability volumes output from 3D Variability

Output mode

intermediates

Number of frames/clusters

Downsample to box size

128

Crop to sizer (after downsample)

100

Cropping the volume to remove the surrounding empty solvent further reduces the file size, speeding up the download and animation process.

Only use these components

If this parameter is left blank, series are made for each component. In this case, we are most interested in what particles look like along component 0, so we select only that component to save time.

Intermediates: window (frames)

2 (default)

In some cases, there are not enough particles in each frame to produce a high-quality map. The intermediates window allows the job to take particles from adjacent frames to improve the SNR of each reconstruction.

In intermediates mode, 3D Variability Display produces a plot of the weight of each frame, for each reconstruction (Figure 30). Each of the colored lines represents the weight with which particles in each frame are used during the reconstruction for that frame. The colored curves peak at the center of the frame and fall off according to the weight of particles at that position.

In this case, the intermediates maps look better than the simple maps, while still showing the interesting movement of the finger and thumb. This is likely for two reasons. First, the reconstructions are performed at the full image size (then downsampled to 128 px), so the final Nyquist resolution is higher. Second, the TMD retains more density because reconstruction does not ignore the contribution of the second component, unlike simple mode.

If we reduce the Intermediates: window (frames) parameter to 0, particles have a weight of 1.0 if they are in the frame and 0.0 outside the frame (Figure 32). Because the particles used to produce these maps come from a narrower segment of the coordinate, the maps may be more distinct from one another (Figure 33). However, they will also include fewer particles, making them noisier.

Sometimes, the maps further from the center of the coordinate will be noisier than maps closer to the center. This is especially likely when the particles are normally distributed about 0, because the frames are the same width at each position, but particles are less dense at the ends of the coordinate axis. In these cases, the Intermediates: window (frames) parameter can be set to a negative number. This will produce maps with the same number of particles in each frame (meaning that the each frame may cover a different amount of coordinate space).

In some cases, these maps are less noisy than those produced when setting Intermediates: window (frames) to 0. Note, however, that this mode does not sample the coordinate space evenly, so the “time” represented by the animation is no longer necessarily linear (Figure 35).

An important consideration when using intermediates mode is that when each coordinate is sampled, particles are only split by that component. For example, if particles are in two or more clear clusters in the component space, they may be inappropriately sampled by intermediates mode (Figure 36). In cases like these, it is best to use cluster mode instead.

Cluster Mode

Unlike the other two modes of 3D Variability Display, cluster mode does not sample along one of the variability components. Instead, it uses a Gaussian mixture model to find clusters of particles, and reconstructs the particles in that cluster. This mode is this able to account for all of a given particle’s coordinates, as opposed to intermediates mode, which only separates particles by the component it is sampling across.

We will set up a 3D Variability job in cluster mode requesting two clusters, since the FaNaC particles seem to form two clusters in this 3D Variability job (Figure 26).

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particle stacks

All particles output from 3D Variability

Input: Components

3D Variability volumes output from 3D Variability

Output mode

cluster

Number of frames/clusters

Note that this parameter will almost always be set to a much lower value in cluster mode than in simple or intermediates modes.

Downsample to box size

128

Crop to sizer (after downsample)

100

In cluster mode, 3D Variability Display produces a plot of the clusters. If there are more than 2 difference components, this plot is an interactive 3D scatter plot.

Since cluster mode generally produces a smaller number of discrete volumes, it is generally best to view them side by side as opposed to as an animation.

The volumes produced by cluster mode clearly represent apo and FMRFa-bound FaNaC. This is perhaps unsurprising given the clear separation between the two clusters. From this point, we could produce higher-quality maps of these particle stacks by refining them independently. We can also assess the separation of apo and FMRFa-bound particles in these jobs as we did for 3D Classification and Heterogeneous Refinement (Figure 40).

Although we found interesting movements with the parameters we chose above, it is often informative to try a range of values for each parameter and see how they affect the results.

Number of components

In the 3DVA job above we used two components. This was a best guess, given that the major movement of the channel is expected to be a coupled finger/thumb movement and TMD twisting. In the two-component results, these movements we expected to be coupled were separated into the two components. It’s possible that if we requested only one component from 3DVA we would see a coupled movement, as we expect. If we run a 3DVA job with the same settings as the first one, but with Number of modes to solve set to 1 instead of 2, then visualize the results with 3DVA Display’s simple mode over 30 frames, the two movements are coupled (Figure 41).

Additionally, the particles are still well-separated into two clear clusters along the single component (Figure 42).

If we instead request three components and visualize the result in the same way, the modes represent slightly different movements of the finger, thumb, and TMD (Figure 43).

Thus, in the case, the choice of number of components does not seem to be too consequential. This is likely because the movement is large and concerted, and there are very few particles in the transition state between the two distinct modes. In other datasets, the choice of components may be very important. It is therefore good practice to investigate a few different component numbers as we have done here when working on a new sample.

Filter Resolution

As explained above, 3DVA works best when the filter resolution is set to be similar to the size of the expected movement. In some cases, the results may not depend very strongly on the choice of filter resolution; for example when the movement is large and particle images are high quality (as is true in this case), the results may be similar and useful over a range of Filter Resolution values.

When comparing 3DVA filter resolution, it is generally best to assess the results using Intermediates or Cluster mode. In simple mode, the differing filter resolution makes the volume series difficult to compare, whereas intermediate or cluster reconstructions are all be performed at the same pixel size.

If we run a 3DVA job with the same settings as the first one, but with Filter resolution (A) set to 8 instead of 4 and display the results with 3DVA Display in intermediates mode, the results are very similar to the 4 Å case (Figure 44).

3D Flexible Refinement

3DVA performed well on this dataset. It was able to capture the movement of the finger, thumb, and TMD. It also separated particles in the apo and ligand-bound states as well as other classification techniques. However, it does have some drawbacks. Chief among those are:

Each 3DVA job captures conformational change at a user-specified size scale — that of the filter resolution.
Each conformation must be modeled by a linear combination of difference components. Thus, for some targets, a great number of components are needed to capture the particle’s motion.

3D Flexible Refinement (hereafter 3D Flex) aims to resolve both of these drawbacks by directly modeling deformation and movement rather than addition and subtraction of density.

3D Flex Theory

For a more complete discussion of 3D Flex’s theory and practice, see the 3D Flex guide page.

When modeling deformation of a protein, the conformational state of each particle must be determined. However, the particle images are simply too noisy for us to perfectly estimate their conformation. The major hurdle to effective modeling of the conformational heterogeneity of a particle stack is, therefore, overfitting.

3D Flex overcomes this challenge by modeling the deformation of a tetrahedral mesh rather than directly modeling the deformation of every voxel in the volume. This construction reduces the degrees of freedom in the problem, significantly reducing overfitting. In addition, the mesh prevents unrealistic motion (like the separation or stretching of individual amino acids) by enforcing smooth deformation within a single tetrahedral cell.

Thus, to analyze a dataset using 3D Flex, we must first construct a tetrahedral mesh. Additionally, to save computation time, we will downsample and crop the input particle images such that their final Nyquist resolution is on the same order as the expected deformation (similar in theory to the filter resolution parameters of 3D Classification and 3DVA).

Data Preparation

First, we will downsample and crop the aligned particle images. Particle downsampling must occur for two main reasons:

Performing 3D Flex Training with the full-size images would require unreasonable amounts of memory and time. Downsampling significantly reduces the pressure on computational resources.
3D Flex Train does not use half sets. 3D Flex Reconstruction, on the other hand, does use half sets, but these half sets are only independent for the resolutions between the training Nyquist resolution and the reconstruction Nyquist resolution. Therefore, the training resolution must be set lower than the reconstruction resolution.

In addition to downsampling, particle images may be cropped to further speed training and reconstruction. Since particles are typically downsampled for training to well below the resolutions at which CTF-delocalized information is relevant, this is generally a good idea. Keep in mind, though, that the final full-size reconstruction is still performed in the cropped box. For more information about signal delocalization due to the CTF, see the CTF guide pages.

Finally, 3D Flex Data Prep also crops and downsamples the input consensus refinement to be the same size and shape as the prepared particle images, and also associates the prepared images with their original, full-size images for later reconstruction.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Particle stacks

All particles output from Align 3D Maps (performed in Aligning the Particles)

Input: Initial volume

The map to which particles were aligned in Align 3D Maps

Alternately, a consensus refinement of all particles together could be used here.

Crop box size (pix)

180

Particles are cropped, then downsampled. Cropping to 180 pixels removes most of the solvent surrounding the channel. Depending on how well centered FaNaC is in your map, you may need to use a Volume Tools job to re-center it before cropping.

Training box size (pix)

The training pixel size will be approximately 2.9 Å/px, meaning the Nyquist resolution will be nearly 6 Å. This is on the same scale as the observed motion.

When this job completes, it can be helpful to download and view the downsampled consensus volume. Ensure that the features you are interested in are still visible, and that no parts of the map have been cropped out. In this case, the finger and thumb are still visible, and the channel is intact (Figure 46).

Mask Creation

For guidance on the design and creation of masks, see the Mask Creation guide page.

The mask in 3D Flex defines the edges of the tetrahedral mesh (equivalently, the region of the volume that is allowed to flex). Unlike other masks in cryo-EM, it does not need a soft edge, since it is used to define a boundary, not multiply a volume. It can still be created using the usual processes. In general, results are best if the mask is created from the downsampled and cropped volume produced by 3D Flex Data Prep to ensure that the box and pixel sizes are consistent. In this case, we will use a Volume Tools job to lowpass filter and threshold the consensus volume to produce the mask (Figure 47).

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Input volume

Downsampled and cropped volume from 3D Flex Data Prep

Lowpass filter (A)

A range of settings near 12 would probably produce similar results. The important point is to produce a smooth mask.

Lowpass Filter Type

butterworth

Rectangular filters can occasionally produce ringing artifacts, which become more challenging when padding and dilation are low

Lowpass Filter Order

Type of output volume

mask

Threshold

0.095

This value will depend on your map

Dilation radius (pix)

Only a small dilation is needed since the volume is in a small box

Soft padding width (pix)

Ringing is not a concern since the volume is not multiplied by the mask.

Mesh Preparation

Finally, we must prepare the tetrahedral mesh that will be used to model deformation. For most samples, the default mesh preparation settings are sufficient. We will use the defaults here, but more information on when and how to design a custom mesh is available in a dedicated guide page.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Consensus volume

Downsampled and cropped volume from 3D Flex Data Prep

Input: Solvent mask

Mask prepared in the last step

When this job completes, you can download and inspect the resulting PDB file of the mesh. It can be difficult to see the mesh with ChimeraX’s default settings — running the following commands might make it easier to see your map inside the mesh:

lighting gentle
size stickRadius 0.15
transparency #2 1 All # replace #2 with whatever the model number for the mesh is

Then, use the Side View panel to show only the front half of the channel. Ideally, the tetrahedral cells are large enough to contain secondary structure elements like helices, but small enough that the finger and thumb will be able to move apart from one another. The mesh we generated here (Figure 48) covers the entire channel and nanodisc, but in some cases 3D Flex performs better when excluding the nanodisc, or when splitting it into its own, rigid mesh segment. One could test all three of these mesh types, as well as a coarser or finer mesh if they were in search of the best possible result. For this case study, we will use only this mesh.

Training

With prepared particle images and a mesh in hand, we are ready to begin training the 3D Flex model. This job does two things:

Train a neural network to turn a latent space coordinate into a specific deformation of the tetrahedral mesh
Assign each particle to the coordinate in that latent space which corresponds to its deformation

To begin, we will run a training job with all default parameters.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Prepared particles

Prepared particles from the Flex Data Prep job

Input: 3DFlex mesh

Mesh prepared in the Flex Mesh Prep job

While this job runs, it produces several diagnostic plots which provide valuable information about the training process. The first is the latent distribution in each dimension (49).

These plots show the distribution of particles’ latent coordinates. Ideally, these range from -1.5 to +1.5 in all dimensions. This gives the neural network “room” in latent space to place particles with different conformation. In this example, the particles are far too tightly clustered, ranging from -0.4 to +0.4. In our next training job, we should reduce the Latent centering strength parameter. This parameter controls how tightly the coordinates cluster around 0, and must be tuned for each new dataset.

The other plot to pay attention to during training is the Training and validation loss plot (Figure 50). This plot shows the validation loss at each training iteration. Validation loss is a measurement of how well the neural network is modeling the particle images, with a lower validation loss being indicating a better model in general. The Train curve (grey) is very noisy because the particle subset differs at each iteration. The validation curves are more informative, since the validation set is the same each time. The Val. (rigid) line shows how well the network is able to model the particle images without flexing the volume at all. The Val. (flex, no penalty) shows how well the network models the images when it is not penalized for flexing the mesh, and the Val. (full) line assesses the model when the rigidity penalty is taken into account.

Generally, a gap between the rigid and full lines means that the network is modeling flexibility that improves the correlation between the volume and the particle images. If the rigid and full lines are too close, the rigidity may be too high. The lines in Figure 50 do look close, indicating that another training job with a lower flexibility may be necessary. However, the appearance of this plot is somewhat dataset dependent — it is better to directly asses the flexible movements produced by the network when determining whether parameters need to change.

3D Flex Generate

3D Flex and 3DVA both assign each particle a coordinate to describe its conformation. 3DVA’s coordinates are linear. In the 3DVA results for this dataset, for instance, moving along component zero always describes movement of the finger and thumb, regardless of the starting coordinate.

3D Flex, on the other hand, is nonlinear. The neural network assigns a deformation to each coordinate, but there is no guarantee that each conformation exists at only one point, or that moving along a single axis always describes the same type of motion. As a result of this, there is no analog to 3DVA’s difference components (such as those in Figure 25). We must instead use the neural network to create maps at each coordinate of interest and manually inspect them.

3D Flex Generate takes a list of coordinates and applies the deformation encoded at that position by the neural network to the consensus volume. This results in movies which look similar to those of 3DVA’s simple mode.

By default, 3D Flex Generate will create equally-spaced maps along each latent axis. This is similar to the coordinates chosen by 3DVA Display’s simple mode, but recall that the 3D Flex latent space is not linear — important conformations may only exist at positions like (1, 1), which are not sampled by this technique. Depending on the distribution of particles in the latent space, it can sometimes be important to generate and sample your own latent coordinates.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: 3DFlex model

The 3DFlex model output of the 3D Flex Training job

Number of frames per series

It is best to pick an odd number such that the center of the latent space is always sampled.

Override scheduler

On (default)

When this parameter is turned on, 3D Flex Generate runs immediately rather than waiting for 3D Flex train to finish. This lets you inspect the current state of the training job while it runs. If you observe maps which look too rigid, you can kill the upstream training job and launch a new one with lower rigidity without having to wait for the first job to finish.

3D Flex Generate produces a map series just like 3DVA Display, and it can be inspected in the same way using ChimeraX.

The two volume series show only small changes in channel conformation (Figure 51). Most of the movement appears to be rigid translation and rotation of the channel, with the TMD slightly flexing or twisting. The neural network was not able to model flexible movement of the channel, most likely for two reasons:

The centering strength was too high, so particles were not able to “spread out” and differentiate the latent space. We suspect this may be the case because of the particle distribution shown in Figure 49.
The rigidity was too high, so the model did not stretch or compress any tetrahedral cells. We suspect this may be the case because the maps produced by 3D Flex Generate do not move very much, and the rigid and full curves in Figure 50 are close together.

Tuning 3D Flex Training

We will run another 3D Flex Training job with reduced centering strength and rigidity.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Prepared particles

Prepared particles from the Flex Data Prep job

Input: 3DFlex mesh

Mesh prepared in the Flex Mesh Prep job

Rigidity (lambda)

0.5

The default value for this parameter is 2.0. The choice of 0.5 is somewhat arbitrary, but in general, it is best to start with a relatively large change in rigidity and then turn it back up if unrealistic motions are observed.

Latent centering strength

The default value for this parameter is 20. As with the rigidity, the exact choice of 1 is arbitrary, but it is faster to take large steps and find a value that is too low, then move back up.

The particles occupy a far greater proportion of the latent space (Figure 52), but still do not reach all the way to +/- 1.5, indicating that the centering strength could be reduced still further. Regardless, we can assess this job with a 3D Flex Generate job with the same settings as before (Figure 53).

This training job produced volumes with significantly greater flexibility. The TMD is far more flexible in the right series, and slight movement in the finger and thumb is visible in the left series. We are still not observing the movement we saw in 3DVA, and so can try one more job with even lower rigidity and centering strength.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Prepared particles

Prepared particles from the Flex Data Prep job

Input: 3DFlex mesh

Mesh prepared in the Flex Mesh Prep job

Rigidity (lambda)

0.05

Latent centering strength

0.25

This time, particles cover most of the latent space (Figure 54). The centering strength here may in fact be too low — some particles have clumped at +/- 1.5, since they are not allowed to extend beyond that range. We will again assess the volumes by running 3D Flex Generate with the same parameters as before (Figure 55).

The series on the left finally captures the movement of the finger and thumb that we observed in 3D Variability analysis. This is encouraging — this means that our centering strength and rigidity are nearly correct. Further fine-tuning may improve results. For instance, the movement of the TMD and nanodisc now looks somewhat exaggerated and may represent non-physical movements. It is difficult to tell if this is the case from these reconstructions, as they are very low resolution. We can repeat this job but provide a full-size map to assess the movement of secondary structure elements in more detail.

First, we create a map of FaNaC at the full pixel size, but cropped to the same physical extent as the input training data, with Volume Tools.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: Volume

One of the maps from the Align 3D Maps job. Alternately, a reconstruction of the aligned particles.

Crop to box size (pix)

180

This value should be the same as the crop size entered in 3D Flex Data Prep.

When we plug this cropped map into the 3D Flex Generate job, the tetrahedral mesh is scaled up to the full pixel size. The mesh is then used to apply the same deformations to the high-resolution consensus map, producing high-resolution versions of the modeled deformations.

Input or Parameter Name

Input source or Parameter Value

Notes and Rationale

Input: 3DFlex model

The 3DFlex model output of the 3D Flex Training job with the lowest rigidity.

Input: Full Res Volume

The cropped map created in the Volume Tools job above.

Number of frames per series

We produce more volumes here to sample the latent space more finely.

It is clear with the full-resolution map that the rigidity is too low in this 3D Flex run. Note the “jelly-like” movement of the TMD, in which the turns of the α-helices stretch and compress unrealistically. If the ultimate goal of applying 3D Flexible Refinement is the production of a consensus map using 3D Flex Reconstruction, it would be important to continue iterating this process, adjusting rigidity as necessary, until the motions looked physically realistic. Given that this particular dataset does not in fact have any particles in the transition state between apo and FMRFa-bound, we consider flexible refinement analysis complete here.

Conclusion

In this case study, we analyzed a combined dataset of FMRFa-bound and apo FaNaC1. We first separated the two states using 3D Classification. Because the poses used in this job came from a consensus refinement which largely looked like the FMRFa-bound state, the apo map was of very poor quality. Refining this map produced a high-resolution version of the two states.

Next, we tried the same analysis using Heterogeneous Refinement instead. Because the poses are updated as Heterogeneous Refinement progresses, both maps had sharp thumb domains without the need for subsequent refinement.

We next moved on to continuous analysis techniques, starting with 3D Variability Analysis (3DVA). After determining an appropriate filter resolution, we found that 3DVA with one or two components was able to effectively separate the two binding states into clear populations. Importantly, none of the three techniques discussed so far perfectly separates the particles into apo- and FMRFa-bound particles. Combining multiple different approaches to the same particle stack will always provide the most complete picture.

Finally, we demonstrated important considerations when applying 3D Flexible Refinement to samples like FaNaC, including downsampling, centering strength, and rigidity.

References

Kalienkova, V., Dandamudi, M., Paulino, C. & Lynagh, T. Structural basis for excitatory neuropeptide signaling. Nature Structural & Molecular Biology 31, 717–726 (2024).
Rosenthal, P. B. & Henderson, R. Optimal Determination of Particle Orientation, Absolute Hand, and Contrast Loss in Single-particle Electron Cryomicroscopy. Journal of Molecular Biology 333, 721–745 (2003).

PreviousCase Study: End-to-end processing of encapsulated ferritin (EMPIAR-10716)NextCase Study: Picking-induced Orientation Bias in HA Trimer (EMPIAR-10096 and -10097)

Last updated 6 months ago