> For the complete documentation index, see [llms.txt](https://guide.cryosparc.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://guide.cryosparc.com/processing-data/tutorials-and-case-studies/case-study-discrete-heterogeneity-in-a-sample-of-acetogenin-bound-complex-i-empiar-10927.md).

# Case Study: Discrete heterogeneity in a sample of Acetogenin-bound complex I (EMPIAR 10927)

Processing of particles from EMPIAR 10927 including separation of discrete targets and states on the basis of per-particle scale, Ab-Initio Reconstruction with custom settings, 3D Classification, and clustering by 3D Variability Analysis.

### **Introduction**

In this case study we will work through exploratory steps in the processing pipeline for a sample of mouse heart mitochondrial respiratory complex I prepared in the presence of a tight-binding inhibitor called Acetogenin. The results of the original processing were deposited as [EMDB:13611](https://emdb-empiar.org/EMD-13611) and [PDB:7psa](https://pdbe.org/7psa) and described in [Grba et al. (2022)](https://www.jbc.org/article/S0021-9258\(22\)00042-4/fulltext). We selected this dataset to illustrate different routes to disentangling discrete heterogeneity from a native sample because of its high contrast, and rich heterogeneity.

{% hint style="info" %}
There isn’t one “right” way to disentangle the different species, so rather than prescribe the “best” way to do it, we will work through, and compare four different strategies to separate discrete heterogeneity, using a variety of CryoSPARC jobs, and go on to evaluate their performance.
{% endhint %}

The raw data are publicly available for download as [EMPIAR-10927](https://www.ebi.ac.uk/empiar/EMPIAR-10927/). Image processing was performed using CryoSPARC v5.0.

![Figure 0. Case Study overview](/files/wYd78nrZW3KwlrtUhBXH)

Mitochondrial respiratory complex I is a 1 MDa protein complex that resides in the inner mitochondrial membrane of eukaryotes and works alongside respiratory complexes II, III, IV and complex V, that are together crucial for cellular energy transduction. Complex I has a hydrophilic arm protruding in the mitochondrial matrix, and a membrane arm embedded in the inner mitochondrial membrane. It performs a redox (reduction-oxidation) reaction, taking electrons from NADH, passing them to ubiquinone in the membrane, and pumping protons across the membrane. The gradient of protons is ultimately used to power ATP production by complex V (F₁Fₒ ATP synthase), and the transport of important molecules across the membrane. In the mitochondria, complexes I,III and IV are known to associate together as supercomplexes, but for the purpose of studying complex I in isolation, this sample was solubilised using a detergent (Dodecyl Maltoside) that breaks up the supercomplexes. Mammalian complex I structures have been observed in two distinct states known as the “active” and “deactive” states (alternatively called “open” and “closed”) that we will keep in mind during the processing of this dataset (see Introductory figure).

![Introductory figure. Simulated maps from PDB 7ak5 and 8om1 are shown along with gaussian smoothed density for the detergent belt. Curved arrows indicate the relative opening and closing of the two arms between the states.](/files/nZ9OdTJl9q4PrbmOQM1N)

### **Setting up**

This is a fairly small dataset with 1283 movies recorded in mrc format with gain correction, and it requires 3.3 TB of disk space.

Before beginning this tutorial, you should [create a new project and a workspace](https://guide.cryosparc.com/application-guide-v4.0+/using-the-cryosparc-interface/projects-workspaces-and-live-sessions#creating-your-first-project) within that project. Download the 1283 movies to a location of your choosing. For example, our data is downloaded to a directory called `rawdata` using the commands:

```
cd /path/to/rawdata
wget -m <ftp://ftp.ebi.ac.uk/empiar/world_availability/10927> .
```

For particle picking, we will be using TOPAZ ([Bepler et al., 2019](https://doi.org/10.1038/s41592-019-0575-8)). TOPAZ will need to be installed separately to use via the CryoSPARC wrapper, and instructions can be found [here](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/deep-picking/topaz).

### **1. Movie import and pre-processing**

* Import the data using an [Import Movies](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/import/job-import-movies) job. The `Movies data path` should match the location of the directory containing the downloaded `.mrc` files, for example:`ftp.ebi.ac.uk/empiar/world_availability/10927/data/FoilHole_*.mrc`

These movies are already gain corrected so you do not need to add any gain reference.

* Add in the experimental information below that we obtained from the [EMDB:13611](https://emdb-empiar.org/EMD-13611) entry.

| Parameter                     | Setting |
| ----------------------------- | ------- |
| `Raw pixel size (A)`          | 1.043   |
| `Accelerative Voltage (kV)`   | 300     |
| `Spherical Aberration (mm)`   | 2.7     |
| `Total exposure dose (e/A^2)` | 50      |

* Run a [Patch Motion Correction](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/motion-correction/job-patch-motion-correction) job with `Save results in 16-bit floating point`:true, so that the output images take up half of the disk space compared to the default 32-bit floating point files (learn more about float16 format [here](https://guide.cryosparc.com/processing-data/tutorials-and-case-studies/tutorial-float16-support)). The `Number of GPUs to parallelize` can be set depending on GPU availability. Parallelizing this job across six GPUs we found each Patch Motion job took 1 hour 44 mins.
* Using the Quick action menu, run a [Patch CTF Estimation](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/ctf-estimation/job-patch-ctf-estimation) job with the default settings.

### **2. Excluding poor micrographs from downstream processing**

Movies collected for single-particle cryo-EM can have a variety of different characteristics. Some of these, e.g., a range of defocus values or a range of ice thickness to obtain a range of particle orientations, can be beneficial for image reconstruction. However, movies often also contain unwanted junk and outlier attributes that compromise the quality of their particles, such as excessive in-movie motion and ice that is too thick or too thin for your sample. We often observe junk in the form of non-vitreous ice and contamination with ice crystals, but other features such as the edge of the holey support are also frequently observed. Ideally we want to avoid extracting particles from regions of junk, or from poor quality micrographs, as these images can sometimes be challenging to remove later on in the processing pipeline. The strategy we will use here is to automatically detect regions of junk so that we can remove micrographs with large quantities of junk and poor statistics.

* Using the Quick actions menu, run a [Micrograph Junk Detector](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/exposure-curation/job-micrograph-junk-detector-beta) job (1).

We show example outputs from the mic. Junk Detector job in Figure 1. We can see that most of the micrographs contain the edge of the gold holes (green), and many contain extrinsic ice defects (magenta) such as ethane contaminants or small ice crystals, but these make up only a very small percentage of the imaged area. On the other hand, relatively few images contain intrinsic ice defects (often non-vitreous ice), but this type of junk makes up a higher percentage of the total imaged area in the dataset.

![Figure 1. Results from the Micrograph Junk Detector job. An example micrograph with identified junk regions, number of micrographs containing each type of junk, and total micrograph area taken up by each junk type are shown.](/files/AOgUdgHhXYpxaqfmSCI8)

We will now inspect the CTF-estimated and junk-detected micrograph statistics so that we can exclude images of poor quality from downstream processing.

* Run a [Manually Curate Exposures](https://guide.cryosparc.com/application-guide-v4.0+/interactive-jobs#interactive-job-manually-curate-exposures) job via the Quick actions menu, or via the job builder inputting the “Labelled Micrographs” from Mic. Junk Detector.
* Navigate to the pink Interactive tab
* Set thresholds for outliers on undesirable characteristics
* We chose to select the following thresholds:

| Parameter                | Threshold | Reason                                           |
| ------------------------ | --------- | ------------------------------------------------ |
| `CTF fit resolution`     | max 6 Å   | Exclude poorest resolution micrographs           |
| `Relative Ice Thickness` | max 1.119 | Exclude very thick ice with poor signal-to-noise |
| `Junk Area (%)`          | 35        | Exclude micrographs with large amounts of junk   |

We found that if we set the Junk Area maximum threshold to 35%, this also excluded the majority of micrographs that contained relatively thick ice and intrinsic ice defects. After curation we were left with 1134 micrographs.

### 3. Particle picking and extraction

During the original processing pipeline in [Grba et al. (2022)](https://www.jbc.org/article/S0021-9258\(22\)00042-4/fulltext)*.,* particles were picked on the basis of manual picking, followed by template picking, and so picks were focussed on those that resembled complex I from the start. Here, we plan to investigate if there are any other targets present in this sample, and to do so we are going to use a TOPAZ pretrained model to pick particles.

{% hint style="info" %}
For TOPAZ picking, you don’t strictly need to already know the diameter(s) of your particles - although usually TOPAZ downsamples micrographs according to a supplied diameter, you can instead manually downsample the micrographs as we do here. We downsample to a pixel size where we still expect particle features to be visible.

You can additionally calculate a minimum distance between picks. For example, if you want to allow your particles to be at least 100 Å away from each other in the downsampled micrographs then you can use this equation:

$$\dfrac{\text{interparticle distance (\AA)}}{\text{downsample factor} \times \text{pixel size (\AA/pixel)}} = \text{radius of extracted regions (pixels)}$$

When we add in our pixel size and downsample factor, this gives us a value of 12 that we enter into the job settings.

The TOPAZ job does not output example images of downsampled micrographs, but in Figure 2 we show what a micrograph looks like when it is downsampled by a factor of 2,4,8,16 and 32 compared to the original micrograph. We also show an inset of an individual particle, and we can see that at Downsampling factors of 16 and 32, there is significant loss of information about the particle shape, however there isn’t much change in the particle appearance with downsampling of 2 or 4. A higher downsampling factor can facilitate a shorter job runtime, so we made a compromise by selecting a downsampling factor of 8 (a pixel size of 8.34 Å) meaning that in our hands the run only takes \~13 mins, but the main particle features are still present. Note that for smaller particles or larger pixel sizes you may wish to use less downsampling.
{% endhint %}

<figure><img src="/files/EOTJvheZWDMRVqe5OHer" alt=""><figcaption><p><strong>Figure 2. Micrograph downsampling with factors of 2 to 32.</strong> An individual particle image inset is shown for each example, and pink bar showing the distance of 100 Å is also shown.</p></figcaption></figure>

* Create a new TOPAZ Extract job and input the accepted exposures from Curate Exposures. Use the following settings:

| Parameter                     | Setting           | Reason                                         |
| ----------------------------- | ----------------- | ---------------------------------------------- |
| `Path to TOPAZ executable`    | your\_TOPAZ\_path |                                                |
| `Select pretrained`           | ResNet16          |                                                |
| `Downsampling mode`           | manual            |                                                |
| `Downsampling factor`         | 8                 | Downsampling the micrographs speeds up the job |
| `Radius of extracted regions` | 12                | Definition of inter-particle distance (pixels) |

* Run an Inspect Picks job inputting the micrographs to examine how well TOPAZ picked particles. See if you note any correlations between particle number and micrograph statistics in the scatter plot on the left.

We found that the particles in this dataset were picked pretty well by TOPAZ with picks centred on visible particles (examples shown in Figure 3), and very few on empty ice regions; on this basis we accepted all of the picks by clicking “Done picking”.

![Figure 3. Example TOPAZ picks. Micrographs are representative of thick, and thin ice, showing the pick locations found by TOPAZ in green.](/files/StfpcbeM4K2OogYmGPzx)

{% hint style="info" %}
As defocus is increased, higher frequency components from particles become delocalised further out in real space due to the point spread function of the microscope. If too small a box is selected for extraction, some information about your particle is lost, and this may limit the obtainable resolution. Conversely using an excessively large a box can lead to a lot of noise in the images, and this can also have a negative effect on the resolution of your reconstruction. As a very rough rule, a box of \~1.5-2.5 x the diameter of your particle is often appropriate, however very high resolution data or data collected with high defocus may require a larger box. The box size must be an even integer of pixels, and it is best if you choose or downsample to a [box size that is computationally efficient.](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/extraction/job-extract-from-micrographs#box-sizes-that-allow-for-efficient-processing)
{% endhint %}

We know from existing PDB models that the diameter of complex I is around 300 Å, so we will use a box of 450 downsampled to a box of 320. The choice of initial box size here is \~1.6 x the particle diameter in Å, and then downsampled to a box size that is smaller and more efficient to handle in the software, but not expected to limit the resolution by its Nyquist frequency (in this case Nyquist will be 2.93 Å).

* Extract the particles using the [Extract from Micrographs](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/extraction/job-extract-from-micrographs) job (1) with a box of 450 pixels (478 Å) and Fourier crop them down to 320 pixels. Fourier cropping makes the images smaller, so that they use up less disk space and some jobs will run faster. This cropping in Fourier space downsamples images in the same way as the [Downsample Particles](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/extraction/job-downsample-particles) job, so that while the box extent in Å is kept the same, the pixel size is larger and the Nyquist limit (the highest achievable resolution) is lower. Select `Save results in 16-bit floating point` to save on the disk space required. We extracted 93,230 particles.

### 4. Initial Refinement and diagnostic 3D Variability Analysis

{% hint style="info" %}
When processing a repeat target, or re-processing a dataset, it can be faster to use existing 3D volumes to initialise refinements, rather than generating Ab-Initio volumes for each dataset. It can also be tempting to assume that a purified protein contains a homogeneous population. Here, we will see what happens when we do exactly that! As there is already a deposited map for this dataset we will use that, and refine all of the extracted particles.
{% endhint %}

* Run an [Import 3D Volumes](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/import/job-import-3d-volumes) job and use EMDB ID 13611
* Run a [Non-Uniform Refinement](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/3d-refinement/job-non-uniform-refinement-new) job (NU Refinement (1)) using the extracted particles, and the volume from Import 3D Volumes, and set `use dynamic refinement mask` false. For membrane proteins during Non-Uniform Refinement, masking is often not necessary because the regularisation dampens the micelle signal sufficiently. Examine the map and statistics from the refinement.

  ![Figure 4. The results from Non-Uniform Refinement 1. Plots shown are global FSC, Conical FSCs, orientation distribution plot, an image of the unsharpened map, and the per-particle scale factors.](/files/qOFjXMnI6MSwkfhNflur)

{% hint style="info" %}
Per-particle scale is a function that compares the contrast of each aligned particle image with a projection of the refined volume and gives it a score. Effectively this gives a high score to particles with higher contrast that match the reference well, and a low score to those that have lower contrast and do not match the reference well. Minimising over per-particle scale means that the best particles get up-weighted, and the worst ones down-weighted. Often this is beneficial and can slightly improve the map quality and metrics. In CryoSPARC v5, particles with negative scale (i.e. the inverted image matches the reference better) are automatically rejected.
{% endhint %}

In Figure 4 we show the results of our NU Refinement (1); the resolution at 3.4 Å is similar to that found in the published processing and the cFAR value of 0.79 and orientation sampling plot indicate that there are good range of different views of complex I present in this dataset. The map looks isotropic and does not show smearing or other visible artefacts.

The per-particle scale plot is intriguing though! We observed that \~3k particles were rejected due to having negative scale values, and the accepted particles formed a bimodal distribution where the high scale peak (expected to be better quality matches to the reference) contains fewer particles than the low scale peak.

{% hint style="info" %}
Multimodal particle scales often indicate some sort of heterogeneity, and one way to investigate this is to use 3D Variability Analysis. 3DVA can often help to visualise density changes that originate from compositional or conformational heterogeneity, or can show smearing artefacts that indicate the presence of particles with poor alignment, or that do not match to the refined volume.
{% endhint %}

* Run 3D Variability Analysis (3DVA (1)) using the particles from NU Refinement (1), and set the Filter resolution to 8 Å.
* From your 3DVA (1) job, use the Quick actions menu to build a [3D Variability Display](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/variability/job-3d-variability-display) job (1). Set `Downsample to box`: 128 and `Crop to size after downsample:` 100.
* Download and examine the three volume series in ChimeraX.

<figure><img src="/files/MQ9fkJ3MmVZhje0qqTFV" alt=""><figcaption><p><strong>Movie 1. Volume series from 3DV Display (1).</strong> Pink; component 0; Blue; component 1, Green; component 2.</p></figcaption></figure>

We show examples of the density changes seen in the three components of 3DVA (1) in Movie 1. In 3DVA, the particle poses are fixed according to alignment to the Refinement volume, and the output components are ordered according to the magnitude of density changes. In component 0 (the largest density change), we see density disappearing and reappearing for the hydrophilic domain of complex I, and in components 1 and 2, we see opening and closing, and twisting motions of the hydrophilic and hydrophobic domains. The latter components look similar to the expected motions between the active (closed) and deactive (open) states of the complex, but the first component appears to indicate some sort of compositional heterogeneity - *appearing* to indicate the loss of the hydrophilic domain in some particles.

### 5. Strategy 1: Using Per-particle scales to select good particles

<img src="/files/zDO9UL7PvYm659stvXzJ" alt="" width="375">

To investigate further, we can split the particle set from NU Refinement (1) on the basis of per-particle scale to see what is in both of these peaks.

* Run a [Subset Particles by Statistic](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/particle-curation/job-subset-particles-by-statistic) job, selecting `Subset by`: Per particle scale. Use the default subsetting mode that uses gaussian fitting to two clusters.

To help us to understand what each of these particle sets contains (rejected particles, low scale particles and high scale particles) we can run separate 2D classifications.

<mark style="color:blue;">OPTIONAL:</mark>

* <mark style="color:blue;">Create a 2D Classification (1), setting</mark> <mark style="color:blue;"></mark><mark style="color:blue;">`Number of 2D classes`</mark><mark style="color:blue;">: 25 and drag over the rejected particles from NU Refinement (1).</mark>
* <mark style="color:blue;">Clone 2D Classification (1) and swap the particles for Particles set 0 from Subset particles, this job will be 2D Classification (2). We had 63,943 particles in set 0.</mark>
* <mark style="color:blue;">Clone 2D Classification (1) and swap the particles for Particles set 1 from Subset particles, this job will be 2D Classification (3). We had 26,123 particles in set 1.</mark>

![Figure 5. 2D class averages from 2D Classifications 1, 2 and 3.](/files/qUOAxdwjOnpSQzDfIEpr)

We found that the particles that were rejected during NU Refinement did not form any coherent 2D class averages that resemble protein targets in 2D Classification (1), supporting their rejection from the pipeline. The class averages of low scale particles in 2D Classification (2) looked predominantly like some unidentified proteins embedded within a detergent micelle. Class averages of high scale particles in 2D Classification (3) resembled complex I. This sample therefore appears to contain something else in addition to intact complex I.

Now that we have good evidence that the particles in the high-scale peak look like complex I, we can run another NU Refinement job with just those particles to see how the map quality and metric change.

{% hint style="info" %}
Initial CTF parameter estimation by Patch CTF may not be perfect; there could be variation in the particle depth in the ice affecting its defocus relative to the camera, and there may be uncorrected electron beam artefacts such as small degrees of beam tilt. Typically, when a refined volume reaches around 3.5 Å, it can be worth testing if refining the individual particle defocus values (Local CTF), and / or refining aberrations (Global CTF) improves the resolution and map quality.
{% endhint %}

* Clone NU Refinement (1), change the input particles to Particles set 1 from Subset particles and set `Optimize per-particle defocus:` True, `Optimize per-group CTF parameters:` True and `Fit anisotropic Mag`: True. This will be NU Refinement (2).
* Compare the map quality and statistics to NU Refinement (1).

![Figure 6. The results from Non-Uniform Refinement 2. Plots shown are global FSC, Conical FSCs, orientation distribution plot, an image of the unsharpened map, and the per-particle scale factors.](/files/ShrhjQ3BOJfg4qm27a4r)

We found that the FSC resolution and cFAR improved somewhat after removing the low-scale particles, and noted that the orientation distribution plot looked different. When junk or particles that do not match the reference volume are present during refinement, their alignment orientation can obscure the real orientation sampling of the target molecule. In this case the non-complex I particles made the orientation distribution *appear* more uniform, but the directional FSCs and cFAR plot indicate that those particles were not meaningfully contributing to the refined map. As the map is not drastically better than NU Refinement (1) you might wonder what was really gained by taking the time to remove the non-matching particles, but doing so is important to avoid artefacts in downstream processing that rely on poses assigned during refinement (such as Local Refinement, 3D classification and 3D Variability Analysis).

In NU Refinement (2) we have achieved a map of similar resolution to the deposited map [EMDB:13611](https://emdb-empiar.org/EMD-13611) without the requirement of any manual curation steps. However, to do so, we made an assumption about what was present in the sample by using an existing map as an initial model, rather than running [Ab-Initio Reconstruction](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/3d-reconstruction/job-ab-initio-reconstruction), and so we only found what we were looking for.

{% hint style="info" %}
For projects investigating a specific biological question, especially for repeat targets, making assumptions about what is in the mix (intact known target and junk) can accelerate progress towards an answer. For some projects it might be sufficient to stop at this sort of stage, if the region of interest in the map shows an adequate quality of density. Something that is sometimes overlooked is that cryo-EM is a technique that is not limited to hypothesis-driven results, and datasets designed to look at one aspect, might contain information about something else entirely! Although we have already achieved a map that looks reasonable, we are going to open the processing now to explore the dataset without constraining the analysis to the expected target.
{% endhint %}

### 6. Strategy 2: Discovering and sorting discrete target heterogeneity by Ab-Initio Reconstruction

<img src="/files/JZhmXV6ALFxWhG9agwd6" alt="" width="375">

At the beginning of this processing pipeline, we chose to skip the generation of Ab-Initio volumes and tested out the assumption that the particles were all complex I. The refined per-particles scale distribution, and subsequent 2D classifications suggest that the picked particles are heterogeneous, so let’s now take a step back and see if we can identify what other particle types are present. We don’t need to include obvious junk in the Ab-Initio Reconstruction job, so we will identify the poorest classes from 2D Classification (2) and exclude them from the next step.

* From 2D Classification (2) use the Quick actions menu to create a Select 2D Classes job and queue it. In the interactive tab select any classes that look like junk and not at all like protein. We selected 5 of the 25 classes at this stage, and this gave us 55,827 rejected particles.

{% hint style="info" %}
To speed up Ab-Initio Reconstruction, the images can be downsample first so that the Nyquist frequency is the maximum resolution that you want to use.
{% endhint %}

* Create a Downsample Particles job and input the rejected particles from Select 2D, and set the `Desired approx. pixel size (A)` to 6 and `Save results in 16-bit floating point:` true.&#x20;

This gave us a box size of 80 pixels.

* Create an Ab-Initio Reconstruction job (Ab-Initio (1)), inputting the downsampled particles and use the following settings:

<table><thead><tr><th width="215.2451171875">Parameter</th><th width="76.67578125">Setting</th><th>Reason</th></tr></thead><tbody><tr><td><code>Number of Ab-Initio classes</code></td><td>14</td><td>Using a lot of classes improves chances of finding a greater number of heterogeneous structures</td></tr><tr><td><code>Num particles to use</code></td><td>your total</td><td>Specifying this value ensures that the entire particle stack is used</td></tr><tr><td><code>Initial Resolution (Angstrom)</code></td><td>18</td><td>Some small or membrane target benefit from starting Ab Initio at higher resolution than the default of 35 Å</td></tr><tr><td><code>Fourier radius step</code></td><td>0.005</td><td>Reducing this value causes the job to take finer steps as the resolution ramps up</td></tr><tr><td><code>Initial minibatch size</code></td><td>300</td><td></td></tr><tr><td><code>Final minibatch size</code></td><td>1000</td><td></td></tr><tr><td><code>Enforce non-negativity</code></td><td>False</td><td></td></tr></tbody></table>

The number of classes, turning off enforcing non-negativity, and initial resolution values used above were determined empirically from repeat runs with different settings. We took inspiration from [Kim et al. (2025)](https://www.biorxiv.org/content/10.1101/2025.09.08.674935v1) for the minibatch size and Fourier radius step.

<mark style="color:blue;">OPTIONAL: As a control experiment, run a second Ab-Initio job (2) with 14 classes but otherwise default settings.</mark>

We found that Ab-Initio (1) took 8 hrs 10 mins, and Ab-Initio (2) took 1 hr 20 mins.

Examine the maps from your Ab-Initio job(s). We show example volumes in Figure 7. Although Ab-Initio (1) took significantly longer to run than Ab-Initio (2), this paid off, because it was able to uncover a wider range of useful volumes. We know that this sample came from a native source of mitochondrial membranes, and was purified without over-expression or affinity tags, so we might expect to see minor contamination of the preparation with other abundant mitochondrial membrane complexes in the particle population.

![Figure 7. Volumes from Ab-Initio jobs 1 and 2. Ab-Initio Volumes are shown beside PDB models with a semitransparent overlay of a molmap at 20 Å generated using ChimeraX. The PDBs used were complex I; 7psa, complex IV; 2occ (one monomer shown), complex III; 8ibg and complex V; 5ara.](/files/QKLrwmjGfumq0xmdpRj2)

A friendly neighbourhood mitochondrial biologist could probably identify the shape of some of these volumes in this case, but as a general rule, having knowledge about how the sample was prepared, and what impurities were identified by orthogonal techniques such as mass spectrometry can help solve the mystery of unexpected cryo-EM density maps that appear during processing.

* Compare your Ab-Initio volumes to the shape of models for mammalian mitochondrial respiratory complexes II, III, IV and V that are available in the Protein Data Bank. Download PDBs complex I; [7psa](https://www.rcsb.org/structure/7PSA), complex IV; [2occ](https://www.rcsb.org/structure/2OCC), complex III; [8ibg](https://www.rcsb.org/structure/8IBG) and complex V; [5ara](https://www.rcsb.org/structure/5ARA).

In Ab-Initio job (1) that used custom settings, we were able to identify complex I (with strong or weak density for part of the hydrophilic domain), complex IV monomer, complex IV dimer, complex IV tetramer, complex III dimer and complex V. We were unable to identify any classes that resembled complex II. When we examined the volumes from Ab-Initio job (2) we were only able to identify volumes that looked like complex I and a monomer, and a dimer of complex IV.

You may observe multiple classes for complex I and IV. Due to stochasticity in the Ab-Initio job, and particle selections, you might not observe one or more of the volume types shown above. Repeating Ab-Initio Reconstruction will sometimes manifest a different selection of volume types, for example you might see a volume for complex V in Ab-Initio (2) but we consistently found the same volume types on repeat jobs of Ab-Initio (1) with our particle set.

### 7. Strategy 2: Separation and Refinement of different discrete targets by Hetero Refinement

We now want to use Heterogeneous Refinement to better separate the different target complexes that we have identified. We can speed up and simplify analyses by selecting only 1 volume for each target type.

* Create a Heterogeneous Refinement job (Hetero Refine (1)). Add one volume each for complex I, complex III, complex IV monomer, complex IV dimer, complex IV tetramer, and complex V along with the remaining junk volumes and connect up the particles from NU Refinement (1). We already ascertained in 2D classification (1) that the rejected particles are not likely to be useful, we will not consider them further.

We show example output volumes from Hetero Refine (1) in Figure 8.

![Figure 8. Volumes from Heterogeneous Refinement. PDB models fitted into each of the protein complex volumes. The PDBs used were complex I; 7psa, complex IV; 2occ (one monomer shown), complex III; 8ibg and complex V; 5ara.](/files/7Ok8XNYgK3l7EPKk6oq7)

We found that among the non-rejected particles from NU Refinement (1), 17% were assigned to junk classes and 83% were assigned to mitochondrial complex volumes. Recall that we did not curate the particles picked by TOPAZ, and relatively few were rejected at the NU Refinement stage, and so this result indicates a good quality of initial particle picking. This sample was prepared in order to study the structure of inhibited complex I, and the original analyses used template picking from the start, which may have missed the other complexes.

### 8. NU Refinement of the individual complexes

{% hint style="info" %}
It is a good idea to consider if your target particle displays rotational symmetry. You can rotate the volume around and consider if it looks the same after rotation in one or more directions. When a target is symmetric, we can make use of this by applying symmetry during refinement, effectively increasing the signal-to-noise ratio, and often improving the map quality. Care should be taken when selecting symmetry for refinement: applying incorrect symmetry, or applying symmetry to a pseudo-symmetric target can lead to poor map quality, artifactual density or loss of interesting asymmetric map features.
{% endhint %}

* Rotate the volumes around and consider if there is symmetry present - we identified what looked like C2 symmetry for complex III dimer, and complex IV tetramer. This means that the maps looked very similar after rotating by 180 degrees. See Figure 9 for examples.
* Examine the output volumes and compare them in ChimeraX to the PDB files that we used above to check the hand of the volumes.

This strategy is useful when the map resolution is not high enough to identify the handedness of alpha helices. If you are unsure which hand fits better, you can look at the “average map value” reported in the log.

* For volumes that require a hand flip to match the model: Run a Volume Tools job with that volume as an input and set `Flip hand:`True, to give a volume with the correct hand.

![Figure 9. Manual checks of maps for symmetry and handedness.](/files/OpCvK5nMldz1LW0SfVwJ)

* Run a Non-Uniform Refinement (3) inputting the correct hand volume and particles for complex I and set `Use dynamic refinement mask`: false
* Run a Non-Uniform Refinement (4) inputting the correct hand volume and particles for complex IV monomer and set `Use dynamic refinement mask`: false
* Run a Non-Uniform Refinement (5) inputting the correct hand volume and particles for complex IV dimer and set `Use dynamic refinement mask`: false
* Run a Non-Uniform Refinement (6) inputting the correct hand volume and particles for complex IV tetramer and set `Use dynamic refinement mask`: false and `Symmetry`: C2
* Run a Non-Uniform Refinement (7) inputting the correct hand volume and particles for complex V and set `Use dynamic refinement mask`: false
* Run a Non-Uniform Refinement (8) inputting the correct hand volume and particles for complex III and set `Use dynamic refinement mask`: false and `Symmetry`: C2

{% hint style="info" %}
In CryoSPARC v5 you can copy job parameters and paste them to other jobs, so you don’t need to set them individually each time! See the guide page on [copying and pasting parameters](https://guide.cryosparc.com/application-guide/creating-and-running-jobs#copy-and-paste-parameters) for more details.
{% endhint %}

We found that the maps for complex I and complex IV dimer were sufficiently high at \~3.5 Å that they might benefit from CTF Refinement of per-particle defocus, and per-group beam tilt, trefoil and anisotropic magnification.

{% hint style="info" %}
When the map resolution approaches the Nyquist frequency that map features become a bit jagged due to under-sampling of the map pixel size. If you have applied Fourier-cropping during particle extraction, you can re-extract the particles less, or no Fourier cropping to improve the map smoothness.

It can also be beneficial to re-extract particles after they have been aligned in 3D in order to better centre the extracted region for the particle.
{% endhint %}

If you have an estimated map resolution of \~3.5 Å, then:

* Clone the extraction job from Section 3 and swap the particles for those in NU Refinement (3) (Extract from Micrographs (2))
* Clone the extraction job from Section 3 and swap the particles for those in NU Refinement (5) (Extract from Micrographs (3))

Using these particles re-run NU refinement:

* Clone NU Refinement (3) and set `Optimize per-particle defocus:` True, `Optimize per-group CTF parameters:` True and `Fit anisotropic Mag`: True. This will be NU Refinement (9).
* Clone NU Refinement (3) and copy over the parameters from NU Refinement (9). This will be NU Refinement (10).
* Examine your maps and look at the unsharpened maps, auto-tightened resolution masks, estimated resolution, and cFAR values.

We found that NU Refinements (9) and (10) had better global FSC resolution than the counterpart refinements (3) and (5) that did not include CTF Refinement.

{% hint style="info" %}
In CryoSPARC v5 you can see the near and far extents of the mask relative to the map in the Dashboard or Event Log under Real Space Auto Tight Mask Slices (see examples in Figure 10). This can give an overview feeling for if the mask is adequately covering the density. Regions of weak map density might exist in between the near and far extents, or even outside of the mask entirely, such as in the case of the detergent micelle. It is always a good idea to take a look at the masks used for FSC resolution estimation in ChimeraX while you inspect the map, to satisfy yourself that the mask adequately encapsulates your map density.
{% endhint %}

![Figure 10. NU Refinement of mitochondrial complexes. Sliced views of the auto tightened FSC mask relative to the refined volume, and cFSC plots for each of NU Refinements 9,4,10,6,7 and 8.](/files/G88Cg5pRG0807ku2AGQr)

Our NU Refinement (9) of complex I looks very similar to that in NU Refinement (2), with a good cFAR indicating a wide range of orientations being sampled. The dimer of complex IV also has good orientation sampling, however the other refinements that contain fewer particles have more limited orientations present and therefore caution might be prudent when considering the estimated map resolutions.

### 9. Ideas for comparing different classification strategies for discrete targets

Which classification strategy worked better for complex I? NU Refinement followed by subsetting the particles on the basis of scale, or Ab-Initio and Hetero Refine? We can check out the following to make a judgement on which one we like the best:

1. compare the unsharpened map quality, especially in the region of interest
2. compare the map statistics such as GSFSC and cFAR
3. find the uncommon particles (i.e. the ones only assigned to complex I in one or other of the strategies) and investigate if they are complex I or not

We could not discern any meaningful differences between the map quality and statistics of NU Refinements (2) and (8). To look at the uncommon particles:

<mark style="color:blue;">OPTIONAL:</mark>

* <mark style="color:blue;">Run a Particle Sets Tool job (1) with the particles from NU Refinement (2) in particles (A) and the particles from NU Refinement (9) in particles (B) with the Action set to Intersect.</mark>

In our case we found 23,362 particles common to both refinements, and 4,505 that were uncommon (2,323 in \[A minus B], and 2,182 in \[B minus A]). We expect to see some uncommon particles in any experimental datasets with different classification strategies or repeat runs. This is partly due to variations in signal-to-noise in the particle images that affect the reliability of particle alignment. This affects the particle scale values, and class assignment in classifications such as Heterogeneous Refinement. We can examine what the uncommon particles look like by running 2D Classifications.

<mark style="color:blue;">OPTIONAL:</mark>

* <mark style="color:blue;">Create a 2D Classification (4), setting</mark> <mark style="color:blue;"></mark><mark style="color:blue;">`Number of 2D classes`</mark><mark style="color:blue;">: 25 and drag over the \[A minus B] particles from Particle Sets Tool (1)</mark>
* <mark style="color:blue;">Create a 2D Classification (5), setting</mark> <mark style="color:blue;"></mark><mark style="color:blue;">`Number of 2D classes`</mark><mark style="color:blue;">: 25 and drag over the \[B minus A] particles from Particle Sets Tool (1)</mark>

<mark style="color:blue;">We can also run a control 2D Classification to look at the class average quality of similar number of the common particles</mark>

<mark style="color:blue;">OPTIONAL:</mark>

* <mark style="color:blue;">Run a Particle Sets Tool job with the \[Intersection (keeping A data)] particles from NU Refinement (2) in particles (A) and the particles from NU Refinement (8) in particles (B) with the Action set to Split, and a value similar to the number in 2D Classifications (4) and (5).</mark>

<mark style="color:blue;">OPTIONAL:</mark>

* <mark style="color:blue;">Create a 2D Classification (6), setting</mark> <mark style="color:blue;"></mark><mark style="color:blue;">`Number of 2D classes`</mark><mark style="color:blue;">: 25 and drag over the \[A minus B] particles from Particle Sets Tool</mark>

![Figure 11. 2D class averages from 2D Classifications 4,5, and 6. One class that resembles a complex IV dimer is circled in yellow.](/files/y0OgHrFqqrMHGmaTEgzw)

In Figure 11 we show 2D class averages from our 2D Classifications (4),(5) and (6). In 2D Classification (4) we noted a class that looked like a complex IV dimer, but this was not seen in 2D Classifications (5) or (6), in which most of the classes are identifiably complex I, despite the very low particle number (\~2,200 particles).

Overall, our initial experiment of NU Refining against an existing map, and removing junk and other species on the basis of per-particle scale worked pretty well and took a relatively short time, but the resulting particle stack from the strategy using Ab-Initio and Hetero Refine looks slightly cleaner. Either Strategy 1, or Strategy 2 might give enough map detail to answer the question at hand, allowing processing to stop here, but for the curious, we have not yet exhaustively examined the heterogeneity present in the dataset.

### 10. Investigating residual heterogeneity using 3D Variability Analysis

We have so far refined 6 maps in Section 8, but how can we be sure that each of the 6 particle sets used are homogeneous? At any point during processing when you want to look for remaining substantial heterogeneity (discrete or continuous), [3D Variability Analysis](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/variability/job-3d-variability?q=3dva) can be very useful. As we have 6 targets to consider, we will run 3DVA with just 1 mode to look for the most substantial density change amongst each refined particle set. For each one, we want to select a filter resolution that is lower than the FSC resolution of the input map.

* Create a 3D Variability Analysis job (2) inputting the map and resolution mask from NU Refinement (9) (complex I), set `Number of modes to solve`: 1, and `Filter resolution`: 8
* Create a 3D Variability Analysis job (3) inputting the map and resolution mask from NU Refinement (4) (complex IV monomer), set `Number of modes to solve`: 1, and `Filter resolution`: 12
* Create a 3D Variability Analysis job (4) inputting the map and resolution mask from NU Refinement (10) (complex IV dimer), set `Number of modes to solve`: 1, and `Filter resolution`: 8
* Create a 3D Variability Analysis job (5) inputting the map and resolution mask from NU Refinement (6) (complex IV tetramer), set `Number of modes to solve`: 1, and `Filter resolution`: 12
* Create a 3D Variability Analysis job (6) inputting the map and resolution mask from NU Refinement (7) (complex III), set `Number of modes to solve`: 1, and `Filter resolution`: 12
* Create a 3D Variability Analysis job (7) inputting the map and resolution mask from NU Refinement (8) (complex V), set `Number of modes to solve`: 1, and `Filter resolution`: 12
* For each of the 3D Var jobs use the Quick actions menu to make 3D Variability Display (3DV Display) jobs (2-7), and set the `Filter resolution` to match that used in the corresponding 3D Var job. To make the files smaller, for faster download also set `Downsample to box size:` 128.
* Examine the movies in ChimeraX and take a look at the shape of the 3DVA mode histogram. We show examples of these in Movie 2 and Figure 12.

<figure><img src="/files/Hof8YushctkRmkWSsRAQ" alt=""><figcaption><p><strong>Movie 2: 3DVA volumes from 3DV Display jobs 2-7.</strong></p></figcaption></figure>

![Figure 12. Histograms of component 0 from 3DVA jobs 2-7.](/files/qG6l8QXkbA2V3hDC9eSb)

We found that the motion of complex I and the complex IV tetramer looked interesting. The motion in complex III, as well as complex IV monomer and dimer look like variations largely in the micelle region, that are expected even within a small fairly homogeneous particle stack, and the motion seen in complex V might indicate that there is a subpopulation that has damage to part of the complex.

We noticed that the particle distribution across the component was bimodal in the case of complex I, and trimodal in the case of complex IV tetramer. We find that a multimodal distribution in 3DVA often indicates that there is discrete heterogeneity in the particle stack, whereas a single peak is more often indicative of continuous heterogeneity.

We will not consider further the changes in complex V due to the low resolution of the map, but we will now go on to look at separating the discrete states that might be present in the complex I and complex IV tetramer. We will do this via three different strategies, then compare the results from each:

* Section 11: Strategy 3 - 3DVA Cluster mode
* Section 12: Strategy 4 - 3D Classification
* Section 13: Strategy 2 revisited - Ab-Initio and Hetero Refine

### 11. Strategy 3: Sub-classification by 3DVA Cluster mode

<img src="/files/namB6DrItxLAO58tQ0e7" alt="" width="375">

In 3DVA (2), for complex I, we saw a bimodal distribution (see Figure 12), indicating possibly two discrete classes, we can therefore use 3D Variability Display in Cluster mode in order to obtain particle sets that correspond to the two peaks seen.

* Using the Quick actions menu from 3DVA (2) create a 3DV Display job (8) with the following settings:

| Parameter                         | Threshold | Reason                                             |
| --------------------------------- | --------- | -------------------------------------------------- |
| `Output mode`                     | cluster   | Output clustered particle sets                     |
| `Number of frames/clusters`       | 2         | The number of peak observed                        |
| `Downsample to box size`          | 128       | Reduce box size by Fourier cropping                |
| `Crop to size (after downsample)` | 100       | Further reduce the box size by real-space cropping |

In 3DVA (5), for complex IV tetramer, we saw a trimodal distribution (see Figure 12), indicating possibly three discrete classes, we can therefore use 3DV Display in Cluster mode in order to obtain particle sets that correspond to the three peaks seen.

* Using the Quick actions menu from 3DVA (5) create a 3DV Display job (9) with the following settings:

| Parameter                   | Threshold | Reason                              |
| --------------------------- | --------- | ----------------------------------- |
| `Output mode`               | cluster   | Output clustered particle sets      |
| `Number of frames/clusters` | 3         | The number of peak observed         |
| `Downsample to box size`    | 128       | Reduce box size by Fourier cropping |

Note, that for the complex IV tetramer, we do not crop the box in real space, because the volume for the tetramer takes up more of the box, and we don’t want to cut it off by cropping too much.

* Examine the output volumes and decide if you think that they might represent different states. Sometimes it might be ambiguous, because the input particle poses came from a job with them all mixed together. We can run fresh NU Refinement jobs to estimate the poses again from scratch using each of the particle stacks and volumes to see if they produce volumes with distinct features.
* Clone NU Refinement (9) and swap out the volume and particles from 3DV Display (8) cluster 0, this is NU Refinement (11)
* Clone NU Refinement (9) and swap out the volume and particles from 3DV Display (8) cluster 1, this is NU Refinement (12)
* Clone NU Refinement (5) and swap out the volume and particles from 3DV Display (9) cluster 0, this is NU Refinement (13)
* Clone NU Refinement (5) and swap out the volume and particles from 3DV Display (9) cluster 1, this is NU Refinement (14)
* Clone NU Refinement (5) and swap out the volume and particles from 3DV Display (9) cluster 2, this is NU Refinement (15)
* Examine the output volumes and refinement statistics.

![Figure 13. Separation of complex I and complex IV tetramer conformations using 3D Variability in cluster mode.](/files/9UpXqhLROOKQX6fbhQLO)

For complex I, we found the map was poorer in NU Refinement (11) (4,588 particles, cFAR 0.72) and some regions of the map had relatively poor density. On the other hand, the map from NU Refinement (12) had similar map statistics and global quality as NU Refinement (9).

We noticed global changes in the maps between NU Refinement (11) and (12) similar to the motion seen in Movie 2 where the two arms are more open in the smaller class, and are more closed in the larger class (see Figure 13). In addition we find that there is a region of density that has good definition in NU Refinement (12) but is absent in NU Refinement (11) (indicated in the inset image, circled). Both the global open/closed conformation, and ordering/disordering of the circled region (part of subunit NDUFA9) are documented characteristics of the active and deactive states, so we can tentatively assign NU Refinement (11) to the deactive state, and NU Refinement (12) to the active state.

For the complex IV tetramer, we found that NU Refinement (15) was of very poor quality, but NU Refinements (14) and (13) produced low resolution maps showing the oxidase monomers in two different packing conformations, affecting the shape of the detergent micelle. One micelle is more of a parallelogram, and the other is more oval and we will refer to them as conformation A and conformation B, respectively. Although we only started with \~5,000 particles we were still able to separate two unique states with 3DVA by looking for very low resolution changes (12 Å). Success here might reflect the resolution range being able to capture changes in micelle shape and relative positions of entire monomers.

We have evidence now that there are at least two good classes of complex I, two good classes for the complex IV tetramer, and perhaps some poor quality or junk particles in NU Refinement (15), but we don’t know if 3DVA was the best way to separate the particle sets.

### 12. Strategy 4: Sub-classification by 3D Classification

<img src="/files/r1w40usGBVYku94Xgznc" alt="" width="375">

As we already have particle poses assigned for complex I and complex IV tetramer in NU Refinements (9) and (6) respectively, we can go ahead and try [3D Classification](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/variability/job-3d-classification-beta) to compare the results that we get with the same number of classes as we used for 3DVA clusters.

* Create a 3D Classification job and input the particles from NU Refinement (9), set `Number of classes`: 2, `Filter resolution`: 8, and `Use latent mixing coefficients`: true
* Create a 3D Classification job and input the particles from NU Refinement (6), set `Number of classes`: 3, `Filter resolution`: 12, and `Use latent mixing coefficients`: true

For both jobs, we enabled latent mixing coefficients (a new option in CryoSPARC v5), and this can help reduce the likelihood of encountering a great number of similar-looking classes with equal particle counts.

For each output class, run NU Refinements (16-17 for complex I and 18-20 for complex IV tetramer), using the same settings as Section 9.

* Examine the maps and refinement statistics

{% hint style="info" %}
Both 3DVA and 3D classification rely on good input particle poses from an upstream refinement, but if one or more particle populations in the refinement don’t align to the consensus volume very well, then the results of 3DVA and 3D Classification may be suboptimal. It is sometimes therefore worth considering a classification strategy that also allows poses to be simultaneously updated during particle sorting, such as Ab-Initio Refinement and Heterogeneous Refinement.
{% endhint %}

We will compare the classification strategies from Sections 11, 12 and 13 later on.

### 13. Strategy 2 revisited: Sub-classification by Ab-Initio and Hetero Refine

<img src="/files/Nt7kjMi74kwnDtzaD8Dv" alt="" width="375">

We have separated complex I into two classes, and complex IV tetramer into 3 classes using 3DV cluster mode, and 3D Classification and will now separate particles using Ab-Initio and Hetero Refine. This is the same strategy that we used to initially discover and separate the 6 different targets that we have found within this dataset in Sections 6 and 7.

* Run an Ab-Initio Reconstruct job (3) inputting the particles from NU Refinement (9) (complex I), setting `Number of Ab-Initio classes`: 2
* Use the Quick actions menu to queue up a Hetero Refine (2)
* Run an Ab-Initio Reconstruct job (4) inputting the particles from NU Refinement (6) (complex IV tetramer), setting `Number of Ab-Initio classes`: 3
* Use the Quick actions menu to queue up Hetero Refine (3)
* For each output class, run NU Refinements (21-22 for complex I and 23-25 for complex IV tetramer), using the same settings as Section 9.

### 14. Comparing classification strategies for complex I

All three of the strategies used in Sections 11-13 might produce maps with sufficient information and features to answer the biological question at hand, but sometimes one method performs noticeably better than the others. It can be tempting to rely on finding the highest FSC resolution, but this does not always indicate the most accurate particle sorting.

{% hint style="info" %}
The populations assigned to each state are not always the same for different particle sorting methods. In this example, we found that with 3DVA Cluster mode, 3D Classification, and Ab-Initio Reconstruction followed by Hetero Refine, the curated complex I in this sample is 18.6, 16.3 or 21.2% in the deactive state. A range of values is expected for experimental samples, and reflect the inherent noisiness of the images being sorted, stochastic differences, and the way that they are being analysed.
{% endhint %}

* Identify which three sub-classified NU Refinements have a smaller population and match the deactive state and examine these maps for the following steps.

<mark style="color:blue;">OPTIONAL:</mark>

* <mark style="color:blue;">Run a Particle Sets Tool job (5) with the particles from NU Refinement of particles from 3DVA in input:particles (A) and the particles from NU Refinement of particles from 3D Classification in input: particles (B) with the Action set to Intersect.</mark>
* <mark style="color:blue;">Run a Particle Sets Tool job (6) with the particles from NU Refinement of particles from 3DVA in input:particles (A) and the particles from NU Refinement of particles from Hetero Refine (2) in input: particles(B) with the Action set to Intersect.</mark>
* <mark style="color:blue;">Run a Particle Sets Tool job (7) with the particles from NU Refinement of particles from 3D Classification in input: particles (A) and the particles from NU Refinement of particles from Hetero Refine (2) in input particles: (B) with the Action set to Intersect.</mark>

We show the results of the intersected (common) particles in the table below.

| Classification strategy | 3D Classification | Ab Initio & Hetero Refine |
| ----------------------- | ----------------- | ------------------------- |
| 3DVA cluster            | 3,848             | 3,784                     |
| 3D Classification       | x                 | 3,497                     |

We found that the particle class assignment using 3DVA cluster mode, and 3D Classification were very similar. Classifying by Ab-Initio Reconstruct and Hetero Refine led to different class assignment for more particles, but to see if all methods are equivalent or if there is a clear winner, you can examine more closely the map features that are specific for one or more of the states you are hoping to isolate.

In this case, we know that the deactive state has a relatively open configuration between the hydrophobic and hydrophilic arms. We can align the three maps from NU Refinement that best match the deactive state and compare their global conformations. In Movie 3 we show a volume morph between our NU Refined volumes after classification by 3D Classification, 3DVA cluster mode, and Ab-Initio reconstruct followed by Hetero Refine. We find that the map that has the most “closed” conformation is after 3D Classification, the map after 3DVA cluster mode is intermediate, and the volume refined after Ab-Initio Reconstruct followed by Hetero Refine is subtly more “open”, so for this specific target and conformational change, we might decide therefore that Ab-Initio Reconstruct followed by Hetero Refine  is a good way to separate these particles.

<figure><img src="/files/HQiTem5rkTRDBMD2DeIl" alt="" width="563"><figcaption><p><strong>Movie 3. Volume morph between the deactive NU Refined states.</strong> Particle sets separated by 3D Classification (purple), 3DVA cluster mode (orange) and Ab-Initio Reconstruct and Hetero Refine (green).</p></figcaption></figure>

We also went on to compare the map statistics and quality between the various refinements run on the complex I active state in Figure 14, along with the unsharpened map generated from the deposited half maps in emdb:13611. We made a new autotightened FSC mask using CryoSPARC to accompany the deposited map in order to allow a more direct comparison of the map statistics of the deposited and experimental maps produced here. We also chose to compare unsharpened maps, to avoid introducing another variable per map, in the form of different sharpening B-factors. For this reason, the map features shown in Figure 14 might not quite match those expected from a sharpened map at the reported resolution.

![Figure 14. Map statistics and unsharpened map density for complex I active state maps emdb:13611, and NU Refinements 1,2,3,9,12,16 and 22. Example density is shown for a representative transmembrane helix in subunit ND3. \*Value was calculated using a new mask generated in CryoSPARC.](/files/InHmAEqQOeaNn6JaGSyy)

The best map quality was observed in NU Refinement jobs (9), (12), (16) and (22), and the FSC resolution and cFAR values were slightly improved by removing the deactive particles in classification by strategies 2,3, or 4.&#x20;

### 15. How global refinement settings can influence map appearance

The map quality of NU Refinement (1) in Figure 14 is not dramatically different to that in the later refinements of complex I, even though \~75% of the particles present weren’t even the same target molecule - how can that be?

{% hint style="info" %}
During global refinement jobs in CryoSPARC, there are a few ways that particles, or map regions can be handled differently, depending on the settings.

* Particle scale minimization - after alignment to the reference volume, particles with relatively poor matching contrast to the volume projection in that pose are down weighted, and vice-versa.
* Pose marginalisation - when a particle has poor contrast, or does not match the reference volume well, it’s “best” pose can be uncertain. To help with this, instead of being assigned a single pose, the contributions of particles to the reconstructed map can be spread (marginalized) across a range of the best poses for each particle.
* Masking - Applying a mask can prevent density from neighbouring molecules, or low resolution regions, from interfering with particle alignment.
* Non-Uniform regularization - Regions of the map that are relatively disordered, such as a detergent micelle, can lead to overfitting of the map, but Non-Uniform regularization allows these regions to be blurred out and focus the refinement pose assignment on the well-ordered regions of the map.&#x20;
  {% endhint %}

We ran a set of additional jobs on the initial extracted particles (and the active state particles from Hetero Refine (2)) to investigate the effects of the 4 settings listed above. Note that Per-particle scale minimization is enabled by default, and the Non-Uniform Refinement job is the same as the Heterogeneous Refinement, except that it has both Adaptive Marginalization (pose marginalization), and Non-Uniform Regularization turned on as a default.

Optional: Run NU Refinement jobs, altering the following settings, to see how the map appearance changes:

In Figure 15 we show the map appearance and conical FSCs for A) the initial particle stack that contains all of the complexes (93,230 particles), and B) the complex I active state particles from Hetero Refine (2) (19,090 particles).

![Figure 15. Example density and cFSCs for refinements of A) particles from the initial extraction, and B) the active complex I class from Hetero Refine 2. Base refinement refers to homogeneous refinement without per-particle scale minimisation or dynamic masking, and the other refinements have NU regularization, particle scale minimization, Adaptive Marginalization and/or Dynamic masking applied, as indicated.](/files/OsJTwchdkshNKUObyp1s)

First, we look at the refinements that were run in the Case Study - NU Refinements (1) and (21) that were run with Non-Uniform Regularization, Adaptive Marginalization, and Per-particle scale minimization, they both have similar side chain definition and cFAR values, but when we strip those settings away and run a base refinement (i.e. homogeneous refinement without particle scale minimisation or dynamic masking) the maps between the particle sets look quite different! The base refinement before classification is poor, with a low cFAR (0.36) and very little side chain definition. On the other hand, the base refinement from the classified active complex I has a better cFAR at 0.60, and better side chain definition in the map.

Separately enabling Non-Uniform Regularization, Per-particle scaling, Dynamic masking, or Adaptive Marginalization improves the cFAR and map quality for both particle sets.

For the active complex I particles (Figure 15 B), and both Non-Uniform Regularisation and Dynamic masking have similar results, and they are also similar to NU Refinement (21), leading us to consider that the major issue with the base refinement may be that the micelle density or density from adjacent particles are interfering with good alignment of particles to the target protein region. Note that combining Dynamic masking with Non-Uniform Regularization usually does not have any additive benefit as both strategies mitigate a similar problem with different approaches, and so we did not use Dynamic Masking in NU Refinements (1) or (21).

For the initial extracted particles (Figure 15 A) we see that Non-Uniform Regularization provides the most noticeable improvement in map quality and cFAR compared to the base refinement (0.56 vs 0.36). Unlike the active complex I particles, the map quality from Non-Uniform Regularization alone is not as good as NU Refinement (1) which also includes Adaptive Marginalization and Per-particle scale minimization. This result indicates that in addition to low resolution regions interfering with particle alignment, there are particles in the stack that are challenging to align to the refinement volume, and that the refinement is better when the contribution of those particles to reconstruction is reduced.

{% hint style="info" %}
Non-Uniform Regularization, Adaptive Marginalization and Per-particle scale minimization can powerfully improve map quality and map metrics, and are usually recommended for membrane proteins especially where the stack contains particles with relatively poor contrast (such as from thick ice or low defocus). As we have seen in this case study, these settings can also mitigate the effects of non-matching or junk particles. This might provide a useable map, but makes heterogeneity less obvious, and downstream jobs such as 3D Classification might produce unexpected, or unreliable results due to the presence of a large number of poorly aligned and non-matching particles. For this reason it is important to investigate the particle scale plot from global refinement, and even if you are working with a membrane protein, it can sometimes be a good idea to run a test homogeneous refinement for comparison.
{% endhint %}

### 16. Comparing classification strategies for complex IV tetramer

* Identify which of the sub-classified NU Refinements match the complex IV conformations A, B and poor quality map shown in Figure 13 and examine and compare the map quality for each.

We found the map quality, FSC resolution and cFAR values to be similar for the conformation A and B maps from all three classification strategies, and we were not able to choose one strategy that looked better than the others. We looked at the proportion of the complex IV tetramer particles that got assigned to each of the three classes from the three different strategies.

| Classification strategy                 | Particles in conf A | Particles in conf B | Particles in poor quality class |
| --------------------------------------- | ------------------- | ------------------- | ------------------------------- |
| 3DVA cluster                            | 1,283 (27%)         | 1,131 (24%)         | 2,306 (49%)                     |
| 3D Classification                       | 1,946 (41%)         | 1,702 (36%)         | 1,072 (23%)                     |
| Ab-Initio Reconstruct and Hetero Refine | 1,916 (41%)         | 2,377 (50%)         | 423 (9%)                        |

We noticed that the proportion of particles assigned to conformation A, B and the poor resolution class varied quite substantially, with Ab-Initio Reconstruct followed by Hetero Refine assigning a much smaller poor quality class. The maps from any of these three strategies could be equally useful, however some caution may be prudent in the interpretation of the populations of class assignment in this case.

### Interpretations and Conclusions

EMPIAR 10927 contains a high degree of discrete heterogeneity and at least 8 distinct targets/conformations. Among these were complex IV in dimer and tetramer conformations. We noted that the density for monomers of complex IV were arranged antiparallel, and this is unexpected from a function perspective, as it would enable the active sites to reside on both sides of the mitochondrial inner membrane. It seems likely that these dimer and tetramer conformations may arise during the necessary detergent solubilisation and sample concentration steps during purification. Co-purification of native, or even tag-assisted purifications with other proteins is common, and understanding the origin and purification process can accelerate interpretation of the cryo-EM data.

**A rapid strategy of refining against an existing map, and selecting the high-scale peak worked for a hypothesis-driven classification of particles**

{% hint style="info" %}
In NU Refinement (1), despite no manual particle curation steps, we already achieved a similar resolution and map quality, and better cFAR than the deposited map, with NU Refinement (2) showing slightly better reported resolution and map features. These maps would be adequate for identifying the inhibitor binding location and took less than 3 hours of processing time to achieve for this small dataset. It is possible that the improved cFAR reflects the reduced influence of human choice during our processing, which we find can be a source of unintentional bias, especially at the 2D classification stage. An automated pipeline using the strategy of refining all picks with an existing high quality map, followed by splitting the particles on the basis of per-particle scale could easily be built for other targets with pre-set thresholds for exposure curation. Importantly, this simple approach did not uncover the rich heterogeneity present in the dataset.
{% endhint %}

**A longer strategy of many class Ab-Initio and Hetero Refine uncovered unexpected target heterogeneity**

{% hint style="info" %}
In Section 7 we found that the particle stack used in NU Refinement (3) (after Ab-Initio Reconstruct and Hetero Refine) was slightly cleaner, and the side chain features in NU Refinement (9) were slightly improved despite apparently similar reported resolution and cFAR. Not only was the complex I map better with this strategy, but it also uncovered 5 other discrete species that were present in the sample.
{% endhint %}

**Evaluation of different classification sorting strategies**

{% hint style="info" %}
The particle sorting route that you choose for your dataset will be highly dependent on your aims, and the nature of the sample. Here, we provided ideas about ways to compare different strategies, and ultimately, a good aim might be for the best per-class map quality, and an idea of how reproducible the class assignment is.
{% endhint %}

### **References**

[Grba, D. et al. Cryo-electron microscopy reveals how acetogenins inhibit mitochondrial respiratory complex I. *J. Biol. Chem.* 298, 101602 (2022).](https://www.sciencedirect.com/science/article/pii/S0021925822000424)

[Bepler, T. et al. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. *Nat. Methods* 16, 1153–1160 (2019).](https://pubmed.ncbi.nlm.nih.gov/31591578/)

[Kim, K. et al. High-resolution ab initio reconstruction enables cryo-EM structure determination of small particles. Preprint at *bioRxiv*  (2025).](https://www.biorxiv.org/content/10.1101/2025.09.08.674935v1)