> For the complete documentation index, see [llms.txt](https://guide.cryosparc.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://guide.cryosparc.com/processing-data/tutorials-and-case-studies/case-study-picking-induced-orientation-bias-in-ha-trimer-empiar-10096-and-10097.md).

# Case Study: Picking-induced Orientation Bias in HA Trimer (EMPIAR-10096 and -10097)

## Overview

EMPIAR-10096 (Hemagglutinin trimer) is a canonical example of a dataset exhibiting severe orientation bias. Many strategies for picking particles on this dataset produce particle stacks biased toward top-down views, producing anisotropic maps.

In this case study, we use Orientation Diagnostics to measure the level of bias, and then demonstrate a variety of techniques, including

* Non-standard blob picking
* Skipping 2D Classification
* Rebalance Orientations

to improve the orientation distribution of the set of particles used in 3D reconstruction, thereby enabling recovery of a high quality, isotropic map from the EMPIAR-10096 dataset.

## Background

![Figure 1. Hemagglutinin is a protein expressed on the surface of influenza virus. It is a homotrimer comprising three copies of a single chain (colored blue, green, and yellow in the cartoon) and cleaved into HA1 (dark) and HA2 (light). Overall, the HA trimer measures 75 Å wide by 130 Å tall and is divided into the head and stem regions. Atomic model from PDB 7VDF (Fan et al).](/files/mQsXD0I7kw4TbafhYUeS)

Hemagglutinin (HA, Figure 1) is one of two envelope glycoproteins anchored in the influenza viral membrane. HA in the viral membrane is composed of three copies of the HA monomer. Each copy is expressed as a single peptide, HA0. HA0 is then cleaved into HA1 and HA2, which are held together with a disulfide bond. Functional HA is required for viral pathogenicity, since HA is required for cell entry. Thus, understanding the structure of HA and the conformational changes it undergoes during cell entry is critical to understanding the flu virus.

In this case study, we analyze movies originally collected by Tan and colleagues (EMPIAR-10096 and -10097). The first of these two datasets was collected with an ordinary, untilted stage. The authors found that this dataset exhibits severe orientation bias, resulting in an unusable map. The authors therefore collected the second dataset (10097) with a tilted stage. The stage tilt rotates particles that would have been imaged as top views so that they become oblique views, which expands the imaged orientation distribution. This in turn allows for recovery of a high quality map.

In this case study, we first briefly process the tilted dataset to establish a baseline 3D map for HA. Then, the majority of this case study is spent processing the untilted dataset in an attempt to recover a usable map from the untilted data.

## Before you begin

### Background Knowledge

This case study assumes familiarity with CryoSPARC’s UI and job workflow. If you haven’t processed data in CryoSPARC before, you may be more comfortable starting with the [T20S Proteasome tutorial](https://guide.cryosparc.com/processing-data/get-started-with-cryosparc-introductory-tutorial) or the [TRPV1 case study](https://guide.cryosparc.com/processing-data/tutorials-and-case-studies/case-study-dktx-bound-trpv1-empiar-10059). This case study also assumes familiarity with foundational cryo-EM concepts like particle pose and alignment. We provide more detail on these foundational concepts in the first video of the [2024 Image Processing Workshop](https://guide.cryosparc.com/processing-data/tutorial-videos) recordings.

Finally, you should also be generally familiar with the concept of splitting particles into half-sets to reconstruction half-maps, and the Fourier Shell Correlation (FSC) as a validation metric based on these half-maps. You should also be comfortable conceptually with the 3D Fourier transform, and the idea that a voxel in Fourier space represents a signal with a specific frequency (”resolution”) and direction. If FSC is unfamiliar to you, [it is briefly discussed](https://youtu.be/859HWl-S1NM?feature=shared\&t=4892) in a recording of the Cryo-EM Fundamentals workshop, and more information is available in various resources online.

### Viewing and Preparing 3D Volumes

This study assumes you have the ability to view 3D volumes. CryoSPARC has a built in [Volume Viewer,](https://guide.cryosparc.com/application-guide-v4.0+/inspecting-data#volumes) but we recommend downloading and installing [UCSF ChimeraX](https://www.cgl.ucsf.edu/chimerax/), as we refer to this program throughout the case study. ChimeraX is a powerful 3D visualization tool which can display and modify atomic models and cryo-EM maps (from CryoSPARC and elsewhere), prepare publication-quality images, and many other features. In this tutorial, we use Chimera**X** and not Chimera (without the X), which is an older version that is no longer under active development and is [no longer recommended by the developers](https://www.cgl.ucsf.edu/chimera/).

This study also assumes passing familiarity with viewing 3D Volumes in your rendering software of choice. Throughout, terms like “contour up” and “contour down” are used to refer to viewing the volume with a higher or lower isosurface, respectively. The process of making masks is also not covered in detail here — a walkthrough for mask creation using ChimeraX is available [elsewhere in the guide](https://guide.cryosparc.com/processing-data/tutorials-and-case-studies/mask-selection-and-generation-in-ucsf-chimera).

### Downloading the Data

First, download the datasets (EMPIAR [10096](https://www.ebi.ac.uk/empiar/EMPIAR-10096/) and [10097](https://www.ebi.ac.uk/empiar/EMPIAR-10097/), Tan et al. 2017). For example:

```markdown
cd /path/to/rawdata/EMPIAR
mkdir 10096
mkdir 10097
cd 10096
wget -m -nH --cut-dirs 5 \
    ftp://ftp.ebi.ac.uk/empiar/world_availability/10096/data/Raw-Frames
cd ../10097
wget -m -nH --cut-dirs 5 \
    ftp://ftp.ebi.ac.uk/empiar/world_availability/10097/data/Raw-Frames
```

These commands download only the Raw-Frames directories for each dataset, which contain the unaligned, raw movies as well as the gain references. If you do not want to process the tilted dataset, you only need to download EMPIAR-10096.

### Preprocessing

Both the tilted and untilted datasets downloaded from EMPIAR are preprocessed in the same way, following the general standard workflow laid out in the [TRPV1 case study](https://guide.cryosparc.com/processing-data/tutorials-and-case-studies/case-study-dktx-bound-trpv1-empiar-10059). In short, both datasets are (separately) run through the following jobs:

1. Patch Motion Correction (default parameters, except `Save results in 16-bit floating point` turned on).
2. Patch CTF Estimation (default parameters)

These steps result in a set of preprocessed micrographs for each dataset, which are then used in the subsequent steps.

## Baseline Map: Tilted Dataset

{% hint style="info" %}
This part of the case study is optional, and is presented here to establish a baseline 3D map of HA with minimal orientation bias. As such, this section does not go into detail — all of the steps taken here are discussed in greater detail below, when processing the untilted dataset throughout the rest of the guide.
{% endhint %}

As part of their original research, Tan and colleagues collected datasets with an untilted stage (like a typical cryo-EM data collection) and with the stage tilted at 40 degrees. We will begin by following the remainder of the standard workflow to produce a map of HA from the tilted dataset (EMPIAR-10097). Starting from the CTF corrected micrographs, we perform:

1. [Blob Picker](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/particle-picking/job-blob-picker) (min/max particle diameter 70 → 100 Å)
2. [Inspect Picks](https://guide.cryosparc.com/application-guide-v4.0+/interactive-jobs#interactive-job-inspect-particle-picks)
3. [Extract From Micrographs](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/extraction/job-extract-from-micrographs) (256px box Fourier cropped to 128px, save as 16-bit floating point)
4. [2D Classification](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/particle-curation/job-2d-classification) (circular mask diameter inner/outer 150 → 180 Å)
5. [Select 2D Classes](https://guide.cryosparc.com/application-guide-v4.0+/interactive-jobs#interactive-job-select-2d-classes) (keeping all classes which have any resemblance to HA)
6. [Ab-Initio Reconstruction](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/3d-reconstruction/job-ab-initio-reconstruction) (2 classes)
7. [Heterogeneous Refinement](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/3d-refinement/job-heterogeneous-refinement) (using all particles and both volumes from Ab-Initio Reconstruction)
8. [Homogeneous Refinement](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/3d-refinement/job-homogeneous-refinement) (of the better class, with C3 symmetry)
9. Extract From Micrographs (256px box with no Fourier cropping, save as 16-bit floating point)
10. Homogeneous Refinement (C3 symmetry)

![Figure 2.](/files/ZVIeACHjYZz8QiMz4rz7)

The resulting map has a GSFSC resolution of 2.88 Å (Figure 2). Backbone density is clearly traceable throughout the entire protein, with sidechains visible in many locations. Note that the large HA2 helix which travels from the stem to the top of the head (in the center of the protein) is well-resolved, and that the strands in the β-sheets at the bottom of the stem domain are clearly separated from each other. Additionally, the helices on both the top and bottom of HA are well resolved. This map provides an overview of HA when the map is well sampled in terms of viewing directions.

{% hint style="info" %}
Maps (like this one) with uniform quality in all directions are called *isotropic*.
{% endhint %}

For the remainder of this study, we will process the untilted dataset (EMPIAR-10096). Initially, we will follow steps similar to the above, which produce a poor quality anisotropic map. We will then change our picking and processing strategies to recover an isotropic map from the same untilted micrographs.

## Particle Picking

From this point on, we will be processing the untilted dataset (EMPIAR-10096).

### Blob Picking

We begin by blob picking the untilted micrographs using [Blob Picker](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/particle-picking/job-blob-picker).

{% hint style="info" %}
Throughout this case study, job inputs and parameters will be presented in tables, like below. Inputs are prefixed with `Input:`. Any other row is a parameter. Any parameters not listed are set to their default value. The “Notes and Rationale” column describes *why* a parameter value or input is used, and presents alternate values which may also work.
{% endhint %}

| Input or Parameter Name       | Input Source or Parameter Value      | Notes and Rationale                                                                                                                                                                                                                           |
| ----------------------------- | ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Input: Micrographs            | Micrographs from Patch CTF Esimation |                                                                                                                                                                                                                                               |
| Minimum particle diameter (A) | 70                                   |                                                                                                                                                                                                                                               |
| Maximum particle diameter (A) | 100                                  | Although HA is 130 A high at its longest extent, only relatively rare side views will appear this long in the image. We use a slightly shorter maximum diameter here so that the three templates generated by Blob Picker are closer in size. |

This job produces 980k particle picks. We can filter these particle picks to remove empty ice and junk picks using [Inspect Particle Picks](https://guide.cryosparc.com/application-guide-v4.0+/interactive-jobs#interactive-job-inspect-particle-picks).

| Input or Parameter Name | Input Source or Parameter Value | Notes and Rationale |
| ----------------------- | ------------------------------- | ------------------- |
| Input: Particles        | Particles from Blob Picker      |                     |
| Input: Micrographs      | Micrographs from Blob Picker    |                     |

One typical workflow for evaluating particle picks is to

1. Adjust the NCC score threshold until most picks appear to be on particles
2. Adjust the high power score threshold until picks on high-contrast junk (e.g., ice crystals or carbon) disappear
3. Adjust the low power score threshold until picks on empty ice disappear

However, while working with this dataset, you may notice that some particles (which generally look like top views) have very striking contrast with their surroundings. Other views (typically longer, perhaps side views) are very hard to see. Indeed, in many micrographs it is difficult to tell whether there are any side views in the micrographs at all. This is apparent when looking at a random subset of particles’ individual scores (Figure 3).

![Figure 3. A random subset of this micrograph’s particles are annotated with an extraction box and the particle’s NCC and power scores. Additionally, the particles are annotated with our best guess as to the HA orientation and location in the box, if any. The micrograph has been lowpass filtered to 10 Å.](/files/WiSuGQJAKbA56LAOiiZ1)

Setting particle thresholds is a delicate and subjective process. Ultimately, we are attempting to view individual particle images and assess whether they represent the target particle or not. The HA top views are obviously visible in this micrograph. They also have high power and NCC scores, making thresholds easy to set for these views.

The same cannot be said for side views. In the example micrograph above, we annotate suspected side views, but it is difficult to tell whether these are random fluctuations in background noise or true particles. The problem is made worse by the wide variance in the scores for different suspected side views. For example, the particle pick labeled A has a high power score and a low NCC, while the pick labeled B has a relatively high NCC score and a lower power score. This level of variance exists in a single micrograph, in addition to the variance in these scores across micrographs with differing defocus and ice thickness.

For this particle picking job, we selected an NCC score threshold of 500, a low power threshold of 786, and a high power threshold of 1933. These are by no means the only correct set of thresholds — depending on the Blob Picking job’s parameter values the scores may differ, and you may prefer to remove fewer particles at this stage to filter them out later, or vice-versa. Regardless, we will move forward with the 360k particles remaining after setting the thresholds to these values.

### Micrograph Junk Detector

{% hint style="info" %}
The [Micrograph Junk Detector](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/exposure-curation/job-micrograph-junk-detector-beta) was added to CryoSPARC in v4.7. If you are using an older version, you can safely skip this step.
{% endhint %}

Before extracting the particles, we run them through the [Micrograph Junk Detector](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/exposure-curation/job-micrograph-junk-detector-beta). This job automatically detects contaminants and rejects particles picked on or near those contaminants.

| Input or Parameter Name | Input Source or Parameter Value         | Notes and Rationale |
| ----------------------- | --------------------------------------- | ------------------- |
| Input: Particles        | Particles from Inspect Particle Picks   |                     |
| Input: Micrographs      | Micrographs from Inspect Particle Picks |                     |

In this case the micrographs are quite clean. The Junk Detector took three minutes to run and rejected 7k of 360k particles.

### Particle Extraction

Finally, we extract the selected particles using [Extract From Micrographs](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/extraction/job-extract-from-micrographs).

| Input or Parameter Name               | Input Source or Parameter Value | Notes and Rationale                                                                                                                                                                                                                                                                                                                                                                                                      |
| ------------------------------------- | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Input: Particles                      | Particles from Junk Detector    | If you skipped Junk Detector, use particles and micrographs from Inspect Particle Picks here instead.                                                                                                                                                                                                                                                                                                                    |
| Input: Micrographs                    | Micrographs from Junk Detector  |                                                                                                                                                                                                                                                                                                                                                                                                                          |
| Extraction box size (pix)             | 256                             | HA’s largest dimension is 130 Å, and the micrograph pixel size is 1.31 Å/px. Applying the common rule-of-thumb of extracting with a box twice the size of the particle, we arrive at 130 \* 2 / 1.31 = 198px. However, these movies were collected with a relatively high defocus (up to 4 µm in some cases), so we extract with a slightly larger box. Extracting with a box size of 204 would produce similar results. |
| Fourier-crop to box size (pix)        | 128                             |                                                                                                                                                                                                                                                                                                                                                                                                                          |
| Save results in 16-bit floating point | True                            | Saving particle images in 16-bit floating point reduces the file size by half [without reducing reconstruction quality](https://guide.cryosparc.com/processing-data/tutorials-and-case-studies/tutorial-float16-support). It is therefore recommended to turn this on.                                                                                                                                                   |

<details>

<summary>Why do we Fourier-crop the particles at this stage?</summary>

In the initial stages of particle curation, using a full size box slows downstream jobs for little benefit, since the curation jobs do not need high resolution information. In this case, we downsample to a pixel size of 2.6 Å, meaning the highest resolution map these images can produces is 5 Å. This is more than sufficient for removing junk particles. The particles will be re-extracted at a later point to refine them to higher resolutions.

Note: If you set `Second (small) F-crop box size (pix)`, Extract From Micrographs will extract and save two images for each particle at the same time — one at full size, and one downsampled to a desired smaller size. We elect not to do so at this early stage, because some particle picks appear significantly off-center. Later, when we align the particles during particle curation, the particle positions will be corrected. Then, a separate second extraction job after particle curation will likely improve the particle images.

</details>

## Initial Particle Curation

### 2D Classification

We begin particle curation with [2D Classification](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/particle-curation/job-2d-classification).

| Input or Parameter Name | Input Source or Parameter Value  | Notes and Rationale |
| ----------------------- | -------------------------------- | ------------------- |
| Input: Particles        | Particles from Extract Particles |                     |

![Figure 4.](/files/v2hdWyI1awmGyI21QIMX)

Most of the 2D class averages appear to be one or more top-down views (Figure 4). Because HA is so much narrower than it is tall, two or more top views fit in a box that contains only one side view. Since top views can appear in a wide variety of numbers and relative positions, the 2D classification algorithm ends up using majority of the classes to describe clusters of top views, leaving few classes available for side or oblique views.

![Figure 5.](/files/d6AMF8ZPejI99m2MWZft)

We can alleviate this issue by using [a mask on the 2D class averages](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/particle-curation/job-2d-classification#circular-mask-diameter-a) (Figure 5). In theory, masking the 2D classes improves classification in two ways:

1. Masked class averages are less likely to contain multiple top views
2. Masked class averages that contain a single top view are better centered in the box.

These improvements allow more top-view particle images to be classified into just a few top view 2D classes, leaving a greater number of classes to capture the rarer side and oblique views.

To test this idea, we re-run 2D Classification with a soft, circular mask on the class averages.

| Input or Parameter Name          | Input Source or Parameter Value  | Notes and Rationale                                                                                                                       |
| -------------------------------- | -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| Input: Particles                 | Particles from Extract Particles |                                                                                                                                           |
| Circular mask diameter (A)       | 150                              | This is still slightly larger than a side view                                                                                            |
| Circular mask diameter outer (A) | 180                              | The whole box is approximately 335 Å wide, so a 180 Å diameter mask cuts out a significant amount of the space surrounding each particle. |

![Figure 6.](/files/lyZGHjCLpxnmxo4Prmre)

As expected, a far greater proportion of the masked 2D classes have a centered top or side view (Figure 6). Note that these classes are not simply the original classes with a mask applied — because the masked class averages are different at each iteration, the results are fundamentally different from the job without masking. There are still a number of classes with two adjacent top views, but now they occupy less than half of the total class number. These images may have two particles too close together to separate and properly align, so we will remove them with a Select 2D Classes job (Figure 7).

| Input or Parameter Name  | Input Source or Parameter Value                  | Notes and Rationale |
| ------------------------ | ------------------------------------------------ | ------------------- |
| Input: Particles         | Particles from the masked 2D Classification      |                     |
| Input: 2D Class Averages | Class averages from the masked 2D Classification |                     |

![Figure 7.](/files/SdgcRgjqv6sGTZZYTikl)

### 3D Curation

Next, we move on to 3D particle curation, starting with [Ab-Initio Reconstruction](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/3d-reconstruction/job-ab-initio-reconstruction).

| Input or Parameter Name     | Input Source or Parameter Value | Notes and Rationale                                                                                                                                      |
| --------------------------- | ------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Input: Particles            | Particles from Select 2D        |                                                                                                                                                          |
| Number of Ab-Initio classes | 2                               | Although the 2D class averages look relatively clean, it is still useful to request two classes to provide to Heterogeneous Refinement in the next step. |

![Figure 8.](/files/jtDedW3mFR9XF3D0GilC)

Of the two volumes produced by Ab-Initio Reconstruction, class 1 is approximately the expected size and shape of HA, while class 0 is significantly shorter and noisier (Figure 8). Class 1 has twice as many particles as class 0, so we could proceed with particles from class 1. However, it is best to use [Heterogeneous Refinement](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/3d-refinement/job-heterogeneous-refinement) for particle curation once we have initial maps, since the classification results from Heterogeneous Refinement (which sees higher resolution information) are generally more reliable than those of Ab-Initio Reconstruction (which focuses on low and medium resolution signal to produce initial maps). We thus set up a Heterogeneous Refinement job with all of the particles and both of the volumes from Ab-Initio Reconstruction.

| Input or Parameter Name   | Input Source or Parameter Value                   | Notes and Rationale                                                                                                                                                                                                                                          |
| ------------------------- | ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Input: Particles          | All Particles from Ab-Initio Reconstruction       | If you are using an older version of CryoSPARC, there may not be an All Particles output. In this case, you can connect Particles class 0 and Particles class 1.                                                                                             |
| Input: Initial volumes    | Class 0 and Class 1 from Ab-Initio Reconstruction |                                                                                                                                                                                                                                                              |
| Symmetry                  | C3                                                | Imposing C3 symmetry effectively improves the SNR of the particles, which may improve classification. Using C1 symmetry may produce similar results, but if a significant number of particles had one or two degraded subunits, the C1 result may be better. |
| Force hard classification | On                                                | Since 2D Classification showed that the particle images are overall quite similar, we turn on hard classification here, but keeping it off would also be reasonable.                                                                                         |

![Figure 9. Results of the first Heterogeneous Refinement of the untilted (top row) and tilted (bottom row) datasets. Class number and particle counts correspond to the untilted dataset.](/files/XWM6jGNxke7fqNg1YRxQ)

The results of this Heterogeneous Refinement follow the same trend as the Ab-Initio Reconstruction, but more particles end up in Class 1 (Figure 9). However, the results with the untilted dataset appear significantly worse than the baseline tilted dataset we processed in the first section of this case study.

Compare the density in the indicated region of the HA head. In the untilted dataset job we just ran, the map is largely composed of vertical tube-like density. The map from the same stage of the tilted dataset processing, while still low-resolution, has horizontal density in this region. At this stage it is too soon to say whether this difference is pathological (i.e., whether it will prevent us from resolving a high-quality map), but it is an important effect to notice and keep track of.

{% hint style="info" %}
In a typical dataset, you would not have access to both a tilted and untilted map to compare. We make the comparisons above to highlight the specific type of directional streaking that is often problematic when orientation bias is present. For instance, both the tilted and untilted datasets have a long vertical tube-like density corresponding to the long α-helix in HA2. This density is approximately the right size and shape to be a real tube-like density (an α-helix) and so would not be cause for concern.
{% endhint %}

## High-Resolution Refinement

### Refinement of Small Particle Images

After 2D and 3D curation of the particle stack, we move on to 3D Refinement. In this step we could use either [Homogeneous Refinement](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/3d-refinement/job-homogeneous-refinement) or [Non-Uniform Refinement](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/3d-refinement/job-non-uniform-refinement-new). Non-Uniform Refinement is known to perform better than Homogeneous Refinement when the particle has either flexible domains or transmembrane regions. Since this particle has neither, we will begin with Homogeneous Refinement.

| Input or Parameter Name | Input Source or Parameter Value                 | Notes and Rationale                                                                                                                                 |
| ----------------------- | ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| Input: Particles        | Particles class 1 from Heterogeneous Refinement | The better (i.e., more HA-like) of the two classes should be selected here — if you are following along this processing, it may be class 0 for you. |
| Input: Initial volume   | Volume class 1 from Heterogeneous Refinement    | The volume from the same class as the particles should be selected here.                                                                            |
| Symmetry                | C3                                              | Similar to Heterogeneous Refinement, C3 symmetry is used to improve the particles’ SNR. C1 symmetry would likely produce similar results.           |

![Figure 10. Views from the side, top, and bottom are shown. The GSFSC plot from the final iteration is reproduced in the bottom-right.](/files/VXezfOXdnjme3zOB6k2Y)

The vertical streaking observed in the Heterogeneous Refinement map is still present in the Homogeneous Refinement result (Figure 10). Additionally, the map is relatively noisy and falls apart into disconnected blobs when viewed at higher thresholds. However, this refinement reached Nyquist resolution for these downsampled particle images, so it is possible that the high-frequency information in the full-size particles may improve the map.

### Re-extraction and Re-refinement

We Fourier-cropped the particles during the initial extraction step, which limits the finest resolution the maps can achieve. Re-extracting the particles at full size after refinement recenters them and raises the limit on the map’s resolution to the micrograph’s Nyquist resolution (i.e., twice the micrograph pixel size).

We first re-extract the aligned particles with Extract From Micrographs.

| Input or Parameter Name               | Input Source or Parameter Value       | Notes and Rationale                                                         |
| ------------------------------------- | ------------------------------------- | --------------------------------------------------------------------------- |
| Input: Particles                      | Particles from Homogeneous Refinement |                                                                             |
| Input: Micrographs                    | Micrographs from Junk Detector        | If you skipped Junk Detector, use the micrographs from Patch CTF Estimation |
| Extraction box size (pix)             | 256                                   |                                                                             |
| Save results in 16-bit floating point | True                                  |                                                                             |

Then, we re-refine the particles. Since these particles have not been Fourier cropped, the refinement has a much finer Nyquist resolution limit.

| Input or Parameter Name | Input Source or Parameter Value              | Notes and Rationale                                                                                                                       |
| ----------------------- | -------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| Input: Particles        | Particles from full-size Extract Micrographs | The better (i.e., more HA-like) of the two classes should be selected here — it may be class 0 instead.                                   |
| Input: Initial volume   | Volume class 1 from Heterogeneous Refinement | The volume from the same class as the particles should be selected here.                                                                  |
| Symmetry                | C3                                           | Similar to Heterogeneous Refinement, C3 symmetry is used to improve the particles’ SNR. C1 symmetry would likely produce similar results. |

![Figure 11. An overview of the full size Homogeneous Refinement map. Compare with Figure 2.](/files/3ExjH9AoPeutAYiQtwwe)

Re-extracting the particles did not improve the streaking issues observed in the earlier refinements (Figure 11). This map is entirely composed of vertical tube-like density with significant noise. There is no recognizable protein secondary structure. This map has severe streaking along one direction — it is *anisotropic*.

This map has a GSFSC resolution of 2.97 Å. For an isotropic map, this resolution would be sufficient to observe secondary structure like the pitch of α-helices or the separation between β-strands. However, none of the visible features in our map correspond to secondary structure. The α-helices are severely distorted and the horizontal loop above the stem β-sheet is not visible at all (Figure 12).

![Figure 12. Two views of the untilted (blue) and tilted (yellow) maps are compared. The GSFSC resolutions of these maps are comparable (2.97 and 2.88, respectively). Left: the α-helix in HA2. Right: the β-sheet in the stem.](/files/ne2uzzMXpuOjIMwiN6X3)

{% hint style="info" %}
It is critical to not rely on the GSFSC resolution alone! If your map is truly 3 Å, you should see features that you can easily recognize as α-helices and β-strands, and even see some bulkier side chains. If you instead see density which does not have these features, the map is likely suffering from an issue such as anisotropy.
{% endhint %}

## Missing Views and Anisotropy

In cryo-EM, anisotropic maps are produced when certain viewing directions are much more common than others (or, equivalently, when certain viewing directions are missing). This problem is often called *orientation bias* or *preferred orientation*. The relationship between particle pose distribution and map isotropy can be understood through two different lenses: real space and Fourier space. These are equivalent ways of thinking about the problem — we present them both to offer a more complete picture.

### Real space

In cryo-EM, the microscope produces a 2D projection image of a 3D object (our target structure). In real space, projection means adding up all the density values in the object along the direction of projection. This produces a projection with one fewer dimension than the object, i.e., a 3D volume produces a 2D projection (an image); a 2D image produces a 1D projection (a line of pixels, as in Figure 13). To explain the fundamentals of anisotropy, we will illustrate the 2D-to-1D case, but the concept applies in the same way to the 3D-to-2D case.

![Figure 13. An image and its projection.](/files/d1YUXgXPkGZ2w2vcsJX1)

The 1D projection in Figure 13 constrains what the original 2D object must look like. We know that if we add up all of the density of the original object along the projection direction, we should get the projection as a result. With only a single projection, our best guess of the 2D object is a simple smear — we know nothing about the object except how its overall intensity varies along a single direction (Figure 14).

![Figure 14. An image, its projection, and the backprojection.](/files/aEjQ5mlxB5Ke2fc1XwU3)

This process of using an object’s projections to reconstruct the original object is called *backprojection*. The more projections we have observed, the more accurate the backprojection becomes (Figure 15).

![Figure 15. Backprojections from an increasing number of evenly-spaced projections.](/files/xRE0oSg1YwYiKDvu5Wtf)

This is because each projection constrains the object along another viewing direction — with two orthogonal views, we know the object is more-or-less centered in the box. With 8 views the overall shape starts to become clear. Once we have 32 views, the object is clear.

The example in Figure 15 uses projections with projection directions that are evenly spaced around the circle. Eventually, with enough projections, the image is constrained in all viewing directions. The only 2D image which satisfies the constraints of all of the 1D projections is the original object.

If, on the other hand, we are missing certain views, those directions are never constrained, so our estimate of the 2D object remains streaked along the missing (unconstrained) directions (Figure 16).

![Figure 16. The same number of projections are used as in Figure 15, but side views are missing.](/files/xfeQWP0NxWpOjvpSs8N6)

This is one way to understand why missing views cause streaking — certain directions of the original object remain unconstrained, so we cannot precisely know where density ought to go along those unconstrained directions.

### Fourier space

This section provides the same argument and conclusion as the previous Real Space section, but now thinking about the problem in Fourier space. A projection in real space is equivalent to taking a central slice in Fourier space (Figure 17). This fact is called the Fourier slice theorem or projection slice theorem.

![Figure 17. Projecting an object in real space is equivalent to taking a slice through the center of the object’s Fourier transform.](/files/VsBStKvMjm83oWsDkKCU)

This equivalency means that we can consider each observed projection as “filling in” information in Fourier space for the 2D object we are trying to recover, depending on the direction of the projection. For example, the Fourier transforms of the same real space images from Figure 15 are reproduced in Figure 18.

![Figure 18. The Fourier transforms of the same images as Figure 15. Fourier space plots are displayed as the log of the absolute value.](/files/UEHYxFCy4T8GyyV5foLs)

When certain projection directions are missing, the corresponding regions of the object’s Fourier transform remain empty (Figure 19). This missing information means that the original object cannot be accurately recovered, and is the source of the streaks caused by missing views.

![Figure 19. The Fourier transforms of the same images as Figure 16. Fourier space plots are displayed as the log of the absolute value.](/files/YTp0FMOwxLZVo7LYSxHX)

### 3D Anisotropy

The examples presented above are 2D images projected to 1D lines, but the same principles apply to 3D volumes. For example, as a larger and larger conical section of 3D Fourier space is missing, the map becomes more and more smeared along the missing viewing direction (Figure 20). Note how contiguous features in the real space map (e.g., the β-sheet in the stem domain) first fragment apart then join into vertical tubes as orientation bias becomes more severe.

![Figure 20. At the beginning of the animation there is no orientation bias. At the middle of the animation, there is severe orientation bias. Left: a cube in Fourier space. In the first frame, the cube is complete, representing the fact that Fourier space is adequately sampled. As the animation proceeds, a cone of the cube disappears. This cone is the region of Fourier space which is no longer sampled as viewing directions disappear. Right: HA with the orientation bias indicated by the cube. As views disappear, the volume becomes more and more vertically streaked due to the missing side views.](/files/Gi6fzL4YCvkzBvtIP38M)

## Orientation Diagnostics

Consider again the HA map produced from the untilted data by the most recent Homogeneous Refinement that we have run (Figure 11). The GSFSC resolution of this map is 2.97, but the map itself only bears the vaguest resemblance to the HA produced from the tilted dataset. The GSFSC resolution is therefore not providing the complete story — the *useful* resolution of this anisotropic map is significantly worse, perhaps 10 Å or even lower.

This difference appears because the GSFSC is a global resolution estimate — it considers the average quality in all directions of the information at a given resolution, rather than the quality in any specific direction. In the untilted dataset, there are many top views of the particle, so in the 3D reconstruction, information in the top view plane is well resolved, even though information in other directions is poorly resolved.

[Orientation Diagnostics](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/utilities/job-orientation-diagnostics) runs a set of analyses on a volume (and optionally particles) to assess the resolution of the volume while taking distribution over orientations into account. The plots and metrics produced by Orientation Diagnostics are discussed at length in the [tutorial page](https://guide.cryosparc.com/processing-data/tutorials-and-case-studies/tutorial-orientation-diagnostics). For the purposes of this case study, the most important metric is the conical FSC Area Ratio (cFAR).

{% hint style="info" %}
A higher cFAR value indicates that signal in Fourier space is evenly resolved, while a lower cFAR score indicates that some regions of Fourier space are less reliable than others.
{% endhint %}

To explore the orientation bias in the untilted HA map, we can compare the results of Orientation Diagnostics run on the untilted and tilted datasets (Figure 21).

{% hint style="info" %}
In this case study, all Orientation Diagnostics jobs are run with the default parameters. The job tables are thus omitted for space.
{% endhint %}

![Figure 21. Selected plots from Orientation Diagnostics run on the untilted (top) or tilted (bottom) datasets. Left: cFAR plot. Right: one slice through Fourier space.](/files/1zoi49FEPaYcwwq04c1K)

There are multiple indications that this dataset suffers from orientation bias. We highlight two here:

1. The cFAR score of the untilted dataset (0.05) is very low, and the individual cFSCs are spread across a wide range of resolutions. Compare this with the tilted dataset, which has a high cFAR score (0.82) and individual cFSCs clustered tightly around 2.9 Å.
2. The untilted particles leave a clear missing cone that is not well sampled in Fourier space, whereas the tilted particles sample Fourier space relatively evenly.

The diagnostics produced by Orientation Diagnostics can help make it easy to detect anisotropy and orientation bias in any dataset you are processing. Since CryoSPARC v4.5.0, the cFAR score is computed and plotted in every refinement job.

## Re-picking the HA Dataset

Orientation Diagnostics confirms that the missing side views are significantly degrading the quality of the map, but as the name indicates, this is purely diagnostic. It is not always possible to produce a usable map when particle images are so thoroughly biased. As such, the original investigators prepared a new sample and collected with a tilt to overcome the effects of the bias on the map. New data collections can sometimes be the best (or only) way to recover from orientation bias. Tilting the stage is one way of overcoming bias. Changing grid types or including additives in the sample before freezing may also help.

In some cases, though, orientation bias may be a result of particle picking or curation. In the case of the untilted HA map, the map only uses particles we found during the initial picking and retained through all of the curation steps. It is possible that the micrographs do contain a sufficient number of the missing side views, but they were either not picked in the first place, or discarded at some step along the processing pipeline. For the remainder of this case study, we work through a series of steps in an attempt to locate “missing” side views in the untilted dataset.

{% hint style="info" %}
For each step of the side view recovery process, we will use only data from the untilted dataset. The strategies presented here can therefore be used in cases where no tilted data (or other data without orientation bias) is available. For the purpose of the case study, however, we include results from the tilted data in plots in order to provide an isotropic baseline map for comparison.
{% endhint %}

Let us first consider particle picking. We have so far used Blob Picker in circular mode, which picks using three Gaussian blobs of varying size. This works well for most targets, but consider the shape of HA (Figure 1). Top views are well-approximated by a circular blob, especially with the lowpass filter applied during picking. Side views, on the other hand, are not remotely circular. Their correlation with a Gaussian blob will therefore never be as high as a top view, and so our original Blob Picker job may have missed them.

We could try to find more side views by generating side view templates from the anisotropic HA map we’ve already produced and using those with Template Picker. However, the overall topology of the untilted map is not correct due to severe anisotropic streaking; we can see this clearly by comparing the tilted and untilted maps (Figure 22).

![Figure 22. Left: HA from the tilted dataset. Right: HA from the untilted dataset. Both volumes have been lowpass filtered for comparison.](/files/MepqScjKpjHf8hOUcRSZ)

The isotropic map (from the tilted dataset) is longer than the anisotropic map produced from the untilted dataset. Moreover, the isotropic map has a similar density between the head and stem, while the anisotropic map has a much weaker stem. Templates generated from the untilted map may therefore not reflect the *true* appearance of an HA side view, and so the picks would still suffer.

Instead, we will use the ellipse picking mode of Blob Picker. This produces an elliptical blob for picking, with dimensions we set to be approximately equal to those of HA.

| Input or Parameter Name       | Input Source or Parameter Value | Notes and Rationale                                                                                                                                             |
| ----------------------------- | ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Input: Micrographs            | Micrographs from Junk Detector  |                                                                                                                                                                 |
| Minimum particle diameter (A) | 75                              | When Blob Picker creates elliptical blobs, the minimum diameter sets the minor axis of the ellipse. We thus set it to the approximate width of an HA side view. |
| Maximum particle diameter (A) | 130                             | Similarly, the maximum diameter sets the major axis and is set to the approximate height of an HA side view.                                                    |
| Use circular blob             | Off                             | It is generally best to use only one type of blob in a single Blob Picker job.                                                                                  |
| Use elliptical blob           | On                              |                                                                                                                                                                 |

Blob Picker picked approximately 750k particles. Next, we use Inspect Picks as before. However, here we can make another hypothesis as to why side picks might have been missed in our first pass through the dataset — side views likely have dramatically less contrast than top views.

When a top view is imaged, the electron beam travels through the entire length of HA (130 Å) to produce an image. When a side view is produced, on the other hand, the beam travels through approximately half this much protein. When we were viewing individual micrographs during the Inspect Picks job, we may have set the low power threshold too high and excluded a good number of side views. To test this hypothesis, we adjust only the high-power threshold to remove junk picks and leave the NCC and low power thresholds untouched. This leaves approximately 660k particles after thresholding.

{% hint style="info" %}
We recommend working through all of the jobs yourself to learn how the optimal parameter values may change depending on decisions made earlier in the processing pipeline. However, if you wish to work with the same particle stack as we use in the rest of this case study, you can download a STAR file with the relevant coordinates [here](https://structura-assets.s3.us-east-1.amazonaws.com/empiar-10096-case-study/ha-trimer.star.tar.gz).
{% endhint %}

We extract the particles using Extract From Micrographs with the same parameters as before.

| Input or Parameter Name               | Input Source or Parameter Value | Notes and Rationale                                                                                   |
| ------------------------------------- | ------------------------------- | ----------------------------------------------------------------------------------------------------- |
| Input: Micrographs                    | Micrographs from Junk Detector  |                                                                                                       |
| Input: Particles                      | Particles from Inspect Picks    |                                                                                                       |
| Extraction box size (pix)             | 256                             | As mentioned in the previous extraction job, a smaller box would likely also produce similar results. |
| Fourier crop to box size (pix)        | 128                             |                                                                                                       |
| Save results in 16-bit floating point | On                              |                                                                                                       |

After extraction, we are left with a stack of approximately 560k particles since many of the input particles were too close to the edge of the micrograph.

## Standard Curation of Side Views

### Particle Curation

We perform 2D Classification of the extracted particles with the same circular mask as before. However, we increase the number of 2D Classes. In some cases, requesting more classes improves the separation of rare views.

| Input or Parameter Name          | Input Source or Parameter Value  | Notes and Rationale |
| -------------------------------- | -------------------------------- | ------------------- |
| Input: Particles                 | Particles from Extract Particles |                     |
| Circular mask diameter (A)       | 150                              |                     |
| Circular mask diameter outer (A) | 180                              |                     |
| Number of 2D Classes             | 100                              |                     |

When selecting 2D classes, we select primarily side views, keeping only the very best top view class. This may help reduce the effects of orientation bias by removing lower-quality top views. We select 9 classes comprising approximately 190k particles (Figure 23).

![Figure 23.](/files/FyIOuhhRF69hAH8vi3pz)

Note that many of the 2D classes still appear to be two adjacent top views. This is somewhat surprising given the shape of the blob used to pick these particles, but we will exclude these classes for now regardless.

### Refinement

Next, we curate the particles with a single round of Heterogeneous Refinement.

| Input or Parameter Name   | Input Source or Parameter Value                                                                 | Notes and Rationale                                                                                                                                                                                                                                      |
| ------------------------- | ----------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Input: Particles          | Selected particles from Select 2D Classes                                                       |                                                                                                                                                                                                                                                          |
| Input: Initial volumes    | Both volumes from the Ab-Initio Reconstruction job run in the Initial Particle Curation section | Although we could run Ab-Initio Reconstruction first to generate volumes, the volumes will likely not be appreciably different from the results of the first Ab-Initio Reconstruction run on the untilted dataset, so we will re-use those volumes here. |
| Symmetry                  | C3                                                                                              |                                                                                                                                                                                                                                                          |
| Force hard classification | On                                                                                              |                                                                                                                                                                                                                                                          |

![Figure 24.](/files/U7w6zzK6JX8j1G8mpmcb)

Class 1 from this Heterogeneous Refinement looks significantly more isotropic than the result of the original Heterogeneous Refinement (compare Figure 24 with Figure 9).

As before, we next refine the better class (Class 1) with Homogeneous Refinement. The job is essentially the same as previous Homogeneous Refinements, with the exception of Adaptive Marginalization. Although Adaptive Marginalization is generally known to improve results with smaller molecules, orientation bias may reduce certainty in particle poses. Turning on marginalization allows for particles to to be averaged across likely poses during backprojection, which may reduce overfitting.

| Input or Parameter Name  | Input Source or Parameter Value                 | Notes and Rationale |
| ------------------------ | ----------------------------------------------- | ------------------- |
| Input: Particles         | Particles class 1 from Heterogeneous Refinement |                     |
| Input: Initial volume    | Volume class 1 from Heterogeneous Refinement    |                     |
| Symmetry                 | C3                                              |                     |
| Adaptive Marginalization | On                                              |                     |

![Figure 25.](/files/aK6BjookzV3LLnBjkh8Y)

As with the Heterogeneous Refinement, this initial consensus refinement appears more isotropic than the original low-resolution consensus (compare Figure 25 and Figure 10). Again, we re-extract at the full box size using another Extract From Micrographs.

| Input or Parameter Name               | Input Source or Parameter Value       | Notes and Rationale |
| ------------------------------------- | ------------------------------------- | ------------------- |
| Input: Particles                      | Particles from Homogeneous Refinement |                     |
| Input: Micrographs                    | Micrographs from Junk Detector        |                     |
| Extraction box size (pix)             | 256                                   |                     |
| Save results in 16-bit floating point | True                                  |                     |

And then refine the full-size images with a final Homogeneous Refinement.

| Input or Parameter Name          | Input Source or Parameter Value                      | Notes and Rationale                                                                                                                                                                  |
| -------------------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Input: Particles                 | Particles class 1 from full-size Extract Micrographs |                                                                                                                                                                                      |
| Input: Initial volume            | Volume class 1 from Heterogeneous Refinement         |                                                                                                                                                                                      |
| Symmetry                         | C3                                                   |                                                                                                                                                                                      |
| Minimize over per-particle scale | On                                                   | Per-particle scale accounts for varying contrast in images, including due to ice thickness. This may help account for the reduced contrast in side-view images relative to top-view. |

![Figure 26.](/files/N4NiDguzzBocgDAEGaRb)

The Homogeneous Refinement of the re-picked particles shows a modest improvement in the overall map quality, but the result is still clearly anisotropic (Figure 26). The topology of the top of the head domain has improved, but the entire structure and especially the stalk is severely streaked along the Z axis.

![Figure 27. The final homogeneous refined map (top) and cFSC plot (bottom) is displayed for each of the circle-picked untilted (left), ellipse-picked untilted (middle) and tilted (right) particle stacks.](/files/CgQaloqPbY2VL12ei8j0)

Picking particles on the untilted dataset with an elliptical blob instead of a circular blob has improved the cFAR from 0.05 to 0.17, but it is still far from the 0.82 achieved with the tilted data (Figure 27). It is also clear that the ellipse-picked particles result in a map that is closer to the correct height than the circle-picked particles, but is still shorter than the tilted particles. Note also that the worst cFSC curve (the light-blue region of the cFSC plot) is much better in the ellipse-picked particles than the circular blob-picked particles.

Despite this improvement, the ellipse-picked particles still did not produce an isotropic map. We must therefore try to find additional places that side views may have been lost.

## Skipping 2D Classification

{% hint style="info" %}
From this point onward in the case study, the final three jobs (refinement, extraction, re-refinement) are combined into a single step. The parameters are the same in each case — C3 symmetry with Adaptive marginalization and 256 pix box without downsampling.
{% endhint %}

After the initial extraction of the ellipse-picked particles, we performed 2D Classification. We noted that there were a great number of particles in classes which appeared to be two adjacent top-down views (Figure 23). This not unheard of — especially in crowded micrographs, Blob Picker can pick the space between two particles like this. However, the number of particles in these classes is somewhat unexpected.

HA is approximately twice as tall as it is wide. Additionally, there is a slight constriction in the middle of the protein, in a ring between the head and the stalk. It is possible, therefore, that some side views have been improperly aligned to class averages which show two adjacent top views. One of the top views in the class average may align to the head, the other to the stalk, or they may align to two adjacent heads (Figure 28).

![Figure 28. Top row: 2D class averages rotated and translated to match the particle’s pose. Middle row: individual particle images, lowpass filtered to 25 Å. Bottom row: possible alternate interpretations of the particle image as one or two side views.](/files/1wgYVI6UEJaTjcnhfFUV)

{% hint style="info" %}
It is, at best, extremely difficult to interpret the contents of a single particle image. Figure 28 is not part of a typical data analysis pipeline. It is presented only as an explanation of why 2D classification may be performing especially poorly for this dataset.
{% endhint %}

Certainly, many (or most) of the particle images in the adjacent top-down class averages probably are truly two adjacent top-down views. If all of the images in these classes were side views, the averages would look like side views. This result highlights the main shortcoming of 2D Classification: each class can represent either (or both):

* a distinct view of the same particle as in other classes
* a different particle or object (or combination of objects) entirely

We can avoid this problem by skipping 2D classification and proceeding directly to Heterogeneous Refinement. Although this means the input stack will likely contain more junk particles, Heterogeneous Refinement treats the pose and class of a particle independently, so we do not run the risk of discarding side views in classes like those shown in Figure 28. We therefore build another Heterogeneous Refinement job, starting from all of the extracted ellipse-picked particles.

| Input or Parameter Name   | Input Source or Parameter Value                                                                 | Notes and Rationale |
| ------------------------- | ----------------------------------------------------------------------------------------------- | ------------------- |
| Input: Particles          | Selected particles from the Extract From Micrographs job in Re-picking the HA Dataset           |                     |
| Input: Initial volumes    | Both volumes from the Ab-Initio Reconstruction job run in the Initial Particle Curation section |                     |
| Symmetry                  | C3                                                                                              |                     |
| Force hard classification | On                                                                                              |                     |

![Figure 29.](/files/edWkN60X8Pdmcl7UhoSB)

The majority of the particles are in a class for which the volume is two unconnected spheres (Figure 29). This likely represents particles images which truly are two adjacent top-down views. However, 235k particles are in a class which looks like HA — approximately 45,000 more than when 2D Classification was used. We can refine, extract, and re-refine class 1 as we have with the other particle stacks to produce a full-size map (Figure 30).

![Figure 30. Particles from class 1 of the Heterogeneous Refinement were refined, re-extracted, and re-refined. This reconstruction has a cFAR of 0.12.](/files/PHlo7WzvNCvGP1ER4U8m)

The final map from this particle stack has a cFAR of 0.12, slightly worse than the particles were curated with 2D Classification. Note, however, that connectivity has slightly improved in the head region. Additionally, comparing the 3D-only map against the 2D classified map, we see that the 3D-only map is taller — and therefore closer to the height of the tilted map (Figure 31). This highlights an important point — while cFAR is a useful metric for assessing a map’s isotropy automatically, it is always important to visually inspect the map to determine whether it is better or worse than a previous result in terms of interpretability.

![Figure 31.](/files/yqJtkt7JgeHOMjqgja2w)

### A Second Heterogeneous Refinement

In the Heterogeneous Refinement job run for this particle stack, there was only one class which looked like HA. The Heterogeneous Refinement run on the 2D Classified particles, on the other hand, produced two HA-like maps: one a higher-quality HA and the other a severely streaked HA. It is therefore possible that this particle stack, which has not been 2D Classified, needs a second round of Heterogeneous Refinement to filter out low-quality HA particles.

| Input or Parameter Name   | Input Source or Parameter Value                                                                  | Notes and Rationale                                                                                                                                                                                          |
| ------------------------- | ------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Input: Particles          | Particles from Class 1 of the last Heterogeneous Refinement (i.e., the particles from Figure 29) |                                                                                                                                                                                                              |
| Input: Initial volumes    | Two copies of Volume 1 from Heterogeneous Refinement                                             | Ab-Initio Reconstruction could be used here to generate new volumes. However, since our aim is to separate HA particles of varying quality, it is likely that providing two copies of HA will be sufficient. |
| Symmetry                  | C3                                                                                               |                                                                                                                                                                                                              |
| Force hard classification | On                                                                                               |                                                                                                                                                                                                              |

![Figure 32. Improved isotropy in the head and stem domains are indicated with arrows.](/files/kPPiT95Kfdgk3hTTLI9O)

As expected, both classes appear to be HA particles. However, class 1 is significantly more isotropic (Figure 32). Note that the loops in the stem domain are connected and the head domain is less distorted. We can generate a full-size map using the same refine-extract-refine procedure.

![Figure 33. This map has a cFAR score of 0.28.](/files/NQrKEJIf1rsJcmCsKsh8)

Although this map has a nominally worse GSFSC resolution of 3.00 Å compared to the single-heterogeneous-refinement map (2.90 Å), it is significantly more isotropic (cFAR 0.28). Note especially that the HA2 helix (inside the central cavity of the protein) appears more helical and less like an unbroken tube of density. This reconstruction uses approximately half as many particles, but it is far more useful and interpretable than the map produced from particles which were run through a single Heterogeneous Refinement (Figure 34).

![Figure 34. The final homogeneous refined map (top) and cFSC plot (bottom) is displayed for each of the following particle stacks: 2D classified (left); no 2D classification, 2x Heterogeneous Refinement (middle); and tilted (right).](/files/jjQbESJkRPU7x8sGdFEq)

Despite this improvement, the map remains imperfect. The stalk region appears “tattered”, with density breaking up into vertical streaks around halfway down the protein. Additionally, the untilted map is still significantly shorter than the tilted map. There are a number of ways one might try to improve the untilted map, including re-picking on [denoised micrographs](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/exposure-curation/job-micrograph-denoiser-beta) or picking with a deep picker (e.g., [Topaz](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/deep-picking/topaz)). However, rather than re-pick particles, we will try to make better use of the particles we already have.

## Rebalance Orientations

[Rebalance Orientations](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/particle-curation/job-rebalance-orientations) takes in a particle stack and removes particles from over-represented viewing directions, until the viewing distribution is more even. It may seem counter-intuitive that removing particles may improve the map, but it is the case that some datasets benefit from this approach.

### Important Considerations

Before building a Rebalance Orientations job, it’s important to understand the assumptions underlying the job. The first assumption is that the particles’ existing pose estimates are reliable. For HA, if the goal is to throw away the over-represented top views, we must be able to reliably identify particles images which are in fact top views. If our pose estimates are unreliable, we may essentially be discarding particles at random, which will likely not improve the map.

The second point to consider is how we decide which particles to throw away. Suppose we do have accurate pose estimates for the particle images, and we need to discard 1,000 top views, but we have 10,000. There’s currently no best way to decide which 1,000 to discard. Rebalance Orientations provides five ways of doing this, each with their own benefits and drawbacks. In each viewing direction bin, Rebalance Orientations can:

* Discard particles with worse NCC or power scores (from particle picking). This may be useful if template picking was performed and templates are reliable, but particle picking information is often very low quality.
* Discard particles with highest 2D alignment error. If 2D classes looked good but the 3D map is poor, this may work well. However, 2D alignments are often low-resolution.
* Discard particles with highest 3D alignment error. If the map is only slightly anisotropic, this mode may work best, but if the map is significantly smeared or streaked it may not be best to keep particles that match that map.
* Discard particles with lowest per particle scape (also known as *alpha*). Alpha is another measure of how well the particle matches the map. It has the same potential benefits and drawbacks as the 3D alignment error, but in practice it often works slightly better because it also takes into account ice thickness and noise.
* Discard particles at random. Discarding randomly removes any dependence on the map quality, and so is usually most useful when the map is poor.

In practice, it is often best to try several of these modes and see which works best.

Finally, there is the question of how many particles to discard. Generally, removing more particles improves orientation distribution, but limits the overall achievable map quality and resolution. If the existing map is highly anisotropic, this may be a net gain. Again, it is best to try a range of rebalancing percentiles and inspect the results manually.

### Rebalance Orientations - Alpha

{% hint style="info" %}
Ensure that the Homogeneous Refinement performed with the final particle stack had `Minimize over per-particle scale` turned on. If this parameter is off, the alpha value will not be accurate to the particle images and poses.
{% endhint %}

We will start with a Rebalance Orientations job which retains particles based on their alpha value (also known as per-particle scale).

| Input or Parameter Name       | Input Source or Parameter Value                  | Notes and Rationale                                                                                                                                                                                                                              |
| ----------------------------- | ------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Input: Particles              | Particles from the latest Homogeneous Refinement |                                                                                                                                                                                                                                                  |
| Rebalance percentile          | 80 (default)                                     | For more explanation of the Rebalance percentile parameter, see the Rebalance Orientations [job page](https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/particle-curation/job-rebalance-orientations#rebalance-percentile). |
| Intra-bin exclusion criterion | alignments3D/alpha                               | A particle’s alpha value is the same as its per-particle scale.                                                                                                                                                                                  |

In this case, there were approximately 122k particles in the input stack. Rebalancing to the 80th percentile viewing direction bin retained 75k particles and excluded 47k particles. We can see that the particle viewing direction distribution is more even after rebalancing, as expected (Figure 35). Note that although we set the rebalance percentile to 80, we retained only approximately 60% of the input particles — Rebalance Orientations rebalances viewing direction bins, not individual particles.

![Figure 35. Viewing direction plots for the input (left), rebalanced (top right), and excluded (bottom-right) particles are shown. Note that the excluded particles are largely top views, and that the dark band at the top of the input particles is no longer present in the rebalanced particles.](/files/0U6F7hDbjScc8HK3Nbst)

We next set up a Homogeneous Refinement to re-refine these particles. We must refine the particles, not just reconstruct them, because by balancing out the views we may improve the alignment in the early, low resolution iterations. This may then have a knock-on effect in later iterations which use higher-resolution information.

| Input or Parameter Name          | Input Source or Parameter Value    | Notes and Rationale                                                                                                                                                                     |
| -------------------------------- | ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Input: Particles                 | Particles from Rebalance Particles |                                                                                                                                                                                         |
| Symmetry                         | C3                                 |                                                                                                                                                                                         |
| Adaptive Marginalization         | On                                 |                                                                                                                                                                                         |
| Minimize over per-particle scale | On                                 | Since we expect the result of this refinement to be significantly different from the result of the previous refinement, we should re-calculate per-particle scale to match the new map. |

![Figure 36. The cFAR for this refinement is 0.41.](/files/ZuQuUEGwuEPy7cmV8S1n)

This refinement is significantly better than the unbalanced result (Figure 36). The connections in the β-sheet are visible, as are individual helices at the top of the head domain. As expected, this more isotropic map also has a significantly better cFAR score of 0.41.

The GSFSC resolution of this map is still around 3 Å — it is possible that removing even more particles would result in improved cFAR, and rebalancing to the 80th percentile has not yet degraded resolution. We will try two more settings of the Rebalance Orientations job’s `Rebalance percentile` parameter: 60 and 40. After both of these jobs, we run a Homogeneous Refinement with the same settings as were used for Figure 36.

![Figure 37. The map produced by particles rebalanced with the indicated percentile is displayed on the right. Maps were aligned to a common reference, and all images were captured in the same position. At the top is the rebalancing plot produced by Rebalance Particles, scaled to keep 0 at the same position. At the bottom is the cFSC plot for the particle stack.](/files/asgggJmTjua8p0lnY6pg)

The resulting maps are different in subtle but important ways. To aid direct comparison, they are presented in a cycling animation (Figure 37).

Compared to the map created by rebalancing to the 80th percentile, the 60th percentile has a better cFAR score (0.51 instead of 0.41). The map looks largely similar, except it is slightly taller. Since all of these maps remain shorter than the tilted map, these two facts together indicate that the 60th percentile map is higher quality than the 80th percentile map.

On the other hand, rebalancing to the 40th percentile produces a map with more fragmented density, especially in the stalk domain. The cFAR score for the 40th percentile map is the same as that of the 60th. However, recall that the cFAR score is the ratio of the best and worst cFSC resolutions. Thus, the cFAR score will remain the same if both the best and worst cFSCs change by the same amount. Indeed, this is exactly what happened in the 40th percentile map.

Taken together, the 60th percentile map finds the best balance between particle count and orientation bias.

### Rebalance Orientations Workflow

Rebalance Orientations produced satisfactory results when rebalancing by discarding particles with low per-particle scales (alpha values). However, the map used to calculate the particles’ alpha values was of poor quality and very anisotropic. It is therefore possible that some high-quality particles had a low alpha score because they did not match the (poor) starting map. These particles would have been discarded, but would in fact have been the best images to keep. To assess whether this happened, we can re-run the same analysis as above, but discarding particles randomly rather than based on their alpha.

We could manually re-create all of the necessary jobs or clone them. However, the process of testing several different rebalance percentiles and refining each against a common map is relatively common, so we will make a [Workflow](https://guide.cryosparc.com/application-guide-v4.0+/workflows) to simplify this process in future projects.

First, select all three of the Rebalance Orientations and their descendant Homogeneous Refinement jobs, but *not* the parent, unbalanced Homogeneous Refinement (Figure 38).

![Figure 38. Each of the Rebalance Orientations jobs and their descendants are selected, but not the parent, unbalanced refinement (in this case J65). Note that in this example we also ran optional Orientation Diagnostics jobs after each refinement.](/files/hhDkDaCD3ihnrIVD5Bsf)

Next, either right-click one of the selected jobs and click Create Workflow, or click the Create Workflow button in the builder panel. This opens the [“Create Workflow” dialog](https://guide.cryosparc.com/application-guide-v4.0+/workflows#creating-a-workflow). At this point, we could create the workflow as is, but it is useful to enter some information to make the workflow easier to use. In this case, we:

* Set the title to `Rebalance Orientations (80, 60, 40)`
* Set the category to Orientation Bias
* Entered a description of what the workflow does and why one might want to use it
* Flagged each Rebalance Orientations’ `Intra-bin exclusion criterion` so that users will know they may want to change this.
* Set each Homogeneous Refinement’s `Symmetry` parameter to C1 and flagged them. Although in this case we want to use C3 symmetry, the workflow will be more generally useful if the default is C1.

Finally, click Create at the bottom of the dialog. The workflow is now saved to your CryoSPARC instance. If you ever encounter a dataset in the future which may benefit from Rebalance Orientations, you can now test the rebalance thresholds with only a few clicks.

### Rebalance Orientations - Random

We will build all of the jobs in our workflow by:

1. Selecting the input, unbalanced Homogeneous Refinement.
2. Clicking Workflows at the top of the builder panel.
3. Selecting the workflow we built in the previous step.
4. (optional) Turning on Queue on Apply and selecting a lane to which all jobs will be queued.
5. Changing each Rebalance Orientations job’s `Intra-bin exclusion criterion` parameter to “random”.
6. Changing each Homogeneous Refinement’s `Symmetry` parameter back to C3.
7. Clicking “Apply”.

The three Rebalance Orientations jobs and their children are created (and optionally queued) in the workspace. Once they complete, we can inspect the results of the Homogeneous Refinements (Figure 39).

![Figure 39. Panels are arranged as in Figure 37.](/files/hQXB9ws4Z04IfIuicdwG)

Each of these maps has a better cFAR score and subjectively more isotropic appearance than the corresponding map that was rebalanced based on particle alpha. Similar to the alpha-rebalanced maps, the 40th percentile map has a higher cFAR score, but this improvement is in part due to a lower overall resolution. The 40th percentile map’s density is also more fragmented than the 60th percentile map’s.

### Rebalance Orientations Conclusion

![Figure 40. The rebalanced map in the center is the Homogeneous Refinement of the 60th percentile, randomly rebalanced particle stack.](/files/7jjEagg8phj3xfFAFKv4)

All told, Rebalance Orientations improved the map with all of the settings tested. However, the best map was produced when rejecting particles randomly within viewing direction bins with more particles than the 60th percentile (Figure 40).

The fact that Rebalance Orientations performed better when rejecting particles randomly may at first be surprising, but recall that particles’ alpha values are based on how well they match the reference. Since the reference was anisotropic and fragmented, particles which match the reference very well may be poorer than particles which do not.

The 60th percentile, randomly rebalanced map has a cFAR score of 0.53. A cFAR greater than 0.5 generally indicates that the map is sufficiently isotropic to build a model. However, the stalk remains far more fragmented than the rest of the protein. When targets have regions with differing quality or resolution, Non-Uniform Refinement can perform better than Homogeneous Refinement.

## Non-Uniform Refinement

Homogeneous Refinement filters the reference according to the GSFSC during each iteration. For many targets, this technique works well. However, membrane proteins or targets with flexible or poorly-resolved domains, the poorly resolved domain can hurt alignment of the rest of the target. Non-Uniform Refinement overcomes this issue by using a variable filter across the protein that is automatically determined during refinement, meaning that the poorer regions are filtered more aggressively than the high-quality regions. This variable filtering may help align the stalk domain. We will create a Non-Uniform Refinement job with similar parameters to the previous Homogeneous Refinement.

| Input or Parameter Name          | Input Source or Parameter Value                | Notes and Rationale                                                                                                                                        |
| -------------------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Input: Particles                 | Particles from the last Homogeneous Refinement |                                                                                                                                                            |
| Symmetry                         | C3                                             |                                                                                                                                                            |
| Adaptive Marginalization         | On (default)                                   | Unlike Homogeneous Refinement, Non-Uniform Refinement uses Adaptive Marginalization by default.                                                            |
| Minimize over per-particle scale | On                                             | As with the previous Homogeneous Refinement, if Non-Uniform Refinement significantly improves the map, the per-particle scales should be adjusted as well. |

![Figure 41. This map has a cFAR score of 0.69.](/files/lBcyBVLgVDeSrwFZ5L99)

Non-Uniform Refinement significantly improved the HA map (Figure 41). The β-sheet in the stalk is clearly separated, and α-helices at both the top of the head and the bottom of the stalk are resolved well enough to build into. Although the final cFAR score of this map (0.69) is still below that of the tilted map (0.81), the quality and isotropy of the Non-Uniform Refined map approaches that of the tilted dataset map (Figure 42). We therefore consider this analysis complete.

![Figure 42. Abbreviations: Homo. Ref. = Homogeneous Refinement; NU Ref. = Non-Uniform Refinement.](/files/YPF5j2tj3oCs1dBvZAYg)

## Conclusion

In this case study, we focused on the processing of the untilted HA dataset. This dataset is a canonical example of orientation bias. The initial particle stack was severely biased and produced an unusable map, with a cFAR score of 0.05. Through careful processing, we were able to recover a 3.1 Å map with a cFAR score of 0.69. Namely:

* Re-picking with an ellipse (instead of a circular blob) improved the cFAR from 0.05 to 0.17
* Replacing 2D Classification with an all-3D workflow improved the cFAR from 0.17 to 0.28
* Rebalancing the particles based on their orientation improved the cFAR from 0.28 to 0.53
* Using Non-Uniform Refinement improved the cFAR from 0.53 to 0.69

Note, however, that these steps were only required for the untilted dataset. The second, tilted dataset collected by Tan and colleagues produced a map with an even better cFAR score of 0.82 with an entirely standard processing pipeline. This highlights the fact that the careful techniques described in this case study are only necessary in situations where the dataset exhibits strong orientation bias.

It is also worth pointing out that different samples will require different analysis pipelines. For example, skipping 2D classification was very important for HA because the side views look very similar to two adjacent top views. Other targets will likely not have this issue, and so may not benefit as much from a 3D-only workflow.

An important lesson from this case study is that decisions made very early on in a processing pipeline have an impact through the rest of the analysis.

{% hint style="info" %}
Users who frequently experience orientation bias problems may be interested in trying a range of parameter values at each stage of processing in this case study to determine how sensitive HA is to these parameters, and in what way the parameters change the result.
{% endhint %}

One final note: in the case of HA, there *were* side views present in the untilted dataset; they were simply difficult to pick and, once picked, easy to lose during particle curation. This is not always the case. Some datasets may be completely devoid of a crucial viewing direction. In these cases, no amount of processing will recover that missing information. New data must be collected to solve the structure.

## References

1. Fan, H. et al. A cryo-electron microscopy support film formed by 2D crystals of hydrophobin HFBI. Nat Commun 12, 7257 (2021).
2. Tan, Y. Z. et al. Addressing preferred specimen orientation in single-particle cryo-EM through tilting. Nature Methods 14, 793–796 (2017).