Case Study: End-to-end and exploratory processing of a motor-bound nucleosome (EMPIAR-10739)

Processing EMPIAR-10739 including handling global pseudosymmetry, using 3DVA to guide classification strategies, separating low population classes, and local refinement of a flexible region.

Introduction

In this case study we will work step-by-step through the full processing pipeline for the human Chromodomain-helicase-DNA-binding protein-like 1 (CHD1L) also known as Amplified in Liver Cancer 1 (ALC1) bound to the Xenopus laevis nucleosome, using a dataset originally collected and processed by Bacic, Gaullier et al deposited as EMDB:13065 and PDB:7otq. The raw data are publicly available for download as EMPIAR-10739. This case study is written so that you can replicate these results yourself.

We selected this dataset to provide an example processing pipeline for a protein-DNA complex that has sub-stoichiometric (less than 1:1) binding of a partner protein, and the image processing was performed using CryoSPARC v4.7.

Nuclear DNA in most eukaryotic cells is stored within the nucleus in a stabilised structure called chromatin. Within chromatin, DNA is wound around a core of histone proteins, typically H3, H4, H2A and H2B, forming units called nucleosomes that together have a “beads on a string” appearance. When the DNA in or near a nucleosome becomes damaged by a single- or double-strand break, a post-translational modification called poly(ADP-ribose) (PAR) gets added to histone amino acids (usually serine residues) near the damaged DNA site by Poly(ADP-ribose) polymerases PARP1 or PARP2. This process depends on Histone PARylation Factor 1 (HPF1) which directs PARylation to serine residues. PARylation of the histones then facilitates recruitment and binding of ALC1. ALC1 is responsible for ATP-dependent chromatin remodelling that is required for DNA repair and has been proposed as a potential target for drug development against cancer. We show the overall architecture of the ALC1-bound nucleosome found from EMPIAR-10739 (EMDB:13065).

Overall architecture of the ALC1-bound nucleosome. Images show the map from EMDB:13065 coloured by proximity to PDB:7otq.

The nucleosomes here were assembled in vitro using a synthetic nucleosome-positioning DNA sequence called Widom 601, which in this case was engineered to have different lengths at each end (one long with 10 additional base pairs extending out of the nucleosome, and one short). The side with longer DNA is more easily accessible to PARP, so PARP binding, PARylation, and downstream ALC1 binding might preferentially occur on one side of the nucleosome due to this asymmetry in DNA length around the nucleosome. This is important prior information that we will consider during our processing. The sample was also prepared using an inhibitory ATP analogue that prevents ACL1 catalysis after binding.

The primary aim of this pipeline is to separate out a population of nucleosome particles that have good density for the bound ALC1, and to resolve the ALC1 to an adequate resolution to observe its binding interactions with the nucleosome. The case study also includes some optional exploratory processing in blue text, some of which informed our decision-making for the final pipeline, and we show those results here to explain the choices made.

A summary of the pipeline of CryoSPARC jobs used in the sections of this case study are shown in a flow diagram below.

Flowchart showing the CryoSPARC jobs used in each section of this case study.

Setting up

This is a fairly large dataset with 34k movies, so ensure that you have sufficient storage disk space for the download (~14.5 TB) and processing (~ 1.2 TB) before commencing.

For anyone with limited disk space who is less interested in the pre-processing, picking and initial particle cleaning steps, we are providing a particle stack of 1.9 M extracted particles ~600 GB) from Section 5 so that you can follow along with Sections 6-7 and 9-10. Note that Reference-Based Motion Correction in Section 8 requires the raw movies.

To download the particles, navigate to your raw data storage location. For example our data is downloaded to a directory called rawdata.

cd /path/to/rawdata
wget https://s3.us-east-1.wasabisys.com/cryosparc-test-data-dist/empiar_10739_particles.tar .

The download may take several hours, and when it is complete, you can untar the folder with the following command:

tar -xvf empiar_10739_particles.tar

Before beginning this tutorial, you should create a new project and a workspace within that project. Download the 33,498 movies to a location of your choosing. For example, our data is downloaded to a directory called rawdata using the commands:

cd /path/to/rawdata
wget -m ftp://ftp.ebi.ac.uk/empiar/world_availability/10739 .

1. Movie import and pre-processing

  • Import the data using an Import Movies job. The Movies data path should match the location of the directory containing the downloaded .tif files, for example:/path/to/rawdata/ftp.ebi.ac.uk/empiar/world_availability/10739/data/batch-1/Images-Disc1/GridSquare*/Data/FoilHole*fractions.tiff.

These movies are already gain corrected so you do not need to add any gain reference.

  • Add in the experimental information below that we obtained from the EMD:13065 entry.

    Raw pixel size (A)

    0.84

    Accelerative Voltage (kV)

    300

    Spherical Aberration (mm)

    2.7

    Total exposure dose (e/A^2)

    45

As the data were collected in two batches, run one import job for each batch, so that they are designated different exposure groups that will later on in the pipeline allow independent refinement of Global CTF parameters that may differ between the two. Now we want to correct for beam-induced motion during movie collection and to estimate the defocus values for each micrograph.

If you are using the downloaded particles, instead:

  • Create an Import Results Group job and navigate to the absolute path of the file J8045_split_0.csg and queue the job to import the particles. You can run Ab Initio 1 from Section 4, then use this volume as an input for Non-Uniform Refinement 1 in Section 5, for downstream steps.

  • Run a Patch Motion Correction job for each import job, followed by Patch CTF Estimation.

In Patch Motion, if you use Save results in 16-bit floating point:true, then the output images will take up half of the disk space compared to the default 32-bit floating point files, with no reported loss in accuracy. The Number of GPUs to parallelize can be set depending on GPU availability, and on a single GPU we found each Patch Motion job took ~28-36 hours. This runtime could be reduced substantially by running Patch Motion on multiple GPUs.

  • Run Patch CTF Estimation with the default settings.

As the Patch Motion jobs could take over a day to complete (depending on how many GPUs you have available), to speed up the processing you can stop a Patch Motion job once it has processed ~5,000 movies, then run another Patch Motion job inputting the micrographs_incomplete to continue processing the remaining movies. Depending on your available computational resources, this may allow you to run Patch CTF Estimation and the jobs in Sections 2-3 before the full dataset Patch Motion has completed, then re-run Micrograph Junk Detector, Manually Curate Exposures and Micrograph Denoising jobs again on the full dataset before Template picking.

2. Excluding poor micrographs from downstream processing

Movies collected for single particle cryo-EM can have a number of different characteristics. Some of these, like a range of defocus values, or a range of ice thickness can be beneficial for image reconstruction, but movies can also contain junk and outlier attributes that reduce the quality of their particles such as excessive in-movie motion and ice that is too thick or too thin for your sample. Common junk can come in the form of non-vitreous ice and contamination with ice crystals or other features such as the edge of the holey support.

We can use the Micrograph Junk Detector to identify regions of junk, and to give us statistics about the types and quantity of junk present in the data.

  • Run a Micrograph Junk Detector job and have a look at the example images and summary graphs

Figure 1. Results from the Micrograph Junk Detector job. An example micrograph with identified junk regions, number of micrographs containing each type of junk, and total micrograph area taken up by each junk type are shown.

We see from the example outputs in Figure 1 that the edge of the holey carbon support (green region) is within the imaged area, and that there are some small regions of ethane contaminants or small ice crystals (magenta regions). The summary graphs indicate that while many of the images contain carbon edge and extrinsic ice defects, the area of the micrographs that these take up is relatively low at ~5-6%.

If particles from poor micrographs or regions get extracted then it can take some effort to remove the low-value particles, so it is easier to exclude the most offensive micrographs at the start, by looking at the statistics generated by the Patch Motion, Patch CTF and Micrograph Junk Detector jobs. For micrographs containing junk, you can assess visually if you are satisfied that rejecting particles from the junk regions will be sufficient, or if you would prefer to reject those micrographs entirely.

We will inspect the CTF-estimated, and junk detected micrograph statistics using a job so that we can exclude images of poor quality from downstream processing.

  • Run a Manually Curate Exposures job, inputting the “Labelled Micrographs” from Mic. Junk Detector.

  • In the job card view go to the pink Interactive tab

  • Try plotting Relative Ice Thickness against Intrinsic Ice Defect Area % to look for a correlation.

  • Set thresholds for outliers on undesirable characteristics

    • We chose to select the following thresholds:

Parameter
Threshold
Reason

CTF fit resolution

max 5 Å

Exclude poorest resolution micrographs

Relative Ice Thickness

max 1.2

Exclude very thick ice with poor signal-to-noise

Total full-frame motion distance

20

Exclude outliers with large motion

Intrinsic Ice Defect Area %

max 10

Exclude images with substantial amounts of non-vitreous ice

We might expect a correlation between the Intrinsic Ice Defect Area % and Relative Ice Thickness because both statistics are likely to pick up micrographs with very thick or non-vitreous ice, but they do so by different methods:

  • Relative Ice thickness is based on the amount of power in the ice resolution band compared to the background

  • Intrinsic Ice Defect is based on visual similarities to a manually segmented set of micrographs containing non-vitreous ice

There are occasionally cases where one statistic identifies a problematic micrograph and the other does not. In our Manually Curate Exposures job, a threshold of 1.2 for Ice Thickness excluded 4,737 micrographs but by adding in a threshold of 10% for Intrinsic Ice Defect Area we rejected a further 333 mics. Plots for Relative Ice Thickness and Intrinsic Ice are shown in Figure 2A.

In Figure 2B we have an example of an accepted micrograph that is typical for this dataset. In Figure 2C we show a micrograph with high relative ice thickness, but the Micrograph Junk Detector did not identity this as having an Intrinsic Ice Defect. On the other hand, in Figure 2D we see a micrograph identified as containing a substantial amount of Intrinsic Ice Defect, but low relative ice thickness. We therefore find it beneficial to make use of both of these statistics to get the best coverage of images containing excessively thick or non-vitreous ice. This left us with ~28k micrographs from the full dataset.

Figure 2. Plots and example micrographs from Manually Curate Exposures. A) Relative Ice thickness with an upper threshold set at 1.2. and Intrinsic Ice Defect Area (%) with a threshold set at 10%. B-D) Example micrographs that were accepted or rejected.

3. Micrograph denoising and blob picking

Now that we have excluded the poor micrographs we can go ahead and denoise the micrographs using the Micrograph Denoiser. This can make downstream picking more consistent across defocus values, and the more prominent appearance of particles makes thresholds for particle picks easier to choose.

  • Create a Micrograph Denoiser job, inputting the exposures_accepted and set Number of CPUs to Parallelize to the number of CPUs available on your node.

Figure 3. Example micrographs from Micrograph Denoiser.

We will now move on to perform blob picking on these cleaner-looking images. There are many examples of nucleosome structures in the PDB, so we can look at those to estimate the diameter of the particles to pick. To us it looked like a circular diameter in the region of 150-180 Å should contain a single nucleosome particle.

Parameter
Setting

Minimum particle diameter (A)

150

Maximum particle diameter (A)

180

Pick on denoised micrographs

true

Number of mics to process

5000

We will use just 5,000 micrographs at this stage as we will use blob picked particles to generate 2D class average templates for later template picking on the full dataset. We already detected junk regions in the micrographs so we can automatically reject picks within and nearby those areas.

We found relatively few particles got rejected a this stage, only 60k out of 1.5M, likely because we already rejected the worst micrographs.

  • Run an Inspect Picks job and look at the first few micrographs, and at the ncc/power score plot.

Looking at the ncc/power score plot we can see there is one main cluster of picks (yellow on the heat map) with a centre around a power score of 100.

While we could set the threshold manually for ncc and power score manually (see Figure 4A), we might be able to make a better selection by using the Auto clustering option (see Figure 4B).

Figure 4. Blob picks after Micrograph Junk Detector. A) Thresholds set as NCC score > 0.630, Local power >66 <192. B) Auto clustering with a Target power score of 100.
  • Make a new Inspect Picks job, this time setting Auto Cluster:true, and Target power score:100.

The choice between manually selecting thresholds and using auto clustering will depend on the shape and number of pick clusters observed. You can try both on your own data and see which gives you the better particle selection.

We ended up with 860k particles which were suitable for taking forward to extraction.

The longest diameter of the nucleosome is ~100 Å but we are expecting with ALC1-bound that this will be larger, so we initially chose a generous box size of 350 pixels (294 Å) to be sure that we extracted the ALC1 as well as the nucleosome.

  • Create an Extract from Micrographs (CPU or GPU) job, inputting the particles and micrographs from Inspect picks with the following settings:

Parameter
Setting
Explanation

Extraction box size (pix)

350

This box should generously include the nucleosome target

Fourier-crop to box size (pix)

64

F-cropping to a smaller box saves disk space, and allows jobs to run faster

Save results in 16-bit floating point

true

using float-16 format saves disk space

Number of CPU cores

64 (or as many as you have on your node / workstation)

set only for “Extract from micrographs (CPU)” job type

Number of GPUs to parallelize

4 (or as many as you have on your node / workstation)

set only for “Extract from micrographs (GPU)” job type

You might notice that the number of extracted particles is fewer than the number selected from Inspect Picks. This is because CryoSPARC does not extract particles where the extraction box extends outside the edges of the micrographs. In the above example out of 860k particles, 730k were extracted.

4. Cleaning particles to generate 2D templates, and template picking

We want to use our blob picked particles to generate nice 2D templates with a variety of particle views that can be used for template picking.

To begin with we will use 2D classification to throw out the worst of the junk particles.

Parameter
Setting
Explanation

Number of 2D classes

100

Using more classes allows us to find a greater diversity of junk types

Maximum resolution (A)

10

The resolution of these images is limited to ~9 Å due to Fourier cropping

Initial classification uncertainty factor

1

Setting this lower than 2 encourages greater diversity of classes = good to identify junk

2D zero pad factor

1

Reducing padding will allows the job to run faster

Number of GPUs to parallelize

4

Use as many as are available on your node or workstation

Now we want to take a look at the 2D class averages and select the best ones

  • Run a Select 2D Classes job and select the class averages that contain views of the intact target like the examples in Figure 5.

Figure 5. 2D class averages after classification of extracted blob picks. Possible density for ALC1 is indicated in two selected classes by dotted green circles.

In some of our selected classes we can see density outside of the core nucleosome (shown inside dotted circles in Figure 5) and we can use the ruler to get a better estimate of the particle diameter. It looks like the ALC1-bound nucleosome has a diameter of around 120 Å. Some class averages do not have obvious density for ALC1, but at this stage we will keep all the good classes in and handle separation of ALC1-free and ALC1-bound nucleosome later on in the pipeline.

The diversity of views in these selected classes is OK, but with a little more particle clean-up we can probably do better, so we can move on to remove more junk in 3D by use of Ab-Initio and Heterogeneous Refinement, and then generate better 2D templates for picking.

As we already separated some good particles from bad ones after blob picking, we can use those particles to generate low resolution volumes in two separate jobs:

  1. Run an Ab-Initio Reconstruction job (1) with Number of classes:1 and input the “particles selected” from your Select 2D job and Num particles to use:20,000

  2. Run an Ab-Initio reconstruction job (2) with “particles excluded” from your Select 2D job with the following settings

Parameter
Setting

Number of Ab-Init classes

3

Num particles to use

1000

Maximum resolution (Angstroms)

30

Number of initial iterations

20

Number of final iterations

30

The settings used for Ab-Initio Reconstruction 2 are intended to quickly generate low resolution volumes by limiting the number of particles, resolution and number of iterations performed. The decision to do so was based on the appearance of the 2D class averages.

For datasets like this one that appear to contain a relatively pure sample where unwanted picks are genuinely junk or false picks on the ice, this strategy can markedly reduce the time required to generate volumes. On the other hand, if picked particles contain an array of unwanted contaminant proteins, or severely damaged and broken particles, then it may be beneficial to run the job with more particles, and the default iterations and maximum resolution.

Now that we have generated 4 volumes we can use these as templates for Heterogeneous Refinement:

  • Run a Heterogeneous Refinement job (1) inputting the selected particles from Select 2D, the Ab-Initio models that you obtained and set Refinement box size (voxels):64 as this is the box size that the particles were Fourier cropped to.

Examine your Ab-Initio and Heterogeneous Refinement output volumes.

Figure 6. Volumes from Ab-Initio and Heterogeneous Refinement of extracted blob picked particles. A) Volumes from Ab-Initio Refinement jobs. B) Volumes from Heterogeneous Refinement.

At this stage we had ~180k particles in the best class (purple in Figure 6) from which to create some nice 2D templates.

  • Run a 2D Classification job using the particles from the best Heterogeneous Refinement class, and the following settings:

Parameter
Setting
Explanation

Number of 2D classes

30

We do not want an excessive number of selected classes as template picking takes longer with more provided templates

Maximum resolution (A)

10

The resolution of these images is limited to ~9 Å due to Fourier cropping

Initial classification uncertainty factor

1

Setting this lower than 2 encourages greater diversity of classes = good to identify junk

2D zero pad factor

1

Reducing padding will allows the job to run faster

Number of GPUs to parallelize

4

Use as many as are available on your node or workstation

In the above example we used an initial uncertainty factor of 1 and we saw a diverse range of views. Other datasets might benefit from using a higher uncertainty factor, such as 2-5 and fewer classes. Setting the initial uncertainty factor too high for a given dataset can lead to all classes looking identical. A value of 1 or 2 tends to work robustly, with values of 0.5 to 5 working better in some cases.

  • Run a Select 2D Classes job and select the class averages that contain views of the intact target.

We show example selected classes in Figure 7A. Up to this point we just used particles from a subset of micrographs to speed up 2D template generation, but now we are ready to pick using the whole dataset.

  • Create a Template Picker job and connect the micrographs from the Manually Curate Micrographs job and the selected templates from last Select 2D Classes. Set Pick on denoised micrographs:true and Particle diameter (A):140.

You will notice that although we deduced the particle diameter was around 120 Å, we use a slightly larger value for template picking. We have empirically found that we sometimes get better picking results with a diameter that is set to ~10-20% larger than the particle.

In our processing the Template Picker yielded ~20 million particles, but we want to remove the most obvious junk picks before extraction.

  1. Run a Micrograph Junk Detector job and input the exposures and particles from the Template Picker job.

We were left with ~18M particles.

  1. Run an Inspect Picks job and look at a few micrographs. Move the NCC and power score sliders to remove picks that look overly generous.

We noticed that there were two clusters visible in the Power Score/ NCC Score plot (see Figure 7D, inset), and found that an NCC threshold of 0.48 and power scores of 57-223 looked like it was keeping the particles without also keeping background picks. This left us with ~9M particles. Alternatively we could use the Auto cluster mode with a Target power score 80, but we found that this selection seemed to miss some good-looking particles (see Figure 7E).

Figure 7. 2D templates and template picking results. A) Selected class averages. B) Lowpass filtered 2D templates used for picking. C) Particles accepted (white) and rejected (red) from the Micrograph Junk Detector. D) Accepted particles and ncc/ Power score plot after setting manual thresholds at ncc 0.49 and power score >57 and <223. E) Accepted particles and ncc/ Power score plot after Auto-clustering using a target power score of 80. Green arrows indicate particles that were not selected using this method.
  • Create an Extract from Micrographs (CPU or GPU) job, inputting the particles and micrographs from Inspect Picks with the same settings as Section 3.

5. Initial cleaning of particles from the full dataset

Now that we have extracted particles from the whole dataset we need to clean out the remaining bad particles. As we have a large quantity at ~7.5M it will likely be faster to first clean in 2D, but as there are so many particles we will get better separation of junk if we use more classes (which will inevitably slow down the job).

Parameter
Setting

Number of 2D classes

400

Maximum resolution (A)

12

initial classification uncertainty factor

1

2D zero pad factor

1

  • Run a Select 2D Classes job and select the class averages that contain views of the intact target.

This job took us ~3.5 hours to run on 6 GPUs. At this point we retained ~2.6M particles in the selected classes. The classes appeared to have good contrast, indicating that further rounds of 2D Classification might not allow for rejection of many more particles. Instead, we will move onto a cleaning step in 3D. During cleaning of the blob picked particles in Section 4 we generated some 3D Ab-Initio volumes that are suitable templates to re-use now.

  • Run a Heterogeneous Refinement job (2) inputting the selected particles from Select 2D, the Ab-Initio models that you obtained in Section 4 and set Refinement box size (voxels):64.

We ended up with ~2M particles in the best class. At this point it is a good idea to re-extract the good particles due to the following reasons:

A. The original picks may not have been well-centered in the boxes and we now have 3D alignment of particles from Hetero Refine to improve particle centering.

B. We chose to Fourier crop quite heavily to speed up cleaning and this limited the achievable resolution.

C. We can revise the extracted box size to include more of the delocalised signal caused by defocusing of the images during collection.

We now have a better estimation of the particle diameter at ~120 Å and we know the defocus range of the collected images extends to -3 µm, which might warrant a larger box than the commonly used 1.5-2x of the particle diameter.

There is a trade-off between capturing the delocalised signal from defocussing during data collection, and including excessive noise or adjacent particles in the extracted box, plus a consideration for the computing time to run jobs. We tested a range of computationally efficient extraction box sizes from 320 to 440 so you don’t have to! FSC resolutions were calculated using the same mask and all jobs were run on the same GPU for relative comparison of runtime.

Box size
FSC 0.143 resolution
cFAR
Runtime

440

2.47

0.83

3:02

416

2.47

0.83

2:49

384

2.50

0.80

2:28

360

2.50

0.80

2:16

320

2.54

0.79

1:55

We found the optimal result using a box of 416 and suggest that you also use this, unless you want to obtain a slightly lower resolution and potentially run the following jobs a bit faster.

  • Create an Extract from Micrographs (CPU or GPU) job, inputting the particles from the good Heterogeneous Refinement class, and micrographs from Inspect Picks, setting Extraction box size (pix):416, save results in 16-bit floating point:true.

  • Create a Non-Uniform Refinement job (Refinement 1) using the 416 box extracted particles, and the nucleosome volume from Heterogeneous Refinement with the following settings:

Parameter
Setting
Explanation

Symmetry

C2

This Nucleosome is expected to have pseudo C2 symmetry, causing alignment difficulty

Symmetry relaxation method

marginalization

Relaxing the symmetry allows the different symmetry-related poses to be tested for each particle

Minimise over per-particle scale

true

This allows weighting of the particle contribution to the reconstruction based on its contrast and similarity to the reference

Dynamic mask start resolution (A)

1

Effectively disabling masking during NU refinement

The reason why we set the dynamic masking to start at 1 Å (meaning that it will never start, since we don’t expect to achieve a 1 Å resolution) is that we want to disable masking during refinement, and rely on the Non-Uniform regularization to adaptively exclude noise. This method can sometimes yield slightly better map quality with NU refinement than using dynamic masking. If you wish, you can compare the results with and without this setting to see how it affects the map quality and statistics.

Per-particle scale is a function that compares the contrast of each aligned particle image with a projection of the refined volume and gives it a score. Effectively this gives a high score to particles with higher contrast that match the reference well, and a low score to those that have lower contrast and do not match the reference well. Minimising over per-particle scale means that the best particles get up-weighted, and the worst ones down-weighted. Often this is beneficial and can slightly improve the map quality and metrics.

When a cryo-EM target has known or suspected pseudosymmetry, asymmetric features can get blurred out across the symmetry-related positions. In order to over-come this, we can apply symmetry relaxation, so that every symmetry-related position is explicitly tested for each particle, so that symmetry-breaking features become clearer. This can be performed by either maximization (taking the best pose) or marginalization (allowing the particle’s contribution to be weighted over the best poses found). For more details see our tutorial on symmetry relaxation.

Once the job is completed, examine the output maps, and refinement statistics.

Figure 8. Refinement statistics and map appearance from NU Refinement 1. A) Gold Standard FSC curves B) images of the auto-sharpened map from top and side views. C) Summary of conical FSCs and cFAR score. D) Histogram of Per-particle scale factors. E) Orientation distribution of particles. F) Pose differences after symmetry relaxation (degrees).

We found that the cFAR of 0.83 (Figure 8) indicated a good sampling of different orientations in the particle stack) and the orientation distribution shows a repeating pattern (see Figure 8E). Such a repeat can indicated a symmetrical or pseudo-symmetric particle. The reported resolution in Figure 8A was 2.45 Å, although we noted a separation between the tight and corrected FSC curves indicating that in this job (Figure 8A), the mask may be too tight, and the resolution estimation may be a little over-optimistic.

Looking at the per-particle scale factors, we noted a shoulder on the lower end (indicated by a green arrow in Figure 8D indicting a bimodal distribution). Often when the particle scales are multimodal, we tend to find that the lower scale peak contains poor-quality, low-contrast particles, or particles that do not match the refined volume, contribute little or not at all, and so sometimes removing these particles improves the quality of refinement, and of downstream jobs such as 3D Classification and Local Refinement. We will come back to the per-particles scales in Section 6.

As we refined with C2 symmetry and symmetry relaxation, we can look at how many of the particles were initially assigned the wrong pose (see Figure 8F), and in this refinement it looks like less than 10% of the particles were initially assigned to a pose ~180 degrees different to the best pose. This means that if those particles contained ALC1, then before relaxation they would contribute ALC1 density at 180 degrees away from where it truly belonged.

6. Global and Local CTF refinement, and separation of particles based on per-particle scale

We have a nice initial refined map, but the CTF parameters estimated by Patch Motion Correction are not likely to be optimal, so we can test out whether refining the defocus of each particle by Local CTF Refinement, or fitting higher order aberrations will help our refinement quality.

  • Run a Global CTF Refinement, inputting the particles, volume and mask_fsc_auto from Refinement 1, setting Fit Spherical Aberration, Fit Tetrafoil , and Fit Anisotropic Mag. all to true.

In order to use the mask_fsc_auto you will first need to input the refinement mask, and then drag over the mask_fsc_auto into the lower-level information from the refinement job outputs tab. For more information about modifying lower-level information see our Job Builder Tutorial.

We want to examine the plots from these jobs but also compare their effect on the refined volume. For this dataset NU refinement can take a few hours so instead of refining after every step, we can speed things up by performing Homogeneous Reconstruction instead, using the poses from Refinement 1.

The reason why we also run a reconstruction from Refinement 1 is that Non-Uniform Refinement includes adaptive marginalisation, which means that instead of each particle being assigned a single pose, it can be partially assigned with different weighting to the best pose matches. During Homogeneous Reconstruct Only there is no marginalization over poses, and only the best pose is used, so to make the comparison of particles before and after CTF refinement fair, we reconstruct them in the same manner with the same mask.

Figure 9. Results from Global and Local CTF Refinement. A) Phase error and fit for odd terms. B) Phase error and fit for even terms. C) Phase error and fit for Anisotropic magnification in X. D) Phase error and fit for Anisotropic Magnification in Y. E) Example defocus profiled from Local CTF Refinement. F) GSFSC curves from Homogeneous Reconstruct Only jobs before and after CTF Refinement. Panels A-D correspond to exposure group 2.

After running Global CTF Refinement it is a good idea to look at the phase errors for each of the fits that were tried (see Figure 9A-D) to ensure that we are satisfied there is data to support the fit. We are looking for the red and blue features in the data to be appropriately captured in the fits. For the odd terms and anisotropic magnification, the fits look pretty good, and although the phase errors for the even terms are a bit lower in magnitude (paler colours), the fit looks consistent.

After running Local CTF Refinement it’s worth taking a look at the example defocus profiles. In the examples in Figure 9E we see a single deep well that reflects the optimal defocus for that particle. If the wells were broader, or there were multiple wells of similar depth for each particle then that suggests that defocus refinement is less certain about the best fit for the defocus, and might indicate that refining defocus won’t help, or might even worsen the map resolution.

Looking at our Homogeneous Reconstruction FSC curves (where the same mask was used for all three) we see that both Global and Local CTF refinement improved the map resolution. For these tests we ran Global and Local CTF refinement in parallel and not in sequence. Both helped, but the fit for Local and Global CTF will depend on each other, so we now have to choose which one to do first. As Global CTF had the greater impact we will start with these particles. We can perform both Global and Local CTF Refinement as part of Non-Uniform Refinement, so we will do that next.

Ab-Initio refinement has a 50% chance of reconstructing your map in the right hand. Before running the next refinement, be sure to check the hand of your map as soon as the features are sufficient to discriminate. For a nucleosome dataset, you can look at the hand of the DNA (it should be right-handed) or you could compare the map to one of the many deposited models of nucleosomes. If the hand is incorrect:

  • Use Volume Tools job, inputting your map and setting Flip hand:true

  • Create a Non-Uniform Refinement job (Refinement 2) with the same settings as Refinement 1 but using the particles from Global CTF and the nucleosome volume from Heterogeneous Refinement 1 or the hand-flipped Volume Tools job (whichever is appropriate). Set Optimize per-particle defocus , Optimize per-group CTF params , Fit Spherical Aberration and Fit Tetrafoil all to to true.

Look at the map and plots compared to Refinement 1.

Figure 10. Plots from NU Refinements 2 and 3, and subsequent Subset particles by statistic jobs. A) GSFSC and cFSC curves from NU Refinement 2. B) Per-particle scales clustered by gaussian fitting (left) or by manual thresholding (right). C) cFSC curves for NU refinements 3 (right) and 3B (left).

We found that the map density and GSFSC were improved in Refinement 2. When we examined the per-particle scale plot we still saw the shoulder that was apparent in Refinement 1. We would like to separate the particles into two clusters to see if removing the low-scale particles improves the refinement!

Sometimes low-scale particles (due to their difficulty with alignment to or dissimilarity to the reference volume) can have a detrimental effect on the refinement, and downstream processes such as 3D Classification and Local Refinement. Where multimodal particle scale distributions are observed (suggesting two or more population types in particle set), it can be better to exclude the low-scale particles from downstream work.

  • Run a Subset Particles by Statistic job, selecting Subset by:Per particle scale. You could use the default subsetting mode that uses gaussian fitting, but we chose to use Subsetting mode:Split by manual thresholds, Number of thresholds:1 and Threshold 1:0.8.

Our output plot is shown in Figure 3B (right) alongside a comparison of the result of using the subsetting mode “Cluster by Gaussian fitting” in Figure 3B (left).

In this case as it is a shoulder and not a clear peak and we preferred to set the threshold manually to retain more of the particles, at the risk of including some slightly lower scale particles. Now we want to see if removing these low-scale particles made things any better by re-refining them! We ended up with ~1.7M particles in cluster 1 and ~300k particles in cluster 0.

  • OPTIONAL: Create a Non-Uniform Refinement job (Refinement 3) with the same settings as Refinement 2, using the particles from cluster 1 from Subset particles by statistic, with the same settings as Refinement 2.

  • OPTIONAL: Create a Non-Uniform Refinement job (Refinement 3B) with the same settings as Refinement 2 using the particles from cluster 0 from Subset particles by statistic, with the same settings as Refinement 2.

We found that the GSFCS and cFAR were marginally improved in Refinement 3, but the resolution and cFAR was poorer in Refinement 3B. As a fairer comparison, we also refined a random subset of particles from cluster 1 so that we were comparing the same number of particles and this gave us a GSFSC resolution of 2.56 and cFAR of 0.74. From this we can infer that the low-scale particles on their own lead to a poorer reconstruction than an equivalent number of high scale particles.

It is not strictly necessary to run Refinement 3; if you want a faster way to assess if removal of the low-scale particles helped or hindered your map statistics, you might prefer:

  • OPTIONAL: Run a Homogeneous Reconstruction Only (4) job using the particles from cluster 1 and the mask_fsc_auto from Refinement 2.

We now have a nice set of particles from cluster 1, and a nice map of the nucleosome, but we don’t seem to have any observable density for ALC1, so we will go on to figure out where it is!

7. 3DVA and focus mask design

If you are looking for heterogeneity (discrete or continuous), a great way to inform onward processing choices is to run 3D Variability Analysis. In our case the biochemistry tells us ALC1 should be bound to the nucleosome in this sample and we are looking for its location. The fact that we can’t see it in the maps so far indicates that it is either very flexible, bound in a low stoichiometry (less than 1:1 binding) or a combination of the two.

3DVA can be run at the lowest resolution at which you might hope to visualise your feature of interest. In cases where the feature might be low stoichiometry or highly flexible, low resolutions such as 15 Å can work well.

Before we go ahead though, we need to consider masking. As we don’t know where the ALC1 is, we need to make sure that the mask is generous enough to encompass the potential binding regions, and we don’t want to use the solvent mask from Non-Uniform Refinement (recall that we effectively disabled masking) because using a mask that encompasses the entire box can lead the 3DVA modes being dominated by box corner artefacts.

  • Create a Volume Tools job, inputting the volume from Refinement 3/ Homogeneous Reconstruct 4, set Lowpass filter (A) :15, Type of output volume:mask and dilation radius (pix) :25. For the Threshold we set 0.025 but you may need to adjust this value to suit your input map. This is Mask 1.

  • Create a 3D Variability job inputting the particles from either Refinement 3, Homogeneous Reconstruct 4 or cluster 1 (these three should yield ~the same results) and Mask 1. We will set Filter resolution:15 as we would like to focus on low resolution features, and Only use this many particles:100,000 as at such low resolution we do not need to use the whole dataset.

  • Create a 3D Variability Display job inputting the particles and components from the 3DVA job. We will assess the 3DVA by using simple mode so that we can visualise the motions found in the 3 components. The pixel size of the extracted particles is smaller than necessary to visualise low resolution features, so we can useDownsample to box size:128 to make downloading of the volumes faster.

We show the 3 modes that we observed from 3DVA in Movie 1.

Movie 1. Modes from 3DVA.

We found that modes 1 (teal) and 2 (blue) were dominated by the appearance and disappearance of density at the end of the DNA, whereas mode 3 (pink) is dominated by the appearance and disappearance of density bound to the nucleosome. We were unable to identify the the additional densities in modes 1 and 2, although based on the biochemistry, it is possible that this is PARP2. The additional density in mode 3 is consistent with with the mass of ALC1, and is a reasonable location for a motor to bind to the nucleosome.

Now that we know where the ALC1 binds, we can try to classify the particles to separate those that do, and don’t have ALC1 present. To do this we need to make a new focus mask covering the region of interest. We will do this by creating a difference map between the first and last frames of the 3DVA mode where ALC1 appears.

  • Identify the mode from your 3DVA job that contains ALC1 density and in ChimeraX run the following commands:

    volume subtract #X.1 #X.20
    and
    volume subtract #X.20 #X.1

where X is the model number for the correct mode in your ChimeraX session.

You should find that one of the newly created volumes contains the ALC1 volume and a few other bits and bobs. You can use the Map Eraser tool in ChimeraX to remove the extra bits, save this volume and upload it to your CryoSPARC workspace. The other difference volume can be discarded.

Note that the 3DVA volumes used here were downsampled to a box of 128 meaning that it has 2.73 Å/pix, and that for mask generation, the dilation radius and soft padding width will need to be fewer pixels than if the full box of 416 at 0.84 Å/pix was used.

  • Create a mask using Volume Tools starting from the cleaned up difference map created above. The threshold will need to be determined for your own map but we used Lowpass filter (A):15, Type of output volume :mask, Threshold :0.25, Dilation Radius (pix):2, Soft padding Width (pix) :5. This is Mask 2.

See Figure 11A-C, below, for example density for the creation of Mask 2.

As we are also interested in seeing if ALC1 has bound to the opposite side of the nucleosome, and the previous refinements were all aligned to the pseudo-C2 symmetry axis, we can rotate the mask using a simple operation.

  • Run a Volume Alignment Tools job inputting the volume from Refinement 3/ Homogeneous Reconstruct 4, and Mask 2. Set 3D rotation euler angles (deg or rad) 0,0,180. The output mask is Mask 3.

For 3D Classification the focus mask should ideally be within the solvent masked region, so we also need to make a more generous solvent mask.

  • Create a Volume Tools job the same as for Mask 1 but set dilation radius (pix) :50 This is Mask 4.

8. 3D Classification of the ALC1 binding sites

Now that we have masks made, we can go ahead and classify our particles to fish out the subpopulations of interest!

  • Run a 3D Classification job (Classification 1) using the particles from either Refinement 3, Homogeneous Reconstruct 4 or cluster 1, with Mask 4 as the solvent mask, and Mask 2 as the focus mask. Input the following settings:

Parameter
Setting
Explanation

Number of classes

40

Using more classes can help separate low population classes

Filter resolution (A)

15

A resolution at which we can see the ALC1 density

Initialization mode

PCA

We found PCA initial volumes gave us a more reproducible result than using simple mode

Class similarity

0.1

When looking for density presence/absence, the classes should not be very similar

  • Run a 3D Classification job (Classification 2) using the same inputs as Classification 1 except using Mask 3 as the focus mask, and setting Number of classes 80.

  • OPTIONAL: Test the effect of changing the number of classes up and down, and the quality of the ALC1 after NU refinement of the best class(es)

  • Examine the density for the output classes in ChimeraX.

We show the classes we obtained from 40-class Classification 1 in Figure 11B, and from 80-class Classification 2 in Figure 11C. After adjusting the contour threshold to ~2, we found a single class that had good density for ALC1 in each classification and we coloured these classes in pink. We noted density in other classes nearby the ALC1-binding site that might reflect different stages of ALC1-binding, but we do not go on to investigate those in this case study.

Figure 11. Masking and results from 3D Classification. A) Mask design using volumes of 3DVA frames 0 and 19 from one of the components B) Volumes from 3D Classification with 40 classes shown at a threshold of 2. C) Volumes from 3D Classification with 80 classes shown at a threshold of 2 The classes that contain ordered ALC1 is coloured in pink.

The choice to use 40 and 80 classes was not based on any prior information about what would work best. We took an empirical approach and tested a range, from 2 up to 80 classes. To assess which classification gave the best result, the ALC1-containing class(es) for each classification were NU-Refined and the ALC1 density was compared (see Figure 12).

Figure 12. NU Refinement of the best classes after Classification with a different number of classes.

We found that for 2-40 class classifications, only a single class had density for ALC1 (at varying strength), but the 80-class classification had good ALC1 density in two classes). After Non-Uniform Refinement the map quality was the best from the 40-class classification. Although the refined maps from the 2-class and 4-class classification don’t initially appear to have discernible ALC1 density after Non-Uniform Refinement, ALC1 is visible if the map is low-pass or gaussian filtered.

In a refined map, a region can have weak density (i.e., appearing only at a low threshold or after low-pass filtering the map) due to partial occupancy or flexibility. The same effect can also happen when particle classification is incomplete, as particles that are missing the density and those that include the density are mixed in the final reconstruction. 3D Classification can, in some cases, produce particle classes where classification has not completely separated the states, and class sizes are uniform. In these cases, it can help to explore using large numbers of classes, much larger than the number of states you expect in the sample.

You will notice that both classification jobs have the same particles as inputs and were effectively run in parallel. It is possible that there is a subpopulation of particles that are found in both of the selected classes and hence have ALC1 density on both sides at the same time. We can find these particles by comparing the particle stacks.

  • Run a Particle Sets Tools job inputting the good class particles from Classification 1 in Particles (A) and the good class particles from Classification 2 in Particles (B) and setting Action intersect.

We ended up with 65.4k particles with ALC1 in one site, 28.9k particles with ALC1 in the other site, and ~760 particles that have ALC1 density in both sites, comprising ~3.9% (site 1), ~1.7% (site 2) and 0.05% (double-bound) of the input particles. We recall that there are theoretically two possible binding sites for ALC1; one might have a higher-affinity for PARP binding than the other. We see a larger population of particles (3.9%) in one position, and this position is consistent with the expected preferred PARylation site described in Bacic et al .

Why didn’t we symmetry expand before classification in this case?

The core nucleosome in this sample has intrinsic asymmetry in the DNA lengths. As the population of ALC1 bound nucleosomes is relatively small, the symmetry relaxed pose-assignment of particles before classification would be based on the DNA lengths at each end. Classifying using such aligned particles allowed identification of the particles sets with ALC1 bound to the two distinct sites.

If we symmetry expanded and then classified then we might find the same total population with ALC1 bound, however, any downstream local refinements would have lost any information about populations or differences between the two sites.

  • Run a Non-Uniform Refinement job with the same settings as Refinement 2 (Refinement 4A) using the particles from A minus B of the Particle Sets Tool job and the volume for the corresponding class from classification.

  • Run a Non-Uniform Refinement job with the same settings as Refinement 2 (Refinement 4B) using the particles from B minus A of the Particle Sets Tool job and the volume for the corresponding class from classification.

For the intersection of particles that might contain two copies of ALC1, we don’t have a volume that represents this, so we need to perform a reconstruction before we refine these particles.

  • Run an Ab-Initio job (3) using the particles from Intersection (keeping A data) of the Particle Sets Tools job.

We use Ab-Initio at this point, rather than homogeneous reconstruction to ensure that we have not added artificial bias. If, despite symmetry relaxation, a small population of particles during NU refine were aligned incorrectly by 180 degrees, this could result in the appearance of ALC1 density on both sides of the nucleosome in a reconstruction, and using a reference volume with density on both sides could bias the refinement. When in doubt re-running Ab-Initio is safer.

  • Run a Non-Uniform Refinement job with the same settings as Refinement 2 (Refinement 4C) using the particles and volume from Ab-Initio 3, and setting Optimize per-particle defocus:False and Optimize per-particle CTF params:False.

We turn off local and global CTF refinement here because there are very few particles and any updated estimations are likely to be much poorer than the input values.

Examine the maps and compare to Refinement 2, ensuring that you have aligned the particles in the correct way so that the long end of the DNA matches. We show example maps in Figure 13. We can clearly see defined density for ALC1 in Refinements 4A and 4B, and although the map quality is poor is Refinement 4C we can clearly see density for ALC1 bound to both sites, while the long and short ends of the DNA are still discernible. We noted that the map for EMDB:13065 matches site 1 that we find here (purple map in Figure 13).

Figure 13. Processing steps and unsharpened maps produced before and after Classification and particle intersection.

We were interested in testing whether the presence of ALC1 had an impact on orientation sampling or other orientation diagnostics. To make this comparison fair comparison we should use the same number of particles, so we needed to make a subset of the ALC1-free particles to refine. First we need to obtain a stack that does not contain well ordered-ALC1.

  • OPTIONAL: Run a Particle Sets Tool job inputting the three particle sets from NU Refinements 4A-C in Particles A and into Particles B drag particles from Refinement 3/Homogeneous Reconstruct 4, or the high scale cluster from Subset particles (all three should contain the same particles set. Use intersect mode.

Now we want to get an equal sized particle stack for comparison

  • OPTIONAL: Run a Particle Sets Tool job with the A minus B particle set from above and use split mode, Split batch size as the number of particles as you have in Refinement 4A and Split randomize:True.

  • OPTIONAL: Run a Non-Uniform Refinement job (4D) with the same settings as Refinement 4, using the particles from split 0.

  • OPTIONAL: Run Orientation Diagnostics on Refinements 4A and 4D.

Figure 14. Orientation diagnostics for Refinements 4 and 4B. A and B) GSFSC and cFSC curves. C and D) Orientation sampling shown in plot and bild format. E and F) Fourier sampling plots.

From Figure 14A and 14B we can see that the cFAR is a bit lower in Refinement 4A, and digging deeper we see in Fig14C and 14D that there is apparently a stronger orientation bias when ALC1 is bound. This also negatively impacts the Fourier sampling (Figure 14E and 14F) however the cFAR is still above 0.5, there are no Fourier bins that are empty and our map did not show obvious signs of streaking or distortion that can indicate pathological orientation bias.

If a different orientation distribution is found after 3D classification, this can indicate a genuine bias difference on the grid, such as from interaction with the air-water interface or support-water interface. As 3D classification relies on being able to distinguish between states from 2D views, if the density difference is relatively small, there is also the possibility that in some views, the region of density difference is obscured by larger density regions, and cannot easily be separated into different states. This could potentially lead to an apparent difference in orientation bias that might not recapitulate the sample behaviour on the grid.

As the entry for EMDB:13065 also contains the half maps and mask, we were able to also run Orientation Diagnostics on that map for comparison, and found a cFAR of 0.46.

Considering the noisy nature of individual particle images, are these classified population numbers reliably related to solution populations of states? Maybe and maybe not! There are a few aspects that we can consider:

  1. Are some classes more prone to damage during vitrification and being thrown away during particle curation?

  2. Is particle quality sufficient for every particle to be perfectly aligned and classified?

Population differences might be a useful lead to build hypotheses, but it is a good idea to validate these observations by another technique wherever possible.

At this point we need to decide whether to keep the two particle sets separate, or if we want to combine them to potentially allow for a higher resolution for ALC1 at a cost to any unique information about the two sites. This sort of choice will need to be taken on a case-by-case basis as there is the possibility of losing information about symmetry-breaking features when symmetry-related sites are combined. We could not see any meaningful differences in the density for ALC1 in the two maps, and so we chose to combine them in a single refinement.

  • Run a Non-Uniform Refinement job with the same settings as Refinement 2 (Refinement 4A) using the particles from refinements 4A and 4B, and the volume from refinement 4A. Drag the lower-level slot for ctf from Refinement, Homogeneous Reconstruct 4 or cluster 1 into both particle inputs for the refinement. This is Refinement 4E.

As we performed CTF refinement separately during NU Refinement of the two particle sets, the values will not be consistent when we join them and CryoSPARC will not allow the job to proceed. To overcome this sort of situation we can simply swap the CTF values from a refinement before classification so that the values are consistent again, and to allow the particles to be refined together.

9. Reference-based motion correction

Now that we have a map that has density for ALC1, it looks like it will be good enough to fit and refine a model into, so we can push it further with Reference-Base Motion correction to enhance the map quality and resolution to hopefully make the modelling more reliable. Note that if you are using the downloaded particles, you will need to skip this step and move on to Section 10.

  • OPTIONAL: Run a Reference-Based Motion Correction job using the Non-Uniform Refinement inputs (particles, map and mask) from Refinement 4E, and curated micrographs from section 2. Change the low level information slot for the input mask, drag over the volume.mask_fsc. Set Save results in 16-bit floating point on. Use more GPUs to accelerate the job if you have the resources.

  • OPTIONAL: Create a Non-Uniform Refinement job with the same settings as Refinement 2 (Refinement 5) using the particles from RBMC.

    Figure 13. Empirical Dose Weights and Example Trajectories from RBMC.

The RBMC job took around 16 hours to run on 2 GPUs, and example outputs for Empirical Dose Weights and Trajectories are shown in Figure 15. We found that the map resolution after RBMC improved from 2.78 Å to 2.54 Å, however the map quality of the NU refinement was not substantially improved. We did find that the downstream Local Refinement map was better quality when RBMC was included in the pipeline. Considering the length of time required for RBMC, the choice to include/exclude it may depend on the balance between result quality and time/resource constraints.

Looking at the ALC1 density from Refinements 4E and/or 5, and you might notice that the density is still not as good quality as that of the histone proteins at the core of the nucleosome, this is likely due to a degree of flexibility or heterogeneity in the binding of ALC1, and we can try to further improve the density of ALC1 by locally refining this region.

10. Local refinement of ALC1

It appears in our map that the core nucleosome is relatively large and rigid compared to the ALC1, and so it will be dominating the particle alignment. We can use Local Refinement to align the ALC1 a bit better, and for this we need to think about masking strategies.

  • Use the Map Eraser tool in ChimeraX on Refinement 4E or 5, and remove all but the ALC1 density and a little of the DNA to which it is bound. We show an example of the erased volume that we used in Figure 14.

  • Upload this volume to your CryoSPARC workspace and run an Import 3D Volumes job to import it

  • Create a Volume Tools job, input the above volume and set Lowpass filter (A):15, Type of output volume:mask, Threshold:[whatever is appropriate for your volume, we used 0.019], Dilation radius (pix):4 and Soft padding width (pix):15. This is Mask 3.

  • Run a Local Refinement job, inputting the particles, mask and volume from Volume Alignment Tools, and with the following settings:

Parameter
Setting
Explanation

Use pose/shift gaussian prior during alignment

true

Using priors penalises deviation too far from the input poses

Standard deviation (deg) of prior over rotation

3

A strict range

Standard deviation (A) of prior over shifts

2

A strict range

Re-center rotations each iteration?

true

Re-center shifts each iteration

true

Initial lowpass resolution (A)

7

The resolution at which the FSC is still ~1

We also compared this refinement to one that used default settings for local refinement, and found this map quality to be slightly poorer with default settings after sharpening.

  • OPTIONAL: Run a Local Refinement (Local Refinement 1B) with default settings, inputting the particles, mask and volume from your last ALC1-containing refinement (Refinement 4 or 5) and Mask 3.

Figure 16. Volumes and masks used for local refinement. Unsharpened volumes from NU and Local refinement are shown along with the mask used for local refinement.

We can see from Figure 16 that the map quality for ALC1 is improved after local alignment within the masked region, making it more interpretable and better for model-building.

11. Map sharpening, assessment of map quality, and comparison to the original deposited map

At the end of refinement it is always worth assessing if the auto-sharpening has applied an appropriate B-factor. We want the sharpening to enhance high resolution features, but without causing map fragmentation, excessive noise, or creating sharpening artefacts such as unexpected density extending from the map. As cryo-EM maps tend to contain a range of resolutions, picking a single B-factor to sharpen means taking a compromise value where the high-resolution regions are not as sharp as they could be, and the low-resolution regions are not as connected as they could be.

  • Examine the unsharpened and sharpened map from Local Refinement, and see if you are happy with the level of sharpening applied. We felt that the map was very slightly over-sharpened due to the appearance of spiky noise extending from the protein that are unlikely to be real features at this resolution. If you would like to change the map sharpening:

  • OPTIONAL: Run a Sharpening Tools job, inputting the Local Refinement map and mask, and setting B-Factor to apply to your desired value, we chose -65.

We show example maps for unsharpened, automatically-sharpened and manually sharpened maps in Figure 17A.

Figure 17. Sharpening and Local Resolution. A) The unsharpened, auto-sharpened (B-factor -74.9) and manually sharpened (B-factor -65) maps from Local Refinement. B) EMDB:13065 coloured by local resolution. C) NU Refinement 5 coloured by local resolution. D) Local Refinement coloured by local resolution.

Assessing the resolution of a local refinement can be tricky, as the masked region can be hard to select and there will inevitably be some density outside of the local refinement mask. As well as looking at the GSFSC resolution of the map, we can also assess and compare the local resolutions for ALC1 in the last NU Refinement and the Local Refinement. In order to proceed we want to ensure that all of the meaningful parts of the map (i.e. the whole nucleosome and ALC1) is inside the mask, to avoid the situation where voxels outside are labelled as having a resolution of 0.

  • Run a Volume Tools job, inputting the mask from the last NU Refinement, and drag over the mask_fsc into the lower level slot. Set Type of input volume:mask Type of output volume:mask, Threshold:1, and Dilation radius (pix):10. This is Mask 5.

  • Run a Local Resolution Estimation job inputting the map from Local Refinement, Mask 5.

  • Run a Local Resolution Estimation job inputting the map from the latest NU Refinement and Mask 5.

  • Run a Local Filtering job for each of the two above jobs

  • Visualise the local resolution in ChimeraX by using the Surface Colour function.

We also ran a Local Resolution Estimation job using the half maps from EMDB:13065 (the map originally deposited for this dataset) for comparison. We show these three in Figure 15B-D. We observe a striking difference in resolution, with the deposited map ranging from ~5-7 Å for ALC1, and our Local Refined map ranging from ~2.5-3.5 Å for ALC1. This improvement in resolution comes along with significantly better map clarity that allows a model to be more confidently built and refined.

We updated the PDB for ALC1, starting from PDB:7enn, so that it matched our map. We found that after real space refinement in PHENIX, the Q-score for just the ALC1 protein was 0.74, compared to a value of 0.33 for the ALC1 model from 7otq against EMDB:13065 (see Figure 18) indicating a substantial improvement in the model-to-map fit and confidence.

We were also able to see density for the histone H4 N-terminus that forms contacts with ALC1, and to predict hydrogen bonding and salt bridges that are likely involved in the binding of ALC1 to DNA (see Figure 16A).

Figure 18. Map and model fits for our updated Local Refinement, and for the original map and model for EMPIAR-10739. A) Sharpened map density and our updated model at the interface between ALC1 and the DNA, and with histone H4 N-terminus. B) The sharpened (B-factor -65) Local Refinement map shown with partial transparency and coloured by proximity to our updated model. C) The EMDB:13065 map shown with partial transparency and coloured by proximity to PDB:7otq.

Conclusions

In this case study, we focussed on the processing of EMPIAR-10739, that contains compositional heterogeneity. As this nucleosome is expected to display pseudosymmetry, we chose to refine using C2 with symmetry relaxation, to avoid symmetry-breaking features being partially present at both symmetry related positions. The particle stacks after gross cleaning achieved a good resolution for the nucleosome, but we were able to further improve this by performing:

  • Global CTF Estimation

  • Local CTF Estimation

  • Subset particles by Statistic on the basis of particle scale

This map did not have clear density for ALC1 and so we investigated the compositional heterogeneity through:

  • 3DVA with a generous solvent mask

  • Creating a difference map to aid ALC1 mask generation

  • 3D classification with 40 and 80 classes to separate out the best ALC1 classes

Refinement of the selected 3D classes had better density for ALC1 but it was still fragmented at the periphery, so we performed Local Refinement to further improve the map clarity.

Our aim was to obtain a map containing ALC1 for molecular modelling and compared to the original published processing pipeline that lead to EMDB:13065, were able to improve the resolution from ~5 Å to ~3 Å for ALC1, and improved the cFAR from 0.46 to 0.61.

During this case study we were focused solely on obtaining a clear map of ALC1-bound to the nucleosome, however this dataset shows a high degree of heterogeneity and there may be many other interesting regions and classes to investigate! You could try applying the same sort of rationale to different classes or regions of the nucleosome to see what else can be found!

You can download our versions of the final maps, half maps and masks from the links below for comparison with your own processing!

References

Luka Bacic, Guillaume Gaullier et al. (2021) Structure and dynamics of the chromatin remodeler ALC1 bound to a PARylated nucleosome eLife 10:e71420.

We thank Dr Guillaume Gaullier for providing background context to the dataset, and enthusiasm towards this reprocessing case study!

Last updated