Case Study: End-to-end and exploratory processing of a motor-bound nucleosome (EMPIAR-10739)
Processing EMPIAR-10739 including handling global pseudosymmetry, using 3DVA to guide classification strategies, separating low population classes, and local refinement of a flexible region.
Introduction
In this case study we will work step-by-step through the full processing pipeline for the human Chromodomain-helicase-DNA-binding protein-like 1 (CHD1L) also known as Amplified in Liver Cancer 1 (ALC1) bound to the Xenopus laevis nucleosome, using a dataset originally collected and processed by Bacic, Gaullier et al deposited as EMDB:13065 and PDB:7otq. The raw data are publicly available for download as EMPIAR-10739. This case study is written so that you can replicate these results yourself.
We selected this dataset to provide an example processing pipeline for a protein-DNA complex that has sub-stoichiometric (less than 1:1) binding of a partner protein, and the image processing was performed using CryoSPARC v4.7.
Nuclear DNA in most eukaryotic cells is stored within the nucleus in a stabilised structure called chromatin. Within chromatin, DNA is wound around a core of histone proteins, typically H3, H4, H2A and H2B, forming units called nucleosomes that together have a “beads on a string” appearance. When the DNA in or near a nucleosome becomes damaged by a single- or double-strand break, a post-translational modification called poly(ADP-ribose) (PAR) gets added to histone amino acids (usually serine residues) near the damaged DNA site by Poly(ADP-ribose) polymerases PARP1 or PARP2. This process depends on Histone PARylation Factor 1 (HPF1) which directs PARylation to serine residues. PARylation of the histones then facilitates recruitment and binding of ALC1. ALC1 is responsible for ATP-dependent chromatin remodelling that is required for DNA repair and has been proposed as a potential target for drug development against cancer. We show the overall architecture of the ALC1-bound nucleosome found from EMPIAR-10739 (EMDB:13065).

The nucleosomes here were assembled in vitro using a synthetic nucleosome-positioning DNA sequence called Widom 601, which in this case was engineered to have different lengths at each end (one long with 10 additional base pairs extending out of the nucleosome, and one short). The side with longer DNA is more easily accessible to PARP, so PARP binding, PARylation, and downstream ALC1 binding might preferentially occur on one side of the nucleosome due to this asymmetry in DNA length around the nucleosome. This is important prior information that we will consider during our processing. The sample was also prepared using an inhibitory ATP analogue that prevents ACL1 catalysis after binding.
The primary aim of this pipeline is to separate out a population of nucleosome particles that have good density for the bound ALC1, and to resolve the ALC1 to an adequate resolution to observe its binding interactions with the nucleosome. The case study also includes some optional exploratory processing in blue text, some of which informed our decision-making for the final pipeline, and we show those results here to explain the choices made.
A summary of the pipeline of CryoSPARC jobs used in the sections of this case study are shown in a flow diagram below.

Setting up
This is a fairly large dataset with 34k movies, so ensure that you have sufficient storage disk space for the download (~14.5 TB) and processing (~ 1.2 TB) before commencing.
Before beginning this tutorial, you should create a new project and a workspace within that project. Download the 33,498 movies to a location of your choosing. For example, our data is downloaded to a directory called rawdata using the commands:
cd /path/to/rawdata
wget -m ftp://ftp.ebi.ac.uk/empiar/world_availability/10739 .1. Movie import and pre-processing
Import the data using an Import Movies job. The
Movies data pathshould match the location of the directory containing the downloaded.tiffiles, for example:/path/to/rawdata/ftp.ebi.ac.uk/empiar/world_availability/10739/data/batch-1/Images-Disc1/GridSquare*/Data/FoilHole*fractions.tiff.
These movies are already gain corrected so you do not need to add any gain reference.
Add in the experimental information below that we obtained from the EMD:13065 entry.
Raw pixel size (A)0.84
Accelerative Voltage (kV)300
Spherical Aberration (mm)2.7
Total exposure dose (e/A^2)45
As the data were collected in two batches, run one import job for each batch, so that they are designated different exposure groups that will later on in the pipeline allow independent refinement of Global CTF parameters that may differ between the two. Now we want to correct for beam-induced motion during movie collection and to estimate the defocus values for each micrograph.
If you are using the downloaded particles, instead:
Create an Import Results Group job and navigate to the absolute path of the file J8045_split_0.csg and queue the job to import the particles. You can run Ab Initio 1 from Section 4, then use this volume as an input for Non-Uniform Refinement 1 in Section 5, for downstream steps.
Run a Patch Motion Correction job for each import job, followed by Patch CTF Estimation.
In Patch Motion, if you use Save results in 16-bit floating point:true, then the output images will take up half of the disk space compared to the default 32-bit floating point files, with no reported loss in accuracy. The Number of GPUs to parallelize can be set depending on GPU availability, and on a single GPU we found each Patch Motion job took ~28-36 hours. This runtime could be reduced substantially by running Patch Motion on multiple GPUs.
Run Patch CTF Estimation with the default settings.
2. Excluding poor micrographs from downstream processing
Movies collected for single particle cryo-EM can have a number of different characteristics. Some of these, like a range of defocus values, or a range of ice thickness can be beneficial for image reconstruction, but movies can also contain junk and outlier attributes that reduce the quality of their particles such as excessive in-movie motion and ice that is too thick or too thin for your sample. Common junk can come in the form of non-vitreous ice and contamination with ice crystals or other features such as the edge of the holey support.
We can use the Micrograph Junk Detector to identify regions of junk, and to give us statistics about the types and quantity of junk present in the data.
Run a Micrograph Junk Detector job and have a look at the example images and summary graphs

We see from the example outputs in Figure 1 that the edge of the holey carbon support (green region) is within the imaged area, and that there are some small regions of ethane contaminants or small ice crystals (magenta regions). The summary graphs indicate that while many of the images contain carbon edge and extrinsic ice defects, the area of the micrographs that these take up is relatively low at ~5-6%.
We will inspect the CTF-estimated, and junk detected micrograph statistics using a job so that we can exclude images of poor quality from downstream processing.
Run a Manually Curate Exposures job, inputting the “Labelled Micrographs” from Mic. Junk Detector.
In the job card view go to the pink Interactive tab
Try plotting
Relative Ice ThicknessagainstIntrinsic Ice Defect Area %to look for a correlation.Set thresholds for outliers on undesirable characteristics
We chose to select the following thresholds:
CTF fit resolution
max 5 Å
Exclude poorest resolution micrographs
Relative Ice Thickness
max 1.2
Exclude very thick ice with poor signal-to-noise
Total full-frame motion distance
20
Exclude outliers with large motion
Intrinsic Ice Defect Area %
max 10
Exclude images with substantial amounts of non-vitreous ice
We might expect a correlation between the Intrinsic Ice Defect Area % and Relative Ice Thickness because both statistics are likely to pick up micrographs with very thick or non-vitreous ice, but they do so by different methods:
Relative Ice thickness is based on the amount of power in the ice resolution band compared to the background
Intrinsic Ice Defect is based on visual similarities to a manually segmented set of micrographs containing non-vitreous ice
There are occasionally cases where one statistic identifies a problematic micrograph and the other does not. In our Manually Curate Exposures job, a threshold of 1.2 for Ice Thickness excluded 4,737 micrographs but by adding in a threshold of 10% for Intrinsic Ice Defect Area we rejected a further 333 mics. Plots for Relative Ice Thickness and Intrinsic Ice are shown in Figure 2A.
In Figure 2B we have an example of an accepted micrograph that is typical for this dataset. In Figure 2C we show a micrograph with high relative ice thickness, but the Micrograph Junk Detector did not identity this as having an Intrinsic Ice Defect. On the other hand, in Figure 2D we see a micrograph identified as containing a substantial amount of Intrinsic Ice Defect, but low relative ice thickness. We therefore find it beneficial to make use of both of these statistics to get the best coverage of images containing excessively thick or non-vitreous ice. This left us with ~28k micrographs from the full dataset.

3. Micrograph denoising and blob picking
Now that we have excluded the poor micrographs we can go ahead and denoise the micrographs using the Micrograph Denoiser. This can make downstream picking more consistent across defocus values, and the more prominent appearance of particles makes thresholds for particle picks easier to choose.
Create a Micrograph Denoiser job, inputting the exposures_accepted and set
Number of CPUs to Parallelizeto the number of CPUs available on your node.

We will now move on to perform blob picking on these cleaner-looking images. There are many examples of nucleosome structures in the PDB, so we can look at those to estimate the diameter of the particles to pick. To us it looked like a circular diameter in the region of 150-180 Å should contain a single nucleosome particle.
Create a Blob Picker job with the following settings:
Minimum particle diameter (A)
150
Maximum particle diameter (A)
180
Pick on denoised micrographs
true
Number of mics to process
5000
We will use just 5,000 micrographs at this stage as we will use blob picked particles to generate 2D class average templates for later template picking on the full dataset. We already detected junk regions in the micrographs so we can automatically reject picks within and nearby those areas.
Run a second Micrograph Junk Detector job and input the exposures and particles from the blob picking job.
We found relatively few particles got rejected a this stage, only 60k out of 1.5M, likely because we already rejected the worst micrographs.
Run an Inspect Picks job and look at the first few micrographs, and at the ncc/power score plot.
Looking at the ncc/power score plot we can see there is one main cluster of picks (yellow on the heat map) with a centre around a power score of 100.
While we could set the threshold manually for ncc and power score manually (see Figure 4A), we might be able to make a better selection by using the Auto clustering option (see Figure 4B).

Make a new Inspect Picks job, this time setting
Auto Cluster:true, andTarget power score:100.
We ended up with 860k particles which were suitable for taking forward to extraction.
The longest diameter of the nucleosome is ~100 Å but we are expecting with ALC1-bound that this will be larger, so we initially chose a generous box size of 350 pixels (294 Å) to be sure that we extracted the ALC1 as well as the nucleosome.
Create an Extract from Micrographs (CPU or GPU) job, inputting the particles and micrographs from Inspect picks with the following settings:
Extraction box size (pix)
350
This box should generously include the nucleosome target
Fourier-crop to box size (pix)
64
F-cropping to a smaller box saves disk space, and allows jobs to run faster
Save results in 16-bit floating point
true
using float-16 format saves disk space
Number of CPU cores
64 (or as many as you have on your node / workstation)
set only for “Extract from micrographs (CPU)” job type
Number of GPUs to parallelize
4 (or as many as you have on your node / workstation)
set only for “Extract from micrographs (GPU)” job type
4. Cleaning particles to generate 2D templates, and template picking
We want to use our blob picked particles to generate nice 2D templates with a variety of particle views that can be used for template picking.
To begin with we will use 2D classification to throw out the worst of the junk particles.
Run a 2D Classification job with the extracted particles and the following settings:
Number of 2D classes
100
Using more classes allows us to find a greater diversity of junk types
Maximum resolution (A)
10
The resolution of these images is limited to ~9 Å due to Fourier cropping
Initial classification uncertainty factor
1
Setting this lower than 2 encourages greater diversity of classes = good to identify junk
2D zero pad factor
1
Reducing padding will allows the job to run faster
Number of GPUs to parallelize
4
Use as many as are available on your node or workstation
Now we want to take a look at the 2D class averages and select the best ones
Run a Select 2D Classes job and select the class averages that contain views of the intact target like the examples in Figure 5.

In some of our selected classes we can see density outside of the core nucleosome (shown inside dotted circles in Figure 5) and we can use the ruler to get a better estimate of the particle diameter. It looks like the ALC1-bound nucleosome has a diameter of around 120 Å. Some class averages do not have obvious density for ALC1, but at this stage we will keep all the good classes in and handle separation of ALC1-free and ALC1-bound nucleosome later on in the pipeline.
The diversity of views in these selected classes is OK, but with a little more particle clean-up we can probably do better, so we can move on to remove more junk in 3D by use of Ab-Initio and Heterogeneous Refinement, and then generate better 2D templates for picking.
As we already separated some good particles from bad ones after blob picking, we can use those particles to generate low resolution volumes in two separate jobs:
Run an Ab-Initio Reconstruction job (1) with
Number of classes:1 and input the “particles selected” from your Select 2D job andNum particles to use:20,000Run an Ab-Initio reconstruction job (2) with “particles excluded” from your Select 2D job with the following settings
Number of Ab-Init classes
3
Num particles to use
1000
Maximum resolution (Angstroms)
30
Number of initial iterations
20
Number of final iterations
30
Now that we have generated 4 volumes we can use these as templates for Heterogeneous Refinement:
Run a Heterogeneous Refinement job (1) inputting the selected particles from Select 2D, the Ab-Initio models that you obtained and set
Refinement box size (voxels):64 as this is the box size that the particles were Fourier cropped to.
Examine your Ab-Initio and Heterogeneous Refinement output volumes.

At this stage we had ~180k particles in the best class (purple in Figure 6) from which to create some nice 2D templates.
Run a 2D Classification job using the particles from the best Heterogeneous Refinement class, and the following settings:
Number of 2D classes
30
We do not want an excessive number of selected classes as template picking takes longer with more provided templates
Maximum resolution (A)
10
The resolution of these images is limited to ~9 Å due to Fourier cropping
Initial classification uncertainty factor
1
Setting this lower than 2 encourages greater diversity of classes = good to identify junk
2D zero pad factor
1
Reducing padding will allows the job to run faster
Number of GPUs to parallelize
4
Use as many as are available on your node or workstation
Run a Select 2D Classes job and select the class averages that contain views of the intact target.
We show example selected classes in Figure 7A. Up to this point we just used particles from a subset of micrographs to speed up 2D template generation, but now we are ready to pick using the whole dataset.
Create a Template Picker job and connect the micrographs from the Manually Curate Micrographs job and the selected templates from last Select 2D Classes. Set
Pick on denoised micrographs:true andParticle diameter (A):140.
In our processing the Template Picker yielded ~20 million particles, but we want to remove the most obvious junk picks before extraction.
Run a Micrograph Junk Detector job and input the exposures and particles from the Template Picker job.
We were left with ~18M particles.
Run an Inspect Picks job and look at a few micrographs. Move the NCC and power score sliders to remove picks that look overly generous.
We noticed that there were two clusters visible in the Power Score/ NCC Score plot (see Figure 7D, inset), and found that an NCC threshold of 0.48 and power scores of 57-223 looked like it was keeping the particles without also keeping background picks. This left us with ~9M particles. Alternatively we could use the Auto cluster mode with a Target power score 80, but we found that this selection seemed to miss some good-looking particles (see Figure 7E).

Create an Extract from Micrographs (CPU or GPU) job, inputting the particles and micrographs from Inspect Picks with the same settings as Section 3.
5. Initial cleaning of particles from the full dataset
Now that we have extracted particles from the whole dataset we need to clean out the remaining bad particles. As we have a large quantity at ~7.5M it will likely be faster to first clean in 2D, but as there are so many particles we will get better separation of junk if we use more classes (which will inevitably slow down the job).
Run a 2D Classification job with the extracted particles and the following settings:
Number of 2D classes
400
Maximum resolution (A)
12
initial classification uncertainty factor
1
2D zero pad factor
1
Run a Select 2D Classes job and select the class averages that contain views of the intact target.
This job took us ~3.5 hours to run on 6 GPUs. At this point we retained ~2.6M particles in the selected classes. The classes appeared to have good contrast, indicating that further rounds of 2D Classification might not allow for rejection of many more particles. Instead, we will move onto a cleaning step in 3D. During cleaning of the blob picked particles in Section 4 we generated some 3D Ab-Initio volumes that are suitable templates to re-use now.
Run a Heterogeneous Refinement job (2) inputting the selected particles from Select 2D, the Ab-Initio models that you obtained in Section 4 and set
Refinement box size (voxels):64.
We ended up with ~2M particles in the best class. At this point it is a good idea to re-extract the good particles due to the following reasons:
A. The original picks may not have been well-centered in the boxes and we now have 3D alignment of particles from Hetero Refine to improve particle centering.
B. We chose to Fourier crop quite heavily to speed up cleaning and this limited the achievable resolution.
C. We can revise the extracted box size to include more of the delocalised signal caused by defocusing of the images during collection.
We now have a better estimation of the particle diameter at ~120 Å and we know the defocus range of the collected images extends to -3 µm, which might warrant a larger box than the commonly used 1.5-2x of the particle diameter.
440
2.47
0.83
3:02
416
2.47
0.83
2:49
384
2.50
0.80
2:28
360
2.50
0.80
2:16
320
2.54
0.79
1:55
We found the optimal result using a box of 416 and suggest that you also use this, unless you want to obtain a slightly lower resolution and potentially run the following jobs a bit faster.
Create an Extract from Micrographs (CPU or GPU) job, inputting the particles from the good Heterogeneous Refinement class, and micrographs from Inspect Picks, setting
Extraction box size (pix):416,save results in 16-bit floating point:true.Create a Non-Uniform Refinement job (Refinement 1) using the 416 box extracted particles, and the nucleosome volume from Heterogeneous Refinement with the following settings:
Symmetry
C2
This Nucleosome is expected to have pseudo C2 symmetry, causing alignment difficulty
Symmetry relaxation method
marginalization
Relaxing the symmetry allows the different symmetry-related poses to be tested for each particle
Minimise over per-particle scale
true
This allows weighting of the particle contribution to the reconstruction based on its contrast and similarity to the reference
Dynamic mask start resolution (A)
1
Effectively disabling masking during NU refinement
Once the job is completed, examine the output maps, and refinement statistics.

We found that the cFAR of 0.83 (Figure 8) indicated a good sampling of different orientations in the particle stack) and the orientation distribution shows a repeating pattern (see Figure 8E). Such a repeat can indicated a symmetrical or pseudo-symmetric particle. The reported resolution in Figure 8A was 2.45 Å, although we noted a separation between the tight and corrected FSC curves indicating that in this job (Figure 8A), the mask may be too tight, and the resolution estimation may be a little over-optimistic.
Looking at the per-particle scale factors, we noted a shoulder on the lower end (indicated by a green arrow in Figure 8D indicting a bimodal distribution). Often when the particle scales are multimodal, we tend to find that the lower scale peak contains poor-quality, low-contrast particles, or particles that do not match the refined volume, contribute little or not at all, and so sometimes removing these particles improves the quality of refinement, and of downstream jobs such as 3D Classification and Local Refinement. We will come back to the per-particles scales in Section 6.
As we refined with C2 symmetry and symmetry relaxation, we can look at how many of the particles were initially assigned the wrong pose (see Figure 8F), and in this refinement it looks like less than 10% of the particles were initially assigned to a pose ~180 degrees different to the best pose. This means that if those particles contained ALC1, then before relaxation they would contribute ALC1 density at 180 degrees away from where it truly belonged.
6. Global and Local CTF refinement, and separation of particles based on per-particle scale
We have a nice initial refined map, but the CTF parameters estimated by Patch Motion Correction are not likely to be optimal, so we can test out whether refining the defocus of each particle by Local CTF Refinement, or fitting higher order aberrations will help our refinement quality.
Run a Global CTF Refinement, inputting the particles, volume and mask_fsc_auto from Refinement 1, setting
Fit Spherical Aberration,Fit Tetrafoil, andFit Anisotropic Mag.all to true.
In order to use the mask_fsc_auto you will first need to input the refinement mask, and then drag over the mask_fsc_auto into the lower-level information from the refinement job outputs tab. For more information about modifying lower-level information see our Job Builder Tutorial.
Run a Local CTF Refinement, inputting the particles, volume and mask_fsc_auto from Refinement 1.
We want to examine the plots from these jobs but also compare their effect on the refined volume. For this dataset NU refinement can take a few hours so instead of refining after every step, we can speed things up by performing Homogeneous Reconstruction instead, using the poses from Refinement 1.
Run a Homogeneous Reconstruction Only (1) from the Global CTF job using the quick action menu
Run a Homogeneous Reconstruction Only (2) from the Local CTF job using the quick action menu
Run a Homogeneous Reconstruction Only (3) using the particles and mask_fsc_auto from Refinement 1

After running Global CTF Refinement it is a good idea to look at the phase errors for each of the fits that were tried (see Figure 9A-D) to ensure that we are satisfied there is data to support the fit. We are looking for the red and blue features in the data to be appropriately captured in the fits. For the odd terms and anisotropic magnification, the fits look pretty good, and although the phase errors for the even terms are a bit lower in magnitude (paler colours), the fit looks consistent.
After running Local CTF Refinement it’s worth taking a look at the example defocus profiles. In the examples in Figure 9E we see a single deep well that reflects the optimal defocus for that particle. If the wells were broader, or there were multiple wells of similar depth for each particle then that suggests that defocus refinement is less certain about the best fit for the defocus, and might indicate that refining defocus won’t help, or might even worsen the map resolution.
Looking at our Homogeneous Reconstruction FSC curves (where the same mask was used for all three) we see that both Global and Local CTF refinement improved the map resolution. For these tests we ran Global and Local CTF refinement in parallel and not in sequence. Both helped, but the fit for Local and Global CTF will depend on each other, so we now have to choose which one to do first. As Global CTF had the greater impact we will start with these particles. We can perform both Global and Local CTF Refinement as part of Non-Uniform Refinement, so we will do that next.
Ab-Initio refinement has a 50% chance of reconstructing your map in the right hand. Before running the next refinement, be sure to check the hand of your map as soon as the features are sufficient to discriminate. For a nucleosome dataset, you can look at the hand of the DNA (it should be right-handed) or you could compare the map to one of the many deposited models of nucleosomes. If the hand is incorrect:
Use Volume Tools job, inputting your map and setting
Flip hand:trueCreate a Non-Uniform Refinement job (Refinement 2) with the same settings as Refinement 1 but using the particles from Global CTF and the nucleosome volume from Heterogeneous Refinement 1 or the hand-flipped Volume Tools job (whichever is appropriate). Set
Optimize per-particle defocus,Optimize per-group CTF params,Fit Spherical AberrationandFit Tetrafoilall to to true.
Look at the map and plots compared to Refinement 1.

We found that the map density and GSFSC were improved in Refinement 2. When we examined the per-particle scale plot we still saw the shoulder that was apparent in Refinement 1. We would like to separate the particles into two clusters to see if removing the low-scale particles improves the refinement!
Run a Subset Particles by Statistic job, selecting
Subset by:Per particle scale. You could use the default subsetting mode that uses gaussian fitting, but we chose to useSubsetting mode:Split by manual thresholds,Number of thresholds:1 andThreshold 1:0.8.
Our output plot is shown in Figure 3B (right) alongside a comparison of the result of using the subsetting mode “Cluster by Gaussian fitting” in Figure 3B (left).
In this case as it is a shoulder and not a clear peak and we preferred to set the threshold manually to retain more of the particles, at the risk of including some slightly lower scale particles. Now we want to see if removing these low-scale particles made things any better by re-refining them! We ended up with ~1.7M particles in cluster 1 and ~300k particles in cluster 0.
OPTIONAL: Create a Non-Uniform Refinement job (Refinement 3) with the same settings as Refinement 2, using the particles from cluster 1 from Subset particles by statistic, with the same settings as Refinement 2.
OPTIONAL: Create a Non-Uniform Refinement job (Refinement 3B) with the same settings as Refinement 2 using the particles from cluster 0 from Subset particles by statistic, with the same settings as Refinement 2.
We found that the GSFCS and cFAR were marginally improved in Refinement 3, but the resolution and cFAR was poorer in Refinement 3B. As a fairer comparison, we also refined a random subset of particles from cluster 1 so that we were comparing the same number of particles and this gave us a GSFSC resolution of 2.56 and cFAR of 0.74. From this we can infer that the low-scale particles on their own lead to a poorer reconstruction than an equivalent number of high scale particles.
It is not strictly necessary to run Refinement 3; if you want a faster way to assess if removal of the low-scale particles helped or hindered your map statistics, you might prefer:
OPTIONAL: Run a Homogeneous Reconstruction Only (4) job using the particles from cluster 1 and the mask_fsc_auto from Refinement 2.
We now have a nice set of particles from cluster 1, and a nice map of the nucleosome, but we don’t seem to have any observable density for ALC1, so we will go on to figure out where it is!
7. 3DVA and focus mask design
If you are looking for heterogeneity (discrete or continuous), a great way to inform onward processing choices is to run 3D Variability Analysis. In our case the biochemistry tells us ALC1 should be bound to the nucleosome in this sample and we are looking for its location. The fact that we can’t see it in the maps so far indicates that it is either very flexible, bound in a low stoichiometry (less than 1:1 binding) or a combination of the two.
3DVA can be run at the lowest resolution at which you might hope to visualise your feature of interest. In cases where the feature might be low stoichiometry or highly flexible, low resolutions such as 15 Å can work well.
Before we go ahead though, we need to consider masking. As we don’t know where the ALC1 is, we need to make sure that the mask is generous enough to encompass the potential binding regions, and we don’t want to use the solvent mask from Non-Uniform Refinement (recall that we effectively disabled masking) because using a mask that encompasses the entire box can lead the 3DVA modes being dominated by box corner artefacts.
Create a Volume Tools job, inputting the volume from Refinement 3/ Homogeneous Reconstruct 4, set
Lowpass filter (A):15,Type of output volume:mask anddilation radius (pix):25. For theThresholdwe set 0.025 but you may need to adjust this value to suit your input map. This is Mask 1.Create a 3D Variability job inputting the particles from either Refinement 3, Homogeneous Reconstruct 4 or cluster 1 (these three should yield ~the same results) and Mask 1. We will set
Filter resolution:15 as we would like to focus on low resolution features, andOnly use this many particles:100,000 as at such low resolution we do not need to use the whole dataset.Create a 3D Variability Display job inputting the particles and components from the 3DVA job. We will assess the 3DVA by using
simple modeso that we can visualise the motions found in the 3 components. The pixel size of the extracted particles is smaller than necessary to visualise low resolution features, so we can useDownsample to box size:128 to make downloading of the volumes faster.
We show the 3 modes that we observed from 3DVA in Movie 1.

We found that modes 1 (teal) and 2 (blue) were dominated by the appearance and disappearance of density at the end of the DNA, whereas mode 3 (pink) is dominated by the appearance and disappearance of density bound to the nucleosome. We were unable to identify the the additional densities in modes 1 and 2, although based on the biochemistry, it is possible that this is PARP2. The additional density in mode 3 is consistent with with the mass of ALC1, and is a reasonable location for a motor to bind to the nucleosome.
Now that we know where the ALC1 binds, we can try to classify the particles to separate those that do, and don’t have ALC1 present. To do this we need to make a new focus mask covering the region of interest. We will do this by creating a difference map between the first and last frames of the 3DVA mode where ALC1 appears.
Identify the mode from your 3DVA job that contains ALC1 density and in ChimeraX run the following commands:
volume subtract #X.1 #X.20 and volume subtract #X.20 #X.1
where X is the model number for the correct mode in your ChimeraX session.
You should find that one of the newly created volumes contains the ALC1 volume and a few other bits and bobs. You can use the Map Eraser tool in ChimeraX to remove the extra bits, save this volume and upload it to your CryoSPARC workspace. The other difference volume can be discarded.
Note that the 3DVA volumes used here were downsampled to a box of 128 meaning that it has 2.73 Å/pix, and that for mask generation, the dilation radius and soft padding width will need to be fewer pixels than if the full box of 416 at 0.84 Å/pix was used.
Create a mask using Volume Tools starting from the cleaned up difference map created above. The threshold will need to be determined for your own map but we used
Lowpass filter (A):15,Type of output volume:mask,Threshold:0.25,Dilation Radius (pix):2,Soft padding Width (pix):5. This is Mask 2.
See Figure 11A-C, below, for example density for the creation of Mask 2.
As we are also interested in seeing if ALC1 has bound to the opposite side of the nucleosome, and the previous refinements were all aligned to the pseudo-C2 symmetry axis, we can rotate the mask using a simple operation.
Run a Volume Alignment Tools job inputting the volume from Refinement 3/ Homogeneous Reconstruct 4, and Mask 2. Set
3D rotation euler angles (deg or rad)0,0,180. The output mask is Mask 3.
For 3D Classification the focus mask should ideally be within the solvent masked region, so we also need to make a more generous solvent mask.
Create a Volume Tools job the same as for Mask 1 but set
dilation radius (pix):50 This is Mask 4.
8. 3D Classification of the ALC1 binding sites
Now that we have masks made, we can go ahead and classify our particles to fish out the subpopulations of interest!
Run a 3D Classification job (Classification 1) using the particles from either Refinement 3, Homogeneous Reconstruct 4 or cluster 1, with Mask 4 as the solvent mask, and Mask 2 as the focus mask. Input the following settings:
Number of classes
40
Using more classes can help separate low population classes
Filter resolution (A)
15
A resolution at which we can see the ALC1 density
Initialization mode
PCA
We found PCA initial volumes gave us a more reproducible result than using simple mode
Class similarity
0.1
When looking for density presence/absence, the classes should not be very similar
Run a 3D Classification job (Classification 2) using the same inputs as Classification 1 except using Mask 3 as the focus mask, and setting
Number of classes80.OPTIONAL: Test the effect of changing the number of classes up and down, and the quality of the ALC1 after NU refinement of the best class(es)
Examine the density for the output classes in ChimeraX.
We show the classes we obtained from 40-class Classification 1 in Figure 11B, and from 80-class Classification 2 in Figure 11C. After adjusting the contour threshold to ~2, we found a single class that had good density for ALC1 in each classification and we coloured these classes in pink. We noted density in other classes nearby the ALC1-binding site that might reflect different stages of ALC1-binding, but we do not go on to investigate those in this case study.

The choice to use 40 and 80 classes was not based on any prior information about what would work best. We took an empirical approach and tested a range, from 2 up to 80 classes. To assess which classification gave the best result, the ALC1-containing class(es) for each classification were NU-Refined and the ALC1 density was compared (see Figure 12).

We found that for 2-40 class classifications, only a single class had density for ALC1 (at varying strength), but the 80-class classification had good ALC1 density in two classes). After Non-Uniform Refinement the map quality was the best from the 40-class classification. Although the refined maps from the 2-class and 4-class classification don’t initially appear to have discernible ALC1 density after Non-Uniform Refinement, ALC1 is visible if the map is low-pass or gaussian filtered.
You will notice that both classification jobs have the same particles as inputs and were effectively run in parallel. It is possible that there is a subpopulation of particles that are found in both of the selected classes and hence have ALC1 density on both sides at the same time. We can find these particles by comparing the particle stacks.
Run a Particle Sets Tools job inputting the good class particles from Classification 1 in
Particles (A)and the good class particles from Classification 2 inParticles (B)and settingActionintersect.
We ended up with 65.4k particles with ALC1 in one site, 28.9k particles with ALC1 in the other site, and ~760 particles that have ALC1 density in both sites, comprising ~3.9% (site 1), ~1.7% (site 2) and 0.05% (double-bound) of the input particles. We recall that there are theoretically two possible binding sites for ALC1; one might have a higher-affinity for PARP binding than the other. We see a larger population of particles (3.9%) in one position, and this position is consistent with the expected preferred PARylation site described in Bacic et al .
Run a Non-Uniform Refinement job with the same settings as Refinement 2 (Refinement 4A) using the particles from
A minus Bof the Particle Sets Tool job and the volume for the corresponding class from classification.Run a Non-Uniform Refinement job with the same settings as Refinement 2 (Refinement 4B) using the particles from
B minus Aof the Particle Sets Tool job and the volume for the corresponding class from classification.
For the intersection of particles that might contain two copies of ALC1, we don’t have a volume that represents this, so we need to perform a reconstruction before we refine these particles.
Run an Ab-Initio job (3) using the particles from
Intersection (keeping A data)of the Particle Sets Tools job.
We use Ab-Initio at this point, rather than homogeneous reconstruction to ensure that we have not added artificial bias. If, despite symmetry relaxation, a small population of particles during NU refine were aligned incorrectly by 180 degrees, this could result in the appearance of ALC1 density on both sides of the nucleosome in a reconstruction, and using a reference volume with density on both sides could bias the refinement. When in doubt re-running Ab-Initio is safer.
Run a Non-Uniform Refinement job with the same settings as Refinement 2 (Refinement 4C) using the particles and volume from Ab-Initio 3, and setting
Optimize per-particle defocus:False andOptimize per-particle CTF params:False.
We turn off local and global CTF refinement here because there are very few particles and any updated estimations are likely to be much poorer than the input values.
Examine the maps and compare to Refinement 2, ensuring that you have aligned the particles in the correct way so that the long end of the DNA matches. We show example maps in Figure 13. We can clearly see defined density for ALC1 in Refinements 4A and 4B, and although the map quality is poor is Refinement 4C we can clearly see density for ALC1 bound to both sites, while the long and short ends of the DNA are still discernible. We noted that the map for EMDB:13065 matches site 1 that we find here (purple map in Figure 13).

We were interested in testing whether the presence of ALC1 had an impact on orientation sampling or other orientation diagnostics. To make this comparison fair comparison we should use the same number of particles, so we needed to make a subset of the ALC1-free particles to refine. First we need to obtain a stack that does not contain well ordered-ALC1.
OPTIONAL: Run a Particle Sets Tool job inputting the three particle sets from NU Refinements 4A-C in
Particles Aand intoParticles Bdrag particles from Refinement 3/Homogeneous Reconstruct 4, or the high scale cluster from Subset particles (all three should contain the same particles set. Use intersect mode.
Now we want to get an equal sized particle stack for comparison
OPTIONAL: Run a Particle Sets Tool job with the
A minus Bparticle set from above and use split mode,Split batch sizeas the number of particles as you have in Refinement 4A andSplit randomize:True.OPTIONAL: Run a Non-Uniform Refinement job (4D) with the same settings as Refinement 4, using the particles from split 0.
OPTIONAL: Run Orientation Diagnostics on Refinements 4A and 4D.

From Figure 14A and 14B we can see that the cFAR is a bit lower in Refinement 4A, and digging deeper we see in Fig14C and 14D that there is apparently a stronger orientation bias when ALC1 is bound. This also negatively impacts the Fourier sampling (Figure 14E and 14F) however the cFAR is still above 0.5, there are no Fourier bins that are empty and our map did not show obvious signs of streaking or distortion that can indicate pathological orientation bias.
As the entry for EMDB:13065 also contains the half maps and mask, we were able to also run Orientation Diagnostics on that map for comparison, and found a cFAR of 0.46.
At this point we need to decide whether to keep the two particle sets separate, or if we want to combine them to potentially allow for a higher resolution for ALC1 at a cost to any unique information about the two sites. This sort of choice will need to be taken on a case-by-case basis as there is the possibility of losing information about symmetry-breaking features when symmetry-related sites are combined. We could not see any meaningful differences in the density for ALC1 in the two maps, and so we chose to combine them in a single refinement.
Run a Non-Uniform Refinement job with the same settings as Refinement 2 (Refinement 4A) using the particles from refinements 4A and 4B, and the volume from refinement 4A. Drag the lower-level slot for
ctffrom Refinement, Homogeneous Reconstruct 4 or cluster 1 into both particle inputs for the refinement. This is Refinement 4E.
9. Reference-based motion correction
Now that we have a map that has density for ALC1, it looks like it will be good enough to fit and refine a model into, so we can push it further with Reference-Base Motion correction to enhance the map quality and resolution to hopefully make the modelling more reliable. Note that if you are using the downloaded particles, you will need to skip this step and move on to Section 10.
OPTIONAL: Run a Reference-Based Motion Correction job using the Non-Uniform Refinement inputs (particles, map and mask) from Refinement 4E, and curated micrographs from section 2. Change the low level information slot for the input mask, drag over the volume.mask_fsc. Set
Save results in 16-bit floating pointon. Use more GPUs to accelerate the job if you have the resources.OPTIONAL: Create a Non-Uniform Refinement job with the same settings as Refinement 2 (Refinement 5) using the particles from RBMC.

Figure 13. Empirical Dose Weights and Example Trajectories from RBMC.
The RBMC job took around 16 hours to run on 2 GPUs, and example outputs for Empirical Dose Weights and Trajectories are shown in Figure 15. We found that the map resolution after RBMC improved from 2.78 Å to 2.54 Å, however the map quality of the NU refinement was not substantially improved. We did find that the downstream Local Refinement map was better quality when RBMC was included in the pipeline. Considering the length of time required for RBMC, the choice to include/exclude it may depend on the balance between result quality and time/resource constraints.
Looking at the ALC1 density from Refinements 4E and/or 5, and you might notice that the density is still not as good quality as that of the histone proteins at the core of the nucleosome, this is likely due to a degree of flexibility or heterogeneity in the binding of ALC1, and we can try to further improve the density of ALC1 by locally refining this region.
10. Local refinement of ALC1
It appears in our map that the core nucleosome is relatively large and rigid compared to the ALC1, and so it will be dominating the particle alignment. We can use Local Refinement to align the ALC1 a bit better, and for this we need to think about masking strategies.
Use the Map Eraser tool in ChimeraX on Refinement 4E or 5, and remove all but the ALC1 density and a little of the DNA to which it is bound. We show an example of the erased volume that we used in Figure 14.
Upload this volume to your CryoSPARC workspace and run an Import 3D Volumes job to import it
Create a Volume Tools job, input the above volume and set
Lowpass filter (A):15,Type of output volume:mask,Threshold:[whatever is appropriate for your volume, we used 0.019],Dilation radius (pix):4 andSoft padding width (pix):15. This is Mask 3.Run a Local Refinement job, inputting the particles, mask and volume from Volume Alignment Tools, and with the following settings:
Use pose/shift gaussian prior during alignment
true
Using priors penalises deviation too far from the input poses
Standard deviation (deg) of prior over rotation
3
A strict range
Standard deviation (A) of prior over shifts
2
A strict range
Re-center rotations each iteration?
true
Re-center shifts each iteration
true
Initial lowpass resolution (A)
7
The resolution at which the FSC is still ~1
We also compared this refinement to one that used default settings for local refinement, and found this map quality to be slightly poorer with default settings after sharpening.
OPTIONAL: Run a Local Refinement (Local Refinement 1B) with default settings, inputting the particles, mask and volume from your last ALC1-containing refinement (Refinement 4 or 5) and Mask 3.

We can see from Figure 16 that the map quality for ALC1 is improved after local alignment within the masked region, making it more interpretable and better for model-building.
11. Map sharpening, assessment of map quality, and comparison to the original deposited map
At the end of refinement it is always worth assessing if the auto-sharpening has applied an appropriate B-factor. We want the sharpening to enhance high resolution features, but without causing map fragmentation, excessive noise, or creating sharpening artefacts such as unexpected density extending from the map. As cryo-EM maps tend to contain a range of resolutions, picking a single B-factor to sharpen means taking a compromise value where the high-resolution regions are not as sharp as they could be, and the low-resolution regions are not as connected as they could be.
Examine the unsharpened and sharpened map from Local Refinement, and see if you are happy with the level of sharpening applied. We felt that the map was very slightly over-sharpened due to the appearance of spiky noise extending from the protein that are unlikely to be real features at this resolution. If you would like to change the map sharpening:
OPTIONAL: Run a Sharpening Tools job, inputting the Local Refinement map and mask, and setting
B-Factor to applyto your desired value, we chose -65.
We show example maps for unsharpened, automatically-sharpened and manually sharpened maps in Figure 17A.

Assessing the resolution of a local refinement can be tricky, as the masked region can be hard to select and there will inevitably be some density outside of the local refinement mask. As well as looking at the GSFSC resolution of the map, we can also assess and compare the local resolutions for ALC1 in the last NU Refinement and the Local Refinement. In order to proceed we want to ensure that all of the meaningful parts of the map (i.e. the whole nucleosome and ALC1) is inside the mask, to avoid the situation where voxels outside are labelled as having a resolution of 0.
Run a Volume Tools job, inputting the mask from the last NU Refinement, and drag over the mask_fsc into the lower level slot. Set
Type of input volume:maskType of output volume:mask,Threshold:1, andDilation radius (pix):10. This is Mask 5.Run a Local Resolution Estimation job inputting the map from Local Refinement, Mask 5.
Run a Local Resolution Estimation job inputting the map from the latest NU Refinement and Mask 5.
Run a Local Filtering job for each of the two above jobs
Visualise the local resolution in ChimeraX by using the Surface Colour function.
We also ran a Local Resolution Estimation job using the half maps from EMDB:13065 (the map originally deposited for this dataset) for comparison. We show these three in Figure 15B-D. We observe a striking difference in resolution, with the deposited map ranging from ~5-7 Å for ALC1, and our Local Refined map ranging from ~2.5-3.5 Å for ALC1. This improvement in resolution comes along with significantly better map clarity that allows a model to be more confidently built and refined.
We updated the PDB for ALC1, starting from PDB:7enn, so that it matched our map. We found that after real space refinement in PHENIX, the Q-score for just the ALC1 protein was 0.74, compared to a value of 0.33 for the ALC1 model from 7otq against EMDB:13065 (see Figure 18) indicating a substantial improvement in the model-to-map fit and confidence.
We were also able to see density for the histone H4 N-terminus that forms contacts with ALC1, and to predict hydrogen bonding and salt bridges that are likely involved in the binding of ALC1 to DNA (see Figure 16A).

Conclusions
In this case study, we focussed on the processing of EMPIAR-10739, that contains compositional heterogeneity. As this nucleosome is expected to display pseudosymmetry, we chose to refine using C2 with symmetry relaxation, to avoid symmetry-breaking features being partially present at both symmetry related positions. The particle stacks after gross cleaning achieved a good resolution for the nucleosome, but we were able to further improve this by performing:
Global CTF Estimation
Local CTF Estimation
Subset particles by Statistic on the basis of particle scale
This map did not have clear density for ALC1 and so we investigated the compositional heterogeneity through:
3DVA with a generous solvent mask
Creating a difference map to aid ALC1 mask generation
3D classification with 40 and 80 classes to separate out the best ALC1 classes
Refinement of the selected 3D classes had better density for ALC1 but it was still fragmented at the periphery, so we performed Local Refinement to further improve the map clarity.
Our aim was to obtain a map containing ALC1 for molecular modelling and compared to the original published processing pipeline that lead to EMDB:13065, were able to improve the resolution from ~5 Å to ~3 Å for ALC1, and improved the cFAR from 0.46 to 0.61.
During this case study we were focused solely on obtaining a clear map of ALC1-bound to the nucleosome, however this dataset shows a high degree of heterogeneity and there may be many other interesting regions and classes to investigate! You could try applying the same sort of rationale to different classes or regions of the nucleosome to see what else can be found!
You can download our versions of the final maps, half maps and masks from the links below for comparison with your own processing!
References
We thank Dr Guillaume Gaullier for providing background context to the dataset, and enthusiasm towards this reprocessing case study!
Last updated