Case study: End-to-end processing of a ligand-bound GPCR (EMPIAR-10853)
Processing EMPIAR-10853 including Micrograph Denoising, Micrograph Junk Detector, Subset Particles by Statistic, Local Refinement and focussed 3D Classification to improve ligand density.
Introduction
In this tutorial we will work step-by-step through the processing pipeline for an active G-protein coupled receptor with a ligand, G-protein and antibody fragment bound, using a dataset originally processed by Cao et. al and deposited as EMDB:24896 and PDB:7s8l. The raw data are publicly available for download as EMPIAR-10853, and this case study is written so that you can replicate these results yourself.
We selected this dataset to provide a processing pipeline for a target with flexibility in the region of ligand-binding, and heterogeneity in the ligand-binding pose. In particular, this case study provides strategies and ideas about how to get a higher quality final particle stack, which we found was essential for downstream local refinement, and focussed classification steps. Image processing was performed using CryoSPARC v4.7.
The Mas-related G-protein coupled receptor member X2 (MRGPRX2) receptor is a member of the G-protein coupled receptor (GPCR) protein family that are characterised by their 7 alpha helical transmembrane helices. GPCRs are predominantly found on the cell surface and are important in cell signalling and as therapeutic targets. The MRGPRX2 receptor is largely found on the surface of mast cells and is activated by agonist ligands. Upon activation it binds to Gq (a heterotrimeric G-protein). This in turn activates the mast cell to degranulate and this process plays a role in host defence, inflammatory diseases and pseudo-allergic drug hypersensitivity.
In this sample, Gq and a single-chain variable fragment (scFv16) were added to the receptor, along with the agonist peptide ligand Cortistatin-14. We can calculate from the existing pdb 7s8l that the combined complex has a mass of ~140 kDa and, but for cases where a molecular model is not available, the mass could also be ascertained from protein sequences (for example in UniProt) or by analysing a purified sample by mass spectrometry. From the structures obtained in Cao et. al, we know that this complex is a non-symmetric membrane protein complex containing a transmembrane receptor domain (TMD) and an intracellular domain formed of the G-protein that extends outside of the membrane, and that the expected ligand binding location is on the cytoplasmic side of the receptor domain. This protein was prepared in a detergent micelle containing lauryl maltose neopentyl glycol, glyco-diosgenin and cholesteryl hemisuccinate.

The primary aim of this pipeline is to generate the highest quality map(s) to ascertain the binding pose(s) of the inhibitor peptide, Cortistatin-14.
Setting up
Before beginning this tutorial, you should create a new project and a workspace within that project. Download the movies and gain references to a location of your choosing. Our data is downloaded to a directory called rawdata in the project directory using the command:
cd /path/to/rawdata
wget -m <ftp://ftp.ebi.ac.uk/empiar/world_availability/10853/> .
1. Movie import and preprocessing
Now we need to get the data into CryoSPARC. These data were recorded over two days and for each day there is a separate gain reference and a slightly different total dose.
Import the data as two separate jobs using an Import movies job. The
Movies data path
needs to match the location of the directory containing the downloaded.tif
files for examplerawdata/EMPIAR/10853/data/20210311*.tif
. TheGain reference path
for the first day needs to be specified for examplerawdata/EMPIAR/10853/data/CountRef_20210311_109_000.mrc
. Experimental information such as pixel size, total electron dose / A^2 and voltage are available in the EMDB entry 24896.ParameterSettingFlip gain ref & defect file in Y?
Yes
raw pixel size (A)
0.91
Accelerative Voltage (kV)
200
Spherical Aberration (mm)
2.7
Total exposure dose (e/A^2)
50.74
Repeat the process for the second day files (20210312) and changing the
Total exposure dose (e/A^2)
to 47.87.
Next we want to correct for beam-induced motion during movie collection and to estimate the defocus values for each micrograph.
Run Patch Motion Correction, inputting both imported movie sets from above. Use
save results in 16-bit floating point
: true and increaseNumber of GPUs to parallelize
depending on GPU availability.
Using float16 format means that the output images will take up half of space than the default 32-bit floating point files. You may expect each job to take several hours on a single GPU.
Run Patch CTF Estimation with default settings
2. Excluding poor micrographs from downstream processing
Not all of the movies that are collected will be perfect; they can contain a variety of non-vitreous ice and contamination with ice crystals or other substances, excessive in-movie motion and ice that is too thick or too thin for your sample. We would like to avoid picking particles from poor micrographs because they are often of low value, and can require extra cleaning steps to remove after they are extracted. It is easier to exclude the worst micrographs at the start, by looking at the statistics generated by the Patch Motion and Patch CTF jobs.
Use a Manually Curate Exposures job to exclude poor quality images on the basis of their CTF estimated micrograph statistics so they are not used for downstream processing.
Input the “Micrographs processed” from Patch CTF, and queue the job. In the job card view go to the pink Interactive tab.
Select the upper and lower bounds for each parameter to exclude outliers, and browse thumbnails and diagnostics of the images to check the appearance of the excluded and included micrographs. For each parameter, try moving the slider or typing values and click “set threshold” to see how many images have been excluded.
We set an upper limit for CTF fit resolution at 3.5 Å to avoid using poor quality images, an upper limit of 70 for Average Intensity to remove outlier images, and an upper bound on Relative Ice Thickness at 1.042 to remove low signal-to-noise images with thick ice. This is a nice dataset with relatively few poor micrographs, and overall we were left with 11,573 micrographs.

The above example shows the micrograph, power spectrum, CTF fit and motion for two example movies. The CTF fit and motion for both images look alright, but the thon rings for the red example are weaker, due to interference from the strong vertical lines in the movie. This might originate from a problem with gain correction of the detector for a few images.
In other datasets it can also be helpful to set thresholds for average defocus, astigmatism, and full-frame motion distance, but it is a good practice to examine all of the available plots to see where outlier populations arise.
3. Micrograph denoising and Junk Detection
3A) Micrograph Denoising
Particles can be hard to identify by eye due to factors such as thick ice, small particle size, carbon or graphene monolayers and low defocus, and these can make selecting thresholds for particle picks somewhat challenging. To improve the quality of template and blob picks, and make thresholding easier we use the Micrograph Denoiser.
Create a Micrograph Denoiser job, inputting the exposures_accepted and set
Number of CPUs to Parallelize
to the number of CPUs available on your node.

We can identity the particles much more easily after denoising and so the denoised images are suitable for Blob picking, Template picking, and Inspect Picks job types.
3B) Junk detection
Micrographs often have different types of unwanted features in them, such as ethane contaminants, ice crystals, crystalline ice and gold or carbon support, which can be tricky to get rid of during classification steps. We can use the Micrograph Junk Detector, released in v4.7, to analyze the micrographs and automatically detect different types of junk. Subsequently, we can reject particle picks that are on or near the junk regions.
Create a Micrograph Junk Detector job, inputting the exposures_accepted from the upstream Micrograph Denoiser job.

The Micrograph Junk Detector has masked out in purple regions that contain extrinsic ice junk (ethane contaminants and ice crystals), and from the job statistics we can see that while almost every micrograph contains junk (overwhelmingly extrinsic ice junk), the total area that this junk occupies is relatively low at <4%. Dataset-level junk statistics might be helpful to diagnose patterns in junk appearance that may relate, for example, to sample prep changes.
4. Blob particle picking
This is a pretty large dataset, so to speed up generation of 2D templates, we are going to make a subset of 200 micrographs for initial blob picks.
Run an Exposure Sets Tool job, inputting your exposures from after the Micrograph Junk Detector, and set
Split batch size
: 200 andSplit randomize
: true
Using the micrographs in split_0, run a Blob picker job with a minimum particle diameter (A) 100, and Maximum particle diameter (A) 130.
Not all of the picks made will be of good quality, and to remove egregious junk picks we can use a combination of automatic removal of junk picks by using the junk detection that we ran earlier, as well as manual thresholding of the pick scores.
Run a second Micrograph Junk Detector job and input the exposures from the first Junk Detector job along with the blob picked particles, setting the
Rejection distance from junk
:30.
Run an Inspect Picks job and look at the first few micrographs.
The overall goal here is to select as many particles as you can without picking empty ice or junk. You should see that there are very few picks on the junk regions as we already rejected those near the detected regions of junk. To remove any remaining junk picks, move the NCC and power score sliders until the lower bound removes empty ice picks, and bring the higher bound down to remove high contrast feature picks without seeing good-looking picks being discarded.
Be sure to inspect micrographs of relatively low and high defocus when adjusting the power sliders, so that particles are being picked across the defocus range.
You can expect to select ~200k particles from a 200 micrograph subset.

As we performed Inspect picks with the denoised micrographs you may notice that the contrast is more similar across the range of defocus used than you would typically see with lowpass filtered images.
5. Blob pick extraction and 2D classification
We expect the MRGPRX2 receptor protein to have a diameter of ~ 120 Å, and this dataset was recorded with a defocus range extending up to around -2 µm. We want to ensure that we capture most of the delocalised signal caused by imaging with defocus, so we need to make sure the box is large enough.
Extract the particles using the Extract from Micrographs job with
Extraction box size (pix)
: 256 (233 Å which is ~2 x the particle diameter) and setFourier crop to box size (pix)
: 64. SelectSave results in 16-but floating point
: true to save on the disk space required.
We will be using these particles to generate 2D class average templates, so we do not need them to be at high resolution. Therefore we can Fourier crop to a box of 64 and this will speed up the downstream jobs. Expect the number of extracted particles to be lower than the number previously selected, because CryoSPARC rejects particles where any part of the box is outside of the micrograph. Fourier cropping for 2D classification saves on disk space and may speed up caching and early particle cleaning jobs.
Now that we have our particles extracted, we want to use 2D class averaging to crudely separate good particles from poor ones.
Run a 2D Classification job with the extracted particles and the following settings:
Number of 2D classes
100
Maximum resolution (A)
12
initial classification uncertainty factor
1
Circular mask diameter (A)
160
Circular mask diameter outer (A)
180
2D zero pad factor
1
We don’t have a large number of particles, but in order to speed up the classification we set the maximum resolution at a limit where we would expect to be able to discern visually between particles and junk, and between different viewing directions, and we can also reduce the zero padding from 2 to 1. Adding a circular mask can help to keep the class averages in the centre, which is helpful for template picking.
The initial classification uncertainty factor influences the diversity of particle views, or junk classes that are identified. A lower value (such as 1 as used here) tends to increase the number of different junk classes, and can be useful when the preparation contains largely one type of particle. In cases where the particle stack contains very little junk, a higher number (such as 5 to 10) may be helpful in separating views.

In the class averages shown in Figure 5, most of them contain easily recognisable ovoid density for the detergent micelle, with an asymmetric protein domain protruding out on one side. We also observe some views of the micelle where the protruding domain is not that visible, and so the class averages look like a distorted ellipse. It is important not to exclude these views ( "top views"), as they can be relatively rare amongst GPCR cryo-EM datasets.
Run a Select 2D Classes job and select the class averages that contain views of the intact target like the examples above.
These class averages already look OK and can be used directly as templates for picking without further cleanup.
6. Template particle picking, extraction and duplicate removal
We can improve the picking quality by supplying 2D templates so that the picks better resemble the expected target.
Create a Template Picker job and connect the denoised micrographs and the selected templates from Select 2D Classes. Set
pick on denoised micorgraphs:
true andParticle diameter (A)
: 170.
In our processing the Template Picker yielded ~14 million particles, but we want to remove the most offensive junk picks before extraction.
Run a Micrograph Junk Detector job and input the exposures from the previous Junk detector job along with the template picked particles, setting the
Rejection distance from junk
:30.Run an Inspect Picks job and look at the first few micrographs. Move the NCC and power score sliders in the same manner as Section 4 to remove picks that look overly generous.

Extract the particles in the same manner as Section 5.
At this stage, we had 11.5M particles.
7. Separating junk from good particles with Ab-Initio and Heterogeneous refinements
Our aim through any basic single particle cryo-EM processing is to separate junk from good particles, without losing valuable rare views. Sometimes it is better to skip 2D classification and go straight to 3D, to avoid losing rare views at the stage of class average selection.
In order to separate the particles by Heterogeneous Refinement, we need a good input volume and some junk volumes. We can generate those from the particles that we 2D classified in Section 5, but if you prefer to 2D classify the Template picked particles and use those instead, they should yield approximately the same result.
As we already separated some good particles from bad ones after blob picking, we can use those particles to generate low resolution volumes in two separate jobs:
Run an Ab-Intio reconstruction job (1) with
number of classes
: 1 and input the “particles selected” from your Select 2D job.Run an Ab-Intio reconstruction job (2) with
number of classes
: 5 and input the “particles excluded” from your Select 2D job.
Once the jobs are complete, you can asses the volumes for each class by selecting the Volumes tab within the job, and inspect each volume one by one. You should find the volume from Ab-Initio (1) resembles a GPCR in a micelle, and the 5 from Ab-Initio (2) are less clearly defined volumes that are probably junk. If you find a second nice GPCR volume in Ab-Initio (2), then discard this volume from downstream processing.
Run a Heterogeneous Refinement job (1) inputting all of the extracted particles from Section 6. Input the Ab-Initio models that you obtained above to classify into different classes and set
Refinement box size (voxels)
: 64 as this is the box size that the particles were Fourier cropped to, and using the default 128 would lengthen the job runtime.As we know which class was the best input volume, we can predict that this same class will be the best output volume, so we can queue up anther Ab-Initio job (3) inputting the particles from the best class from Hetero Refine 1, and setting
Number of Ab-Initio classes
: 5 andNum particles to use
: 20,000.
We didn’t find that we needed more than 20,000 particles for this job, but if the volume quality is not good you can try with more particles, e.g. 50,000-100,000.
Inspect the volumes from Ab-Initio (3) to find the best volume(s).
Run three further sequential Heterogeneous refinement jobs (2-4) using the volumes from Ab-Initio (3), each time taking forward the particles from the best class(es).
We found that the percentage of particles found in the best class at each round of Heterogeneous Refinement were 48%, 63%, 92% and 95% and this left us with ~ 2.3M particles. Generally when Heterogeneous Refinement yields ~95% of particles in the best class then you are in the realms of diminishing returns and it is usually not worth running further rounds of this job.

8. Re-extraction at a larger box size and Non-Uniform Refinement
Now that we have cleared out junk particles via Heterogeneous Refinement, we are ready to re-extract with a larger box size for the following reasons:
Using a higher pixel sampling (less Fourier cropping) extends the achievable resolution limit (Nyquist frequency) on the particle images.
Extracting particles after 3D alignments for example in Heterogeneous refinement means that they will be better centred in the box.
A smaller stack of particles (2-3 million compared to the original 14 million) will be read and cached faster for downstream jobs.
As we have now aligned the particles we should also take care to remove any duplicates based on their updated shifts before we re-extract.
Run a Remove Duplicate Particles job with
Minimum separation distance (A)
: 50 and using the particles from the good class(es) at the end of Section 7.Re-extract the non-duplicated particles with
Extraction box size (pix)
: 320 andFourier crop to box size (pix)
:256. We expanded the box size in Å to ~2.5 x the particle diameter to capture more of the delocalised signal, but Fourier cropped back to 256 for computational efficiency.Input the particles into a Non-Uniform Refinement job (Refinement 1) using one of the good GPCR 3D volumes from Heterogeneous Refinement with
Minimise over per-particle scale
on, andDynamic mask start resolution (A)
:1.
Examine the output unsharpened and sharpened map in the volume viewer and look at the real space mask, cFSC plot and viewing direction plot. The reported resolution should be around 2.9 Å.
Run an Orientation Diagnostics job to assess the orientation sampling in more detail by connecting the volume, particles and mask from Refinement 1. Orientation diagnostics by default uses the solvent mask from refinement, however we disabled masking during refinement, so for the mask instead use the
volume.mask_fsc_auto
volume.

In Figure 8 we can see that the volume looks good before sharpening, but after sharpening some parts have fragmented density, for example, the bottom of the transmembrane domain. The orientations for this particle are pretty well distributed for a GPCR with a cFAR score of 0.67 and SCF score is 0.826 but they are not uniformly sampled.
9. Improving the particle stack using Rebalance Orientations and Subset Particles by Statistic
Although the refinement in Section 8 was already pretty good, we aim to investigate and classify the ligand-binding site later on in the pipeline, so we want the particle stack to be as high quality as possible to avoid classification artefacts. Two jobs that can help with this are Rebalance orientations and Subset Particles by Statistic (new in v4.7). First we will filter out some of the slightly excess views, by removing the least helpful particles in those poses.
Run a Rebalance Orientations job setting the
Intra-bin exclusion criterion
toalignments3D/alpha
.
We found that ~300k particles were removed.
OPTIONAL: Run a Non-Uniform Refinement (Refinement 2) with these particles with the same settings as used for Refinement 1, and assess the appearance of the per-particle scale histogram (optional)
We found the plot showed a bimodal distribution that is overlapping, and this manifests as a shoulder on the main peak (see Figure 9).
We want to remove particles that are of low quality and will not be meaningfully contributing to the reconstruction. To do this:
Run a Subset Particles by Statistic job, selecting
Subset by
: Per particle scale. You could use the default subsetting mode that uses gaussian fitting, but we chose to useSubsetting mode:
Split by manual thresholds,Number of thresholds:
1 andThreshold 1
: 0.9.
Finding the optimal threshold that removes the poorest particles but does not negatively affect the map resolution may take some trial and error. Setting a threshold of 0.9 we rejected a further ~400k leaving us with ~1.6M particles.
Run Non-Uniform Refinement (Refinement 3) using the particles in cluster 1, with the same settings as Refinement 1.
Optional: Run a Non-Uniform Refinement (Refinement 3B) using the particles in cluster 0 for comparison with Refinement 3.
You should see that the map quality and GSFSC resolution are essentially the same between refinements 3 and 1, but the cFAR has marginally improved. We saw the refinement report an improvement in cFAR from 0.7 to 0.76 calculated without masking.

10. Global and Local CTF Refinement
Now that we have nominally improved our particle stack, it is worth checking to see if there are microscope aberrations that can be corrected for. We can also go ahead and try Global CTF refinement to fit tilt, tetrafoil, spherical aberration and anisotropic magnification. Correcting for these, where possible, can improve the CTF fit, and therefore the map quality, and as a general rule, higher resolution maps benefit more from these corrections than medium resolution maps.
Run two Global CTF Refinements, inputting the particles from the Local CTF Refinement particles, volume and mask_fsc_auto from Refinement 1.
Try using
Fit Anisotropic Mag.
: trueTry
Fit Anisotropic Mag.
,Fit Tetrafoil
andFit Spherical Aberration
: true
Run a Homogeneous Reconstruction Only job for each using the mask_fsc_auto from Non-Uniform Refinement 3.

In Figure 10 you can see that both tests improved the resolution from Figure 8, but that including spherical aberration and tetrafoil is even better. The plots for the tilt, trefoil and anisotropic magnification display clear regions of red and blue to fit to, but the spherical aberration is less clear, and by eye it is difficult to know if including the fit will be useful. See the Tutorial on CTF Refinement for a detailed description of how to interpret the plots, but essentially the brighter and more consistent the red and blue areas are in the data, the more reliable the fit its likely to be.
These particles are on the smaller side, but as they were recorded on a Talos at 200 kV their signal-to-noise may be sufficient for local (per particle) defocus refinement to account for slight differences in particle height across the grid.
Run a Local CTF Refinement job inputting the particles, volume and volume.mask_fsc_auto mask from Refinement 1. This mask needs to be used because the refinement was run without a solvent mask.
Figure 11 shows example outputs from a Local CTF Refinement job, we can see that the spread of defocus change across the particles does not extend beyond 500. The defocus profiles shown have a single obvious minima in the centre flanked by two less favourable local minima. These patterns, where there isn’t just a single dip, can be typical for a small particle.
Run a Homogeneous Reconstruction job using the mask_fsc_auto from the Non-Uniform Refinement. We can see in Figure 11 that the resolution was slightly worsened, so we should not include local CTF refinement at this stage.
Figure 11. Results from Local CTF Refinement. Example plots for change in defocus, and per particle defocus landscapes from Local CTF Refinement, and the GSFSC curves before and after Local CTF Refinement. Using the particles from the Global CTF Refinement including Tetrafoil and Spherical Aberration, run a Non-Uniform Refinement job (Refinement 4) on the particles from the Global CTF Refinement including Spherical Aberration, including the following settings:
Minimize over per-particle scale
true
Dynamic mask start resolution (A)
1
Optimize per-goup CTF params
true
Fit Tetrafoil
true
Fit Spherical Aberration
true
You will notice that although we fitted Anisotropic Magnification in the Global CTF, we will not also do so during the Non-Uniform Refinement job. This is for two reasons; first, the anisotropic magnification is not likely to change greatly over the course of processing, whereas the other parameters might change more as the refinement achieves higher resolution, and second, when Aniso. Mag. is refined iteratively during either Homo Refine or NU refine, we have found rare cases where the values become unstable, so it is safer to only correct Anisotropic Magnification during Global CTF Refinement.
The resulting map from Refinement 4 should have a higher resolution estimation than Refinement 2 and at this stage we had a map nominally at 2.64 Å with an auto tightened mask.
11. Reference Based Motion Correction
We have now improved our map resolution with CTF correction, but we can further improve the particle image quality by correcting for dose-dependent sample damage as well as minor particle motions during imaging using Reference Based Motion Correction (RBMC). RBMC requires that the total dose for all exposures is identical within a single job. Since we combined the two movie sets together during Patch Motion Correction, so we now need to split them again.
Create an Exposure Group Utilities job, inputting the accepted exposures from the Curate Exposures job in Section 2, and set
Input Selection
: exposure,Action
: info_only andSplit Outputs by Exposure Group
: true.
This job will output two subsets of micrographs, one for each of the exposure groups (this is one exposure group per day of collection).
We will now run Reference Based Motion Correction in two stages so that we can assess the hyper-parameters and dose weighting before proceeding.
Estimate hyper-parameters for motion and dose weights
Run a Reference Based Motion Correction job (A) using the Non-Uniform refinement inputs from Refinement 4, and one of the micrographs exposure groups split above.
Change the low level input slot for the input mask: drag over the volume.mask_fsc_auto.F because the refinement only applied a mask at the stage of FSC autotightening.
set
Final processing stage:
Compute empirical dose weightsUse more GPUs to accelerate the job if you have the resources
Run a second RBMC job (B) with the same settings, but with the other micrographs exposure group.
Example hyper parameters and empirical dose weights are shown in Figure 12.
This dataset has 60 frames per movie. We would expect most of the high frequency information to be found in the first frames, before the sample has undergone radiation damage. In Figure 12 we see that the majority of the weighting (dark red colour) for high frequency information is in the early frames, but there is also an apparent reappearance of high frequency information in the last frames.
We will try to improve this situation, but if you chose to skip RBMC or continue with correction of the particles using the hyper parameters output from jobs (A) and (B) you may observe over-fitting artefacts later on in the processing pipeline.
Clone the two RBMC jobs that you just ran (A and B) so that you have jobs (C) and (D), but manually enter in the values for
Override: acceleration prior strength
andOverride: spatial correlation distance
that were ascertained in jobs (A) and (B) respectively. For theOverride: spatial prior strength
set this to be 1/5th of the value that was determined previously, for the example shown above we would use 1.0984e-03 instead of 5.4929e-03.
When you inspect the empirical dose weights from the new jobs you should see plots that are more in line with our expectations of radiation damage (see Figure 12C and D).
In making the acceleration prior number smaller we are applying a stronger constraint on how much the particles move between frames. Using the updated value (divided by 5) seems to work well for the empirical dose weight estimation, however using this value for the particle correction stage may not yield the best results as very little motion would be corrected, so we will use the improved dose weights, along with the original hyper parameters to get the best of both.

Create a new Reference-Based Motion Correction job (E) with the same settings as (C), drag over the hyper parameters from (C) (this will contain both motion priors and dose-weighting) then override the values of the motion hyper parameters with those that you obtained from A) into the
Override: spatial prior strength
,Override: spatial correlation distance
andOverride: acceleration prior strength
.Set
Save results in 16-bit floating point
on and motion-correct particlesFinal processing stage:
motion-correct particlesRepeat the process to create another RBMC job (F) using the hyper parameters from D, and overriding with the priors from (B).
In Figure 12 we can also see the extent of the modelled particle motion for particles from each half of the dataset.
Take both sets of particles from RBMC (E) and (F) and use them to run a Non-Uniform Refinement job (Refinement 5) with the same settings as in Section 10.
The reported resolution and quality of the map should have improved. In Figure 13, we can see that the density for the TMD is poorer than that for the G-protein and scFv16. This is expected because in GPCRs it is typical that the G-protein and receptor domains move somewhat relative to one another, and the G-protein and ScFc16 being more rigid tend to dominate the particle alignment in consensus refinement. Recall that the aim of the processing here is to define the ligand binding pose and interactions, however, the ligand binding site is the poorest part of the map! This misfortune also reminds us of Murphy’s law, but all is not lost, we shall persevere by improving this region using a Local Refinement job. At lower contour thresholds where the receptor transmembrane (TMD) density is more visible, you might notice some additional blobs of density around the G-protein. This is not uncommon in single particle analyses of GPCRs. These blobs may represent sharpening artefacts that only become apparent near regions of good ordering, and only visible at a lower contour level than you would normally use to visualize these regions. In our case the only reason we took the contour so low was to see the relatively less well-ordered TMD.

12. Local Refinement of the TMD
In order to improve the density in the TMD, especially where the ligand binding is expected to occur, we need a custom mask around the area of interest. Different options for mask creation are described here, but we will make the mask using the molecular model, and this mask could be re-used if you have have more than one ligand-bound structure of the same protein.
Open the map from Refinement 5 in ChimeraX and also open PDB:7s8l and fit this into the density.
Create a 20 Å simulated map for chain R (the receptor) and chain A (the ligand) by using the command
molmap #X/R/A 20
where X is the model number for the PDB.Resample the map on the grid of refinement 3 with the command
volume resample #Y ongrid #Z
Where Y is your molmap model number, and Z is the Refinement 5 model number, and note the threshold at which the volume covers all of the TMD density in your Refinement 3 map.Save the volume and use the Upload Local Files feature to upload it to your CryoSPARC project directory.
Use an Import 3D Volumes job to import this volume to CryoSPARC
Use a Volume Tools job to create a custom mask. We need to make this somewhat tight to cover the ligand density but not include too much of the detergent micelle, so set
Type of output volume
: mask, set the threshold that you selected in ChimeraX, and addDilation radius (pix)
: 3 andSoft padding width (pix):
16. The reason we are adding a dilation radius as well as the extending the soft padding is so that the mask covers the density for the ligand as well as the receptor. Inspect the mask to ensure that it is appropriate: an example is shown in Figure 14 as a guide.Create a Local Refinement job and input the volume and particles from Refinement 5 and your newly made mask. We are now refining a small volume at around 40 kDa, and having aligned globally, we want to avoid the poses diverging too much, so we will restrict the rotations and translations at each iteration using the following settings:
Use pose/shift gaussian priors during alignment
true
Re-center rotations each iteration
true
Re-center shifts each iteration
true
Rotation search extent (deg)
10
Shift search extent (A)
5
Initial lowpass resolution (A)
4.5
We want to lowpass filter the map, but the default value of 15 in the Local Refinement is probably overly harsh, and we can usually select a value where the GSFSC is still ~1. In our case that was at ~ 4 Å but we set initial lowpass resolution
: 4.5 to be a bit more conservative.

Although the reported resolution of the Local Refinement is poorer than for the global Refinement 5 (in our case 2.69Å vs 2.41 Å), the density for the TMD is greatly improved as evident by examining the map quality for the transmembrane helices shown in the insets of Figures 14 compared to those in Figure 13 (Refinement 5).
12A) Comparison of local refinement map quality with a simpler processing pipeline
For comparative purposes, we used the same local refinement settings and mask used above on particles that were processed through a simpler route, skipping Sections 3 and 9 and 11; that is, we did not perform micrograph denoising, junk detection, orientation rebalancing, manual curation on the basis of per-particle scale or reference based motion correction. This pipeline ended up with 3M particles compared to our ~1.6M in this case study.

From Figure 15 we can see that if we do not use the more advanced processing steps found in Sections 3, 9 and 11 then the result of local refinement is poorer in quality. The inclusion of poorer quality particles in local refinement is associated with radial plates and streaks of density clearly indicating over-fitting during refinement.
In our good Local Refinement (see Figure 15-4, left), the unsharpened map has sufficient clarity to identify ligand density in the entrance to the TMD on the cytosolic side, however, the ligand density is still relatively poor and the density becomes fragmented after sharpening. A poorer quality of ligand density indicated either partial occupancy, or a mixture of binding poses.
To further improve the ligand density we will go on to locally classify this region to see if we can separate empty/full binding sites or alternative binding poses.
13. 3D Classification
In order to classify the region that contains the ligand, we need to create a mask that covers just this region. To do this, we can start with chain A (the ligand chain) from 7s8l.
In ChimeraX load up your Local Refinement map and updated atomic model and fit the model into the map.
Follow the same process that we used in Section 11, except use just chain A for the Molmap:
Save the volume and transfer it to your CryoSPARC directory.
Use an Import 3D Volumes job to import this volume to CryoSPARC.
Use a Volume Tools job to create the custom mask, setting an appropriate contour threshold, we used 0.0063, and the
Type of output volume
to mask. Our mask relative to the locally refined TMD is shown in Figure 16.Create a 3D Classification job (see here for a detailed tutorial) which will be Classification 1, inputting the volume and particles from the Local Refinement and using the following settings:
O-EM batch size (per class)
20000
O-EM learning rate init
1
Class similarity
0
Filter resolution
3.5
Setting the class similarity lower than the default of 0.5 means that the density within the mask for each class will be more independent, and increasing the learning rate from the default of 0.4 to 1 allows the volume in each class to be totally replaced at each iteration, allowing for more evolution of the class volumes. The filter resolution is important as this needs to be able to represent the sort of class-to-class differences that you are looking for, but also not to include too higher frequencies that may result in classes with only very minor high frequency differences. It is best to choose the lowest resolution that can still represent the thing that you are interested in, in this case 3.5 Å is a good level to observe ligand density.

In the output you will see some plots indicating statistics from the classification. The example per-particle class size (ESS) plot shown in Figure 16 indicates that most of the images have smeared probability across multiple classes with a mean ESS score of 2.8, which means that on average, each particle might belong to any of 2.8 classes. The class distribution shows all of the classes are equally populated and these two observations together suggest that there either may be continuous heterogeneity (for example flexibility in the ligand binding) or that the signal inside the mask is not sufficient to have high confidence in the classification at the filter resolution.
14. Assessment of 3D classes for manual regrouping
To inspect the class volumes from 3D classification at higher pixel sampling and resolution, next run a Heterogeneous Reconstruction Only job to obtain higher resolution volumes for each of the 10 classes. To visualise the ligand density, we produced difference maps for the 10 classes by subtracting a fitted molmap of the apo receptor PDB structure using ChimeraX.
Download the volume series (not the sharpened one)
Open the 10 unsharpened volumes (not as a series) in a new ChimeraX session
Open 7s8l, fit it into one of the maps and execute the following commands
set bgColor white; graphics silhouettes true; fit #11 inmap #1; lighting soft
molmap #11 /R/B/C/D/E 3
volume subtract #1 #12; volume subtract #2 #12; volume subtract #3 #12; volume subtract #4 #12; volume subtract #5 #12; volume subtract #6 #12; volume subtract #7 #12; volume subtract #8 #12; volume subtract #9 #12; volume subtract #10 #12
surface dust #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 size 5
volume level 0.07176
These steps should generate maps in which only the unmodeled density is now visible, and small specks and noise have been removed. You can inspect each one in turn and decide which classes look the most promising. You might find using the side view function allows a clearer view by removing detergent density from view.

After inspecting the unmodeled density in the 10 class Heterogeneous reconstruction, we identified 4 broad categories of ligand shape that we refer to as poorly ordered (grey), party-ordered (gold), linear ordered (purple) and circular ordered (green). The manually chosen category for each class has been indicated with the corresponding colour of box behind each volume in Figure 15A. Due to differences in particle stacks, masking or 3D classification initialisation, the volumes you observe may be different to those shown above, but we reproducibly found one or more classes with a circular ordered appearance.
Run 4 separate Homogeneous Reconstruction Only jobs; one for the particles that match each major ligand shape by grouping together the classes with similar-looking density.
15. Mask optimisation for resolution estimation
CryoSPARC refinements have in-built masking that automatically tightens the mask at the end of refinement, and this mask is used to calculate the global resolution displayed in the GUI. Map sharpening by applying a global B-factor is also handled automatically in refinement and reconstruction allowing for the output of automatically sharpened maps.
It is always a good idea to inspect the FSC curves and mask, as well as the sharpened map connectivity in case the mask or sharpening B-factor need to be adjusted. We are going to check those out now in Sections 15 and 16.
Open up the locally refined map in ChimeraX, and the mask_fsc_auto. This mask can be found in the outputs tab of the job in the Refined Volume section (Importantly, this is different from the mask provided in the Outputs Group section, which is the final iteration mask before auto-tightening). Set the map at a contour threshold where you observe the features you are interested in, but not including noise, and set the mask threshold to 1. This mask threshold shows the contour at which the mask was binarised, and on top of this there is a soft edge that extends out to the contour observed at a threshold of ~0.
Examine the mask and map together and decide if the mask encompasses the volume of your particle adequately. We found that the auto-tightened mask included some density from the adjacent G-protein (Figure 18, left).
Examine the FSC for the auto-tightened map and compare the Tight and Corrected curves. As described in the mask creation tutorial the difference between these curves can be used as an indicator of mask tightness. In our example in Figure 18 we do see a sudden drop in the corrected curve, suggesting there is density outside of the mask.

Although this mask is generous enough it also includes density outside of the region of interest, so we will design a new mask that only covers only the receptor and ligand.
Open up the map from Local Refinement in ChimeraX and also open PDB:7s8l or an updated model and fit this into the density.
Create a 15 Å simulated map for chain R (the receptor) and chain A (the ligand) by using the command
molmap #X/R/A
15 where X is the model number for the PDB.Resample the map on the grid of the local refinement map with the command
volume resample #Y ongrid #Z
Where Y is your molmap model number, and Z is the local refinement model number, and note the threshold at which the volume covers all of the TMD density in your local refinement map.save the file and transfer and import it to CryoSPARC.
We want this mask to be more generous than the one used for Local Refinement, so we will add a dilation radius.
Create a Volume Tools job and input the newly imported receptor volume. Set the
Threshold
to the predetermined value (we used 0.13) and set the `Dilation radius (pix) to a few pixels (we used 6).Create Validation (FSC) jobs for each of the homo reconstructed volumes, using the newly made mask and inspect the FSC curves.
We found that the reconstructed receptor maps had reported FSC=0.143 resolutions of 2.75-2.83 Å .
16. Map sharpening and inspection for model building
At the end of the day, the resolution number is less important the map quality, especially in the region of interest and in this case we are most interested in the ligand density and its binding interactions. Taking into account the estimated resolutions, we can now inspect the sharpened maps and look for map feature consistency. At ~2.8 Å resolution we would expect to be able to see good definition of side chains, some density for the backbone carbonyls, and if it was a soluble region, possibly density for well-resolved water molecules.

Compare the sharpened and unsharpened density from your Homogeneous Reconstruction jobs. You will find that the auto sharpened map has good side chain density, and better definition of backbone carbonyls than the unsharpened map (features that we would expect to see in this resolution range), however it will likely also has more fragmented density for the ligand. Example densities are shown in Figure 19A. We need to strike a balance here because we are mostly interested in the ligand and its binding interactions, and so we would like to manually adjust the sharpening B-factor ourselves to optimise for this region.
Create a Sharpening Tools job for each of the homogeneous reconstructed volumes, and add
B-factor to apply
that is lower than the value that was previously used. In our case we found that -50 or -60 gave a good balance for the density quality of the receptor and ligand (see Figure 19A).
Having appropriately sharpened our maps, we were able to go on to model the ligand poses for three of the maps and refine each using Phenix (ligand models shown in Figure 19B) to aid comparison to the deposited maps and model.
17. Interpretations and conclusion
Our final maps show density consistent with different binding poses of the ligand Cortistatin-14. By locally refining and classifying, the ligand density is now much clearer in at least two classes than in the deposited map (see Figure 19). Q-scores for the receptor and ligand suggest a fit that is consistent with the reported resolutions.
For the Cortistatin-14 peptide to be circularised, a disulphide bond needs to be formed between the two cysteine residues. Using our workflow here, the presence of a circular class and a linear class indicate that the Cortistatin-14 in this sample is partially reduced (i.e. some of the molecules have their cysteines disulphide bonded and some do not). This means that the disulphide bond that usually keeps it circular in-vivo has been broken by reduction, and we noticed that the reducing agent tris(2-carboxyethyl)phosphine (TCEP) was used upstream during sample preparation. Partial reduction could explain why we have both an ordered-circular (S-S bonded) and ordered-linear (S-S broken) class.
We were unable to build a satisfactory model into the density for the poorly ordered class, and there may be residual heterogeneity in terms of ligand binding poses within the particles used for this map. Examining our three ligand models, we can see that the register in our model is different to that of the original model 7s8l (see Movie 1). We were able to confidently assign the register due to the bulky side chain residues, and in particular the tryptophan.
Cortistatin-14 binds to the cytoplasmic side of the MRGPRX2 receptor and displays a variety of ligand binding poses. The N-terminus of Cortistain-14 is in the same location in all three cases, and this is likely the important region for binding affinity. The C-terminal portion of the ligand, however, is exposed to the cytoplasmic space and therefore has the capability of forming different arrangements.
We wanted to note that if we used the overfitted local refinement in Section 12B for classification using the same settings and mask as used in Section 13, we were unable to find a class with circular ligand density, even though the ligand appeared circular in the parent map. This emphasizes the importance of data quality when performing both local refinement and local classification, and in this case study, using some of the new features recently added in CryoSPARC allowed improved interpretation of the data.
You can download our versions of the final maps, half maps and masks from the links below for comparison with your own processing!
Last updated