Case study: End-to-end processing of a ligand-bound GPCR (EMPIAR-10853)

Processing EMPIAR-10853 including Micrograph Denoising, Micrograph Junk Detector, Subset Particles by Statistic, Local Refinement and focussed 3D Classification to improve ligand density.

Introduction

In this tutorial we will work step-by-step through the processing pipeline for an active G-protein coupled receptor with a ligand, G-protein and antibody fragment bound, using a dataset originally processed by Cao et. al and deposited as EMDB:24896 and PDB:7s8l. The raw data are publicly available for download as EMPIAR-10853, and this case study is written so that you can replicate these results yourself.

We selected this dataset to provide a processing pipeline for a target with flexibility in the region of ligand-binding, and heterogeneity in the ligand-binding pose. In particular, this case study provides strategies and ideas about how to get a higher quality final particle stack, which we found was essential for downstream local refinement, and focussed classification steps. Image processing was performed using CryoSPARC v4.7.

The Mas-related G-protein coupled receptor member X2 (MRGPRX2) receptor is a member of the G-protein coupled receptor (GPCR) protein family that are characterised by their 7 alpha helical transmembrane helices. GPCRs are predominantly found on the cell surface and are important in cell signalling and as therapeutic targets. The MRGPRX2 receptor is largely found on the surface of mast cells and is activated by agonist ligands. Upon activation it binds to Gq (a heterotrimeric G-protein). This in turn activates the mast cell to degranulate and this process plays a role in host defence, inflammatory diseases and pseudo-allergic drug hypersensitivity.

In this sample, Gq and a single-chain variable fragment (scFv16) were added to the receptor, along with the agonist peptide ligand Cortistatin-14. We can calculate from the existing pdb 7s8l that the combined complex has a mass of ~140 kDa and, but for cases where a molecular model is not available, the mass could also be ascertained from protein sequences (for example in UniProt) or by analysing a purified sample by mass spectrometry. From the structures obtained in Cao et. al, we know that this complex is a non-symmetric membrane protein complex containing a transmembrane receptor domain (TMD) and an intracellular domain formed of the G-protein that extends outside of the membrane, and that the expected ligand binding location is on the cytoplasmic side of the receptor domain. This protein was prepared in a detergent micelle containing lauryl maltose neopentyl glycol, glyco-diosgenin and cholesteryl hemisuccinate.

The primary aim of this pipeline is to generate the highest quality map(s) to ascertain the binding pose(s) of the inhibitor peptide, Cortistatin-14.

Setting up

Before beginning this tutorial, you should create a new project and a workspace within that project. Download the movies and gain references to a location of your choosing. Our data is downloaded to a directory called rawdata in the project directory using the command:

cd /path/to/rawdata
wget -m <ftp://ftp.ebi.ac.uk/empiar/world_availability/10853/> .

1. Movie import and preprocessing

Now we need to get the data into CryoSPARC. These data were recorded over two days and for each day there is a separate gain reference and a slightly different total dose.

Import the data as two separate jobs using an Import movies job. The Movies data path needs to match the location of the directory containing the downloaded .tif files for example rawdata/EMPIAR/10853/data/20210311*.tif. The Gain reference path for the first day needs to be specified for example rawdata/EMPIAR/10853/data/CountRef_20210311_109_000.mrc. Experimental information such as pixel size, total electron dose / A^2 and voltage are available in the EMDB entry 24896.
Parameter
Setting
Flip gain ref & defect file in Y?
Yes
raw pixel size (A)
0.91
Accelerative Voltage (kV)
200
Spherical Aberration (mm)
2.7
Total exposure dose (e/A^2)
50.74
Repeat the process for the second day files (20210312) and changing the Total exposure dose (e/A^2) to 47.87.

In this example the difference in dose is not very substantial between the two days of collection, but we will treat the two subsets of movies separately as an example of how to do this. In cases where movies are collected as a single session, have the same number of frames, and the total doses are very similar (less than ~10% different), even if there are multiple gain references available, you might prefer to import all of the images together as a single job, using just one of the gain references, and perhaps setting the total dose to an average value, provided that all of the gain references have a similar appearance.

Next we want to correct for beam-induced motion during movie collection and to estimate the defocus values for each micrograph.

Run Patch Motion Correction, inputting both imported movie sets from above. Use save results in 16-bit floating point: true and increase Number of GPUs to parallelize depending on GPU availability.

Using float16 format means that the output images will take up half of space than the default 32-bit floating point files. You may expect each job to take several hours on a single GPU.

Run Patch CTF Estimation with default settings

2. Excluding poor micrographs from downstream processing

Not all of the movies that are collected will be perfect; they can contain a variety of non-vitreous ice and contamination with ice crystals or other substances, excessive in-movie motion and ice that is too thick or too thin for your sample. We would like to avoid picking particles from poor micrographs because they are often of low value, and can require extra cleaning steps to remove after they are extracted. It is easier to exclude the worst micrographs at the start, by looking at the statistics generated by the Patch Motion and Patch CTF jobs.

Use a Manually Curate Exposures job to exclude poor quality images on the basis of their CTF estimated micrograph statistics so they are not used for downstream processing.

Input the “Micrographs processed” from Patch CTF, and queue the job. In the job card view go to the pink Interactive tab.

You may notice that many of the images here after import have faintly darker stripes along the left and right sides of the images, (See Figure 1, left). This is likely due to the gain correction. You may prefer to generate a gain reference from the movies using a different software package rather than use the supplied gain reference files, but we found that including this sort of step did not improve the final map quality or resolution.

Select the upper and lower bounds for each parameter to exclude outliers, and browse thumbnails and diagnostics of the images to check the appearance of the excluded and included micrographs. For each parameter, try moving the slider or typing values and click “set threshold” to see how many images have been excluded.

We set an upper limit for CTF fit resolution at 3.5 Å to avoid using poor quality images, an upper limit of 70 for Average Intensity to remove outlier images, and an upper bound on Relative Ice Thickness at 1.042 to remove low signal-to-noise images with thick ice. This is a nice dataset with relatively few poor micrographs, and overall we were left with 11,573 micrographs.

The above example shows the micrograph, power spectrum, CTF fit and motion for two example movies. The CTF fit and motion for both images look alright, but the thon rings for the red example are weaker, due to interference from the strong vertical lines in the movie. This might originate from a problem with gain correction of the detector for a few images.

In other datasets it can also be helpful to set thresholds for average defocus, astigmatism, and full-frame motion distance, but it is a good practice to examine all of the available plots to see where outlier populations arise.

3. Micrograph denoising and Junk Detection

3A) Micrograph Denoising

Particles can be hard to identify by eye due to factors such as thick ice, small particle size, carbon or graphene monolayers and low defocus, and these can make selecting thresholds for particle picks somewhat challenging. To improve the quality of template and blob picks, and make thresholding easier we use the Micrograph Denoiser.

Create a Micrograph Denoiser job, inputting the exposures_accepted and set Number of CPUs to Parallelize to the number of CPUs available on your node.

We can identity the particles much more easily after denoising and so the denoised images are suitable for Blob picking, Template picking, and Inspect Picks job types.

3B) Junk detection

Micrographs often have different types of unwanted features in them, such as ethane contaminants, ice crystals, crystalline ice and gold or carbon support, which can be tricky to get rid of during classification steps. We can use the Micrograph Junk Detector, released in v4.7, to analyze the micrographs and automatically detect different types of junk. Subsequently, we can reject particle picks that are on or near the junk regions.

Create a Micrograph Junk Detector job, inputting the exposures_accepted from the upstream Micrograph Denoiser job.

The Micrograph Junk Detector has masked out in purple regions that contain extrinsic ice junk (ethane contaminants and ice crystals), and from the job statistics we can see that while almost every micrograph contains junk (overwhelmingly extrinsic ice junk), the total area that this junk occupies is relatively low at <4%. Dataset-level junk statistics might be helpful to diagnose patterns in junk appearance that may relate, for example, to sample prep changes.

4. Blob particle picking

This is a pretty large dataset, so to speed up generation of 2D templates, we are going to make a subset of 200 micrographs for initial blob picks.

Run an Exposure Sets Tool job, inputting your exposures from after the Micrograph Junk Detector, and set Split batch size: 200 and Split randomize : true

We know this receptor has a G-protein and a scFv bound and has a total mass of 140 kDa. Suitable diameters for picking can be estimated by looking at the “ruler” in the Manually Curate Exposures job, on in ChimeraX from homologous deposited structures or AlphaFold models. The diameter does not need to be precise, and setting the diameter ~10% larger than the estimated diameter of the particle can be beneficial to account for imperfect estimation, and the effects of defocus in the images. We estimate the longest diameter is around 120 Å based on PDB 7s8l.

Using the micrographs in split_0, run a Blob picker job with a minimum particle diameter (A) 100, and Maximum particle diameter (A) 130.

Not all of the picks made will be of good quality, and to remove egregious junk picks we can use a combination of automatic removal of junk picks by using the junk detection that we ran earlier, as well as manual thresholding of the pick scores.

Run a second Micrograph Junk Detector job and input the exposures from the first Junk Detector job along with the blob picked particles, setting the Rejection distance from junk:30.

The distance from detected junk where you want the junk detector to reject particles may depend on the type of junk present, the size of the particle, and the box size you want to extract. For larger particles and larger boxes, a larger value maybe more appropriate.

Run an Inspect Picks job and look at the first few micrographs.

The overall goal here is to select as many particles as you can without picking empty ice or junk. You should see that there are very few picks on the junk regions as we already rejected those near the detected regions of junk. To remove any remaining junk picks, move the NCC and power score sliders until the lower bound removes empty ice picks, and bring the higher bound down to remove high contrast feature picks without seeing good-looking picks being discarded.

Be sure to inspect micrographs of relatively low and high defocus when adjusting the power sliders, so that particles are being picked across the defocus range.

You can expect to select ~200k particles from a 200 micrograph subset.

As we performed Inspect picks with the denoised micrographs you may notice that the contrast is more similar across the range of defocus used than you would typically see with lowpass filtered images.

5. Blob pick extraction and 2D classification

We expect the MRGPRX2 receptor protein to have a diameter of ~ 120 Å, and this dataset was recorded with a defocus range extending up to around -2 µm. We want to ensure that we capture most of the delocalised signal caused by imaging with defocus, so we need to make sure the box is large enough.

As the defocus is increased, the higher frequency components from particles are delocalised further out in real space due to the point spread function of the microscope. If too small a box is selected for extraction, some information about your particle is lost, and this may limit the attainable resolution. Conversely using an excessively large a box can lead to a lot of noise in the images, and this can also have a negative effect on the resolution of your reconstruction. As a very rough rule, a box of ~1.5-2.5 x the diameter of your particle is often appropriate, however very high resolution data or data collected with high defocus may require a larger box. The box size must be an even integer of pixels, and it is best if you choose or downsample to a box size that is computationally efficient.

Extract the particles using the Extract from Micrographs job with Extraction box size (pix): 256 (233 Å which is ~2 x the particle diameter) and set Fourier crop to box size (pix): 64. Select Save results in 16-but floating point: true to save on the disk space required.

We will be using these particles to generate 2D class average templates, so we do not need them to be at high resolution. Therefore we can Fourier crop to a box of 64 and this will speed up the downstream jobs. Expect the number of extracted particles to be lower than the number previously selected, because CryoSPARC rejects particles where any part of the box is outside of the micrograph. Fourier cropping for 2D classification saves on disk space and may speed up caching and early particle cleaning jobs.

Now that we have our particles extracted, we want to use 2D class averaging to crudely separate good particles from poor ones.

Run a 2D Classification job with the extracted particles and the following settings:

Parameter

Setting

Number of 2D classes

100

Maximum resolution (A)

initial classification uncertainty factor

Circular mask diameter (A)

160

Circular mask diameter outer (A)

180

2D zero pad factor

We don’t have a large number of particles, but in order to speed up the classification we set the maximum resolution at a limit where we would expect to be able to discern visually between particles and junk, and between different viewing directions, and we can also reduce the zero padding from 2 to 1. Adding a circular mask can help to keep the class averages in the centre, which is helpful for template picking.

The initial classification uncertainty factor influences the diversity of particle views, or junk classes that are identified. A lower value (such as 1 as used here) tends to increase the number of different junk classes, and can be useful when the preparation contains largely one type of particle. In cases where the particle stack contains very little junk, a higher number (such as 5 to 10) may be helpful in separating views.

In the class averages shown in Figure 5, most of them contain easily recognisable ovoid density for the detergent micelle, with an asymmetric protein domain protruding out on one side. We also observe some views of the micelle where the protruding domain is not that visible, and so the class averages look like a distorted ellipse. It is important not to exclude these views ( "top views"), as they can be relatively rare amongst GPCR cryo-EM datasets.

While we don’t expect every view to be equally sampled (because of target interactions with the air-water interface or support, or ice thickness restriction) it is still important to have an adequately diverse set of selected particle views in the dataset to produce a reliable 3D volume.

Run a Select 2D Classes job and select the class averages that contain views of the intact target like the examples above.

These class averages already look OK and can be used directly as templates for picking without further cleanup.

6. Template particle picking, extraction and duplicate removal

We can improve the picking quality by supplying 2D templates so that the picks better resemble the expected target.

Create a Template Picker job and connect the denoised micrographs and the selected templates from Select 2D Classes. Set pick on denoised micorgraphs: true and Particle diameter (A): 170.

In our processing the Template Picker yielded ~14 million particles, but we want to remove the most offensive junk picks before extraction.

Run a Micrograph Junk Detector job and input the exposures from the previous Junk detector job along with the template picked particles, setting the Rejection distance from junk:30.
Run an Inspect Picks job and look at the first few micrographs. Move the NCC and power score sliders in the same manner as Section 4 to remove picks that look overly generous.

Extract the particles in the same manner as Section 5.

At this stage, we had 11.5M particles.

Our aim through any basic single particle cryo-EM processing is to separate junk from good particles, without losing valuable rare views. Sometimes it is better to skip 2D classification and go straight to 3D, to avoid losing rare views at the stage of class average selection.

In order to separate the particles by Heterogeneous Refinement, we need a good input volume and some junk volumes. We can generate those from the particles that we 2D classified in Section 5, but if you prefer to 2D classify the Template picked particles and use those instead, they should yield approximately the same result.

As we already separated some good particles from bad ones after blob picking, we can use those particles to generate low resolution volumes in two separate jobs:

Run an Ab-Intio reconstruction job (1) with number of classes: 1 and input the “particles selected” from your Select 2D job.
Run an Ab-Intio reconstruction job (2) with number of classes: 5 and input the “particles excluded” from your Select 2D job.

Once the jobs are complete, you can asses the volumes for each class by selecting the Volumes tab within the job, and inspect each volume one by one. You should find the volume from Ab-Initio (1) resembles a GPCR in a micelle, and the 5 from Ab-Initio (2) are less clearly defined volumes that are probably junk. If you find a second nice GPCR volume in Ab-Initio (2), then discard this volume from downstream processing.

The number of junk classes required depends on the diversity of junk that might be present in the sample. Generally we have found 1-5 junk classes sufficient in most cases.

Run a Heterogeneous Refinement job (1) inputting all of the extracted particles from Section 6. Input the Ab-Initio models that you obtained above to classify into different classes and set Refinement box size (voxels): 64 as this is the box size that the particles were Fourier cropped to, and using the default 128 would lengthen the job runtime.
As we know which class was the best input volume, we can predict that this same class will be the best output volume, so we can queue up anther Ab-Initio job (3) inputting the particles from the best class from Hetero Refine 1, and setting Number of Ab-Initio classes : 5 and Num particles to use : 20,000.

We didn’t find that we needed more than 20,000 particles for this job, but if the volume quality is not good you can try with more particles, e.g. 50,000-100,000.

Inspect the volumes from Ab-Initio (3) to find the best volume(s).
Run three further sequential Heterogeneous refinement jobs (2-4) using the volumes from Ab-Initio (3), each time taking forward the particles from the best class(es).

We found that the percentage of particles found in the best class at each round of Heterogeneous Refinement were 48%, 63%, 92% and 95% and this left us with ~ 2.3M particles. Generally when Heterogeneous Refinement yields ~95% of particles in the best class then you are in the realms of diminishing returns and it is usually not worth running further rounds of this job.

8. Re-extraction at a larger box size and Non-Uniform Refinement

Now that we have cleared out junk particles via Heterogeneous Refinement, we are ready to re-extract with a larger box size for the following reasons:

Using a higher pixel sampling (less Fourier cropping) extends the achievable resolution limit (Nyquist frequency) on the particle images.
Extracting particles after 3D alignments for example in Heterogeneous refinement means that they will be better centred in the box.
A smaller stack of particles (2-3 million compared to the original 14 million) will be read and cached faster for downstream jobs.

As we have now aligned the particles we should also take care to remove any duplicates based on their updated shifts before we re-extract.

Run a Remove Duplicate Particles job with Minimum separation distance (A): 50 and using the particles from the good class(es) at the end of Section 7.
Re-extract the non-duplicated particles with Extraction box size (pix): 320 and Fourier crop to box size (pix):256. We expanded the box size in Å to ~2.5 x the particle diameter to capture more of the delocalised signal, but Fourier cropped back to 256 for computational efficiency.
Input the particles into a Non-Uniform Refinement job (Refinement 1) using one of the good GPCR 3D volumes from Heterogeneous Refinement with Minimise over per-particle scale on, and Dynamic mask start resolution (A) :1.

The reason why we set the dynamic masking to start at 1 Å (meaning that it will never start, since we don’t expect to achieve a 1 Å resolution) is that we want to disable masking during refinement, and rely on the Non-Uniform regularization to adaptively exclude noise. This method can sometimes yield slightly better map quality with NU refinement. If you wish, you can compare the results with and without this setting to see how it affects the map quality and statistics.

Per-particle scale is a function that compares the contrast of each aligned particle image with a projection of the refined volume and gives it a score. Effectively this gives a high score to particles with higher contrast that match the reference well, and a low score to those that have lower contrast and do not match the reference well. Minimising over per-particle scale means that the best particles get up-weighted, and the worst ones down-weighted. Often this is beneficial and can slightly improve the map quality and metrics.

Examine the output unsharpened and sharpened map in the volume viewer and look at the real space mask, cFSC plot and viewing direction plot. The reported resolution should be around 2.9 Å.
Run an Orientation Diagnostics job to assess the orientation sampling in more detail by connecting the volume, particles and mask from Refinement 1. Orientation diagnostics by default uses the solvent mask from refinement, however we disabled masking during refinement, so for the mask instead use the volume.mask_fsc_auto volume.

To change the mask taken from a refinement job, you can alter the input for the mask volume slot. To do this, open the inputs information in the mask field that you already loaded into the Orientation Diagnostics job builder (by clicking the toggle arrow) then go to the Outputs tab of the Non-Uniform Refinement (Refinement 1) job and drag over the slot called .volume.mask_fsc_auto.F into the mask field for the Orientation Diagnostics job.

In Figure 8 we can see that the volume looks good before sharpening, but after sharpening some parts have fragmented density, for example, the bottom of the transmembrane domain. The orientations for this particle are pretty well distributed for a GPCR with a cFAR score of 0.67 and SCF score is 0.826 but they are not uniformly sampled.

9. Improving the particle stack using Rebalance Orientations and Subset Particles by Statistic

Although the refinement in Section 8 was already pretty good, we aim to investigate and classify the ligand-binding site later on in the pipeline, so we want the particle stack to be as high quality as possible to avoid classification artefacts. Two jobs that can help with this are Rebalance orientations and Subset Particles by Statistic (new in v4.7). First we will filter out some of the slightly excess views, by removing the least helpful particles in those poses.

Rebalancing orientations can be most beneficial on datasets that have strongly over-represented views, but do not have views that are totally absent. In extreme cases, orientation bias can lead to stretched or compressed maps in the direction of low information, as well as resolution anisotropy.

Run a Rebalance Orientations job setting the Intra-bin exclusion criterion to alignments3D/alpha.

alignments3D/alpha is the internal CryoSPARC software name for the per-particle scale metric described earlier, and so using this setting will mean that particles from over-represented views that have the lowest per-particle scale will be rejected.

We found that ~300k particles were removed.

OPTIONAL: Run a Non-Uniform Refinement (Refinement 2) with these particles with the same settings as used for Refinement 1, and assess the appearance of the per-particle scale histogram (optional)

We found the plot showed a bimodal distribution that is overlapping, and this manifests as a shoulder on the main peak (see Figure 9).

When a multimodal distribution is observed in the per-particle scale factors, it can mean that the peak at the low end contains poor quality, poor contrast or poorly matching particles compared to the refined volume. These particles often either do not benefit, or in some cases hinder the overall reconstruction.

We want to remove particles that are of low quality and will not be meaningfully contributing to the reconstruction. To do this:

Run a Subset Particles by Statistic job, selecting Subset by: Per particle scale. You could use the default subsetting mode that uses gaussian fitting, but we chose to use Subsetting mode: Split by manual thresholds, Number of thresholds: 1 and Threshold 1: 0.9.

Finding the optimal threshold that removes the poorest particles but does not negatively affect the map resolution may take some trial and error. Setting a threshold of 0.9 we rejected a further ~400k leaving us with ~1.6M particles.

Run Non-Uniform Refinement (Refinement 3) using the particles in cluster 1, with the same settings as Refinement 1.
Optional: Run a Non-Uniform Refinement (Refinement 3B) using the particles in cluster 0 for comparison with Refinement 3.

You should see that the map quality and GSFSC resolution are essentially the same between refinements 3 and 1, but the cFAR has marginally improved. We saw the refinement report an improvement in cFAR from 0.7 to 0.76 calculated without masking.

The purpose of the steps in Section 9 are to remove poor and unnecessary particles so that the final stack is of higher quality overall. At this stage the effect on the result looks underwhelming, but the quality of the particle stack can have strong effects on downstream jobs like local refinement and 3D classification. We will show a comparison in Section 12 for what happens in local refinement if Sections 3, 9 and 11 of this case study are skipped.

10. Global and Local CTF Refinement

Now that we have nominally improved our particle stack, it is worth checking to see if there are microscope aberrations that can be corrected for. We can also go ahead and try Global CTF refinement to fit tilt, tetrafoil, spherical aberration and anisotropic magnification. Correcting for these, where possible, can improve the CTF fit, and therefore the map quality, and as a general rule, higher resolution maps benefit more from these corrections than medium resolution maps.

Run two Global CTF Refinements, inputting the particles from the Local CTF Refinement particles, volume and mask_fsc_auto from Refinement 1.

Try using Fit Anisotropic Mag. : true
Try Fit Anisotropic Mag., Fit Tetrafoil andFit Spherical Aberration : true

Run a Homogeneous Reconstruction Only job for each using the mask_fsc_auto from Non-Uniform Refinement 3.

When testing the helpfulness of Local or Global CTF Refinement in improving map resolution, using homogeneous reconstruction rather than refinement can take a fraction of the time.

In Figure 10 you can see that both tests improved the resolution from Figure 8, but that including spherical aberration and tetrafoil is even better. The plots for the tilt, trefoil and anisotropic magnification display clear regions of red and blue to fit to, but the spherical aberration is less clear, and by eye it is difficult to know if including the fit will be useful. See the Tutorial on CTF Refinement for a detailed description of how to interpret the plots, but essentially the brighter and more consistent the red and blue areas are in the data, the more reliable the fit its likely to be.

These particles are on the smaller side, but as they were recorded on a Talos at 200 kV their signal-to-noise may be sufficient for local (per particle) defocus refinement to account for slight differences in particle height across the grid.

Run a Local CTF Refinement job inputting the particles, volume and volume.mask_fsc_auto mask from Refinement 1. This mask needs to be used because the refinement was run without a solvent mask.

Figure 11 shows example outputs from a Local CTF Refinement job, we can see that the spread of defocus change across the particles does not extend beyond 500. The defocus profiles shown have a single obvious minima in the centre flanked by two less favourable local minima. These patterns, where there isn’t just a single dip, can be typical for a small particle.

Run a Homogeneous Reconstruction job using the mask_fsc_auto from the Non-Uniform Refinement. We can see in Figure 11 that the resolution was slightly worsened, so we should not include local CTF refinement at this stage.
Figure 11. Results from Local CTF Refinement. Example plots for change in defocus, and per particle defocus landscapes from Local CTF Refinement, and the GSFSC curves before and after Local CTF Refinement.
Using the particles from the Global CTF Refinement including Tetrafoil and Spherical Aberration, run a Non-Uniform Refinement job (Refinement 4) on the particles from the Global CTF Refinement including Spherical Aberration, including the following settings:

Parameter

Setting

Minimize over per-particle scale

true

Dynamic mask start resolution (A)

Optimize per-goup CTF params

true

Fit Tetrafoil

true

Fit Spherical Aberration

true

You will notice that although we fitted Anisotropic Magnification in the Global CTF, we will not also do so during the Non-Uniform Refinement job. This is for two reasons; first, the anisotropic magnification is not likely to change greatly over the course of processing, whereas the other parameters might change more as the refinement achieves higher resolution, and second, when Aniso. Mag. is refined iteratively during either Homo Refine or NU refine, we have found rare cases where the values become unstable, so it is safer to only correct Anisotropic Magnification during Global CTF Refinement.

The resulting map from Refinement 4 should have a higher resolution estimation than Refinement 2 and at this stage we had a map nominally at 2.64 Å with an auto tightened mask.

11. Reference Based Motion Correction

We have now improved our map resolution with CTF correction, but we can further improve the particle image quality by correcting for dose-dependent sample damage as well as minor particle motions during imaging using Reference Based Motion Correction (RBMC). RBMC requires that the total dose for all exposures is identical within a single job. Since we combined the two movie sets together during Patch Motion Correction, so we now need to split them again.

Create an Exposure Group Utilities job, inputting the accepted exposures from the Curate Exposures job in Section 2, and set Input Selection: exposure, Action: info_only and Split Outputs by Exposure Group : true.

This job will output two subsets of micrographs, one for each of the exposure groups (this is one exposure group per day of collection).

We will now run Reference Based Motion Correction in two stages so that we can assess the hyper-parameters and dose weighting before proceeding.

Estimate hyper-parameters for motion and dose weights

Run a Reference Based Motion Correction job (A) using the Non-Uniform refinement inputs from Refinement 4, and one of the micrographs exposure groups split above.
Change the low level input slot for the input mask: drag over the volume.mask_fsc_auto.F because the refinement only applied a mask at the stage of FSC autotightening.
set Final processing stage:Compute empirical dose weights
Use more GPUs to accelerate the job if you have the resources
Run a second RBMC job (B) with the same settings, but with the other micrographs exposure group.

Example hyper parameters and empirical dose weights are shown in Figure 12.

This dataset has 60 frames per movie. We would expect most of the high frequency information to be found in the first frames, before the sample has undergone radiation damage. In Figure 12 we see that the majority of the weighting (dark red colour) for high frequency information is in the early frames, but there is also an apparent reappearance of high frequency information in the last frames.

It is important to always check and interpret plots such as the empirical dose-weights. In some cases with small particles and high noise, longer movies or many frames, RBMC’s optimization of priors can lead to a situation where the late frames appear to have strong high frequency signal. If such dose weights are applied during the particle correction stage, the results may be poorer than if only low frequency information is being used from the late frames. In tricky cases like this, we found that it can help to manually strengthen the spatial prior just for the dose-weighting calculation step, and we find that this can often improve the appearance of the empirical dose weights, as well as the reported map resolution after RBMC.

We will try to improve this situation, but if you chose to skip RBMC or continue with correction of the particles using the hyper parameters output from jobs (A) and (B) you may observe over-fitting artefacts later on in the processing pipeline.

Clone the two RBMC jobs that you just ran (A and B) so that you have jobs (C) and (D), but manually enter in the values for Override: acceleration prior strength and Override: spatial correlation distance that were ascertained in jobs (A) and (B) respectively. For the Override: spatial prior strength set this to be 1/5th of the value that was determined previously, for the example shown above we would use 1.0984e-03 instead of 5.4929e-03.

When you inspect the empirical dose weights from the new jobs you should see plots that are more in line with our expectations of radiation damage (see Figure 12C and D).

In making the acceleration prior number smaller we are applying a stronger constraint on how much the particles move between frames. Using the updated value (divided by 5) seems to work well for the empirical dose weight estimation, however using this value for the particle correction stage may not yield the best results as very little motion would be corrected, so we will use the improved dose weights, along with the original hyper parameters to get the best of both.

Create a new Reference-Based Motion Correction job (E) with the same settings as (C), drag over the hyper parameters from (C) (this will contain both motion priors and dose-weighting) then override the values of the motion hyper parameters with those that you obtained from A) into the Override: spatial prior strength, Override: spatial correlation distance and Override: acceleration prior strength.
Set Save results in 16-bit floating point on and motion-correct particles Final processing stage:motion-correct particles
Repeat the process to create another RBMC job (F) using the hyper parameters from D, and overriding with the priors from (B).

In Figure 12 we can also see the extent of the modelled particle motion for particles from each half of the dataset.

Take both sets of particles from RBMC (E) and (F) and use them to run a Non-Uniform Refinement job (Refinement 5) with the same settings as in Section 10.

The reported resolution and quality of the map should have improved. In Figure 13, we can see that the density for the TMD is poorer than that for the G-protein and scFv16. This is expected because in GPCRs it is typical that the G-protein and receptor domains move somewhat relative to one another, and the G-protein and ScFc16 being more rigid tend to dominate the particle alignment in consensus refinement. Recall that the aim of the processing here is to define the ligand binding pose and interactions, however, the ligand binding site is the poorest part of the map! This misfortune also reminds us of Murphy’s law, but all is not lost, we shall persevere by improving this region using a Local Refinement job. At lower contour thresholds where the receptor transmembrane (TMD) density is more visible, you might notice some additional blobs of density around the G-protein. This is not uncommon in single particle analyses of GPCRs. These blobs may represent sharpening artefacts that only become apparent near regions of good ordering, and only visible at a lower contour level than you would normally use to visualize these regions. In our case the only reason we took the contour so low was to see the relatively less well-ordered TMD.

12. Local Refinement of the TMD

In order to improve the density in the TMD, especially where the ligand binding is expected to occur, we need a custom mask around the area of interest. Different options for mask creation are described here, but we will make the mask using the molecular model, and this mask could be re-used if you have have more than one ligand-bound structure of the same protein.

When processing ligand-bound datasets it is quite common that a molecular model of the apo form of the protein has already been built or that a predicted model is available. Sometimes it may also be necessary to do some refinement or model building in order to generate a mask in the precise region that you are interested in for local refinement, and in difficult cases this can be an iterative process.

Open the map from Refinement 5 in ChimeraX and also open PDB:7s8l and fit this into the density.
Create a 20 Å simulated map for chain R (the receptor) and chain A (the ligand) by using the command molmap #X/R/A 20 where X is the model number for the PDB.
Resample the map on the grid of refinement 3 with the command volume resample #Y ongrid #Z Where Y is your molmap model number, and Z is the Refinement 5 model number, and note the threshold at which the volume covers all of the TMD density in your Refinement 3 map.
Save the volume and use the Upload Local Files feature to upload it to your CryoSPARC project directory.
Use an Import 3D Volumes job to import this volume to CryoSPARC
Use a Volume Tools job to create a custom mask. We need to make this somewhat tight to cover the ligand density but not include too much of the detergent micelle, so set Type of output volume: mask, set the threshold that you selected in ChimeraX, and add Dilation radius (pix): 3 and Soft padding width (pix):16. The reason we are adding a dilation radius as well as the extending the soft padding is so that the mask covers the density for the ligand as well as the receptor. Inspect the mask to ensure that it is appropriate: an example is shown in Figure 14 as a guide.
Create a Local Refinement job and input the volume and particles from Refinement 5 and your newly made mask. We are now refining a small volume at around 40 kDa, and having aligned globally, we want to avoid the poses diverging too much, so we will restrict the rotations and translations at each iteration using the following settings:

Parameter

Setting

Use pose/shift gaussian priors during alignment

true

Re-center rotations each iteration

true

Re-center shifts each iteration

true

Rotation search extent (deg)

Shift search extent (A)

Initial lowpass resolution (A)

4.5

Applying gaussian priors means that movements of particles further away from the input position is softly penalised, encouraging a solution close to the input. For small regions (<50 kDa) particle alignment can be tricky, and over-fitting occurs frequently, so restricting the search extents prevents unrealistic divergence from the input refinement. In some cases reducing the searches may also require additional extra passes, but the map should always be inspected for overfitting artefacts such as radial spikes of unexpected density.

We want to lowpass filter the map, but the default value of 15 in the Local Refinement is probably overly harsh, and we can usually select a value where the GSFSC is still ~1. In our case that was at ~ 4 Å but we set initial lowpass resolution: 4.5 to be a bit more conservative.

Although the reported resolution of the Local Refinement is poorer than for the global Refinement 5 (in our case 2.69Å vs 2.41 Å), the density for the TMD is greatly improved as evident by examining the map quality for the transmembrane helices shown in the insets of Figures 14 compared to those in Figure 13 (Refinement 5).

12A) Comparison of local refinement map quality with a simpler processing pipeline

For comparative purposes, we used the same local refinement settings and mask used above on particles that were processed through a simpler route, skipping Sections 3 and 9 and 11; that is, we did not perform micrograph denoising, junk detection, orientation rebalancing, manual curation on the basis of per-particle scale or reference based motion correction. This pipeline ended up with 3M particles compared to our ~1.6M in this case study.

From Figure 15 we can see that if we do not use the more advanced processing steps found in Sections 3, 9 and 11 then the result of local refinement is poorer in quality. The inclusion of poorer quality particles in local refinement is associated with radial plates and streaks of density clearly indicating over-fitting during refinement.

If with this, or your own datasets, even after strict curation of particles you still observe radial density artefacts, this may be exacerbated by refining a small masked volume, or by the presence of a detergent micelle inside the local refinement mask. You can try a different mask, or try further limiting the searches, setting tighter gaussian priors, and limiting the maximum resolution used for alignment. These steps may reduce the reported resolution of the map, but interpretation of an artefact-free map at lower resolution is easier and preferable to one at higher resolution that contains obvious artefacts.

In our good Local Refinement (see Figure 15-4, left), the unsharpened map has sufficient clarity to identify ligand density in the entrance to the TMD on the cytosolic side, however, the ligand density is still relatively poor and the density becomes fragmented after sharpening. A poorer quality of ligand density indicated either partial occupancy, or a mixture of binding poses.

To further improve the ligand density we will go on to locally classify this region to see if we can separate empty/full binding sites or alternative binding poses.

13. 3D Classification

When working with ligand-bound structures it can often be helpful to have an atomic model of the apo structure pre-built to aid in the identification of ligand density. This model can be used to generate difference maps which can help identify the ligand density, especially if the ligand ends up binding in multiple locations, or in a different location that expected. An Alphafold model or structure of a closely related protein or complex refined into the map might be good enough at this stage, but a model of the same protein is even better. Classification of ligand densities can be rather challenging; unless the ligand contains a heavy metal, if binding is not rigid, or is substoichiometric, then the signal from the ligand will often be relatively small compared to the protein complex, and it is important to avoid the introduction of artefacts by using too small of a mask. Most commonly this will manifest as density in the masked area that is either completely absent, or stronger than the rest of the map.

In order to classify the region that contains the ligand, we need to create a mask that covers just this region. To do this, we can start with chain A (the ligand chain) from 7s8l.

In ChimeraX load up your Local Refinement map and updated atomic model and fit the model into the map.

Follow the same process that we used in Section 11, except use just chain A for the Molmap:

Save the volume and transfer it to your CryoSPARC directory.
Use an Import 3D Volumes job to import this volume to CryoSPARC.
Use a Volume Tools job to create the custom mask, setting an appropriate contour threshold, we used 0.0063, and the Type of output volume to mask. Our mask relative to the locally refined TMD is shown in Figure 16.
Create a 3D Classification job (see here for a detailed tutorial) which will be Classification 1, inputting the volume and particles from the Local Refinement and using the following settings:

O-EM batch size (per class)

20000

O-EM learning rate init

Class similarity

Filter resolution

3.5

Setting the class similarity lower than the default of 0.5 means that the density within the mask for each class will be more independent, and increasing the learning rate from the default of 0.4 to 1 allows the volume in each class to be totally replaced at each iteration, allowing for more evolution of the class volumes. The filter resolution is important as this needs to be able to represent the sort of class-to-class differences that you are looking for, but also not to include too higher frequencies that may result in classes with only very minor high frequency differences. It is best to choose the lowest resolution that can still represent the thing that you are interested in, in this case 3.5 Å is a good level to observe ligand density.

In the output you will see some plots indicating statistics from the classification. The example per-particle class size (ESS) plot shown in Figure 16 indicates that most of the images have smeared probability across multiple classes with a mean ESS score of 2.8, which means that on average, each particle might belong to any of 2.8 classes. The class distribution shows all of the classes are equally populated and these two observations together suggest that there either may be continuous heterogeneity (for example flexibility in the ligand binding) or that the signal inside the mask is not sufficient to have high confidence in the classification at the filter resolution.

14. Assessment of 3D classes for manual regrouping

To inspect the class volumes from 3D classification at higher pixel sampling and resolution, next run a Heterogeneous Reconstruction Only job to obtain higher resolution volumes for each of the 10 classes. To visualise the ligand density, we produced difference maps for the 10 classes by subtracting a fitted molmap of the apo receptor PDB structure using ChimeraX.

Download the volume series (not the sharpened one)
Open the 10 unsharpened volumes (not as a series) in a new ChimeraX session
Open 7s8l, fit it into one of the maps and execute the following commands

set bgColor white; graphics silhouettes true; fit #11 inmap #1; lighting soft

molmap #11 /R/B/C/D/E 3

volume subtract #1 #12; volume subtract #2 #12; volume subtract #3 #12; volume subtract #4 #12; volume subtract #5 #12; volume subtract #6 #12; volume subtract #7 #12; volume subtract #8 #12; volume subtract #9 #12; volume subtract #10 #12

surface dust #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 size 5

volume level 0.07176

These steps should generate maps in which only the unmodeled density is now visible, and small specks and noise have been removed. You can inspect each one in turn and decide which classes look the most promising. You might find using the side view function allows a clearer view by removing detergent density from view.

After inspecting the unmodeled density in the 10 class Heterogeneous reconstruction, we identified 4 broad categories of ligand shape that we refer to as poorly ordered (grey), party-ordered (gold), linear ordered (purple) and circular ordered (green). The manually chosen category for each class has been indicated with the corresponding colour of box behind each volume in Figure 15A. Due to differences in particle stacks, masking or 3D classification initialisation, the volumes you observe may be different to those shown above, but we reproducibly found one or more classes with a circular ordered appearance.

Run 4 separate Homogeneous Reconstruction Only jobs; one for the particles that match each major ligand shape by grouping together the classes with similar-looking density.

15. Mask optimisation for resolution estimation

CryoSPARC refinements have in-built masking that automatically tightens the mask at the end of refinement, and this mask is used to calculate the global resolution displayed in the GUI. Map sharpening by applying a global B-factor is also handled automatically in refinement and reconstruction allowing for the output of automatically sharpened maps.

It is always a good idea to inspect the FSC curves and mask, as well as the sharpened map connectivity in case the mask or sharpening B-factor need to be adjusted. We are going to check those out now in Sections 15 and 16.

Open up the locally refined map in ChimeraX, and the mask_fsc_auto. This mask can be found in the outputs tab of the job in the Refined Volume section (Importantly, this is different from the mask provided in the Outputs Group section, which is the final iteration mask before auto-tightening). Set the map at a contour threshold where you observe the features you are interested in, but not including noise, and set the mask threshold to 1. This mask threshold shows the contour at which the mask was binarised, and on top of this there is a soft edge that extends out to the contour observed at a threshold of ~0.
Examine the mask and map together and decide if the mask encompasses the volume of your particle adequately. We found that the auto-tightened mask included some density from the adjacent G-protein (Figure 18, left).
Examine the FSC for the auto-tightened map and compare the Tight and Corrected curves. As described in the mask creation tutorial the difference between these curves can be used as an indicator of mask tightness. In our example in Figure 18 we do see a sudden drop in the corrected curve, suggesting there is density outside of the mask.

A dip in the corrected FSC curve can indicate a mask that is too tight, however this phenomena is common and expected with local refinement, because inevitably there will be some density outside of the masked region of interest that the FSC mask may exclude or cut through

Although this mask is generous enough it also includes density outside of the region of interest, so we will design a new mask that only covers only the receptor and ligand.

Open up the map from Local Refinement in ChimeraX and also open PDB:7s8l or an updated model and fit this into the density.
Create a 15 Å simulated map for chain R (the receptor) and chain A (the ligand) by using the command molmap #X/R/A 15 where X is the model number for the PDB.
Resample the map on the grid of the local refinement map with the command volume resample #Y ongrid #Z Where Y is your molmap model number, and Z is the local refinement model number, and note the threshold at which the volume covers all of the TMD density in your local refinement map.
save the file and transfer and import it to CryoSPARC.

We want this mask to be more generous than the one used for Local Refinement, so we will add a dilation radius.

Create a Volume Tools job and input the newly imported receptor volume. Set the Threshold to the predetermined value (we used 0.13) and set the `Dilation radius (pix) to a few pixels (we used 6).
Create Validation (FSC) jobs for each of the homo reconstructed volumes, using the newly made mask and inspect the FSC curves.

We found that the reconstructed receptor maps had reported FSC=0.143 resolutions of 2.75-2.83 Å .

16. Map sharpening and inspection for model building

At the end of the day, the resolution number is less important the map quality, especially in the region of interest and in this case we are most interested in the ligand density and its binding interactions. Taking into account the estimated resolutions, we can now inspect the sharpened maps and look for map feature consistency. At ~2.8 Å resolution we would expect to be able to see good definition of side chains, some density for the backbone carbonyls, and if it was a soluble region, possibly density for well-resolved water molecules.

Compare the sharpened and unsharpened density from your Homogeneous Reconstruction jobs. You will find that the auto sharpened map has good side chain density, and better definition of backbone carbonyls than the unsharpened map (features that we would expect to see in this resolution range), however it will likely also has more fragmented density for the ligand. Example densities are shown in Figure 19A. We need to strike a balance here because we are mostly interested in the ligand and its binding interactions, and so we would like to manually adjust the sharpening B-factor ourselves to optimise for this region.

Create a Sharpening Tools job for each of the homogeneous reconstructed volumes, and add B-factor to apply that is lower than the value that was previously used. In our case we found that -50 or -60 gave a good balance for the density quality of the receptor and ligand (see Figure 19A).

Having appropriately sharpened our maps, we were able to go on to model the ligand poses for three of the maps and refine each using Phenix (ligand models shown in Figure 19B) to aid comparison to the deposited maps and model.

17. Interpretations and conclusion

Our final maps show density consistent with different binding poses of the ligand Cortistatin-14. By locally refining and classifying, the ligand density is now much clearer in at least two classes than in the deposited map (see Figure 19). Q-scores for the receptor and ligand suggest a fit that is consistent with the reported resolutions.

For the Cortistatin-14 peptide to be circularised, a disulphide bond needs to be formed between the two cysteine residues. Using our workflow here, the presence of a circular class and a linear class indicate that the Cortistatin-14 in this sample is partially reduced (i.e. some of the molecules have their cysteines disulphide bonded and some do not). This means that the disulphide bond that usually keeps it circular in-vivo has been broken by reduction, and we noticed that the reducing agent tris(2-carboxyethyl)phosphine (TCEP) was used upstream during sample preparation. Partial reduction could explain why we have both an ordered-circular (S-S bonded) and ordered-linear (S-S broken) class.

It is always useful to keep in mind the biology and chemistry of the sample that was applied to the grid. You may observe something unexpected in the density and knowing more about your sample can help determine if the feature is likely to be real, or an artefact.

We were unable to build a satisfactory model into the density for the poorly ordered class, and there may be residual heterogeneity in terms of ligand binding poses within the particles used for this map. Examining our three ligand models, we can see that the register in our model is different to that of the original model 7s8l (see Movie 1). We were able to confidently assign the register due to the bulky side chain residues, and in particular the tryptophan.

Cortistatin-14 binds to the cytoplasmic side of the MRGPRX2 receptor and displays a variety of ligand binding poses. The N-terminus of Cortistain-14 is in the same location in all three cases, and this is likely the important region for binding affinity. The C-terminal portion of the ligand, however, is exposed to the cytoplasmic space and therefore has the capability of forming different arrangements.

We wanted to note that if we used the overfitted local refinement in Section 12B for classification using the same settings and mask as used in Section 13, we were unable to find a class with circular ligand density, even though the ligand appeared circular in the parent map. This emphasizes the importance of data quality when performing both local refinement and local classification, and in this case study, using some of the new features recently added in CryoSPARC allowed improved interpretation of the data.

You can download our versions of the final maps, half maps and masks from the links below for comparison with your own processing!

consensus

TMD local refinement

TMD reconstruct circular ligand