Case Study: End-to-end processing of an inactive GPCR (EMPIAR-10668)

Processing EMPIAR-10668 including Local Refinement, 3D Variability Analysis, 3D Flex and classification of continuous heterogeneity.

Introduction

In this case study we will work step-by-step through the full processing pipeline for a G-protein coupled receptor without any ligand, G-protein or antibody fragments bound, using a dataset originally processed by Josephs et. al, deposited as EMDB:22962 and PDB:7knt. The raw data are publicly available for download as EMPIAR-10668, and this case study is written so that you can replicate these results yourself.

We selected this dataset to provide an example processing pipeline for a challenging small asymmetric membrane protein target that displays flexibility in a relatively small region. In particular, this case study provides strategies and ideas about how you might overcome challenges with particle crowding, improving poor density in regions of high flexibility, and separating different substates from a continuous motion. The image processing was performed using CryoSPARC v4.4 with additional notes about more recent software updates that are relevant to the processes described.

The Calcitonin receptor-like receptor (CLR) is a member of the G-protein coupled receptor (GPCR) protein family that are characterised by their 7 alpha helical transmembrane helices. GPCRs are predominantly found on the cell surface and are important in cell signalling and as therapeutic targets. When CLR binds to Receptor activity-modifying protein 1 (RAMP1) it forms a complex that has a mass of 73 kDa, known as the Calcitonin-gene related peptide (CGRP) receptor. In this case we know the mass because there is already a PDB model available for this complex, but the mass could also be ascertained from protein sequences (for example in UniProt) or by analysing a purified sample by mass spectrometry. GPCR monomers are non-symmetric membrane protein complexes, and from the structures obtained in Josephs et al., we learn that this particular receptor not only contains a transmembrane domain (TMD) but also an extracellular domain (ECD) that extends outside of the membrane. It is often useful to know a bit about the biochemistry of your sample to help interpret what you observe during data processing, for example the fact that the protein used for EMPIAR-10668 was prepared in a detergent micelle containing lauryl maltose neopentyl glycol and cholesteryl hemisuccinate.

The primary aim of this pipeline is to generate the highest quality map for model-building (sections 1-15), and we include steps that can be taken to improve the resolution of the transmembrane domain (section 17), to visualise the motions of the GPCR (sections 16, and 18), and to separate different conformational states (section 19).

Setting up

Before beginning this tutorial, you should create a new project and a workspace within that project. Download the 5,634 movies and single gain reference to a location of your choosing. For example, our data is downloaded to a directory called rawdata using the commands:

cd /path/to/rawdata
wget -m ftp://ftp.ebi.ac.uk/empiar/world_availability/10668 .

1. Movie import and preprocessing

  • Next, import the data using an Import Movies job. The Movies data path should match the location of the directory containing the downloaded .tif files, for example: /path/to/rawdata/ftp.ebi.ac.uk/empiar/world_availability/10668/data/PS1112.RAMP1-CLR.1/*.tif. The Gain reference path should lead to RAMP1-CLR_gain.mrc. Experimental information such as pixel size, total electron dose / A^2 and voltage are available in the EMDB-22962 entry.

Flip gain ref & defect file in Y?Yes

raw pixel size (A)

0.83

Accelerative Voltage (kV)

300

Spherical Aberration (mm)

2.7

Total exposure dose (e/A^2)

57.3

In Patch Motion, if you switch on save results in 16-bit floating point then the output images will take up half of the disk space compared to the default 32-bit floating point files, with no reported loss in accuracy. The Number of GPUs to parallelize can be set depending on GPU availability, and in our hands the job to took ~14 hours on a single GPU.

Patch CTF can be run with the default settings.

2. Excluding poor micrographs from downstream processing

During data collection at the electron microscope we try to optimise for good data, but it is rare that every single movie in a dataset is perfect! Movies can be afflicted by many maladies including non-vitreous ice and contamination with ice crystals or other substances, excessive in-movie motion and ice that is too thick or too thin for your sample. If you pick particles from poor micrographs it can take some effort to remove the particles that are of low value and will not contribute to the final resolution, so it is easier to exclude the most offensive micrographs at the start, by looking at the statistics generated by the Patch Motion and Patch CTF jobs.

  • We will inspect the CTF-estimated micrograph statistics using a Manually Curate Exposures job so that we can exclude images of poor quality from downstream processing. Input the “Micrographs processed” from the Patch CTF job and queue the job. In the job card view go to the pink Interactive Tab (this may take a few minutes to appear while the image statistics are being loaded).

You may notice that many of the images have a circle near the centre (See Figure 1, left); this is where the ice layer is too thin for the protein to fit inside. This is okay and is not a reason to exclude images.

Using the sliders we can select the upper and lower bounds for each parameter to exclude outliers, and we can browse thumbnails and diagnostics of the images to check that the excluded ones are indeed poor quality, and that the included ones look good. For each parameter, try moving the slider or typing values and click set thresholdto see how many images have been excluded. For example, in the CTF fit resolution (A) the vast majority of the images here are < 3.5 Å, and setting the limit at 3 Å excludes 10% of the data. The other statistics look pretty consistent across this dataset, but if your data is more variable, it may be worth also setting thresholds, for example, for ice thickness (to remove very thick ice images) and total full frame motion distance (to remove images with no particles or large regions containing contaminants. It also worth looking at the other statistics available to look for outliers from the trends. Other unhelpful images may manifest outlier CTF fits that can be identified by their astigmatism or defocus values.

Although we want to exclude egregious junk images, we don’t want to exclude potentially useful ones along with them. This is a nice clean dataset so in this case rejecting 10% of the data overall seems proportionate. In a more unfortunate dataset it may be necessary to remove a considerably higher fraction so that the downstream junk-removal classifications are more streamlined. Ultimately your goal will be to pick enough particles to obtain a map at a resolution that shows your features of interest (which can be project and dataset specific). With that goal in mind, for smaller datasets you may prefer to be more generous with your curation and keep images with statistics that you consider borderline, and for larger datasets you might wish to curate more strictly.

In the above example, the CTF fit, motion and thon rings for the blue image (left) look good, indicating a high quality image of protein in thin ice. For the red one (right), the CTF fit is very poor (the correlation drops to zero quickly) and motion estimations are very large and erratic, due to the presence of large hexagonal ice crystals and ethane contamination in the image.

In our processing, we were left with 5,290 accepted micrographs.

3. Blob-based particle picking

We already know the CGRPR was prepared in a detergent micelle, and that it has a mass of 73 kDa from the PDB entry for this dataset from Josephs et al. We need to estimate the approximate particle diameter (including the detergent micelle) in order to go ahead with particle picking.

Suitable diameters for picking of other datasets can be estimated from the “ruler” in the Curate Exposures job, or by measuring the dimensions of homologous or similar deposited structures or AlphaFold models using ChimeraX. The diameter does not need to be exact, but it can be beneficial to set the diameter ~10% larger than the estimated diameter of the particle to account for imperfect estimation, and the effects of defocus in the images.

  • Using your accepted micrograph images, run a Blob Picker job with a minimum particle diameter (A) 90, and Maximum particle diameter (A) 130.

Once the job is complete:

  • Run an Inspect Picks job and look at the first few micrographs. Move the power score sliders until the lower bound removes thin ice picks except for obvious items, and bring the higher bound down until just before you see good-looking picks being discarded (see Figure 2 for an example).

Inspect micrographs of relatively low and high defocus (these can be selected from the table) to adjust the power sliders as appropriate.

Many of these micrographs are packed with protein but some have patches without particles, or with damaged protein at the edges of very thin ice. Be sure to adjust the NCC score so that these areas are not being picked (see Figure 2). You want to select as many particles as you can without picking obvious junk because classifying junk out is not always easy.

You can expect to select ~3-4 million particles by blob picking this dataset.

4. Blob pick extraction and 2D classification

We expect the CGRP receptor protein to have a diameter of ~ 100 Å, and the dataset was recorded with a defocus range extending up to around -1.5 µm.

As the defocus is increased, the higher frequency components from particles are delocalised further out in real space due to the point spread function of the microscope. If too small a box is selected for extraction, some information about your particle is lost, and this may limit the obtainable resolution. Conversely using an excessively large a box can lead to a lot of noise in the images, and this can also have a negative effect on the resolution of your reconstruction. As a very rough rule, a box of ~1.5-2.5 x the diameter of your particle is often appropriate, however very high resolution data or data collected with high defocus may require a larger box. The box size must be an even integer of pixels, and it is best if you choose or downsample to a box size that is computationally efficient.

  • Extract the particles using the Extract from Micrographs job with a box of 288 pixels (240 Å) and Fourier crop them down to 64 pixels because Fourier cropping for 2D classification saves on disk space and speeds up the job. This cropping in Fourier space downsamples images in the same way as the Downsample Particles job, so that while the box extent in Å is kept the same, the pixel size is larger and the Nyquist limit (the highest achievable resolution) is lower. However, we can re-extract at the full size later once we have removed junk. Select Save results in 16-bit floating point to save on the disk space required. Expect the number of extracted particles to be lower than the number previously selected, because CryoSPARC rejects particles where any part of the box is outside of the micrograph.

  • Add these particles into a 2D Classification job with the following settings:

Number of 2D classes

200

Maximum resolution (A)

12

initial classification uncertainty factor

1

We increased the number of classes from the default value because we have a lot of particles and want to capture classes of different particle orientations and junk types, but the exact number of classes is not crucial. In order to speed up the classification, and reduce the risk of overfitting, we set the resolution at a limit where we would expect to be able to discern visually between particles and junk, and between different viewing directions.

The initial classification uncertainty factor influences whether a more diverse set of particle views, or a more diverse set of junk classes are identified. A lower value (such as 1) may increase the number of different junk classes by rapidly becoming certain about class assignments. This can be particularly useful when the preparation contains largely one type of good particle, rather than a mixture of different particles types. If the particle set is cleaner or contains multiple different particle species, it can be more beneficial to use a value of ~2-10 to obtain a greater spread of good particle classes.

  • After 2D Classification, create a Select 2D Classes job and connect the outputs from the 2D Classification job. Example class averages from our Select 2D Classes job are shown in Figure 3 below.

The majority of the class averages have easily recognisable ovoid density for the detergent micelle (an example is indicated in green in Figure 3), with a small, asymmetric protein domain protruding out on one side. This is what we consider a membrane protein “side view” (see Figure 3) where we are viewing the particle from the plane of the hydrophobic phase (micelle or liposome). It is quite common for these side views to occur preferentially in micrographs due to favourable interactions with the air/water or support/water interface. On the other hand, “top views” (observing particles from above the hydrophobic phase at ~90 degrees from the side views - see Figure 3) can be challenging to find for some samples such as GPCRs. We observe and select some views where the protruding domain is not as easily visible outside of the ovoid micelle, and it is important to retain such classes for downstream processes of template picking and ultimately good sampling of diverse orientations of your particle for 3D reconstruction.

You will have noted that the protein is quite tightly packed in the micrograph images (see Figure 2), and so we might expect to see some class averages where neighbouring particles have formed loose interactions to one another such as the example highlighted in yellow, where there appears to be an ordered, or semi-ordered interaction between two adjacent particles. This is an important feature of this dataset that we need to keep in mind during the processing pipeline. In the “Examples of selected classes” in Figure 3, note that some of the class averages are also not well-centred due to additional density from adjacent particles.

5. Ab-initio Reconstruction, Heterogeneous Refinement and 2D Classification to produce clean 2D templates

Although the 2D classes from blob picks look like our target particle, these picks could be off-centred because the picker doesn’t know about the shape of our particle, and for the same reason, some good particles may have been missed. In order to improve picks we can use 2D class averages as templates for template-based picking, but we can first take a few more steps to make the 2D templates a bit better in terms of clarity and diversity as this will improve the quality of template picks.

  • Run an Ab-Initio Reconstruction job with Number of classes set to 3, using only 20,000 particles (Number of particles to use) and setting the class similarity to 0. Setting the class similarity to 0 is not crucial but may help to disentangle volumes that are very different from one another. Once the job is complete, you can assess the volumes for each class by selecting the Volumes tab within the job, and inspect each volume one by one. You should find 1-2 of the volumes that resemble a GPCR in a micelle, and possibly one that looks like a junk class.

  • Next, run a Heterogeneous Refinement job inputting all of the good particles from Select 2D in section 4. Input the best and the worst ab-initio models that you obtained above to classify into 2 classes and set Refinement box size (voxels) to 64.

  • Since we know the order of the input volumes, we can already predict which is going to be the good class from Heterogeneous Refinement, so we can queue a 2D Classification job using those particles as an input. This time we are more interested in separating different views than identifying junk, so set Initial classification uncertainty factor to 4 and also Number of 2D classes to 50, and Maximum resolution (A) to 12. We saw some signal from neighbouring particles in some of the initial 2D class averages (Figure 3) causing the classes to be off-centre. This is not ideal as picking with such a template would also cause the particle to not be in the centre of the box, and may cut off of some important delocalised signal. To help with this we can also add a Circular mask diameter (A) 150 in this 2D Classification job. The chosen diameter should comfortably contain the particle including the micelle from any viewing direction.

The 2D classes in Figure 4 are now a better quality for template picking than those in Figure 3, and encompass a number of different viewing directions of the protein.

6. Template particle picking, extraction and duplicate removal

  • Create a Template Picker job and connect up the good micrographs from the Curate Exposures job, and the good 2D class averages (see Figure 4). Set Particle diameter (A) to 150.

  • Once the job has finished, use an Inspect Picks job to adjust the power and ncc slider, as we did in section 3. Again, we might expect around 3-4 million particles to be picked.

Comparing Figures 2 and 5 we can see that now fewer junk particles with high contrast are being picked.

  • Then, extract the particles in the same manner as section 4 (using Extract from Micrographs).

We want to combine the blob and template picks to ensure that we did not miss any particles, but we do not want to have duplicate images of the same particles.

  • Run a Remove Duplicates Particles job with the blob pick and template picked extracted particles and set Minimum separation distance (A) to 50. This value is ~30% of the longest diameter for this particle (~150Å) and may be appropriate when the micrographs are crowded. Selection for this value will depend on the shape of your particle, and how crowded the images are. Some trial and error may be needed to find a separation distance where most or all of the visible particles are picked only once. If two or more particles are found within 50 Å of each other, the ones with the poorest NCC score will be removed. You can expect the accepted particles to amount to ~4 Million.

7. Creating 3D volume templates to separate junk and good particles

We now want to sort the particles into junk and good particles for our ongoing processing work and we will do this sorting in 3D. The reason we aren’t using 2D Classification at this stage, is that it is not always that easy to distinguish between junk and good particles in 2D class averages, especially for small proteins, less populated views that have fewer features or lower signal-to-noise and may appear blurry. If good particles are excluded in “junk” 2D classes at early stages of processing, this can introduce bias that leads to loss of certain views of your target, potentially compromising the reliability and clarity of later 3D reconstructions.

We are therefore going to separate all of the (non-duplicated) picked particles using Heterogeneous Refinement, and skip 2D classification for the initial junk cleaning of our template picks. We will use the accepted and rejected particle sets from Select 2D in section 4 to generate a good reference volume from the accepted classes (already completed in section 5) and junk volumes from the rejected particles. The junk volumes are important as they will accumulate the junk particles during the refinement, leaving a cleaner particle stack associated with the good volume.

  • We already made a nice GPCR 3D volume in the heterogeneous refine job in section 5, however we didn’t make many junk volumes. To do this, we can run an Ab-Initio Reconstruction job with 5 classes but using only the excluded classes from the initial Select 2D job. This job should produce some decent junk class volumes. If you find that one or more of these volumes also looks convincingly like the GPCR, then you can exclude these volumes from downstream processing.

  • Now we can run Heterogeneous Refinement on all of the non-duplicated extracted particles from the end of section 6, using all of the 5 new junk 3D volumes and the one good volume from Heterogeneous Refinement in section 5. This job may take a few hours because despite the small box size there are a lot of particles. Once the job is complete, you will hopefully see in the Volumes tab that the most populated class is the one with the best-looking GPCR density (although it is possible due to the stochastic nature of the refinement that you will end up with two or even more classes than appear equally good). You might also find a class where the GPCR is in a much larger micelle. Here we are going to focus on the monomeric GPCR in order to achieve high resolution and good density for model-building, and so we will choose to discard the particles associate with this class by not inputting them into downstream rounds of Heterogeneous Refinement (described below Figure 6). This class may represent a membrane-associated dimer of the GPCR, or the GPCR with a different membrane-associated binding protein.

In Figure 6, we see that the FSC curves do not reach zero; you might barely see any lines at all! This is because we have Fourier cropped the particle images to enforce a physical Nyquist limit of ~8 Å, so information beyond that resolution has been excluded during the initial junk cleaning steps, and the half map FSC almost up until ~8 Å is 1.

Sometimes, as here, a single round of Heterogeneous Refinement is not sufficient to remove all of the junk particles, and this can be tested by running sequential heterogeneous refinement jobs (each time inputting the particles from the good class from the previous job). You should generally see that the percentage of particles ending up in your good class increases each time. While you can be as strict or relaxed as you like with this process, we find that once a Heterogeneous Refinement job is only removing ~5% junk, additional rounds may be of limited benefit. Additionally, if the map in the “good” class shows streaking in one direction, this can sometimes be indicative of junk aligning to one viewing direction, suggesting more classification in 3D or 2D may be required.

  • Take the particles from the good volume class(es), and input these to a new Heterogeneous Refinement job with the same settings for a second round of junk removal.

  • Take the particles from the good volume class(es), and input these to a new Heterogeneous Refinement job with the same settings for a third round of junk removal.

  • If you are still not satisfied that the particle set is clean you can perform an additional round of heterogeneous refinement.

We did 3 rounds of heterogeneous refinement and were left with 746,495 particles.

8. Re-extraction at the full box size and Homogeneous Refinement

Take the particles from the good classes at the end of section 7 (which we expect to amount to ~ 700-900k particles).

  • As we have now aligned them in 3D we should remove any duplicates based on their updated shifts by using a Remove Duplicate Particles job with Minimum separation distance (A) set to 50.

  • Re-extract the non-duplicated particles with a box size of 288 using Extract from Micrographs, without applying any Fourier cropping. We re-extract at this stage for three reasons:

    1. To extend the achievable resolution limit (nyquist frequency) on the particle images.

    2. To extract with the particles better centred in the box (according to their alignment in Heterogeneous Refinement).

    3. To make a smaller stack of particles so that the entire ~ 3-4 million original particle images do not need to be read and cached for every downstream job.

  • Input the particles into a Homogeneous Refinement job (Refinement 1) using one of the good GPCR 3D volumes from Heterogeneous refinement with default settings.

The reported resolution should be around 2.7-3 Å. Examine the output unsharpened and sharpened map in the Volumes tab and look at the real space mask and viewing direction plot.

In Figure 7, we see in the real space map and mask slices, the shape of the GPCR addition that we expected, but there is some additional density outside of the main protein, and the auto mask included this density.

When assessing 3D reconstructions, here are a few non-ideal situations to look out for!

  1. Unexpected map stretching or streaking (unexpected parallel features) in just one direction.

  2. Fragmented density at the edges or in specific regions of an otherwise nice map

  3. Noise or spiky features surrounding or extending from your map

  4. Poor orientation diagnostics (cFAR <0.5, SCF <0.8)

These tricky and non-ideal situations could indicate issues such as preferred orientations in the selected particles (or in the raw data), small particle size, heterogeneity in the particle conformation, difficulty aligning the particles to the reference, or interference from low resolution features such as detergent, lipids, or loosely interacting neighbour molecules. With the exception of preparatory orientation bias in the raw data, these these obstacles can often be overcome by further processing steps.

In this case, picking with templates and blobs appears to have captured a good range of particle views shown in the viewing direction distribution plot, the SCF score is high at 0.867 but the cFAR score is only 0.27 (Figure 7). This combination suggests that there is no concerning preferred orientation that might benefit from other picking methods such as TOPAZ, but that there may be some difficulty with aligning the particles. Although the global resolution is ~3 Å, the conical FSC plots display considerable resolution anisotropy: in certain viewing directions the resolution is much poorer. This situation could be caused for example by problems with aligning the particles in certain views. In the case of this dataset we suspect that loosely associated neighbouring particles such as those seen in the highlighted rejected class in Figure 4 are interfering with the alignment of our central particles.

The sharpened map in this example contains more noisy density from the detergent micelle, and the protein density looks somewhat fragmented compared to the unsharpened map.

Observing fragmented density only after sharpening typically indicates that too large of a sharpening B-factor was applied for the region of interest. This is more likely to happen if the mask used to automatically calculate the B-factor includes very low resolution regions such as the detergent micelle or partly ordered interaction interfaces between particles.

9. Checking and correcting handedness of the map

We initialised the volume for this project from an Ab-inito model, and there is a 50% chance that the map has the wrong hand. Checking the handedness of the map at an early stage of the processing is a good idea, it can avoid frustration and repetitive map flipping steps later on in the pipeline. As there is already a PDB for this protein, you can download try to fit the model 7knt into your map in ChimeraX. In the cases where a model (whether identical or homologous) is not available, and the resolution is sufficient, you can assess the handedness of your map by ensuring that the density of the alpha helical regions are right handed.

If the model only fits when you apply the ChimeraX function volume flip to your map, then perform the following step in CryoSPARC:

  • Create a Volume Tools job, inputting the volume from your Homogeneous Refinement and select Flip hand. The output volume can be now be used to initialise your next refinement and all downstream processing will have the correct hand.

10. Non-Uniform Refinement to help with loosely interacting neighbouring particles

From Refinement 1 in section 8, it looks like the proximity and flexible association of neighbouring particles may be causing problems in homogeneous refinement and leading to some unwanted smeared density in the map. We have different options for how to potentially overcome this:

  1. Classify the particles in 2D or 3D to remove those that have formed a semi-ordered interaction in the extra-cellular domain (ECD)

Our testing on this dataset found that neither 2D nor 3D classification was able to cleanly separate classes with and without co-joined particles, and so it is possible that a large number of the population have this sort of interaction due to the high concentration in the sample.

  1. Apply a static mask during refinement that excludes the neighbouring particle

Applying a static mask that excludes the additional density might be beneficial, however if the mask is too tight or too loose then the resulting resolution may be poorer so we can consider this as a last resort method.

  1. Use non-uniform refinement

We will reduce the extent of this interaction’s influence on the reconstruction by using non-uniform regularisation. This is particularly beneficial for membrane protein particles, where it avoids pose assignment from being dominated by alignment to the detergent micelle, and can also help minimise alignment to other low resolution features.

  1. Subtract the particles to remove density from neighbouring particles and locally refine.

Particle subtraction and local refinement can be performed later on in the processing pipeline.

  • Run a Non-Uniform Refinement job (Refinement 2) using the particles and default settings and also run an Orientation Diagnostics job.

You will notice that the resolution is much better with Non-Uniform Refinement than with Homogeneous Refinement, and as the iterations progress, the influence of the neighbouring particles becomes less apparent, as evident in the Real space mask slices (see Figure 9).

Orientation Diagnostics shows us that even though the viewing direction distribution and SCF appear largely unchanged, we now have a substantially better cFAR score at 0.56, and the conical FSCs show a far smaller range of resolutions, indicating that particle alignment has been very much improved by the addition of Non-Uniform regularisation and / or adaptive marginalisation, and the pathological conical FSCs observed after Homogeneous Refinement were indeed not a reflection on the orientational sampling of the picked particles.

The density of the transmembrane helices of the receptor look good quality in the sharpened map, however, in the ECD the density appears somewhat fragmented, indicating a lower resolution in this region. Considering the overall architecture of the protein seen in the introductory figure it seems likely that poorer density in the ECD is due to its connection to the TMD by a flexible linker, and we will investigate this possibility in sections 16 and 18.

11. Global CTF refinement

These particles are quite small, and will therefore have a relatively low signal-to-noise compared to a larger particle. Although the resolution is < 3Å there is no guarantee that Local CTF refinement of defocus values, or Global CTF refinement will improve the CTF fit. We are going to test the effect of refining anisotropic magnification and beam tilt to see if they improve the reconstruction resolution and map quality. We could run a single job refining both anisotropic magnification and tilt together, however if the fit to one or the other does not look promising, we will need to re-run the job without that parameter selected. We will instead run two consecutive jobs for illustrative purposes, although the results via either route should be very similar.

  • Make a Global CTF Refinement job and connect up the particles and map and mask from the Non-Uniform Refinement job in section 9. We can see from the real space slices in Figure 7 that the refinement mask includes the micelle and a little bit of the extra density. We will instead use the FSC auto-tightened mask.

To change the mask taken from a refinement job, you can alter what is called the ‘low level’ information for the mask volume slot. To do this, open the low level information in the mask field that you already loaded into the Global CTF job cart (by clicking the toggle arrow) then go to the Outputs tab of the Non-Uniform Refinement job and drag over the slot called .volume.mask_fsc_auto.F into the mask field for the Global CTF job.

  • When running Global CTF Refinement we can choose at the start which fits we want to include. First we will just assess anisotropic magnification, so use the sliders to turn off Fit tilt and Fit Trefoil and to turn on Fit Anisotropic Mag. and run the job. The output data plots will show the data, fit and residuals (remaining data after correction), and you can assess if the fit matches the data well, and if the signal in the data is strong enough to trust the fit.

In Figure 10 we see the anisotropic magnification data contains blue and red on opposing sides of the plot. See the Tutorial on CTF refinement for a detailed description of how to interpret the plots, but essentially the brighter the red and blue sides are in the data, the more reliable the fit is likely to be. Here the residuals display less extreme blue and red after fitting, so it looks like correcting for anisotropic magnification may help a little for this dataset.

  • Take these particles and enter them into a second Global CTF Refinement job, this time turning off Fit Trefoil and Fit Anisotropic Mag. We have already estimated the anisotropic magnification in the previous job, and at a resolution of ~3 Å, adding the fit for trefoil is not always helpful. You could chose instead to re-run the first Global CTF Refinement job with Tilt and Anisotropic Magnification fit enabled, and this would give you essentially the same result. When the job is complete we can then assess the beam tilt fit based on the plots.

Similarly to the case for anisotropic magnification, we are looking for the fit to resemble the red and blue regions in the data plots, and for the residuals have weaker red and blue areas (See Figure 8, right). In this case it looks like the tilt fitting may also be beneficial. If the red/blue areas are not easily discernible by eye or if the residuals do not look better than the data, attempting to correct for the parameter being fitted is unlikely to be helpful.

12. Optimising masking strategy for Non-Uniform Refinement

In the last Non-Uniform Refinement, although it was an improvement on Homogeneous Refinement, we still found that dynamic masking was including the density from neighbouring particles (see Figures 9 & 10), and this may be affecting the central particle alignment. We can try to address this issue now by changing the masking strategy during refinement.

  • Generate a static mask (see guide page on mask creation) that excludes the additional density by downloading the solvent mask from your last Non-Uniform refinement, setting the threshold to where the extra density is clearly visible, and use the Volume eraser tool in ChimeraX to remove this density from the mask (see Figure 10).

  • Save and import this new mask volume using an Import Volumes job in your CryoSPARC project. In CryoSPARC v4.5+ you can achieve this very simply by using the Upload Local Files feature directly through the browser.

We would like to make this mask a little more generous and smoother so it is not excessively tight for early iterations where the resolution is poorer.

  • Using a Volume Tools job, input this mask, set Type of input volume to mask and Type of output volume to mask, Lowpass filter (A) to 15 as well as setting the binarisation Threshold, in our case 0.95 was appropriate, and set dilation radius (pix) to 6.

This manually created mask shown in Figure 11 is intentionally generous, and more generous than the the final iteration mask from the refinement in section 10. This is because, as a static mask, it will be used from the start of the refinement that begins at low resolution, and applying a too-tight mask can lead to map/mask interactions causing artefacts building up during refinement.

  • Next run a second Non-Uniform Refinement job (Refinement 3) using the CTF refined particles, and use the slider to set Optimize per-group CTF params on and Fit Trefoil off. If you find that the resolution is poorer that before CTF refinement then use the previous refinement for the next section. This refinement will use dynamic masking with default parameters.

  • Run a Non-Uniform Refinement (Refinement 4) with the same settings as for the dynamic mask refinement, but apply the new static mask that you just created.

  • Run a third Non-Uniform Refinement (Refinement 5) without supplying a mask and setting Dynamic mask start resolution to 1 Å - this effectively turns off masking during refinement, except for a soft spherical mask.

When we look at the results from the three refinements we can see that the resolution and map quality is not the same.

The best resolution and best quality of map was found here when no masking was applied during the refinement. In the maps with masking applied we can see less continuous density in the ECD of the receptor (indicated in the image for Refinement 3 in Figure 12), and more noisy density around the transmembrane domain, compared to the refinement without a mask.

The density for the transmembrane regions are very nice, and although we improved the density in the ECD somewhat by removing masking during Non-Uniform refinement, inspection of the maps so far show that the density for the extracellular domain is still rather fragmented, especially after sharpening. This poorer density could be a result of:

a) less of the delocalised signal from the ECD being within the box (as this domain is closer to the box edge)

b) centre-of-mass for the refinement being in the transmembrane region.

c) flexible heterogeneity

Although flexible heterogeneity is possible (and likely), resolving different classes and motions is likely to be more successful if the input particle alignments and map are the highest quality possible first. For this reason, we will first address points a) and b) from above, and refine the consensus map to the best quality we can before looking at heterogeneity.

13. Improving poor density in the extra-cellular domain by re-centering and re-extracting

We next are going to reduce the risk that delocalised signal from the ECD of the particle is being cut off by the box edge, and also encourage the particle alignment during refinement to be weighted more towards the ECD:

  • Open the best Non-Uniform Refinement map in ChimeraX and use the Markers → Surface tool to place a marker in the centre of the extra-cellular domain of your map. In the ChimeraX log, the coordinates of this marker will be listed (in Å) and you can convert this to pixels by dividing each number by the map’s pixel size (in this case 0.83).

  • Create a Volume Alignment Tools job, input your map and particles from your Non-Uniform Refinement job and set the 3D Coordinates of new center (voxels) to the values ascertained above.

  • Perform an Extract from Micrographs job using these aligned particles, and with an extraction box size of 320. We make the box size bigger at this stage to avoid cutting off delocalised signal from the transmembrane region after re-centering the particles.

  • Run a Non-Uniform Refinement (Refinement 6) using the re-extracted particles, with the same settings as your best refinement from section 12, and compare the reported resolution, map quality and connectivity in the extra-cellular domain.

You will notice that the reported resolution is poorer now than before re-extraction, however the density in the ECD is considerably better. Our aim is to generate the best map for model-building and interpretation, so a more complete map is more important than the reported resolution. If the main area of interest was in ligands, ions or waters that bind to the outside or inside of the transmembrane region, then the particles before volume alignment will yield higher resolution information in that region.

In Refinement 6, alongside improved density for the extra-cellular domain, we also observed some strong extra density lines in the map that are easily observed at low contour thresholds if the map is lowpass filtered for example to 6 Å. As the quality of density in the ECD appeared better than when a mask was applied, we will compromise on the fact that this map has some low resolution additional density features for now, and will remove them at a later stage using particle subtraction.

14. Reference-Based Motion Correction (Optional)

At the start of the processing pipeline we corrected for motion in patches of the micrographs, and applied dose-weighting according to the dose rate for data collection. Under illumination of the grid in the microscope, the motion can be more complex than the rigid patches can describe, and the rate of high resolution information degradation by radiation damage can vary subtly from sample to sample. In CryoSPARC we can try to correct for these aspects by using Reference-Based Motion Correction (RBMC). Benefits of using RBMC in some datasets include improved map quality and FSC curves (often increasing the resolution at the FSC=0.143 cut-off), but not all datasets benefit substantially. Your choice to use RBMC may depend on time and computing resource constraints, and on the relative importance of speed vs pushing the resolution.

Here, we tested RBMC, however this did not produce a measurable improvement in map quality or resolution for this dataset (see Figure 14) and you may choose to skip this section and move on to section 15 using the outputs from section 13.

RBMC will routinely use the solvent mask from the input refinement, but as the refinement run in section 13 did not use masking, the only available mask from there is the auto-tightened one. We will extend the auto-tightened mask slightly for RBMC.

  • To make the extended mask, create a Volume Tools job, and input the Mask from Non-Uniform Refinement. In the low level information slot for the input mask, drag over the volume.mask_fsc_auto.F from the outputs tab. Set the Type of input volume and Type of output volume to mask, Threshold to 1, Dilation radius (pix) to 1 and Soft padding width (pix) to 15.

  • Run a Reference-Based Motion Correction job using the Non-Uniform Refinement inputs (particles, map and mask) from section 13 and curated micrographs from section 2. Change the low level information slot for the input mask, drag over the volume.mask_fsc_auto.F. Set Save results in 16-bit floating point on. Use more GPUs to accelerate the job if you have the resources.

In the job log you may see a recommendation to use per-particle scales in your input refinement, however in cases where refining with per-particle scales worsens the resolution of the map, you can expect the results from Reference-Based Motion Correction to be poorer, and it is OK to go ahead and run Reference-Based Motion Correction without the scale information. In our tests we found including per-particle scales was detrimental to the map resolution for this data.

This dataset had 70 frames per movie, and we would expect most of the high frequency information to be found in the first frames, before the sample has undergone radiation damage. In Figure 14 below, we see that the majority of the weighting (dark red colour) for high frequency information is in the early frames, but there is also an apparent reappearance of high frequency information in the last frames which is not expected. During sample collection we expect that radiation damage diminishes the high frequency information as acquisition progresses, and the phenomena found in Figure 14 is possibly an unfortunate overfitting of the data that can limit the usefulness of RBMC in this case. Sometimes removing the late frames (by first re-running Patch Motion Correction, Patch CTF Estimation and Reassign Particles to Micrographs) helps with this, but unfortunately it did not for this dataset.

By examining the particle trajectories we see some residual motion for particles has been modelled. The ~circular gap where there are no particles correlate to a region of very thin ice that contained no particles.

The output particles can be NU-refined (Refinement 7) with the same settings as for Refinement 6. In our case we did not see any noticeable improvement in map quality or FSC.

15. Particle subtraction, consensus refinement and sharpening

The maps from refinements in sections 13 and 14 still contain features outside of the central particles that are likely stemming from adjacent particles, and are affecting alignment. We can mitigate this effect by making a generous mask around the GPCR so that we can subtract away most of the surrounding, unwanted density.

We need a mask without high frequency features, and we want to extend the mask to ensure we are not cutting off valuable signal required for alignment and reconstruction. Generating a custom mask can be achieved a number of ways, but here we will lowpass filter, extend and invert an existing mask.

  • Make a Volume Tools job and input the mask (drag over the volume.mask_fsc_auto.F.) from your last Non-Uniform Refinement job in section 13. Set Lowpass filter (A) 6 Type of input volume and Type of output volume to mask, Invert mask on, set the Threshold to the value you wish, along with Dilation radius (pix) 10 and Soft padding width (pix) 15.

Inspect the output mask in ChimeraX by comparing it to your refined volume, using the “Side View” function to cut into the mask to see the GPCR-shaped hole in the centre.

If you think that the mask looks too tight or too generous, adjust the Threshold and Dilation radius (pix) until you are happy. In the previous refinement, particle alignment may not be that accurate due to the influence of neighbouring particles, and so we want this mask to be generous enough to capture the whole protein even from poorly aligned images, without including the density from adjacent particles.

  • Make a Particle Subtraction job and input the particles from your latest Non-Uniform Refinement, as well as the inverted mask that you just created.

As we have now Reference Motion-corrected and subtracted the particles, we expect the particle alignment to be more accurate, so even though the particles are of a small size at 73 kDa, it is worth testing if the combination of Fit trefoil on, Fit Spherical Aberration on, and Minimize over per-particle scale on, improve the resolution and map quality.

  • Run a Non-Uniform Refinement with the above settings (Fit trefoil on, Fit Spherical Aberration on, and Minimize over per-particle scale on) (Refinement 8).

  • Run Orientation Diagnostics on the maps before and after subtraction.

Compare the map quality, reported resolution, FSC curves and orientation diagnostics.

We found that the highest resolution was found when including tilt, trefoil and spherical aberration refinement, along with per-particle scale refinement.

Although the reported resolution is not much improved, the FSC curve shows a higher correlation at frequencies below the FSC=0.143 cutoff, and the density now appears less noisy, indicating a better quality of map irrespective of the resolution number reported. The orientation diagnostics show that the resolution anisotropy of the map is improved, along with the cFAR score after subtraction.

Refinement 8 is our final consensus map, and needs to be sharpened to the right level. While Auto sharpening often does provide a suitable map for model building, differences in local resolution mean that sometimes applying a slight smaller manual sharpening B-factor yields more continuous density in poorer regions.

  • Check if the sharpening is okay, and if not make a Sharpening Tools job and try a few different B-factors until you have optimised the sharpening.

You may need a set of maps with different sharpening in order to manually build and adjust your molecular model. We found that overall, a B-factor of -40 gave a balance between connectivity of poorer regions, and higher resolution features. There are blobs of density found surrounding the TMD, some of which are elongated see Figure 17), and others are fragmented. These likely stem from partially ordered detergent and lipid molecules in the micelle. On the other hand, some of the blobs of density surrounding the extra-cellular domain will still be arising from the neighbouring particles, but others may represent sharpening artefacts from too a high a B-factor for this region of the particle, and can be used as a guide for selecting an appropriate B-factor.

In high resolution maps (better than ~2.7 Å), blobs outside of the main particle may come from ordered waters of hydration or bound ions, so it is worth re-assessing sharpening B-factors once a preliminary model has been built to be sure that such high resolution features, if present, are being captured.

16. 3D Variability Analysis

We now have a nice map that would be suitable for model-building, however earlier on in the processing pipeline in sections 9, 10 and 12, we saw that the ECD was fragmented when the transmembrane domain dominated the particle alignment, indicating flexible heterogeneity in the particle between the transmembrane and EC domains.

To investigate what motions are occurring, we can use using 3D Variability Analysis (3DVA) with a mask over the entire GPCR. It is a good idea to use a mask that excludes the micelle, so that 3DVA is not dominated by differences in the micelle. To just visualise the motions, we can use a subset of 100k particles to speed up the job.

  • Create a 3D Variability job inputting the latest Non-Uniform volume and particles, and the extended mask that we made in section 14. We will set the Filter resolution to 4 as we would like to be able to see motion of all secondary structure elements. If interest was largely in whole domain motions, then it would be more appropriate, and quicker to select a lower resolution, such as 8 or 12.

  • Create a 3D Variability Display job inputting the particles and components from the 3DVA job. We will assess the 3DVA by using simple mode so that we can visualise the motions found in the 3 components. As we set the high resolution limit for the analyses at 4 Å, we can also set the same limit in Filter resolution here, and the box is quite large compared to the particle so we can Crop to size (after downsample) to 224; this will make downloading and loading the volumes faster.

Examine the output movements.

You should find a combination of open-closed rocking motion (as found in the gold volume in Movie 1) as the tether between the domains becomes more and less visible, as well as two twisting motions of the ECD relative to the TMD (observed in the green and purple volumes in Movie 1). Both the ECD and TMD are moving, so this tells us that we might have more success classifying different states if we either use a global mask, or if we first focus the refinement on one of the two domains.

17. Local Refinement on the TMD and ECD, and comparison of local resolutions

There are motions between the two domains of this GPCR, and this means that neither domain is optimally aligned in the consensus refinement. We can improve the alignment of each domain by providing a soft mask so that particle alignment shifts are better for the regions within the mask, and the fulcrum for particle alignment rotations is in the middle of the region of interest.

In order to locally refine the two domains we need to make new masks. We can do so by using our latest Non-Uniform Refinement volume and erase the unwanted density in ChimeraX.

  • Load the volume into ChimeraX and apply a gaussian filter and set the volume to a contour level where the protein features are visible but the micelle is not. Using the Volume eraser tool, erase the density for the TMD and save the volume.

  • Repeat the process but this time erase the density for the ECD and save the volume.

  • Transfer these volumes to the directory for your CryoSPARC processing project and import them using an Import Volumes job.

  • Create two Volume Tools jobs and input each of the two new density-erased volumes. Set the Threshold to match what looked good in ChimeraX, Type of output volume to mask. For the TMD, set Dilation radius (pix) to 10 and Soft Padding width (pix) to 15. For the ECD, set Dilation radius (pix) to 18 and Soft Padding width (pix) to 15.

Check that the masks both appropriately cover your refined protein density, and adjust the threshold and dilation radius as appropriate.

The mask for the ECD is made more generous than that for the TMD to try and reduce the risk of over-fitting artefacts from refining a small volume. This means that the mask will include a part of the TMD but this is a compromise we take so that the resulting map is a bit more reliable.

  • Create a Local Refinement job (Local Refine 1), inputting the volume and particles from Refinement 8 in section 15, and the new TMD mask we just created. The region inside the mask is small and we do not want the initial alignment of particles to be worsened in this job, so set Use pose/shift gaussian prior during alignment , Re-center rotations each iteration and Re-center shifts every iteration to true. We can set Initial lowpass resolution (A) to 5 Å here but be aware that setting this value at too high resolution can lead to overfitting and map artefacts. A good starting point is to set the lowpass filter to a resolution lower than where FSC=1 from the consensus refinement.

  • Create a second Local Refinement job (Local Refine 2) by cloning the first one, and exchange the focus mask for the ECD mask created above. As the ECD is a rather small volume, we are concerned with trying to minimise over-fitting artefacts, so set the Maximum align resolution (A) to 3 and Number of extra final passes to 0. Both of these settings are likely to slightly reduce the final reported resolution compared to the defaults, but produce more reliable density.

During the refinement, you can observe the Magnitude of alignment and shift changes per iteration.

We see fairly large changes in the first iteration, especially in the angle changes, and this makes sense considering the rotation movements that we observe. For the TMD the peak is sharper, and for the ECD it is broader, and this can be interpreted as less precise alignment of the ECD either in the input Non-Uniform Refinement, or during this Local Refinement. By the final iterations, far fewer particle images were moved substantially, indicating that the refinement has converged on a solution that is more accurately aligned (within the focus mask) than the input refinement. If the final refinement map quality is poorer than the input map, and there is a large spread of angle and shift changes, it may be a good idea to try tighter priors to prevent the alignment from diverging too far.

The local resolution and map clarity in the TMD is now better, with holes visible in some aromatic residues (see Figure 19). The map clarity in the ECD is quite similar after local refinement, local resolution is poorer and there are some over-fitting artefacts visible after sharpening. These features appear as radiating lines or plates of smeared density that do not resemble the shape of biological molecules such as proteins, lipids or nucleic acids.

During refinement (global or local) of small particles or sub-volumes there is a fundamental risk of overfitting producing artifactual density in the maps. This is related to a low signal-to-noise ratio in the particle images or in the masked region. The size at which this becomes problematic depends on image quality and the composition of the target. It may be prudent to check for unexpected features that cannot be explained by your known sample composition at any target size (these may be new exciting findings (!) or density artefacts), but vigilance becomes especially important at sizes ≲ 100 kDa.

While this ECD map may be useful in some areas to guide manual model building, caution should be taken with the selection of map(s) used for automated refinement in order to avoid a molecular model being placed into artifactual density (such as circled in red in Figure 19).

In the same manner as the consensus map, it is a good idea to assess the sharpening and adjust as appropriate. In the above example, the auto-sharpening B-factor of -62.3 seemed appropriate for most of the TMD, but the auto-sharpening B-factor of -111 for the ECD fragmented the density too much, so we selected a sharpening B-factor of -70 manually in a Sharpening Tools job.

18. Resolving flexible heterogeneity with 3D Flex

Now that we have a nice local refinement on the TMD, we can use this as a starting point for 3D Flexible Refinement (3D Flex), looking at the relative motions of the ECD. 3D Flex is able to discover and allow visualisation of more complex motions than those observed in 3D Variability Analysis.

3D Flex works best on large particles with substantial motions, and here we have quite a small particle to contend with. We will share the final parameters that we optimised but please bear in mind that with such a difficult target, a few rounds of optimisation of the segmentation and training were required (see the Custom 3D Flex Mesh and 3D Flexible Refinement tutorials).

We saw three linear modes of motion in the 3D Variability Analysis in section 16, however, it is important to consider that the motions of biological targets are often best described by a more complicated combination of linear components, with the individual components not necessarily occurring in isolation. This is why if you use volumes from 3DVA as inputs to 3D Classification or Heterogeneous Refinement, you may not recover volumes that entirely resemble the inputs.

  • Create a 3D Flex Data Prep job using the particles and map from the TMD local refinement with Num. particles to use 100,000 and Crop box size (pix) 256.

If you would like the Flex Mesh to be finer, you can crop the box size. This is helpful because we have a rather large box and a rather small particle, and the fineness of the mesh for 3D Flex is relative to the box size (if we crop the box, the mesh will be finer because the spacing is defined as the number of tetra-cells per box extent in pixels).

  • Check if your consensus volume from section 15 overlays perfectly with the TMD local refinement from section 17. If there is an offset between the volumes, perform an Align 3D Maps job with the TMD as the reference volume, and the consensus map and the map to align.

  • Use a Volume Tools job, inputting this map and setting Crop to box size (pix) 256 so that the map sampling is not affected but the box dimensions match those for the Flex Data Prep particles. This will be used later for

We also want to use Volume Tools jobs to create a mask from the consensus map volume that matches both the sampling and box extent in Å of the Flex Prepared particles. We will use the consensus map for this because the density for the ECD is more easily visible there.

Flex Data Prep performs box cropping before downsampling but Volume tools performs these in the opposite order, so we need to either run two sequential jobs (first crop, then downsample), or find compatible values that produce the same result.

  • Input the volume from refinement 8, into a Volume Tools job, resample the box to 180 pix, and crop the box to 128 pix, along with a lowpass filter to 6 Å.

  • Examine the volume in the volume tab or ChimeraX to find a reasonable threshold where the micelle density is not included but the ECD is still visible.

  • Input this volume to a second Volume Tools job, set the output to be mask, apply the threshold that you just determined, add Dilation radius (pix) 8 and Soft padding width (pix) 3. Check in ChimeraX that the mask covers the entire GPCR.

In addition we want to segment the map into sections that we think could reasonably move relative to one another, and to do this we will use the lowpass filtered, cropped and downsampled volume prepared above.

  • Open the volume in ChimeraX and perform segmentation on the map so that it is cut into 4 segments; the TMD, the ECD, the linker and the tail. Save the segmentation file and transfer to your CryoSPARC processing directory. The identifier number for each segment will be shown in the model panel, so make a note of these and how the 4 segments relate to one another.

For 3D Flex to work best, segments that move more independently of one another should have a cut between them, to minimise the behaviour where movement of one segment “pulls” another with it. See the tutorial on mesh design for more details!

  • Make a 3D Flex Mesh Prep job, inputting the 3D Flex Data Prep volume, mask created from the same, and segmentation file. We will make the mask even more generous to encapsulate the range of motion that we expect, so set Mask dilation (A) to 4. We would like the mesh to be relatively fine, to avoid the situation where one tetracell contains density from both the ECD and the TMD, so set Base num. tetra cells to 40. Input the Segment connections (example shown in Figure 21), and finally set Min. rigidity weight to 0.6. The reason why we added this last setting is that where some of the TMD helix regions were in contact with the micelle, we initially observed some motions in the protein that appeared too flexible to be realistic.

  • Run a 3D Flex Training job inputting the particles, from the Flex Data Prep job and mesh from the Flex Mesh Prep job. Set Number of latent dims to 1, Rigidity (lambda) to 0.007 and Latent centering strength to 0.2.

Finding good values for rigidity and latent centering were crucial in our tests. For a small particle, the default rigidity is often too high and can lead to outputs that show rigid-body motions of the entire particle.

Reducing the rigidity (lambda) value means you can visualise more subtle motions, however be aware that setting the lambda too low can mean the motions are not sufficiently restrained, and lead to a spongey or gelatinous quality to the motions that may not be realistic. It may be especially important to cross-verify or externally validate 3D Flex observations where settings have been altered considerably from the defaults. Cross-validation might be achieved by using volumes from frames in the 3D Flex movies as inputs to 3D classification or heterogeneous refinement.

The Latent centering strength can be adjusted so that the extent of the data points are from ~ -1.5 to ~1.5. In this case we reduced the strength of the centering prior from the default because we found the points were quite close to zero.

  • Create a 3D Flex Generator job inputting the Flex Model from Flex Train, and the aligned, cropped (but not downsampled) consensus map in the Full Res Volume field.

The reason why we use the consensus volume here for visualisation purposes. This volume has better definition of the ECD at high resolution than the TMD local refinement. The TMD local refinement had adequate density for the ECD in the Flex Data Prep downsampled images to ascertain the range of relative motions of the ECD at low resolution, but using the un-binned local TMD volume as the Full Res Volume input to a Flex Generate job would mean the high resolution details of the ECD would not be as well defined.

19. 3D Classification into different conformational states and cross-verification of 3D Flex

Having now found a single motion that describes both the opening/closing and twisting of the ECD with respect to the TMD, we can use some of those volumes as inputs for classification, to find the particles that most closely match those volumes, and also to cross-verify the presence of the volumes observed in 3D Flex. As the motion of the ECD appears continuous you can select as many or as few classes as you like at this stage, considering that with more classes, the extent of motions may be captured better, but at a cost to the resolution of the final maps. We will use 4 classes.

  • Save the volumes from frames 0, 13, 27 and 40, and transfer them to your CryoSPARC directory. In CryoSPARC v4.5+ the volumes in the series can be split using the Split Volumes Group job type so that re-upload is not necessary.

  • Import these volumes using Import Volumes.

We already made a generous mask to cover the ECD in section 17, but here we can use a slightly tighter one.

  • Clone the job for generating the generous ECD mask, and set the parameters to something like Dilation radius (pix) 10 and Soft padding width (pix) 15.

  • Make a 3D Classification job inputting the particles from the TMD local refinement, the imported volumes from above, the solvent mask used for Reference-Based Motion Correction, and the focus mask generated above. Use the following settings:

Number of classes

4

Target resolution (A)

6

Init structure lowpass resolution (A)

20

Initialization mode

input

Force hard classification

on

O-EM learning rate init

0.2

O-EM batch size (per class)

2000

Reorder classes by size

off

RMS density change threshold

on

The number of classes and input volumes need to match, but the exact target resolution may require some optimisation. Setting the initial O-EM learning rate lower than default will prevent the volumes from diverging from the inputs during early iterations.

At the end of the job you may find that the assignment of particles is quite similar per class (see Figure 23) and that is a common result when the sample has continuous flexible heterogeneity.

Examine the low-resolution output classes to check that the job has indeed separated different conformational states of the ECD.

  • Run Non-Uniform Refinements for each class (Refinements 9-12) using your best settings so far, but because the particle sets are now smaller per class, it may be better to turn off refinement of trefoil and spherical aberration as this may worsen the final resolutions.

Inspect the maps for quality and clarity - you can also use Phenix variability refinement of a molecular model into the volumes to help to assess them.

In the 4 classes here we can see states that agree with the motion observed in 3D Flex. Refinements 9 and 10 (green and teal) have good density for the ECD and density for the tether between domains. In Refinements 11 and 12, the tether density is poorer, and the density for two of the ECD helices is worse, particularly in Refinement 12. Modelling of these 4 maps show a fairly rigid motion of the ECD with high flexibility in the hinge loop.

20. Checking and updating the masks used for FSC resolution reporting

At the end of this processing pipeline we have 6 maps for which we would like to be able to report the global resolution; the consensus map, locally refined TMD, and the 4 classified maps that describe the ECD motion.

CryoSPARC has in-built masking that automatically tightens to the map at the end of refinement, and this mask is used to calculate the global resolution displayed in the GUI. It is always a good idea to manually check if the mask is appropriate for your volume by assessing the map and mask in ChimeraX, and checking the FSC curves for indications of over-tightness. This is explained in our tutorial on mask creation. Over-tightness may be more prevalent if your target has peripheral flexibility.

  • Open up the maps in ChimeraX, and the mask_fsc_auto. This mask can be found in the outputs tab of the job in the Refined Volume section (Importantly, this is different from the mask provided in the Outputs Group section, which is the final iteration mask before auto-tightening). Set the map at a contour threshold where you observe the features you are interested in, but not including noise, and set the mask threshold to 1. This mask threshold shows the contour at which the mask was binarised, and on top of this there is a soft edge that extends out to the contour observed at a threshold of ~0.

  • Examine the mask and map together and decide if the mask encompasses the volume of your particle adequately. In our example in Figure 24 we can see that the mask for the consensus map just covers the map volume, and excludes the micelle.

  • Examine the FSC for the auto-tightened map and compare the Tight and Corrected curves. As described in the mask creation tutorial the difference between these curves can be used as an indicator of mask tightness. In our example in Figure 24 we do see a sudden drop in the corrected curve, suggesting there is density outside of the mask.

It is particularly challenging to assess mask tightness using the FSC curves for this dataset because in addition to the receptor the map also contains density from the micelle and some residual density from adjacent particles (even after subtraction).

For membrane proteins where the micelle or nanodisc has been masked out we expect to see a dip in the corrected FSC curve even with an appropriate mask, because we know there is density outside of the mask. The size of this dip will depend on factors such as the map resolution (larger dip at lower resolution) and how strong the detergent or lipid density is relative to the target (larger dip with stronger density). The same FSC phenomena is expected for local refinements where residual density is known and expected to be outside of the mask, or in local masking of regions of a larger map.

  • If you think the auto-tightened mask may be too tight, make a new mask that is more generous using Volume Tools. We achieved this by taking the auto-tightened mask and extending it. We binarised the auto-tightened mask at a threshold of 1, extended by 2 pixels and added a soft edge of 15 pixels (~12 Å).

  • Run a Validation (FSC) job inputting the refined volume and your new mask.

In Figure 24 we can see that our extended mask more generously covers the receptor, and the corrected curve has a smaller dip. In this example, extending the mask did not reduce the reported resolution at FSC=0.143.

The same procedure can be repeated for the other maps, checking the mask and FSC, updating the mask and re-checking the FSC. In the example of the TMD local map shown in Figure 24, we see some blobs of density outside of our locally refined region of interest included in the auto-tightened mask. You can choose your own mask design strategy but this is an example where generating a mask from a PDB model may work best. We fitted in PDB 7knt to our TMD local refined map in ChimeraX and generated a 10 Å map of the TMD using the “molmap” and “volume resample” commands, uploaded the file, and generated a mask with a soft edge of 15 pixels using Volume Tools. In the Validation (FSC) job, this time the reported resolution at FSC=0.143 changed from that with the auto-tightened mask.

Ultimately for map interpretation, the quality of the density in the region of interest is more important than the resolution number, but if you do not see the expected features in your map at a resolution reported with auto-masking, it is worth checking and assessing the influence of the mask.

21. Comparisons and findings from the updated maps

In the updated maps, the improved resolution allowed us to see two flat, elongated densities that we have modelled as cholesteryl hemisuccinate molecules, as well as waters and glycosylation of Asn that we have modelled as N-acetylglucosamine (Figure 25). The most troublesome part of the map was helices 1 and 2 of RAMP1. In Figure 24 we can see how the density of this region has improved with the steps taken in different sections of this case study. We can see that the greatest improvements occurred here after re-centering on the ECD and re-extracting, and after 3D classification.

You can download our versions of the final maps, half maps and masks from the links below for comparison with your own processing!

Last updated