Tutorial: Tips for Membrane Protein Structures
Helpful hints for processing cryo-EM data of membrane proteins.
Membrane proteins are an increasingly important class of targets for cryo-EM in academia and industry. These targets are often small (<100kDA in molecular weight), flexible, and have a large micelle in the transmembrane region. Here, we list a few suggested tips for working with these targets in CryoSPARC, sorted by the different stages of processing.
One type of membrane protein: the Cannabinoid Receptor 1-G GPCR complex, (Kumar et al., 2019). Data from EMPIAR-10288. Density shown at two different thresholds to illustrate the micelle regions.
Generally, pre-processing steps remain unchanged from other nominal cryoEM pipelines—namely, we recommend the use of the Patch Motion Correction and Patch CTF jobs with no salient modifications to the parameters.
We find that per-particle CTF refinements (post-3D refinement) rarely improve final structures due to the low amount of signal present per-particle in the micrograph for small membrane targets. Nevertheless, per-particle CTF refinement may be useful to try once a sufficiently detailed structure is refined.
Particle picking can be one of the most challenging parts of working with membrane proteins.
For crowded micrographs, the following two parameters can substantially affect picking performance:
Min. separation dist
- Reducing this for crowded datasets may help pick out more true particles.
Other potentially useful picking jobs:
- Neural-network-based particle picking techniques such as Topaz or Deep Particle Picker can be useful when a large portion of particles are difficult to identify visually
- Blob Picker Tuner can also be quite useful for crowded micrographs. Be sure to choose approximately 100 manual picks, focusing on picks that are ‘clumped together’ and originating from micrographs that span a wide range of defocus values.
We suggest using an extraction box size that is approximately 2-3 times larger than the particle diameter.
To account for signal displacement caused by the CTF, Rosenthal and Henderson (2003) suggest a box size of:
is the diameter of the particle,
is the electron wavelength,
is the defocus value and
is the resolution. Note that the radius of displacement,
, is not a function of the particle diameter, and therefore this formula may result in a box size that is 4-5 times larger than the particle diameter when
is relatively small (as is typically the case for membrane proteins).
Many of the CryoSPARC algorithms (e.g., 2D classification, ab-initio), however, are tuned for particle images with a box size that is 2-3 times larger than the extent of the particle. Furthermore, substantial computational savings can be achieved by using a smaller box size at early stages of processing where there are potentially many particles (millions). With small membrane proteins at reasonable concentrations, it is common to have many particles (several hundred) per micrograph and therefore very large particle sets in initial classification.
To address this, a suggested pipeline is:
- 1.Extract particles with smaller box size (1.5X - 2X particle extent),
- 2.Perform multiple rounds of 2D classification, ab-initio, 3D classification, and initial (heterogeneous) refinements, selecting the best particles to carry forward
- 3.Re-extract surviving particles with a larger box size (2X - 3X particle extent) to reasonably account for all the information spread due to the CTF, and finally
- 4.Perform high resolution refinement(s).
During 2D classification, a number of parameter changes can help improve performance for membrane targets:
Force Max over poses/shifts
- By turning this off, 2D classification will automatically marginalize over the poses and shifts of each particle. For small particles, the uncertainty over poses and shifts can be substantial, and account for this through marginalization over these unknowns can be beneficial. Marginalization will add computational cost, but can help improve classification results in general when SNR is low. When this option is used, 2D classes will appear more “radially blurred” with less streaky or noisy artefacts towards the periphery.
Number of iterations
- Increasing the default value of 20 may help improve classes.
- Empirically, users have found that doubling the initial value to 400 is sometimes beneficial.
Circular mask diameter
- This can help account for crowding by masking out any information outside of a circular region in each particle image. For small particles with a lot of crowding, this can be necessary to ensure classification is based on view/conformation rather than arrangement of neighbours.
First and foremost, all masks applied during 2D-to-3D processing should be smooth (i.e., contain no sudden 'cliffs' where the mask drops from a value near 1 to a value near 0) to avoid ringing effects. This is because sharp masks, when applied to half-maps during refinement jobs, can increase the likelihood of overfitting by introducing artifactual signal that is common to both half-maps. If you are generating masks using Chimera(X) (e.g., by following our tutorial), be sure to use the Volume Tools job to add a sufficient soft padding width. As noted by the mask generation tutorial, a useful rule of thumb is to keep the mask padding width proportional to the achieved resolution in Angstroms. As long as the soft padding width is sufficient, and the mask covers the desired region of structure (while "cutting" through minimal density), the threshold value and dilation radius may be set as needed in order to generate a mask of the desired size.
Furthermore, it is especially important for membrane target masks to not be overly 'tight' to the structure. For such small proteins, a tight mask can more easily lead to a situation where a refinement 'overfits' to junk/noise (cf. Common Failure Modes). In general, avoid creating a mask that is similar in shape to the secondary structure of the protein, and err on the side of loose (but nevertheless smooth) masks for all processing.
In general, we find that particle subtraction can only help in very specific situations. Namely, if your structure contains two very rigid subunits, one large and one small. In this case, subtracting the larger subunit can improve the resolution of the smaller unit, if particle alignments are sufficiently well resolved for this subtraction to accurately remove the larger subunit signal.
We strongly recommend avoiding the subtraction of micelles -- these structures are generally disordered, and it is very difficult to subtract them from particle images without removing other useful signal. Instead, consider the use of local refinement with non-uniform refinement and marginalization turned on.
Initial / maximum resolution
- For smaller membrane proteins, it is often useful to set the initial and maximum resolutions to smaller numerical values (e.g., 9Å and 7Å). This is because smaller particles appear as featureless blobs at lower resolutions and there will not be enough information to align particles and recover the structure.
Non-uniform refinement can significantly improve refinements for targets that contain micelles and for smaller proteins. Consider the following two modifications if refinement results are poor:
- Empirically, increasing this resolution (e.g., to a lower numerical value such as 15Å) may improve results for smaller targets.
- For small, low-SNR particles, dynamic masking may perform poorly. Instead, supplying a soft, static mask may improve the final refinement.
Local refinement can also be quite useful for membrane targets. Note that local refinement masks must be softly padded, especially when cutting into density (even a micelle). A few salient parameters to consider:
Rotation/Shift search extent
- When using smaller masks, tighter orientation search extents generally produce better results.
- (Default on) Marginalization over poses and shifts can greatly improve alignments for smaller targets.
Non-uniform refine enable
- (Default on) Non-uniform refinement can help account for disordered regions (such as micelles and flexible/floppy appendages).
Rotation/Shift gaussian prior widths
- In cases of small molecules, small masks, or poor SNR, local refinement may benefit from the introduction of gaussian priors around each particle's initial orientation parameters. The utility of these priors is commensurate with the quality of the initial alignments.
CryoSPARC includes a wide assortment of tools for assessing and separating heterogeneous datasets. For high-resolution refinement of any protein, it is critical to ensure that the dataset is as homogeneous as possible; this often entails both particle curation (junk removal) and pruning of heterogeneity. In addition to 2D Classification and Ab-Initio Reconstruction, several other job types for heterogeneity analysis are highlighted below, along with important parameters to consider.
Force hard classification
- Hard classification can improve results for low-SNR particles, especially when the target contains a static (well-resolved) domain connected to a flexible/heterogeneous domain (such as a micelle).
Force hard classification
- Similar to Heterogeneous Refinement, force hard classification can help isolate regions of heterogeneity.
RMS convergence criterion
- For low-SNR particles, the standard class switching criterion may lead to more F-EM iterations than necessary and cause processing to take longer. Consider turning this secondary criterion on to save computational cost.
3D Variability (3DVA) analysis can be an especially important tool for heterogeneity analysis of small membrane targets. The 3DVA publication includes results on the Cannabinoid Receptor 1-G GPCR, which show that 3DVA can resolve two different bending motions of the 53kDa transmembrane region of the protein.
When running 3DVA, be sure to supply a soft solvent mask to ensure that the job does not resolve variation due to the micelle. It is often advantageous to use a mask that excludes the micelle, nanodisc, or other disordered regions, in order to force the algorithm to focus only on variability within the ordered region.
- "Spiky" densities like the one shown below are often a sign that there are many junk particles in the dataset — this can be especially prevalent in membrane protein datasets where particle picking is difficult. In these cases, it is often helpful to further “purify” the dataset, by either:
A ’spiky’ hyaluronan synthase (the same density shown at two different thresholds for clarity) resolved from one class of a 3D classification job. Data from EMPIAR-11030 (Maloney et al., 2022).
- performing additional 2D classification rounds, or
- running ab-initio reconstruction with multiple classes, then using the resulting volumes (including junk classes) to initialize heterogeneous refinement or 3D classification jobs and processing all the particles. Particles that fall into intact classes where the protein density is strong can be used for further refinements and particles falling into other classes can be discarded. This “junk-sorting” in 3D can often separate junk particles more effectively than 2D classification.
Kumar, Kaavya Krishna, et al. "Structure of a signaling cannabinoid receptor 1-G protein complex." Cell 176.3 (2019): 448-458
Maloney, Finn P., et al. "Structure, substrate recognition and initiation of hyaluronan synthase." Nature 604.7904 (2022): 195-201.
Rosenthal, Peter B., and Richard Henderson. "Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy." Journal of Molecular Biology 333.4 (2003): 721-745.