CryoSPARC Guide
Search…
⌃K

# Tutorial: Tips for Membrane Protein Structures

Helpful hints for processing cryo-EM data of membrane proteins.
Membrane proteins are an increasingly important class of targets for cryo-EM in academia and industry. These targets are often small (<100kDA in molecular weight), flexible, and have a large micelle in the transmembrane region. Here, we list a few suggested tips for working with these targets in CryoSPARC, sorted by the different stages of processing.
One type of membrane protein: the Cannabinoid Receptor 1-G GPCR complex, (Kumar et al., 2019). Data from EMPIAR-10288. Density shown at two different thresholds to illustrate the micelle regions.

### Pre-Processing

Generally, pre-processing steps remain unchanged from other nominal cryoEM pipelines—namely, we recommend the use of the Patch Motion Correction and Patch CTF jobs with no salient modifications to the parameters.
We find that per-particle CTF refinements (post-3D refinement) rarely improve final structures due to the low amount of signal present per-particle in the micrograph for small membrane targets. Nevertheless, per-particle CTF refinement may be useful to try once a sufficiently detailed structure is refined.

### Particle Picking

Particle picking can be one of the most challenging parts of working with membrane proteins.
For crowded micrographs, the following two parameters can substantially affect picking performance:
• Particle diameter
• Min. separation dist
• Reducing this for crowded datasets may help pick out more true particles.
Other potentially useful picking jobs:
• Neural-network-based particle picking techniques such as Topaz or Deep Particle Picker can be useful when a large portion of particles are difficult to identify visually
• Blob Picker Tuner can also be quite useful for crowded micrographs. Be sure to choose approximately 100 manual picks, focusing on picks that are ‘clumped together’ and originating from micrographs that span a wide range of defocus values.
We suggest using an extraction box size that is approximately 2-3 times larger than the particle diameter.
To account for signal displacement caused by the CTF, Rosenthal and Henderson (2003) suggest a box size of:
$D + 2 R = D + 2 (\lambda \Delta F / d),$
where
$D$
is the diameter of the particle,
$\lambda$
is the electron wavelength,
$\Delta F$
is the defocus value and
$d$
is the resolution. Note that the radius of displacement,
$R$
, is not a function of the particle diameter, and therefore this formula may result in a box size that is 4-5 times larger than the particle diameter when
$D$
is relatively small (as is typically the case for membrane proteins).
Many of the CryoSPARC algorithms (e.g., 2D classification, ab-initio), however, are tuned for particle images with a box size that is 2-3 times larger than the extent of the particle. Furthermore, substantial computational savings can be achieved by using a smaller box size at early stages of processing where there are potentially many particles (millions). With small membrane proteins at reasonable concentrations, it is common to have many particles (several hundred) per micrograph and therefore very large particle sets in initial classification.
To address this, a suggested pipeline is:
1. 1.
Extract particles with smaller box size (1.5X - 2X particle extent),
2. 2.
Perform multiple rounds of 2D classification, ab-initio, 3D classification, and initial (heterogeneous) refinements, selecting the best particles to carry forward
3. 3.
Re-extract surviving particles with a larger box size (2X - 3X particle extent) to reasonably account for all the information spread due to the CTF, and finally
4. 4.
Perform high resolution refinement(s).

### Particle Curation

During 2D classification, a number of parameter changes can help improve performance for membrane targets:
• Force Max over poses/shifts
• By turning this off, 2D classification will automatically marginalize over the poses and shifts of each particle. For small particles, the uncertainty over poses and shifts can be substantial, and account for this through marginalization over these unknowns can be beneficial. Marginalization will add computational cost, but can help improve classification results in general when SNR is low. When this option is used, 2D classes will appear more “radially blurred” with less streaky or noisy artefacts towards the periphery.
• Number of iterations
• Increasing the default value of 20 may help improve classes.
• Batch size
• Empirically, users have found that doubling the initial value to 400 is sometimes beneficial.
• Circular mask diameter
• This can help account for crowding by masking out any information outside of a circular region in each particle image. For small particles with a lot of crowding, this can be necessary to ensure classification is based on view/conformation rather than arrangement of neighbours.

### Reconstruction & Refinement

• Initial / maximum resolution
• For smaller membrane proteins, it is often useful to set the initial and maximum resolutions to smaller numerical values (e.g., 9Å and 7Å). This is because smaller particles appear as featureless blobs at lower resolutions and there will not be enough information to align particles and recover the structure.
Non-uniform refinement can significantly improve refinements for targets that contain micelles and for smaller proteins. Consider the following two modifications if refinement results are poor:
• Initial lowpass
• Empirically, increasing this resolution (e.g., to a lower numerical value such as 15Å) may improve results for smaller targets.
• Static masking
• For small, low-SNR particles, dynamic masking may perform poorly. Instead, supplying a soft, static mask may improve the final refinement.
Local refinement can also be quite useful for membrane targets. Note that local refinement masks must be softly padded, especially when cutting into density (even a micelle) and consider the following parameter:
• Rotation search extent
• When using smaller masks, tighter orientation priors generally produce better results.

### Heterogeneity Analysis

• Force hard classification
• Hard classification can improve results for low-SNR particles, especially when the target contains a static (well-resolved) domain connected to a flexible/heterogeneous domain (such as a micelle).
• Force hard classification
• Similar to Heterogeneous Refinement, force hard classification can help isolate regions of heterogeneity.
• RMS convergence criterion
• For low-SNR particles, the standard class switching criterion may lead to more F-EM iterations than necessary and cause processing to take longer. Consider turning this secondary criterion on to save computational cost.
3D Variability (3DVA) analysis can be an especially important tool for heterogeneity analysis of small membrane targets. The 3DVA publication includes results on the Cannabinoid Receptor 1-G GPCR, which show that 3DVA can resolve two different bending motions of the 53kDa transmembrane region of the protein.
When running 3DVA, be sure to supply a soft solvent mask to ensure that the job does not resolve variation due to the micelle.

### Common Failure Modes

• "Spiky" densities like the one shown below are often a sign that there are many junk particles in the dataset — this can be especially prevalent in membrane protein datasets where particle picking is difficult. In these cases, it is often helpful to further “purify” the dataset, by either:
• performing additional 2D classification rounds, or
• running ab-initio reconstruction with multiple classes, then using the resulting volumes (including junk classes) to initialize heterogeneous refinement or 3D classification jobs and processing all the particles. Particles that fall into intact classes where the protein density is strong can be used for further refinements and particles falling into other classes can be discarded. This “junk-sorting” in 3D can often separate junk particles more effectively than 2D classification.
A ’spiky’ hyaluronan synthase (the same density shown at two different thresholds for clarity) resolved from one class of a 3D classification job. Data from EMPIAR-11030 (Maloney et al., 2022).