Job: Micrograph Denoiser (BETA)

At a Glance

Produce enhanced micrograph images to aid particle picking and visual inspection.

Description

Micrograph Denoiser takes micrographs as input and produces denoised versions of those micrographs. These denoised versions can be used downstream to help pick particles and to aid in visual inspection of the micrographs. Note that particles are extracted from the raw micrographs and not the denoised micrographs even if the latter are used for particle picking.

The denoiser works by first learning repeated patterns in the data during the training step. Then, to perform the denoising, it considers each region of the input micrograph. The denoiser is trained to match patterns it has learned and it amplifies those patterns in the denoised output micrograph. Features and noise which do not match are dimmed, thereby enhancing the visual quality of the denoised output micrograph. For a more thorough explanation of how the Micrograph Denoiser works, see the Denoiser Training section.

Denoised micrographs have significantly higher contrast than, for example, a lowpass filtered micrograph because image content is added by the denoiser. Of course, this contrast is not truly present in any single given region of the data — it is learned in aggregate by the denoiser. As such, the denoised images can be quite helpful for visual inspection or particle picking, but they do not contain increased signal for 2D or 3D reconstruction and therefore particles are not extracted from these denoised micrographs for downstream processing.

This job comprises both the training and application of the denoiser model. Input micrographs with training data are used to train the denoiser model, then all connected micrographs are denoised to produce the output.

Inputs

Exposures

Micrographs to be denoised, with background subtracted and CTF estimates available. The pixel size of these micrographs must be smaller than 3 Å.

Currently, only Patch Motion Correction performs background subtraction, so the input micrographs must come from movies which were motion corrected by Patch Motion Correction.

If a new Denoising model will be trained (which is the recommended workflow), the input micrographs must also have training data. This data is only generated by Patch Motion Jobs run with CryoSPARC version 4.5 or later. If you wish to perform denoising on data motion corrected in prior versions of CryoSPARC, see Denoising Data from Existing Patch Motion Jobs.

Thus, a typical preprocessing workflow for CryoSPARC v4.5 or later might be

  1. Import Movies

  2. Patch Motion Correction

  3. Patch CTF

  4. Micrograph Denoiser

  5. Blob Picker, etc…

Denoise Model

If a Micrograph Denoiser has previously been run on this data, you may connect the Denoiser model output of the previous job to use that model instead of training a new one.

Commonly Adjusted Parameters

Renormalize input greyscale

In some situations, the denoiser must (or should) re-estimate the range of pixel values that likely correspond to particles (as opposed to empty ice or contaminants like crystalline ice or carbon). This estimation is called greyscale normalization, and Renormalize input greyscale controls whether or not this occurs. For more information on greyscale normalization, see that section of this guide page.

When using the pre-trained model or when training a new model, this parameter is not displayed. This is because, in both of these cases, the model was not (or has not yet been) trained on the data and so must have a new greyscale normalization estimate.

When using a model trained by a previous Micrograph Denoiser job, this parameter becomes visible. If the input model was trained on the same data it will be denoising, it is typically not necessary to renormalize the greyscale and so this parameter can be kept off. If, however, it was trained on different data (even a different dataset of the same particle), it is likely worth renormalizing the greyscale and this parameter should be turned on.

Greyscale normalization factor

Although the greyscale normalization procedure typically finds the correct range for the normalized greyscale, it is not perfect. This parameter (1.0 by default) is a multiplicative scale applied after the greyscale normalization process. For example, if the automated normalization determines that training should take place using a greyscale ranging from 0 to 200, but the Greyscale normalization factor is set to 1.5, the final greyscale used during training and denoising will be 0 to 300.

Typically, this parameter can be left at the default value of 1.0. One notable exception is HexAuFoil grids, which often require a lower value for this parameter due to the significant fraction of the micrograph occupied by gold.

The first plot produced by Micrograph Denoiser is an example micrograph with the greyscale normalization applied. If this plot appears flat and grey or completely blown out, this parameter may need to be adjusted to a lower or higher value, respectively.

Use pretrained model

If no input model is connected, you can either train a new model based on the input data or use the pre-trained model that is packaged with CryoSPARC. Generally, we recommend that this setting is left off, since a model trained directly from the data is expected to perform better and is relatively fast.

Number of mics for training

How many micrographs are used during denoiser training. Micrographs are selected randomly from the input for training, up to this number. If fewer than this number have training data available, the job will produce a warning but proceed with the training. At least 10 micrographs are required to train the denoiser; if fewer than 10 micrographs have training data, the job will fail.

We have generally found that 100 micrographs are sufficient to train a high-quality denoiser.

Train from scratch

If this parameter is true (default), a new denoiser will be trained starting from a random initialization. This generally produces better results. If this parameter is false, the training will instead start with an initialization using the pre-trained denoiser as the starting point.

Num training epochs

This parameter controls the number of times the denoiser is trained on the selected training subset. If the results of a denoiser job are still noisy or difficult to interpret, re-running the job with a greater number of epochs can produce better results at the cost of increased training time.

Crop micrograph edges (fraction)

In some datasets, the edges of micrographs may have artifacts due to aberrations in the microscope or significant full-frame drift. These high-contrast artifacts can degrade denoiser training performance. In these cases, it is beneficial to ignore the edges of the micrograph during training.

This parameter sets a fraction of the micrograph to ignore on each edge. For example, setting Crop micrograph edges to 0.01 will crop 1 pixel off each side of a 100 x 100 pixel image, resulting in a 98 x 98 pixel training image. For rectangular images, the largest dimension is used to calculate the number of pixels: the same factor of 0.01 would trim a 50 x 100 pixel image to 48 x 98 pixels during training.

Outputs

Denoised micrographs

Denoised micrographs are output in this slot. Note that the original, non-denoised micrographs are also included in this same output. If this output is ultimately connected to an Extract Particles job, the non-denoised micrographs will be used automatically.

Diagnostic plots

Normalization

The first plot produced by the Micrograph Denoiser is an example micrograph with normalization applied. See the greyscale section of this page for more information on how and why input micrographs are normalized before training and denoising. If this micrograph has little to no contrast, the normalization factor must be reduced. If this micrograph appears blown-out, with most pixels either white or black, the normalization factor must be increased.

Common Problems

Denoised micrographs are blurry or still noisy

If the denoiser model has not yet converged, it will not be able to accurately model noise in the micrographs. This will result in blurry or still-noisy results. Increasing Num training epochs often helps in these cases.

Denoiser produces images of empty ice or large, blotchy images

Flat, grey images or blotchy, high-contrast images, like those shown in the second row of the Diagnostic Plots above, are typically due to a Greyscale normalization factor that is too high or low, respectively. Changing this parameter should improve results.

Common Next Steps

Denoised micrographs are especially helpful when performing and evaluating particle picking. Thus, a typical next step would be Blob Picking or Template Picking, followed by Inspect Picks. The Inspect Picks job allows users to toggle between viewing the raw and denoised micrographs to evaluate pick locations. In CryoSPARC v4.6+, Inspect Picks is able to automatically cluster and select particles when picking is done on denoised micrographs (see Interactive Job: Inspect Particle Picks).

After particles are picked, micrographs can be plugged directly into Extract Micrographs — the raw micrographs will automatically be used during extraction even if denoised micrograph images are available or were used for picking.

At this time, TOPAZ does not perform well on micrographs denoised with the Micrograph Denoiser. If particle picking will be performed with TOPAZ, we recommend using TOPAZ Denoise instead.

Denoising Data from Existing Patch Motion Jobs

Patch Motion Correction jobs run in versions of CryoSPARC prior to 4.5 do not generate the data necessary to train a denoiser model. To denoise these micrographs, two workflows are available:

  1. (recommended) Performing Patch Motion Correction on a small subset of the movies to generate the necessary data, or

  2. Using the pre-trained model

In our testing, a denoiser trained on the input data typically outperforms the pre-trained model. We therefore recommend that the necessary training data is generated for a subset of micrographs and used to create the denoising model. This model can then denoise the entire set of micrographs, including those for which were not re-motion corrected.

  1. Create a Patch Motion Correction job with the same settings as the existing job, except set Only process this many movies and Num. movies for denoiser training data both to 100. Plug the micrographs from the initial Patch CTF Estimation job into the Exposures input.

  2. Run the resulting micrographs through a Micrograph Denoiser job. The CTF estimates from the initial job will be used along with the training data from the new Patch Motion Correction job.

  3. Set up a new Micrograph Denoiser job with

    1. the original, full set of motion-corrected and CTF-estimated movies, and

    2. the denoise model from the first Micrograph Denoiser job

This takes advantage of the benefits of training the denoiser on the data, while avoiding the need to re-motion correct the entire dataset.

Use the pretrained method

If time is at a premium, plugging the existing movies into the Micrograph Denoiser and turning Use pretrained model on skips model training and uses the pretrained model. The results with the pretrained model are typically not as good as those trained on the data, but will likely still represent an improvement over simple lowpass filtering.

Denoiser Training

The Micrograph Denoiser in CryoSPARC uses a specialized neural network architecture to produce denoised micrographs from training data. The neural network is trained using a Nosie2Noise methodology (Lehtinen et al. 2018) to predict what parts of an image are noise and what parts are signal.

The basic principle behind Noise2Noise training is very similar to that of GSFSC validation. First, training data is generated by splitting each movie into odd and even frames, then creating half-micrographs from only those frames. Any signal in the movie should be the same in both half-micrographs, but the noise is treated as entirely random and independent.

Next, the neural network is trained to predict half-micrograph B from half-micrograph A. The only information that is the same between the two micrographs is the signal. Thus, as this neural network improves its ability to predict half B from half A, it is in effect learning patterns present only in the signal — modeling noise would not improve its ability to predict half B.

This setup has been explored in several denoiser methods for cryo-EM data, including Warp (Tegunov and Cramer 2018), TOPAZ denoise (Bepler et al. 2020), and Sphire (Wagner et al. 2020).

In addition to the Noise2Noise training setup, CryoSPARC’s Micrograph Denoiser pre-corrects for the CTF in training data and input data, causing the denoised micrographs to be as visually consistent as possible across a range of defocus values. Furthermore, traditional denoising metrics are used to augment the training objective function to encourage the denoiser to quickly learn to produce visually clear micrographs emphasizing repeated signal such as particles.

Greyscale

Each pixel in a cryoEM micrograph contains a numeric value representing the electron dose received at that pixel. It can have, essentially, any value. Typically, these numbers are represented by making the highest value pure black and the lowest value pure white, with intermediate values linearly scaled to an intermediate grey. This mapping from values to darkness is called the “greyscale” of the image. Each dataset will have a slightly different greyscale, depending on the electron dose, pixel size, aperture settings, etc.

Particles typically fall within a relatively narrow band of values across a dataset. What’s more, they are typically much closer in value to empty ice than to very dark objects like carbon or crystalline ice. Thus, if we trained the denoiser on the raw greyscale it may not even be able to detect true particles, since they would have essentially the same value as empty ice.

To avoid this flattening effect, the Micrograph Denoiser first estimates the greyscale band most likely to contain particles and produces a normalized greyscale covering only that range. Any values outside this range are clipped to black or white. This dedicates the greatest dynamic range to the values most likely to contain particles.

Note in the figure above that, after normalization, the values corresponding to empty ice are all white, and any values outside the expected range for a particle are all black, regardless of their true value. All the variance of the greyscale is focused on the region in which particles are expected to lie. Normalizing the greyscale in this way focuses the denoiser on detecting patterns from the particles rather than empty ice or contaminants.

Greyscale normalization factor adjusts the estimated greyscale by multiplying the limits, moving them further from the mean. For example if, in the initial greyscale, any values below -10 are pure white and any values above 10 are pure black, then setting the normalization factor to 1.5 would result in a final greyscale (used for training) in which -15 and below is white and 15 and above is black.

This means the model would be trained with a wider range of values; this may improve or degrade performance, depending on whether or not those values are useful for learning about the particles in the dataset.

References

  1. Nakane, T. et al. Single-particle cryo-EM at atomic resolution. Nature 587, 152–156 (2020).

  2. Han, Y. et al. High-yield monolayer graphene grids for near-atomic resolution cryoelectron microscopy. Proceedings of the National Academy of Sciences 117, 1009–1014 (2019).

  3. Lehtinen, J. et al. Noise2Noise: Learning Image Restoration without Clean Data. arXiv (2018) doi:10.48550/arXiv.1803.04189.

  4. Tegunov, D. & Cramer, P. Real-time cryo-EM data pre-processing with Warp. bioRxiv (2018) doi:10.1038/s41592-019-0580-y.

  5. Bepler, T., Kelley, K., Noble, A. J. & Berger, B. Topaz-Denoise: general deep denoising models for cryoEM and cryoET. Nature Communications 11, 5208 (2020).

  6. Wagner, T. & Raunser, S. The evolution of SPHIRE-crYOLO particle picking and its application in automated cryo-EM processing workflows. Communications Biology 3, 61 (2020).

Last updated