Tutorial: EPU AFIS Beam Shift Import

A tutorial covering how to split exposures into groups based on beam shift values, for data collected in EPU's AFIS mode.

Introduction

Thermo Fisher’s EPU data collection software is a commonly used data acquisition software for single particle analysis. Typically, SPA data has been collected via manually moving the stage around, placing different regions within the hole at the optical center of the microscope. This is done in order to avoid the strong aberrations that result from off-axis use of the objective lens. However, collecting data in this manner induces delays from moving the stage and waiting for the stage to settle. Thus, advances such as EPU’s Aberration-Free Image Shift (AFIS) collection mode have allowed for targeting multiple holes without stage movement in-between each hole. Importantly, AFIS and associated microscope calibration service allow for targeting holes that don’t lie at the optical center of the microscope without inducing severe artefacts, and this significantly speeds up data collection.

For many datasets collected via AFIS mode, it is still worthwhile to estimate residual higher-order aberrations such as coma via the Global CTF Refinement job: if there are any residual aberrations, correcting them may lead to improved structures. However, doing so requires grouping movies into subsets (Exposure Groups) based on similar optical conditions, which includes the amount of applied beam shift. Since refinement of higher-order CTF aberrations is done separately for each exposure group, the assignment of movies into exposure groups can have significant impacts on the aberration values, which will impact the resolution achieved by subsequent refinements. In CryoSPARC v4.4+, we have integrated the import of beam shift values from EPU sessions collected in AFIS (Aberration-free Image Shift) mode to allow for Exposure Group assignments based on applied beam shift. The following tutorial covers:

  • how to import movies/micrographs with beam shift values

  • how to assign movies/micrographs into exposure groups based on beam shift

  • merging beam shift values into pre-v4.4 movie/micrograph datasets, without re-processing from scratch

  • continuing processing data from live

Use case #1: Clustering movies into exposure groups at import time

This use case covers the situation where dataset processing starting in CryoSPARC from scratch, i.e., all processing steps post-motion correction (including particle picking) have not been done yet. For existing CryoSPARC exposure datasets, datasets processed in CryoSPARC Live, or datasets with existing particles, please refer to the subsequent use case #2.

Import Movies or Import Micrographs

For movie or micrograph datasets collected via EPU’s AFIS mode, beam shift values can now be imported along with other metadata. As an example, here is a screenshot of an output data directory from a data collection session using EPU. Note that the movies we would like to import are in .eer format, and the associated files containing the beam shifts are in .xml format. Each EER movie has a corresponding xml file.

In CryoSPARC’s Import Movies and Import Micrographs jobs, you will now notice an “XML Import” section that allows specification of an absolute path wildcard to the directory containing the XML files. This path is in addition to the wildcard expression pointing to the raw movie files. For the above example, we have set the two wildcard expressions to the following:

  • Movies data path: /bulk9/data/EPU_apof_JLR/20230212a_T64/Images-Disc1/GridSquare_11564642/Data/*.eer

  • EPU XML metadata path: /bulk9/data/EPU_apof_JLR/20230212a_T64/Images-Disc1/GridSquare_11564642/Data/*.xml

Here they are the same paths with different file extension filters.

There are 4 additional parameters that assist in finding correspondences between the .eer and .xml files. These parameters specify the number of characters to cut off of the beginning and end of the movie and XML filenames in order to match them to each other, one-to-one:

  • Length of movie filename prefix to cut for XML correspondence: Use this field to specify the number of characters to cut off the prefix of the imported movie filename, to match with the XML filename.

  • Length of movie filename suffix to cut for XML correspondence: Use this field to specify the number of characters to cut off the suffix of the imported movie filename, to match with the XML filename.

  • Length of XML filename prefix to cut for movie correspondence: Use this field to specify the number of characters to cut off the prefix of the XML filename, to match with the imported movie filename.

  • Length of XML filename suffix to cut for movie correspondence: Use this field to specify the number of characters to cut off the suffix of the XML filename, to match with the imported movie filename.

In this case, we need to trim the eight _EER.eer characters off of the end of the movie filename, as well as the four .xml characters off of the XML filenames. Thus, we’ll set the movie suffix parameter to 8, the XML suffix parameter to 4, and leave the rest empty.

After inputting the parameters and running the job, a scatter plot with the beam shift values will be displayed in the event log if the XML import was successful. The event log will also print an example of the movie and XML paths after applying the prefix/suffix trim, in order to ensure that these are aligned and match in structure. If any of the XML files are absent, corrupt, or missing beam shift values, they will be flagged as having missing beam shifts; in the example image, two exposures are missing beam shift values. Be sure to check the event log to see if the majority of exposures had successfully read beam shift values; if this is not the case, a warning will be displayed in orange highlight.

Pre-processing

Next, exposures must be pre-processed via motion correction (applicable to movies) and CTF estimation. CTF estimation is required to cluster exposures by the applied beam shift. The recommended motion correction job is Patch Motion Correction, and the recommended CTF estimation job is Patch CTF Estimation.

Clustering via Exposure Group Utilities

The next step is to cluster exposures into groups based on the applied beam shift. The main purpose of clustering is to ensure that exposures with similar beam shift values are placed into the same exposure group. This can be done via running Exposure Group Utilities in the cluster&split mode.

First, connect the outputted exposures from the Import Movies or Import Micrograph job above. Then, specify the “Input Selection” as exposure, and the “Action” as cluster&split. Finally, set the number of clusters. In this case, based on the beam tilt scatter plot above, we counted 61 clusters, which correspond to the 61 unique “rings” (each ring corresponding to 8 different collection sites arranged in a circle on one hole). Note that it is not necessary to ensure that the number of clusters matches the number of holes precisely. Indeed, depending on the layout and orientation of holes on the grid, the beam shift distribution may not form neat clusters and may appear more continuous. In any case, the following should be noted when choosing the number of clusters:

  • With too few clusters, there will be greater intra-exposure-group variability in the beam shift, possibly leading to less accuracy when fitting the higher-order aberrations

  • With too many clusters, there will be fewer exposures and particles per exposure group, possibly limiting the precision of the fit higher-order aberration values. In extreme cases, too few particles per exposure group could impact the stability of the Global CTF Refinement aberration fitting algorithm, as there is a minimum cumulative amount of signal in each exposure group that is needed to fit the aberration parameters. This is important to keep in mind, as aberration estimation is done independently for each exposure group.

The “Clustering method” may also be tweaked. The most important factor when clustering exposures is that clusters are reasonably uniform in both:

  • the number of exposures they contain, and

  • the range of beam shift values they span

The default of agglomerative clustering works well on a variety of datasets, but we also enable kmeans. K-means clustering works better when exposures’ beam shift values form isotropic clusters with most points located close to the mean, or when the spread over beam shifts is more “continuous” and doesn’t form neat discrete clusters. In these cases, k-means will ensure that clusters remain relatively uniform in the range of beam shift values that each cluster spans. Agglomerative clustering may perform better when clusters form more irregular shapes, such as the “rings” in this example.

Once the number of clusters is chosen, queue and run the job. At the first checkpoint, the exposure group clustering result is shown. If any exposures are missing beam shift values, they will be placed into their own separate exposure group, and the number of exposure groups outputted by the job in total will be one larger than the parameter value.

The output exposures are now ready for downstream processing, including motion correction, CTF estimation, and particle picking. Be sure to experiment with Global CTF Refinement to see if clustering particles into exposure groups helps obtain better resolutions. Note that only exposure groups with an adequate number of particles should have their aberrations refined, as Global CTF Refinement depends on having enough signal across the particle images in the exposure group.

Use case #2: Clustering exposures from live, or clustering exposures from pre-v4.4 CryoSPARC versions

This use case covers the following situations:

  • Exposures have been initially processed via CryoSPARC Live or via CryoSPARC versions pre-v4.4, with or without associated particle stacks, and re-clustering of exposure groups based on beam tilt is desired

In this case, the following steps (outlined below) allow for re-clustering of exposure groups:

  • Running an Import Beam Shifts job in order to retrieve the exposures’ beam shift values;

  • Clustering the movies/micrographs into exposure groups via Exposure Group Utilities, with input particles provided to the job

If you are importing fresh movies or micrographs into CryoSPARC v4.4+, Use case #1 covers the basic import case, which is recommended to read first.

Import Beam Shift

Navigate to the job builder, locate the new “Import Beam Shift” job under the imports section, and build an Import Beam Shift job. This is a new job created to add beam shift information to existing exposures datasets in CryoSPARC, without need for re-importing the movies/micrographs from scratch.

Next, connect the existing movies or micrographs dataset from CryoSPARC as input to the Import Beam Shifts job. This may be a movie dataset exported from CryoSPARC Live, or a movie dataset processed in regular CryoSPARC. Ensure that the entire movie dataset is inputted to the job (i.e. if any exposure curation had filtered out some exposures, ensure to use exposures from upstream to that job). When connecting movies as input to the Import Beam Shifts job, the jobs will use the existing movies’ UIDs rather than generating new UIDs, like the other import jobs. These existing UIDs are required when updating particles’ exposure group assignments in Exposure Group Utilities, to match particles to the exposures that they came from.

As in use case #1, provide the XML directory wildcard expression, that points to the directory containing the original XML files. These parameters are identical to those in Import Movies, and the instructions can be followed in use case #1 instructions. If needed, specify the four “Length of movie/XML filename prefix/suffix…” parameters to correctly match movie filenames to XML filenames. Examples of the trimmed file-paths will be printed to the event log to help determine the number of characters. The values of these parameters is most quickly determined by running the job with all defaults, and observing the event log. For example, when connecting movies that were previously imported to CryoSPARC and running the Import Beam Shift job, the event log shows the following messages:

Here, the example movie/mic filename is the same as the XML filename, except for the trailing _EER.eer. Due to these extra characters, the beam shift import was not successful, and CryoSPARC warned that it did not find the beam shifts associated with any of the 2797 exposures. To fix this, we can set the “Length of movie filename suffix to cut for XML correspondence” parameter to 8 to cut off the trailing 8 characters and find proper matches between the XML and movie files. Re-running the job, we see that the XML files were found for all but two exposures, which happen to be missing from this dataset:

Finally, if the XML import was successful and beam shifts were present in the XML files, a beam shift scatter plot will be displayed in the event log as in use case #1. The UIDs and all input slots (e.g. motion correction or CTF estimation results) will have been pulled from the input dataset, meaning we do not have to repeat these steps if they have already been done.

Clustering via Exposure Group Utilities

If particles were already picked, we also do not have to repeat particle picking and can instead assign particles to exposure groups based on which exposures they came from. This can also be done via the Exposure Group Utilities job. In this case, we can use Exposure Group Utilities as described above, with the following (bolded) modifications:

  • Connect the output exposures from “Import Beam Shift” and the existing particle dataset to Exposure Group Utilities;

  • As in use case #1, Set the “Input Type” to exposure, specify the “Action” as cluster&split, and specify the number of clusters and clustering method;

  • Activate the “Correspond particles to exposures and enforce consistency of exposure group IDs” parameter

  • If particles were previously split into more than one exposure group, set the “Combine strategy” to take_mode

    • This ensures that when particles from different exposure groups are combined into the same group, the aberration values for the entire group will be set to the mode (most common value) amongst particles in the group. Since exposure group clustering is done with the purpose of re-running Global CTF Refinement, aberrations will be re-refined and this is not a point of concern.

In our case, particles were previously from only one exposure group, so we don’t need to change the combine strategy. Thus, we’ll run the job with the following input parameters:

Checkpoints 1 and 2 will show the exposure and particle datasets’ exposure group information prior to clustering, respectively, in a table format in the event log. In most cases, particles and exposures will initially be all pooled into one exposure group, unless they were assigned different exposure group IDs upon import. Checkpoint 1 will also show the beam shift scatter plot labelled by the assigned exposure groups. Checkpoints 3 and 4 will show the exposure and particle datasets’ exposure group information after clustering. If “Correspond particles to exposures” was activated, the particles and exposures datasets should be consistent.

Next Steps

Once picked particles are obtained and a relatively high-resolution structure has been obtained, use Global CTF Refinement to fit higher-order aberration values. If you followed use case #2, it’s possible to do an apples-to-apples comparison of resolutions before and after clustering particles into exposure groups. This can be done by using two Global CTF Refinement jobs and two Homogeneous Reconstruction Only jobs along with a fixed mask.

In our example dataset, the resolution improvements obtained via exposure group clustering were rather modest, indicating that the microscope was quite well calibrated. However, examples of more significant improvements have been previously documented on the forum.

References

  • Dustin Morado’s EPU_group_AFIS repository for clustering strategies, as well as his detailed forum post describing the motivation for exposure group clustering when collecting in AFIS mode.

Last updated