Job: Exposure Group Utilities

Exposure group utilities for combining or splitting exposure or particle datasets.

Description

  • Combine or split your exposure or particle datasets into exposure groups.

  • Splitting may be done based on filename or beam shift (refer to EPU AFIS Beam Shift Tutorial)

Input

  • Particles

    • ctf (required)

    • location (optional)

    • blob (optional)

  • Exposures

    • ctf (required)

    • blob [movie/micrograph] (optional)

    • mscope_params (optional)

Output

  • Input Dataset [particle/exposure] Combined (default)

  • Input Dataset [particle/exposure] Split by Exposure Groups (optional)

Common Parameters

  • Input Selection: Specifies which input group to use

  • Action: Specifies which mode to use

    • info_only: Prints a table listing information about the dataset's exposure group stats

    • combine&set: Allows combining of multiple exposure inputs and set them to a single exposure group

    • split: Allows use of a path field to split the datasets into unique exposure groups

    • test_token: When splitting a dataset into exposure groups, this option allows you to test your "token creation" method

    • cluster&split: Enables clustering exposures (movies or micrographs) into exposure groups based on their mscope_params/beam_shift values, via K-means or Agglomerative clustering. Note: movies or micrographs must have been previously imported in CryoSPARC v4.4.0+ with valid beam shift metadata

  • Token Creation Strategy: The strategy used to create the tokens that will split the dataset into unique exposure groups.

    • string_slice: Uses character positions to slice the filepath into unique tokens

    • string_split: Uses a separator to split the file path into groups, of which one can be selected

    • regular_expression: Uses python's re.search() to evaluate a regular expression against each filename, creating subgroups, one of which can be selected to create unique tokens

      • As of v4.0.2, all filepaths that do not match the provided regular expression will be assigned to a separate exposure group ID.

  • Combine Strategy: Specifies which mode to use when finding conflicting CTF values across the exposure group

    • fail: throws an exception if inconsistent CTF values are found

    • take median: overwrites the CTF values of each exposure group with the median of the CTF values across the exposure group

  • Set Exposure Group Value: Used only in mode combine&set- indicates the exposure group ID to set for this dataset

  • Split Outputs by Exposure Group: Specifies whether to output the dataset by each individual exposure group

Parameters for split mode:

  • Field to use to split Dataset: Used only in mode split - indicates which file path field to use to create unique tokens to split the dataset into different exposure groups

  • Starting Exposure Group ID: Used only in mode split - indicates the starting ID that each exposure group will increment from

  • Start Slice Index: Used only in mode split/string_slice - indicates the number of characters from the Index Position to start the slice

  • Number of characters to Consider: Used only in mode split/string_slice - indicates the number of characters from the start position to slice to create the tokens out of the file paths

  • Index Position: Used only in mode split/string_slice- indicates which position of the file path to index from

  • File path separator: Used only in mode split/string_split- indicates the separator to use to split the filepath into individual groups

  • Split Group Index: Used only in modes split/string_split or split/regular_expression- indicates the group to select when splitting the filename into groups

  • Regular Expression String: Used only in mode split/regular_expression- indicates the regular expression to evaluate against each file path using python's re.search()

    • As of v4.0.2, all filepaths that do not match the provided regular expression will be assigned to a separate exposure group ID.

Parameters for cluster&split mode:

  • Number of clusters: This controls the number of clusters to create when clustering exposures into groups based on beam shift

  • Clustering method: Controls the clustering method used to group exposures based on beam shift. By default, agglomerative (hierarchical) clustering is used; K-means is also available

  • Correspond particles to exposures and enforce consistency of exposure group IDs: Activate this parameter to override the exposure group IDs in the particles input with those from the connected exposure inputs. If this is activated, then the Input Selection must be set to exposures, and both particles and exposures must be connected.

Last updated