CryoSPARC Guide
  • About CryoSPARC
  • Current Version
  • Licensing
    • Non-commercial license agreement
  • Setup, Configuration and Management
    • CryoSPARC Architecture and System Requirements
    • CryoSPARC Installation Prerequisites
    • How to Download, Install and Configure
      • Obtaining A License ID
      • Downloading and Installing CryoSPARC
      • CryoSPARC Cluster Integration Script Examples
      • Accessing the CryoSPARC User Interface
    • Deploying CryoSPARC on AWS
      • Performance Benchmarks
    • Using CryoSPARC with Cluster Management Software
    • Software Updates and Patches
    • Management and Monitoring
      • Environment variables
      • (Optional) Hosting CryoSPARC Through a Reverse Proxy
      • cryosparcm reference
      • cryosparcm cli reference
      • cryosparcw reference
    • Software System Guides
      • Guide: Updating to CryoSPARC v4
      • Guide: Installation Testing with cryosparcm test
      • Guide: Verify CryoSPARC Installation with the Extensive Validation Job (v4.3+)
      • Guide: Verify CryoSPARC Installation with the Extensive Workflow (≤v4.2)
      • Guide: Performance Benchmarking (v4.3+)
      • Guide: Download Error Reports
      • Guide: Maintenance Mode and Configurable User Facing Messages
      • Guide: User Management
      • Guide: Multi-user Unix Permissions and Data Access Control
      • Guide: Lane Assignments and Restrictions
      • Guide: Queuing Directly to a GPU
      • Guide: Priority Job Queuing
      • Guide: Configuring Custom Variables for Cluster Job Submission Scripts
      • Guide: SSD Particle Caching in CryoSPARC
      • Guide: Data Management in CryoSPARC (v4.0+)
      • Guide: Data Cleanup (v4.3+)
      • Guide: Reduce Database Size (v4.3+)
      • Guide: Data Management in CryoSPARC (≤v3.3)
      • Guide: CryoSPARC Live Session Data Management
      • Guide: Manipulating .cs Files Created By CryoSPARC
      • Guide: Migrating your CryoSPARC Instance
      • Guide: EMDB-friendly XML file for FSC plots
    • Troubleshooting
  • Application Guide (v4.0+)
    • A Tour of the CryoSPARC Interface
    • Browsing the CryoSPARC Instance
    • Projects, Workspaces and Live Sessions
    • Jobs
    • Job Views: Cards, Tree, and Table
    • Creating and Running Jobs
    • Low Level Results Interface
    • Filters and Sorting
    • View Options
    • Tags
    • Flat vs Hierarchical Navigation
    • File Browser
    • Blueprints
    • Workflows
    • Inspecting Data
    • Managing Jobs
    • Interactive Jobs
    • Upload Local Files
    • Managing Data
    • Downloading and Exporting Data
    • Instance Management
    • Admin Panel
  • Cryo-EM Foundations
    • Image Formation
      • Contrast in Cryo-EM
      • Waves as Vectors
      • Aliasing
  • Expectation Maximization in Cryo-EM
  • Processing Data in cryoSPARC
    • Get Started with CryoSPARC: Introductory Tutorial (v4.0+)
    • Tutorial Videos
    • All Job Types in CryoSPARC
      • Import
        • Job: Import Movies
        • Job: Import Micrographs
        • Job: Import Particle Stack
        • Job: Import 3D Volumes
        • Job: Import Templates
        • Job: Import Result Group
        • Job: Import Beam Shift
      • Motion Correction
        • Job: Patch Motion Correction
        • Job: Full-Frame Motion Correction
        • Job: Local Motion Correction
        • Job: MotionCor2 (Wrapper) (BETA)
        • Job: Reference Based Motion Correction (BETA)
      • CTF Estimation
        • Job: Patch CTF Estimation
        • Job: Patch CTF Extraction
        • Job: CTFFIND4 (Wrapper)
        • Job: Gctf (Wrapper) (Legacy)
      • Exposure Curation
        • Job: Micrograph Denoiser (BETA)
        • Job: Micrograph Junk Detector (BETA)
        • Interactive Job: Manually Curate Exposures
      • Particle Picking
        • Interactive Job: Manual Picker
        • Job: Blob Picker
        • Job: Template Picker
        • Job: Filament Tracer
        • Job: Blob Picker Tuner
        • Interactive Job: Inspect Particle Picks
        • Job: Create Templates
      • Extraction
        • Job: Extract from Micrographs
        • Job: Downsample Particles
        • Job: Restack Particles
      • Deep Picking
        • Guideline for Supervised Particle Picking using Deep Learning Models
        • Deep Network Particle Picker
          • T20S Proteasome: Deep Particle Picking Tutorial
          • Job: Deep Picker Train and Job: Deep Picker Inference
        • Topaz (Bepler, et al)
          • T20S Proteasome: Topaz Particle Picking Tutorial
          • T20S Proteasome: Topaz Micrograph Denoising Tutorial
          • Job: Topaz Train and Job: Topaz Cross Validation
          • Job: Topaz Extract
          • Job: Topaz Denoise
      • Particle Curation
        • Job: 2D Classification
        • Interactive Job: Select 2D Classes
        • Job: Reference Based Auto Select 2D (BETA)
        • Job: Reconstruct 2D Classes
        • Job: Rebalance 2D Classes
        • Job: Class Probability Filter (Legacy)
        • Job: Rebalance Orientations
        • Job: Subset Particles by Statistic
      • 3D Reconstruction
        • Job: Ab-Initio Reconstruction
      • 3D Refinement
        • Job: Homogeneous Refinement
        • Job: Heterogeneous Refinement
        • Job: Non-Uniform Refinement
        • Job: Homogeneous Reconstruction Only
        • Job: Heterogeneous Reconstruction Only
        • Job: Homogeneous Refinement (Legacy)
        • Job: Non-uniform Refinement (Legacy)
      • CTF Refinement
        • Job: Global CTF Refinement
        • Job: Local CTF Refinement
        • Job: Exposure Group Utilities
      • Conformational Variability
        • Job: 3D Variability
        • Job: 3D Variability Display
        • Job: 3D Classification
        • Job: Regroup 3D Classes
        • Job: Reference Based Auto Select 3D (BETA)
        • Job: 3D Flexible Refinement (3DFlex) (BETA)
      • Postprocessing
        • Job: Sharpening Tools
        • Job: DeepEMhancer (Wrapper)
        • Job: Validation (FSC)
        • Job: Local Resolution Estimation
        • Job: Local Filtering
        • Job: ResLog Analysis
        • Job: ThreeDFSC (Wrapper) (Legacy)
      • Local Refinement
        • Job: Local Refinement
        • Job: Particle Subtraction
        • Job: Local Refinement (Legacy)
      • Helical Reconstruction
        • Helical symmetry in CryoSPARC
        • Job: Helical Refinement
        • Job: Symmetry search utility
        • Job: Average Power Spectra
      • Utilities
        • Job: Exposure Sets Tool
        • Job: Exposure Tools
        • Job: Generate Micrograph Thumbnails
        • Job: Cache Particles on SSD
        • Job: Check for Corrupt Particles
        • Job: Particle Sets Tool
        • Job: Reassign Particles to Micrographs
        • Job: Remove Duplicate Particles
        • Job: Symmetry Expansion
        • Job: Volume Tools
        • Job: Volume Alignment Tools
        • Job: Align 3D maps
        • Job: Split Volumes Group
        • Job: Orientation Diagnostics
      • Simulations
        • Job: Simulate Data (GPU)
        • Job: Simulate Data (Legacy)
    • CryoSPARC Tools
    • Data Processing Tutorials
      • Case study: End-to-end processing of a ligand-bound GPCR (EMPIAR-10853)
      • Case Study: DkTx-bound TRPV1 (EMPIAR-10059)
      • Case Study: Pseudosymmetry in TRPV5 and Calmodulin (EMPIAR-10256)
      • Case Study: End-to-end processing of an inactive GPCR (EMPIAR-10668)
      • Case Study: End-to-end processing of encapsulated ferritin (EMPIAR-10716)
      • Case Study: Exploratory data processing by Oliver Clarke
      • Tutorial: Tips for Membrane Protein Structures
      • Tutorial: Common CryoSPARC Plots
      • Tutorial: Negative Stain Data
      • Tutorial: Phase Plate Data
      • Tutorial: EER File Support
      • Tutorial: EPU AFIS Beam Shift Import
      • Tutorial: Patch Motion and Patch CTF
      • Tutorial: Float16 Support
      • Tutorial: Particle Picking Calibration
      • Tutorial: Blob Picker Tuner
      • Tutorial: Helical Processing using EMPIAR-10031 (MAVS)
      • Tutorial: Maximum Box Sizes for Refinement
      • Tutorial: CTF Refinement
      • Tutorial: Ewald Sphere Correction
      • Tutorial: Symmetry Relaxation
      • Tutorial: Orientation Diagnostics
      • Tutorial: BILD files in CryoSPARC v4.4+
      • Tutorial: Mask Creation
      • Case Study: Yeast U4/U6.U5 tri-snRNP
      • Tutorial: 3D Classification
      • Tutorial: 3D Variability Analysis (Part One)
      • Tutorial: 3D Variability Analysis (Part Two)
      • Tutorial: 3D Flexible Refinement
        • Installing 3DFlex Dependencies (v4.1–v4.3)
      • Tutorial: 3D Flex Mesh Preparation
    • Webinar Recordings
  • Real-time processing in cryoSPARC Live
    • About CryoSPARC Live
    • Prerequisites and Compute Resources Setup
    • How to Access cryoSPARC Live
    • UI Overview
    • New Live Session: Start to Finish Guide
    • CryoSPARC Live Tutorial Videos
    • Live Jobs and Session-Level Functions
    • Performance Metrics
    • Managing a CryoSPARC Live Session from the CLI
    • FAQs and Troubleshooting
  • Guides for v3
    • v3 User Interface Guide
      • Dashboard
      • Project and Workspace Management
      • Create and Build Jobs
      • Queue Job, Inspect Job and Other Job Actions
      • View and Download Results
      • Job Relationships
      • Resource Manager
      • User Management
    • Tutorial: Job Builder
    • Get Started with CryoSPARC: Introductory Tutorial (v3)
    • Tutorial: Manually Curate Exposures (v3)
  • Resources
    • Questions and Support
Powered by GitBook
On this page
  • Parameters
  • Example Parameters
  • Similarities and Differences between Topaz Train and Topaz Cross Validation
  • Interpreting Training Results from Train and Cross Validation
  1. Processing Data in cryoSPARC
  2. All Job Types in CryoSPARC
  3. Deep Picking
  4. Topaz (Bepler, et al)

Job: Topaz Train and Job: Topaz Cross Validation

Topaz job types available via wrapper in CryoSPARC.

To perform particle picking using Topaz, a model must first be trained using either the Topaz Train job or the Topaz Cross Validation job. Both of these jobs require the same inputs and produce the same outputs as listed below:

Inputs

  • Particle Picks

  • Micrographs

Outputs

  • Topaz Model

  • Micrographs

Parameters

Both of the Topaz Train and Topaz Cross Validation jobs feature various parameters. The basic parameters are detailed below:

  • Path to Topaz Executable

    • The absolute path to the Topaz executable that will run the denoise job.

  • Downsampling Factor

    • The factor by which to downsample micrographs by. It is highly recommended to downsample micrographs to reduce memory load and improve model performance. For example, a recommended downsampling factor for a K2 Super Resolution (7676x7420) dataset (e.g. EMPIAR 10025), is 16.

  • Learning Rate

    • The value that determines the extent by which model weights are updated. Higher values will result with training approaching an optimum faster but may prevent the model from reaching the optimum itself, resulting with potentially worse final accuracy.

  • Minibatch Size

    • The number of examples that are used within each batch during training. Lower values will improve model accuracy at the cost of increased training time.

  • Number of Epochs

    • The number of iterations through the entire dataset the training performs. Higher number of epochs will naturally lead to longer training times. The number of epochs does not have to be optimized as the train and cross validation jobs will automatically output the model from the epoch with the highest precision.

  • Epoch Size

    • The number of updates that occur each epoch. Increasing this value will increase the amount of training performed in each epoch in exchange for slower training speed.

  • Train-Test Split

    • The fraction of the dataset to use for testing. For example, a value of 0.2 will use 80% of the input micrographs for training and the remaining 20% for testing. It is highly recommended to use a train-test split greater than 0.

  • Expected Number of Particles

    • The average expected number of particles within each micrograph. This value does not have to be exact but a somewhat accurate value is necessary for Topaz to perform well. This is a necessary parameter that does not include a base value, thus it must be input by the user. It should be noted that if this parameter is lower than the average number of labeled picks input into the training job, then the training job will switch to the PN loss function, which was experimentally found to be worse than the GE-binomial loss function.

  • Number of Parallel Threads

    • Number of threads to distribute preprocessing over. This parameter decreases the preprocessing time by a factor approximatly equal to the input value. It is recommended to set this value to at least 4 as the preprocessing time is often a bottleneck in the time performance of the job. Values less than 2 will default to a single thread.

The advanced parameters are detailed below:

  • Pixel sampling factor / Number of iterations / Score threshold

    • Parameters that affect the preprocessing of micrographs. It is recommended not to change these parameters.

  • Loss function

    • The loss function used to train the model. It is recommended to use GE-binomial for the following reasons. The PU loss function is a non-negative risk estimator approach and PN is a naive approach where unlabeled data is considered as negative for training. Both of these loss functions were found to perform poorly compared to the GE-binomial and GE-KL loss functions in the paper "Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs" by Bepler, T. et al. [1], the developers of Topaz. The paper also found that while GE-binomial and GE-KL had similar performance in most cases, there were a few cases where GE-binomial outperformed GE-KL. Thus it is recommended to use GE-binomial.

  • Slack

    • The weight on the loss function if GE-binomial or GE-KL is selected as the loss function. It is recommended to keep the slack at -1 as -1 will use the default parameters. If the user desires to change this parameter, it is recommended to read the paper by Bepler, T. et al. [1] prior to doing so.

  • Autoencoder weight

    • The weight on the reconstruction error of the autoencoder. An autoencoder weight of 0 will disable the autoencoder. According to the paper by Bepler, T. et al. [1], the autoencoder improves classifier performance when using fewer labeled data points. However, the degree of improvement diminishes with more labeled data points until it begins to negatively affect classifier performance due to over-regularization. The paper recommends using an autoencoder weight of 10 / N when N ≤ 250 and using an autoencoder weight of 0 otherwise, where N is the number of labeled data points.

  • Regularization

    • The regularization parameter on the loss function using L2 regularization. Values less than 1 can be used to improve model performance but values greater than 1 are likely to begin to impede training.

  • Model architecture

    • ResNet stands for residual network are a neural network architecture that is popular for eliminating the vanishing gradient problem. Note that max pooling cannot be used with Conv model architectures. Note that average pooling cannot be used with the ResNet8 model architecture.

    • Conv stands for convolutional neural network which is a popular architecture for computer vision problems. Note that max pooling cannot be used with Conv model architectures.

      • resnet8 [receptive field = 71]

      • conv127 [receptive field = 127]

      • conv63 [receptive field = 63]

      • conv31 [receptive field = 31]

  • Number of units in base layer

    • The number of units in the base layer. The ResNet8 model architecture will double the number of units during convolutions and pooling. For the Conv model architectures, the scaling of units can be specified using the Unit scaling parameter.

  • Dropout rate

    • The probability that a unit is disabled for one batch iteration during training. Dropout is sometimes useful for preventing overfitting. Low dropout rates greater than 0 and less than 0.5 can be used when the Topaz model begins overfitting during training.

  • Batch normalization

    • Enabling batch normalization reduces the covariance shift of hidden units during training. This enables higher learning rates as activation of hidden units are reduced, reduces overfitting, and has provides some regularization. It is recommended to use batch normalization.

  • Pooling method

    • Pooling method is the type of layer used to reduce the spatial complexity of layers within the model. Pooling methods improve training speed in exchange for some information loss.

    • Max pooling uses the max of the values within the pooling kernel as the output value of the kernel. Note that max pooling cannot be used with Conv model architectures.

    • Average pooling uses the average of the values within the pooling kernel as the output value of the kernel. Note that average pooling cannot be used with the ResNEt8 model architecture.

    • There is no strong recommendation regarding pooling method.

  • Unit scaling

    • The factor by which to scale the number of units during convolutions and pooling when using Conv model architectures.

  • Encoder network unit scaling

    • The factor by which to scale the number of units during convolutions and pooling within the autoencoder architecture. Only applies when an autoencoder weight greater than 0 is used.

The Topaz Cross Validation job includes unique parameters that enable the user to select which parameter to vary and how to vary the parameter during training. These parameters are:

  • Parameter to Optimize

  • Number of Cross Validation Folds

  • Initial Value to begin with during Cross Validation

  • Value to Increment Parameter by during Cross Validation

The first parameter allows the user to select which parameter to vary. The number of cross validation folds indicate how many training jobs to perform during cross validation. The initial value and the incremental value parameters serve to specify which values to test. For example, choosing the parameters found in the table below will result with the Topaz Cross Validation job testing the following learning rates two times each: 0.0001, 0.0002, 0.0003. After finding the learning rate yielding the best results, it will use that learning rate to perform the final training.

Example Parameters

  • Parameter to optimize

    • Learning rate

  • Number of cross validation folds

    • 2

  • Initial value to begin with

    • 0.0001

  • Value to increment by

    • 0.0001

  • Number of times to increment parameter

    • 3

There are other advanced training and model parameters that will not be discussed in this introductory user guide such as selection of pooling layer or encoding network. These parameters can potentially improve the Topaz model. It should be noted that some of these parameters are incompatible with certain model architectures. The job will output an error if the job is attempting to use incompatible parameters. The following parameter combinations are forbidden:

  • Parameters incompatible with ResNet architecture:

    • Average Pooling

    • Autoencoder/Encoding network

  • Parameters incompatible with Convolutional Neural Network architecture:

    • Max Pooling

    • Dropout

Similarities and Differences between Topaz Train and Topaz Cross Validation

The Topaz Train and Topaz Cross Validation jobs serve in the same purpose in that they both use particle picks and micrographs to produce models which can then be used to automatically pick particles.

The Topaz Cross Validation job is different in that it runs multiple instances of the Topaz Train job while varying a specified parameter, enabling the job to find an optimal value for a certain parameter. The Topaz Cross Validation job then uses the optimal parameter value to perform one last Topaz Train job and produces a usable model. However, a key disadvantage of the Topaz Cross Validation job is that it is significantly slower than the standard Topaz Train job.

It is recommended to use the Topaz Train job for training the Topaz model and to only use the Topaz Cross Validation job when attempting to find the optimal value for a particular parameter.

Interpreting Training Results from Train and Cross Validation

Once training using either of the Topaz Train or Topaz Cross Validation jobs is complete, it will output a plot indicating the performance on the training set over each epoch. If a train-test split greater than 0 is used, a plot for the performance on the test set will also be output. The testing plot is a better indicator of the overall training results than the training plot and should be used to interpret the results whenever available. The x-axis indicates the epoch and the y-axis indicates the precision. The precision measures how accurate the model is. Successfully trained models will have a test plot gradually featuring precision that increases as the epoch increases.

If the precision begins to decrease after increasing for several epochs, then the model had begun to overfit to the training set. However, the job will automatically select the model from the epoch with the highest precision, therefore, assuming that the precision was improving prior to overfitting, the job will output a version of the model from before it began overfitting.

Below is an example of a test plot from a well-performing Topaz model.Once training using either of the Topaz Train or Topaz Cross Validation jobs is complete, it will output a plot indicating the performance on the training set over each epoch. If a train-test split greater than 0 is used, a plot for the performance on the test set will also be output. The testing plot is a better indicator of the overall training results than the training plot and should be used to interpret the results whenever available. The x-axis indicates the epoch and the y-axis indicates the precision. The precision measures how accurate the model is. Successfully trained models will have a test plot gradually featuring precision that increases as the epoch increases.

If the precision begins to decrease after increasing for several epochs, then the model had begun to overfit to the training set. However, the job will automatically select the model from the epoch with the highest precision, therefore, assuming that the precision was improving prior to overfitting, the job will output a version of the model from before it began overfitting.

Below is an example of a test plot from a well-performing Topaz model.

The Topaz Cross Validation job also features a plot that presents the results of the cross validation and the performance at each value. An example of a cross validation plot using the example parameters shown previously can be found below:

PreviousT20S Proteasome: Topaz Micrograph Denoising TutorialNextJob: Topaz Extract

Last updated 2 years ago

According to the , ResNet8 provides a balance of good performance and receptive field size. Conv31 and Conv63, which have smaller receptive fields can be useful when less complex models are desired. Conv127 should not be used unless quite complex models are required. The following are the receptive fields for each architecture as shown in the aforementioned GitHub page:

Topaz GitHub page
Topaz Training Precision Plots
Topaz Cross Validation Precision Plots