CryoSPARC Guide
  • About CryoSPARC
  • Current Version
  • Licensing
    • Non-commercial license agreement
  • Setup, Configuration and Management
    • CryoSPARC Architecture and System Requirements
    • CryoSPARC Installation Prerequisites
    • How to Download, Install and Configure
      • Obtaining A License ID
      • Downloading and Installing CryoSPARC
      • CryoSPARC Cluster Integration Script Examples
      • Accessing the CryoSPARC User Interface
    • Deploying CryoSPARC on AWS
      • Performance Benchmarks
    • Using CryoSPARC with Cluster Management Software
    • Software Updates and Patches
    • Management and Monitoring
      • Environment variables
      • (Optional) Hosting CryoSPARC Through a Reverse Proxy
      • cryosparcm reference
      • cryosparcm cli reference
      • cryosparcw reference
    • Software System Guides
      • Guide: Updating to CryoSPARC v4
      • Guide: Installation Testing with cryosparcm test
      • Guide: Verify CryoSPARC Installation with the Extensive Validation Job (v4.3+)
      • Guide: Verify CryoSPARC Installation with the Extensive Workflow (≤v4.2)
      • Guide: Performance Benchmarking (v4.3+)
      • Guide: Download Error Reports
      • Guide: Maintenance Mode and Configurable User Facing Messages
      • Guide: User Management
      • Guide: Multi-user Unix Permissions and Data Access Control
      • Guide: Lane Assignments and Restrictions
      • Guide: Queuing Directly to a GPU
      • Guide: Priority Job Queuing
      • Guide: Configuring Custom Variables for Cluster Job Submission Scripts
      • Guide: SSD Particle Caching in CryoSPARC
      • Guide: Data Management in CryoSPARC (v4.0+)
      • Guide: Data Cleanup (v4.3+)
      • Guide: Reduce Database Size (v4.3+)
      • Guide: Data Management in CryoSPARC (≤v3.3)
      • Guide: CryoSPARC Live Session Data Management
      • Guide: Manipulating .cs Files Created By CryoSPARC
      • Guide: Migrating your CryoSPARC Instance
      • Guide: EMDB-friendly XML file for FSC plots
    • Troubleshooting
  • Application Guide (v4.0+)
    • A Tour of the CryoSPARC Interface
    • Browsing the CryoSPARC Instance
    • Projects, Workspaces and Live Sessions
    • Jobs
    • Job Views: Cards, Tree, and Table
    • Creating and Running Jobs
    • Low Level Results Interface
    • Filters and Sorting
    • View Options
    • Tags
    • Flat vs Hierarchical Navigation
    • File Browser
    • Blueprints
    • Workflows
    • Inspecting Data
    • Managing Jobs
    • Interactive Jobs
    • Upload Local Files
    • Managing Data
    • Downloading and Exporting Data
    • Instance Management
    • Admin Panel
  • Cryo-EM Foundations
    • Image Formation
      • Contrast in Cryo-EM
      • Waves as Vectors
      • Aliasing
  • Expectation Maximization in Cryo-EM
  • Processing Data in cryoSPARC
    • Get Started with CryoSPARC: Introductory Tutorial (v4.0+)
    • Tutorial Videos
    • All Job Types in CryoSPARC
      • Import
        • Job: Import Movies
        • Job: Import Micrographs
        • Job: Import Particle Stack
        • Job: Import 3D Volumes
        • Job: Import Templates
        • Job: Import Result Group
        • Job: Import Beam Shift
      • Motion Correction
        • Job: Patch Motion Correction
        • Job: Full-Frame Motion Correction
        • Job: Local Motion Correction
        • Job: MotionCor2 (Wrapper) (BETA)
        • Job: Reference Based Motion Correction (BETA)
      • CTF Estimation
        • Job: Patch CTF Estimation
        • Job: Patch CTF Extraction
        • Job: CTFFIND4 (Wrapper)
        • Job: Gctf (Wrapper) (Legacy)
      • Exposure Curation
        • Job: Micrograph Denoiser (BETA)
        • Job: Micrograph Junk Detector (BETA)
        • Interactive Job: Manually Curate Exposures
      • Particle Picking
        • Interactive Job: Manual Picker
        • Job: Blob Picker
        • Job: Template Picker
        • Job: Filament Tracer
        • Job: Blob Picker Tuner
        • Interactive Job: Inspect Particle Picks
        • Job: Create Templates
      • Extraction
        • Job: Extract from Micrographs
        • Job: Downsample Particles
        • Job: Restack Particles
      • Deep Picking
        • Guideline for Supervised Particle Picking using Deep Learning Models
        • Deep Network Particle Picker
          • T20S Proteasome: Deep Particle Picking Tutorial
          • Job: Deep Picker Train and Job: Deep Picker Inference
        • Topaz (Bepler, et al)
          • T20S Proteasome: Topaz Particle Picking Tutorial
          • T20S Proteasome: Topaz Micrograph Denoising Tutorial
          • Job: Topaz Train and Job: Topaz Cross Validation
          • Job: Topaz Extract
          • Job: Topaz Denoise
      • Particle Curation
        • Job: 2D Classification
        • Interactive Job: Select 2D Classes
        • Job: Reference Based Auto Select 2D (BETA)
        • Job: Reconstruct 2D Classes
        • Job: Rebalance 2D Classes
        • Job: Class Probability Filter (Legacy)
        • Job: Rebalance Orientations
        • Job: Subset Particles by Statistic
      • 3D Reconstruction
        • Job: Ab-Initio Reconstruction
      • 3D Refinement
        • Job: Homogeneous Refinement
        • Job: Heterogeneous Refinement
        • Job: Non-Uniform Refinement
        • Job: Homogeneous Reconstruction Only
        • Job: Heterogeneous Reconstruction Only
        • Job: Homogeneous Refinement (Legacy)
        • Job: Non-uniform Refinement (Legacy)
      • CTF Refinement
        • Job: Global CTF Refinement
        • Job: Local CTF Refinement
        • Job: Exposure Group Utilities
      • Conformational Variability
        • Job: 3D Variability
        • Job: 3D Variability Display
        • Job: 3D Classification
        • Job: Regroup 3D Classes
        • Job: Reference Based Auto Select 3D (BETA)
        • Job: 3D Flexible Refinement (3DFlex) (BETA)
      • Postprocessing
        • Job: Sharpening Tools
        • Job: DeepEMhancer (Wrapper)
        • Job: Validation (FSC)
        • Job: Local Resolution Estimation
        • Job: Local Filtering
        • Job: ResLog Analysis
        • Job: ThreeDFSC (Wrapper) (Legacy)
      • Local Refinement
        • Job: Local Refinement
        • Job: Particle Subtraction
        • Job: Local Refinement (Legacy)
      • Helical Reconstruction
        • Helical symmetry in CryoSPARC
        • Job: Helical Refinement
        • Job: Symmetry search utility
        • Job: Average Power Spectra
      • Utilities
        • Job: Exposure Sets Tool
        • Job: Exposure Tools
        • Job: Generate Micrograph Thumbnails
        • Job: Cache Particles on SSD
        • Job: Check for Corrupt Particles
        • Job: Particle Sets Tool
        • Job: Reassign Particles to Micrographs
        • Job: Remove Duplicate Particles
        • Job: Symmetry Expansion
        • Job: Volume Tools
        • Job: Volume Alignment Tools
        • Job: Align 3D maps
        • Job: Split Volumes Group
        • Job: Orientation Diagnostics
      • Simulations
        • Job: Simulate Data (GPU)
        • Job: Simulate Data (Legacy)
    • CryoSPARC Tools
    • Data Processing Tutorials
      • Case study: End-to-end processing of a ligand-bound GPCR (EMPIAR-10853)
      • Case Study: DkTx-bound TRPV1 (EMPIAR-10059)
      • Case Study: Pseudosymmetry in TRPV5 and Calmodulin (EMPIAR-10256)
      • Case Study: End-to-end processing of an inactive GPCR (EMPIAR-10668)
      • Case Study: End-to-end processing of encapsulated ferritin (EMPIAR-10716)
      • Case Study: Exploratory data processing by Oliver Clarke
      • Tutorial: Tips for Membrane Protein Structures
      • Tutorial: Common CryoSPARC Plots
      • Tutorial: Negative Stain Data
      • Tutorial: Phase Plate Data
      • Tutorial: EER File Support
      • Tutorial: EPU AFIS Beam Shift Import
      • Tutorial: Patch Motion and Patch CTF
      • Tutorial: Float16 Support
      • Tutorial: Particle Picking Calibration
      • Tutorial: Blob Picker Tuner
      • Tutorial: Helical Processing using EMPIAR-10031 (MAVS)
      • Tutorial: Maximum Box Sizes for Refinement
      • Tutorial: CTF Refinement
      • Tutorial: Ewald Sphere Correction
      • Tutorial: Symmetry Relaxation
      • Tutorial: Orientation Diagnostics
      • Tutorial: BILD files in CryoSPARC v4.4+
      • Tutorial: Mask Creation
      • Case Study: Yeast U4/U6.U5 tri-snRNP
      • Tutorial: 3D Classification
      • Tutorial: 3D Variability Analysis (Part One)
      • Tutorial: 3D Variability Analysis (Part Two)
      • Tutorial: 3D Flexible Refinement
        • Installing 3DFlex Dependencies (v4.1–v4.3)
      • Tutorial: 3D Flex Mesh Preparation
    • Webinar Recordings
  • Real-time processing in cryoSPARC Live
    • About CryoSPARC Live
    • Prerequisites and Compute Resources Setup
    • How to Access cryoSPARC Live
    • UI Overview
    • New Live Session: Start to Finish Guide
    • CryoSPARC Live Tutorial Videos
    • Live Jobs and Session-Level Functions
    • Performance Metrics
    • Managing a CryoSPARC Live Session from the CLI
    • FAQs and Troubleshooting
  • Guides for v3
    • v3 User Interface Guide
      • Dashboard
      • Project and Workspace Management
      • Create and Build Jobs
      • Queue Job, Inspect Job and Other Job Actions
      • View and Download Results
      • Job Relationships
      • Resource Manager
      • User Management
    • Tutorial: Job Builder
    • Get Started with CryoSPARC: Introductory Tutorial (v3)
    • Tutorial: Manually Curate Exposures (v3)
  • Resources
    • Questions and Support
Powered by GitBook
On this page
  • Benchmark Data
  • Processing Steps
  • Cluster Configuration
  • Main Node
  • Compute Nodes
  • File Systems
  • Software
  • EMPIAR 10288 Pipeline
  • Performance Analysis
  • Storage Performance
  • Cost Analysis
  • Conclusion
  1. Setup, Configuration and Management
  2. Deploying CryoSPARC on AWS

Performance Benchmarks

Version 1.0 (May 10, 2021)

PreviousDeploying CryoSPARC on AWSNextUsing CryoSPARC with Cluster Management Software

Last updated 1 year ago

This Benchmark Guide accompanies the . This guide provides an overview of benchmarks performed on a sample cryo-EM data processing workflow in on AWS ParallelCluster. A typical workflow in CryoSPARC involves multiple steps, and it’s important to understand the computational requirements of each step to build a cluster on AWS that is both performant and economical.

In addition to benchmarking results, this guide also presents best practices around EC2 instance selection, CryoSPARC configuration, and file system considerations.

NOTE: This guide serves as an example of possible installation options, performance and cost, but each user’s results may vary. Performance and costs may scale differently depending on the specific compute setup, data being processed, how long AWS compute resources are being used, specific steps used in processing, etc.

Benchmark Data

The dataset used for benchmarking is (cannabinoid receptor 1-G protein complex). It’s composed of 2756 TIFF images constituting 476 GB of data. The dataset is moderate in size (compared to production cryo-EM workloads) but was chosen because the size meant a large number of simulations could be run to test a range of different architecture options. Later benchmarking efforts will build on the analysis presented here and will be applied to larger datasets.

There are a few files in the raw dataset that need to be discarded before processing, due to a mismatch in shapes when compared to the rest of the dataset. These files are:

  • CB1__00004_Feb18_23.33.18.tif

  • CB1__00005_Feb18_23.34.19.tif

  • CB1__00724_Feb19_12.00.25.tif

The raw dataset comes with two gain reference files in the .dm4 format, which at this time, CryoSPARC does not support. Use the following file as a gain reference for the entire dataset, which has been converted to .mrc from one of the .dm4 files:

Processing Steps

1. Import Movies

  • Movies data Path: .../10288/data/*.tif

  • Gain reference path: ...10288/data/CountRef_CB1__00826_Feb19_14.19.25.mrc

  • Flip gain ref in Y? True

  • Raw pixel size (Å): 0.86

  • Accelerating Voltage (kV): 300

  • Spherical Aberration (mm): 2.7

  • Total exposure dose (e/Å^2): 58

2. Patch Motion Correction

3. Patch CTF Estimation

4. Blob Picker

  • Minimum particle diameter (Å): 100

  • Maximum particle diameter (Å): 150

  • Use circular blob: True

  • Use elliptical blob: True

  • Number of micrographs to process: 100

5. Extract from Micrographs

  • Extraction Box Size: 320

  • Fourier crop to box size: 64

6. 2D Classification 7. Select 2D (INTERACTIVE) Select all views that look resolvable; these classes will be used to identify particle locations on the full dataset. Try to select the following views:

8. Template Picker

  • Particle Diameter: 160

9. Inspect Picks (INTERACTIVE) Move sliders to match the following values:

  • NCC score > 0.340

  • Local power > 936.000

  • Local power < 1493.000

10. Extract from Micrographs

  • Extraction Box Size: 360

  • Fourier crop to box size: 256

11. 2D Classification

12. Select 2D (INTERACTIVE)

13. Ab-Initio Reconstruction

14. Non-Uniform Refinement The Non-Uniform Refinement should yield a 3Å resolution structure, at which point the data processing pipeline is complete.

Your tree view should look like this:

Cluster Configuration

The specific cluster architecture for these benchmarks is as follows (deployed in us-east-1):

Main Node

  • EC2 instance: c5n.9xlarge

  • 36 vCPUs

  • 96 GB memory

  • Network Bandwidth: 50 Gbps

  • 100 GB local storage (EBS gp2)

Compute Nodes

Multiple queues were configured to dynamically provision the following instance types:

  • g4dn.16xlarge

    • Intel Cascade Lake CPU (32 vCPUs, 256GB memory)

    • 1 x NVIDIA T4 GPU

    • 1 x 900 GB NVMe local storage

  • g4dn.metal

    • Intel Cascade Lake CPU (32 vCPUs, 256GB memory)

    • 8 x NVIDIA T4 GPUs

    • 2 x 900 GB NVMe local storage

  • p3.2xlarge

    • Intel Broadwell CPU (8 vCPUs, 61GB memory)

    • 1 x NVIDIA Tesla V100 GPU

  • p3.8xlarge

    • Intel Broadwell CPU (32 vCPUs, 244GB memory)

    • 4 x NVIDIA Tesla V100 GPUs

  • p3.16xlarge

    • Intel Broadwell CPU (64 vCPUs, 488GB memory)

    • 8 x NVIDIA Tesla V100 GPUs

  • p3dn.24xlarge

    • Intel Broadwell CPU (96 vCPUs, 768GB memory)

    • 8 x NVIDIA Tesla V100 GPUs

    • 2 x 900 GB NVMe local storage

  • p4d.24xlarge

    • Intel Cascade Lake CPU (96 vCPUs, 1152GB memory)

    • 8 x NVIDIA A100 GPUs

    • 8 x 1 TB NVMe local storage

File Systems

  • /fsx

    • 12 TB FSx for Lustre (2.4 GB/s throughput)

    • Used as the primary working directory for all CryoSPARC jobs

  • /shared

    • 100 GB EBS volume mounted on cluster head node

    • Shared with compute nodes via NFS

    • Used as application installation directory

  • /scratch

    • Local storage on compute nodes

    • Only used to test cache performance in certain CryoSPARC steps

    • Only available on specific EC2 instances (those with additional NVMe or SSD)

Software

  • CryoSPARC v3.0.0

  • AWS ParallelCluster v2.10.0

  • Slurm 20.02.4

  • CUDA 11.0

  • NVIDIA Driver 450.80.02

EMPIAR 10288 Pipeline

The pipeline used to benchmark the EMPIAR-10288 dataset is composed of the following CryoSPARC steps:

  • Import Movies

  • Patch Motion Correction

  • Patch CTF Estimation

  • Blob Picker

  • Template Picker

  • Extract From Micrographs

  • 2D Classification

  • Ab-initio Reconstruction

  • Non-Uniform Refinement

Performance Analysis

Each stage was run on 1, 2, 4 or 8 GPUs on each of the listed EC2 instance types (certain stages only make use of a single GPU and are noted in the results). Each pipeline step was run on the attached FSx for Lustre filesystem, and several of the steps were also run using local NVMe drives as a cache to compare performance.

The total runtime for the EMPIAR-10288 pipeline on each instance type is shown below:

The p4d instance provides the best overall performance, but the analysis pipeline doesn’t make use of all 8 GPUs the entire time. Also, the cost of running the entire pipeline on a single p4d instance may not be ideal for some users:

The g4dn.metal instance provides the most cost-effective option if we were to use it for the entire analysis.

An ideal approach is to match EC2 instance types to CryoSPARC pipeline stages, allowing us to make efficient use of the compute resources and keep compute costs down. The benchmarks for each step are listed below and will be used to help identify which instance types to use for which step.

The movie import step does not make use of GPUs, so the performance is determined by the host CPU. The g4 and p4 instances have Intel Cascade Lake processors, the p3dn.24xlarge has a Skylake processor, and the p3 instances have Broadwell processors.

The Patch Motion and Patch CTF Estimation steps show good scaling, and so an instance with 8 GPUs is recommended for these stages.

The Blob Picker and Template Picker stages make use of a single GPU, and there is minimal difference in performance between GPU types. Here a low-cost, single-GPU EC2 instance is recommended (e.g. g4dn.16xlarge).

The Extract from Micrographs step shows scaling only up to 2 GPUs. Currently, there are no EC2 instances that offer only 2 GPUs, so a 4-GPU instance is recommended (e.g. p3.8xlarge). The performance difference is minimal across GPU architectures, so price should be the driving factor in choosing an instance for this stage.

The 2D Classification step also shows scaling only up to 2 GPUs. At present, EC2 instances are available with 1, 4, or 8 GPUs, so a 4-GPU instance is recommended (e.g. p3.8xlarge). Optionally, a lower-cost instance like the g4dn.metal (with 8 GPUs) may be used, with the expectation that scaling may be limited. The performance difference is minimal across GPU architectures, so price should be the driving factor in choosing an instance for this stage.

The Ab-initio Reconstruction step makes use of a single GPU, so the g4dn.16xlarge instance is the best option in terms of price and performance. The larger p3 and p4 instances offer the best performance but increase the overall cost.

The Non-Uniform Refinement stage makes use of 1 GPU and contributes the most to the total runtime. For this stage, a user should consider whether cost or time-to-solution is more important; a p4d.24xlarge will produce results 3.4x faster than a g4dn.16xlarge, but at a higher cost.

We can see several different aspects of a simulation to consider when choosing instance types:

  • The Patch Motion Correction and Patch CTF Estimation stages scale, and so an 8-GPU instance is recommended.

  • Ab-initio Reconstruction and Non-Uniform Refinement dominate the runtime as single-GPU stages.

  • 2D Classification benefits somewhat from scaling, but the short run time for this stage means cost should be a deciding factor in choosing an instance.

As CryoSPARC involves an analysis pipeline, with each stage making use of compute resources in a different way, care must be given to the selection of instance type.

Storage Performance

Another key component in the CryoSPARC pipeline is the filesystem. It is recommended to make use of a local, fast storage on the compute node such as a cache (e.g. NVMe). AWS EC2 instances do offer NVMe local drives, but only a subset of the instances. Generally, they are available on larger instances (e.g., p4d.24xlarge, p3dn.24xlarge, and g4dn.metal). In order to provide a cost-optimal solution and allow for smaller GPU instance types, FSx for Lustre was benchmarked to determine if it could provide the necessary performance.

Below is benchmark data for the 2D classification step (one of the steps in CryoSPARC that can make use of a local cache). It shows the percentage decrease in runtime using FSx for Lustre.

In every case except for one, FSx for Lustre provided better performance than using local storage as a cache.

The main reason for this is how the local filesystem is created. Larger instances have multiple drives (e.g. the p4d.24xlarge has 8 NVMe drives). AWS ParallelCluster creates a single, logical volume from those drives, and that logical volume affects the performance. Additionally, the 2D classification step can make use of multiple GPUs, and so we see a performance loss when multiple cryoSPARC tasks are using the same filesystem (as opposed to those tasks using FSx for Lustre, a file system designed to support IO for a large number of processes).

Cost Analysis

Using a single EC2 instance type for an entire CryoSPARC workload is not an optimal solution. Instances like the p4d.24xlarge will provide the best performance, but with no consideration to cost. General guidelines were given above as to what instances to pick for what stage. Below is an example of how a user may implement those recommendations, along with the total cost (including storage), and runtime. Prices are listed in USD for the us-east-1 region.

Configuration 1

Configuration 2

Pipeline Stage

Instance Type

Cost (USD)

Runtime (Min.)

Instance Type

Cost (USD)

Runtime (Min.)

Patch Motion Correction

p4d.24xlarge

$12.25

22.4

g4dn.metal

$3.97

30.5

Patch CTF Estimation

p4d.24xlarge

$7.46

13.7

g4dn.metal

$1.44

11.1

Blob picker

g4dn.16xlarge

$0.72

9.9

g4dn.16xlarge

$0.72

9.9

Template Picker

g4dn.16xlarge

$1.77

24.4

g4dn.16xlarge

$1.77

24.4

Micrograph Extraction

p3.8xlarge

$1.65

8.25

g4dn.16xlarge

$0.90

12.5

2D Classification

g4dn.metal

$1.76

13.5

g4dn.metal

$1.76

13.5

Ab-initio Reconstruction

g4dn.16xlarge

$2.56

35.3

p3.16xlarge

$6.83

23.3

Non-Uniform Refinement

g4dn.16xlarge

$9.43

130

p4d.24xlarge

$33.66

82.5

Compute Cost

$37.60

$51.05

Lustre Cost

(12 TB Scratch)

$9.52

$8.03

Total

$47.12

257.55

(4.29 hrs)

$59.08

206.8

(3.45 hrs)

For comparison, here are costs and runtimes if we ran the entire benchmark on a single EC2 instance type:

  • p4d.24xlarge

    • 2.79 hours

    • $91.40

  • g4dn.metal

    • 4.8 hours

    • $37.60

  • p3.16xlarge

    • 3.4 hours

    • $83.00

Conclusion

The benchmark results presented demonstrate how AWS can be used to accelerate cryo-EM workloads. By making use of AWS ParallelCluster, users can easily create an HPC cluster with a range of GPU instance types, allowing them to best match compute resources with the requirements of each analysis step.

Benchmark data presented here was done with a single user; in practice, it’s likely multiple users will use the cluster for analysis. Further cost optimization can be achieved by running multiple jobs on a larger instance. For example, while a p4d.24xlarge (with 8 A100 GPUs) may have a higher cost, running multiple, single-GPU stages like Non-Uniform Refinement at the same time will help amortize the higher cost of the p4d instance.

Summarized below are the key points a user should consider when creating a cryoEM cluster.

  • Amazon FSx for Lustre provides a high-performance file system that can meet the requirements of a cryo-EM analysis pipeline. This also allows flexibility in choosing instance types (e.g. those that lack local, fast NVMe storage).

  • A range of GPU instances should be employed. AWS ParallelCluster can be used to easily create different queues with different EC2 instances for this purpose.

  • p3dn.24xlarge instances are not recommended. They can provide excellent performance, but the p4d.24xlarge instance is priced very closely to the p3dn.24xlarge. The time-to-solution for the p4d.24xlarge is fast enough that in most cases a processing stage will use less compute time and, thus, cost less.

  • g4dn instances will likely make up the bulk of the compute resources. They provide performance at an excellent price point.

2.10.0 was used to create an HPC cluster on which to benchmark CryoSPARC. The main requirements for cryo-EM workloads are access to large numbers of GPUs (and CPUs for some applications) as well as a high-performance file system. AWS ParallelCluster provides a simple-to-use mechanism to create a cluster that meets those requirements.

Further details about the above EC2 instance types can be found .

The following data shows a breakdown of each stage as a percentage of the total runtimes.

Deploying CryoSPARC on AWS Guide
CryoSPARC
EMPIAR-10288
https://structura-assets.s3.amazonaws.com/files/CountRef_CB1__00826_Feb19_14.19.25.mrc
AWS ParallelCluster
here
The final tree view for the processing pipeline.
Storage architecture of a p4d.24xlarge in AWS ParallelCluster