CryoSPARC Guide
  • About CryoSPARC
  • Current Version
  • Licensing
    • Non-commercial license agreement
  • Setup, Configuration and Management
    • CryoSPARC Architecture and System Requirements
    • CryoSPARC Installation Prerequisites
    • How to Download, Install and Configure
      • Obtaining A License ID
      • Downloading and Installing CryoSPARC
      • CryoSPARC Cluster Integration Script Examples
      • Accessing the CryoSPARC User Interface
    • Deploying CryoSPARC on AWS
      • Performance Benchmarks
    • Using CryoSPARC with Cluster Management Software
    • Software Updates and Patches
    • Management and Monitoring
      • Environment variables
      • (Optional) Hosting CryoSPARC Through a Reverse Proxy
      • cryosparcm reference
      • cryosparcm cli reference
      • cryosparcw reference
    • Software System Guides
      • Guide: Updating to CryoSPARC v4
      • Guide: Installation Testing with cryosparcm test
      • Guide: Verify CryoSPARC Installation with the Extensive Validation Job (v4.3+)
      • Guide: Verify CryoSPARC Installation with the Extensive Workflow (≤v4.2)
      • Guide: Performance Benchmarking (v4.3+)
      • Guide: Download Error Reports
      • Guide: Maintenance Mode and Configurable User Facing Messages
      • Guide: User Management
      • Guide: Multi-user Unix Permissions and Data Access Control
      • Guide: Lane Assignments and Restrictions
      • Guide: Queuing Directly to a GPU
      • Guide: Priority Job Queuing
      • Guide: Configuring Custom Variables for Cluster Job Submission Scripts
      • Guide: SSD Particle Caching in CryoSPARC
      • Guide: Data Management in CryoSPARC (v4.0+)
      • Guide: Data Cleanup (v4.3+)
      • Guide: Reduce Database Size (v4.3+)
      • Guide: Data Management in CryoSPARC (≤v3.3)
      • Guide: CryoSPARC Live Session Data Management
      • Guide: Manipulating .cs Files Created By CryoSPARC
      • Guide: Migrating your CryoSPARC Instance
      • Guide: EMDB-friendly XML file for FSC plots
    • Troubleshooting
  • Application Guide (v4.0+)
    • A Tour of the CryoSPARC Interface
    • Browsing the CryoSPARC Instance
    • Projects, Workspaces and Live Sessions
    • Jobs
    • Job Views: Cards, Tree, and Table
    • Creating and Running Jobs
    • Low Level Results Interface
    • Filters and Sorting
    • View Options
    • Tags
    • Flat vs Hierarchical Navigation
    • File Browser
    • Blueprints
    • Workflows
    • Inspecting Data
    • Managing Jobs
    • Interactive Jobs
    • Upload Local Files
    • Managing Data
    • Downloading and Exporting Data
    • Instance Management
    • Admin Panel
  • Cryo-EM Foundations
    • Image Formation
      • Contrast in Cryo-EM
      • Waves as Vectors
      • Aliasing
  • Expectation Maximization in Cryo-EM
  • Processing Data in cryoSPARC
    • Get Started with CryoSPARC: Introductory Tutorial (v4.0+)
    • Tutorial Videos
    • All Job Types in CryoSPARC
      • Import
        • Job: Import Movies
        • Job: Import Micrographs
        • Job: Import Particle Stack
        • Job: Import 3D Volumes
        • Job: Import Templates
        • Job: Import Result Group
        • Job: Import Beam Shift
      • Motion Correction
        • Job: Patch Motion Correction
        • Job: Full-Frame Motion Correction
        • Job: Local Motion Correction
        • Job: MotionCor2 (Wrapper) (BETA)
        • Job: Reference Based Motion Correction (BETA)
      • CTF Estimation
        • Job: Patch CTF Estimation
        • Job: Patch CTF Extraction
        • Job: CTFFIND4 (Wrapper)
        • Job: Gctf (Wrapper) (Legacy)
      • Exposure Curation
        • Job: Micrograph Denoiser (BETA)
        • Job: Micrograph Junk Detector (BETA)
        • Interactive Job: Manually Curate Exposures
      • Particle Picking
        • Interactive Job: Manual Picker
        • Job: Blob Picker
        • Job: Template Picker
        • Job: Filament Tracer
        • Job: Blob Picker Tuner
        • Interactive Job: Inspect Particle Picks
        • Job: Create Templates
      • Extraction
        • Job: Extract from Micrographs
        • Job: Downsample Particles
        • Job: Restack Particles
      • Deep Picking
        • Guideline for Supervised Particle Picking using Deep Learning Models
        • Deep Network Particle Picker
          • T20S Proteasome: Deep Particle Picking Tutorial
          • Job: Deep Picker Train and Job: Deep Picker Inference
        • Topaz (Bepler, et al)
          • T20S Proteasome: Topaz Particle Picking Tutorial
          • T20S Proteasome: Topaz Micrograph Denoising Tutorial
          • Job: Topaz Train and Job: Topaz Cross Validation
          • Job: Topaz Extract
          • Job: Topaz Denoise
      • Particle Curation
        • Job: 2D Classification
        • Interactive Job: Select 2D Classes
        • Job: Reference Based Auto Select 2D (BETA)
        • Job: Reconstruct 2D Classes
        • Job: Rebalance 2D Classes
        • Job: Class Probability Filter (Legacy)
        • Job: Rebalance Orientations
        • Job: Subset Particles by Statistic
      • 3D Reconstruction
        • Job: Ab-Initio Reconstruction
      • 3D Refinement
        • Job: Homogeneous Refinement
        • Job: Heterogeneous Refinement
        • Job: Non-Uniform Refinement
        • Job: Homogeneous Reconstruction Only
        • Job: Heterogeneous Reconstruction Only
        • Job: Homogeneous Refinement (Legacy)
        • Job: Non-uniform Refinement (Legacy)
      • CTF Refinement
        • Job: Global CTF Refinement
        • Job: Local CTF Refinement
        • Job: Exposure Group Utilities
      • Conformational Variability
        • Job: 3D Variability
        • Job: 3D Variability Display
        • Job: 3D Classification
        • Job: Regroup 3D Classes
        • Job: Reference Based Auto Select 3D (BETA)
        • Job: 3D Flexible Refinement (3DFlex) (BETA)
      • Postprocessing
        • Job: Sharpening Tools
        • Job: DeepEMhancer (Wrapper)
        • Job: Validation (FSC)
        • Job: Local Resolution Estimation
        • Job: Local Filtering
        • Job: ResLog Analysis
        • Job: ThreeDFSC (Wrapper) (Legacy)
      • Local Refinement
        • Job: Local Refinement
        • Job: Particle Subtraction
        • Job: Local Refinement (Legacy)
      • Helical Reconstruction
        • Helical symmetry in CryoSPARC
        • Job: Helical Refinement
        • Job: Symmetry search utility
        • Job: Average Power Spectra
      • Utilities
        • Job: Exposure Sets Tool
        • Job: Exposure Tools
        • Job: Generate Micrograph Thumbnails
        • Job: Cache Particles on SSD
        • Job: Check for Corrupt Particles
        • Job: Particle Sets Tool
        • Job: Reassign Particles to Micrographs
        • Job: Remove Duplicate Particles
        • Job: Symmetry Expansion
        • Job: Volume Tools
        • Job: Volume Alignment Tools
        • Job: Align 3D maps
        • Job: Split Volumes Group
        • Job: Orientation Diagnostics
      • Simulations
        • Job: Simulate Data (GPU)
        • Job: Simulate Data (Legacy)
    • CryoSPARC Tools
    • Data Processing Tutorials
      • Case study: End-to-end processing of a ligand-bound GPCR (EMPIAR-10853)
      • Case Study: DkTx-bound TRPV1 (EMPIAR-10059)
      • Case Study: Pseudosymmetry in TRPV5 and Calmodulin (EMPIAR-10256)
      • Case Study: End-to-end processing of an inactive GPCR (EMPIAR-10668)
      • Case Study: End-to-end processing of encapsulated ferritin (EMPIAR-10716)
      • Case Study: Exploratory data processing by Oliver Clarke
      • Tutorial: Tips for Membrane Protein Structures
      • Tutorial: Common CryoSPARC Plots
      • Tutorial: Negative Stain Data
      • Tutorial: Phase Plate Data
      • Tutorial: EER File Support
      • Tutorial: EPU AFIS Beam Shift Import
      • Tutorial: Patch Motion and Patch CTF
      • Tutorial: Float16 Support
      • Tutorial: Particle Picking Calibration
      • Tutorial: Blob Picker Tuner
      • Tutorial: Helical Processing using EMPIAR-10031 (MAVS)
      • Tutorial: Maximum Box Sizes for Refinement
      • Tutorial: CTF Refinement
      • Tutorial: Ewald Sphere Correction
      • Tutorial: Symmetry Relaxation
      • Tutorial: Orientation Diagnostics
      • Tutorial: BILD files in CryoSPARC v4.4+
      • Tutorial: Mask Creation
      • Case Study: Yeast U4/U6.U5 tri-snRNP
      • Tutorial: 3D Classification
      • Tutorial: 3D Variability Analysis (Part One)
      • Tutorial: 3D Variability Analysis (Part Two)
      • Tutorial: 3D Flexible Refinement
        • Installing 3DFlex Dependencies (v4.1–v4.3)
      • Tutorial: 3D Flex Mesh Preparation
    • Webinar Recordings
  • Real-time processing in cryoSPARC Live
    • About CryoSPARC Live
    • Prerequisites and Compute Resources Setup
    • How to Access cryoSPARC Live
    • UI Overview
    • New Live Session: Start to Finish Guide
    • CryoSPARC Live Tutorial Videos
    • Live Jobs and Session-Level Functions
    • Performance Metrics
    • Managing a CryoSPARC Live Session from the CLI
    • FAQs and Troubleshooting
  • Guides for v3
    • v3 User Interface Guide
      • Dashboard
      • Project and Workspace Management
      • Create and Build Jobs
      • Queue Job, Inspect Job and Other Job Actions
      • View and Download Results
      • Job Relationships
      • Resource Manager
      • User Management
    • Tutorial: Job Builder
    • Get Started with CryoSPARC: Introductory Tutorial (v3)
    • Tutorial: Manually Curate Exposures (v3)
  • Resources
    • Questions and Support
Powered by GitBook
On this page
  • 2024 Version
  • 2023 Version
  1. Processing Data in cryoSPARC
  2. Data Processing Tutorials

Case Study: Exploratory data processing by Oliver Clarke

PreviousCase Study: End-to-end processing of encapsulated ferritin (EMPIAR-10716)NextTutorial: Tips for Membrane Protein Structures

Last updated 11 months ago

The guides presented here are kindly reproduced from Oliver Clarke, PhD, Assistant Professor of Physiology and Cellular Biophysics at Columbia University. They are friendly, approachable introductions to cryoEM data processing in CryoSPARC with a focus on the exploratory, "try-it-and-see" nature of single-particle analysis.

Both guides cover similar topics, but the 2024 version includes some steps which require CryoSPARC v4.4 or later. Below we directly reproduce the general outline section of each guide. The full guide is available in the linked PDF.


2024 Version

General principles to keep in mind

  1. Process small, clean, subsets of your dataset before tackling the whole. There are many choices to make during data processing - What picking strategy to use? What cleaning/classification strategy to use? What molecular species are present, and which to focus on? In many cases, the only way to identify the best performing strategy is by trial and error. This is much faster working with a smaller subset of data, and can provide 3D volumes and strategies which can then be used to seed analysis of the entire dataset.

  2. Iterate! Often, optimal processing of a heterogeneous dataset will benefit from multiple passes. The first quick pass identifies any potential issues (non-optimal orientation distribution, variable behavior of particles in different ice thickness regimes) and facilitates identification of the very best micrographs (those with the most particles remaining after initial picking and classification), which can then be used to train a neural network picker such as Topaz to repick the entire dataset.

  3. Experiment/explore! There is no single valid strategy for processing a heterogeneous dataset, and this workshop is only a brief guide to some possible approaches. Mix and match, test what works best, and then apply these strategies to your own data!

These datasets are intended to provide a lightweight and portable starting point for data processing initiated from either CTF estimation and picking (micrographs) or ab-initio volume generation and classification (particles), which can be easily accommodated even on systems with limited storage and processing power. Both sets of data are relatively small, but large enough to allow for identification and characterization of multiple species over the course of the workshop.

2023 Version

General principles to keep in mind:

  1. Process small, clean, subsets of your dataset before tackling the whole. There are many choices to make during data processing - What picking strategy to use? What cleaning/ classification strategy to use? What molecular species are present, and which to focus on? In many cases, the only way to identify the best performing strategy is by trial and error. This is much faster working with a smaller subset of data, and can provide 3D volumes and strategies which can then be used to seed analysis of the entire dataset.

  2. Iterate! Often, optimal processing of a heterogeneous dataset will benefit from multiple passes. The first quick pass identifies any potential issues (non-optimal orientation distribution, variable behavior of particles in different ice thickness regimes) and facilitates identification of the very best micrographs (those with the most particles remaining after initial picking and classification), which can then be used to train a neural network picker such as Topaz to repick the entire dataset.

  3. Experiment/explore! There is no single valid strategy for processing a heterogeneous dataset, and this workshop is only a brief guide to some possible approaches. Mix and match, test what works best, and then apply these strategies to your own data!

These datasets are intended to provide a lightweight and portable starting point for data processing initiated from either CTF estimation and picking (micrographs) or ab-initio volume generation and classification (particles), which can be easily accommodated even on systems with limited storage and processing power. Both sets of data are relatively small, but large enough to allow for identification and characterization of multiple species over the course of the workshop.

This workshop is intended to provide an introduction to "exploratory" data processing in CryoSPARC - that is, data processing with the goal of quickly identifying, reconstructing and refining the molecular species present in a heterogeneous sample. CryoSPARC is used here, but the same general principles & workflow apply to single particle processing in any software package (the particles and micrographs should be directly importable into RELION - just convert the particle.cs file to a STAR file using csparc2star.py in pyem (see ) and you should be good to go). Note: some parts (e.g. symmetry relaxation) require CS 4.4 or later.

I have included two subsets of data for the first part of the workshop (micrographs and extracted & Fourier cropped particles) derived from a publicly available heterogeneous dataset - , the erythrocyte ankyrin-1 complex purified from digitonin extracts of human red blood cell membranes (PMID: ). For the second part of the workshop, addressing mixed symmetry and pseudosymmetry, I have included subsets of data from (the MlaBDEF complex, PMID: ), as well as (TRPV1-DkTx complex, PMID: ).

This workshop is intended to provide an introduction to "exploratory" data processing in CryoSPARC - that is, data processing with the goal of quickly identifying, reconstructing and refining the molecular species present in a heterogeneous sample. CryoSPARC is used here, but the same general principles & workflow apply to single particle processing in any software package (the particles and micrographs should be directly importable into RELION - just convert the particle.cs file to a STAR file using csparc2star.py in pyem (see ) and you should be good to go).

I have included two subsets of data (micrographs and extracted & Fourier cropped particles) derived from a publicly available heterogeneous dataset - , the erythrocyte ankyrin-1 complex purified from digitonin extracts of human red blood cell membranes.

here
EMPIAR-11043
35835865
EMPIAR-10425
34188171
EMPIAR-10059
27281200
here
EMPIAR-11043
2MB
exploratory_data_processing_workshop.pdf
pdf
13MB
exploratory_data_processing_stockholm_2024.pdf
pdf
General workflow for the tutorial (sections 2-8)
General workflow for the tutorial.