CryoSPARC Guide
  • About CryoSPARC
  • Current Version
  • Licensing
    • Non-commercial license agreement
  • Setup, Configuration and Management
    • CryoSPARC Architecture and System Requirements
    • CryoSPARC Installation Prerequisites
    • How to Download, Install and Configure
      • Obtaining A License ID
      • Downloading and Installing CryoSPARC
      • CryoSPARC Cluster Integration Script Examples
      • Accessing the CryoSPARC User Interface
    • Deploying CryoSPARC on AWS
      • Performance Benchmarks
    • Using CryoSPARC with Cluster Management Software
    • Software Updates and Patches
    • Management and Monitoring
      • Environment variables
      • (Optional) Hosting CryoSPARC Through a Reverse Proxy
      • cryosparcm reference
      • cryosparcm cli reference
      • cryosparcw reference
    • Software System Guides
      • Guide: Updating to CryoSPARC v4
      • Guide: Installation Testing with cryosparcm test
      • Guide: Verify CryoSPARC Installation with the Extensive Validation Job (v4.3+)
      • Guide: Verify CryoSPARC Installation with the Extensive Workflow (≤v4.2)
      • Guide: Performance Benchmarking (v4.3+)
      • Guide: Download Error Reports
      • Guide: Maintenance Mode and Configurable User Facing Messages
      • Guide: User Management
      • Guide: Multi-user Unix Permissions and Data Access Control
      • Guide: Lane Assignments and Restrictions
      • Guide: Queuing Directly to a GPU
      • Guide: Priority Job Queuing
      • Guide: Configuring Custom Variables for Cluster Job Submission Scripts
      • Guide: SSD Particle Caching in CryoSPARC
      • Guide: Data Management in CryoSPARC (v4.0+)
      • Guide: Data Cleanup (v4.3+)
      • Guide: Reduce Database Size (v4.3+)
      • Guide: Data Management in CryoSPARC (≤v3.3)
      • Guide: CryoSPARC Live Session Data Management
      • Guide: Manipulating .cs Files Created By CryoSPARC
      • Guide: Migrating your CryoSPARC Instance
      • Guide: EMDB-friendly XML file for FSC plots
    • Troubleshooting
  • Application Guide (v4.0+)
    • A Tour of the CryoSPARC Interface
    • Browsing the CryoSPARC Instance
    • Projects, Workspaces and Live Sessions
    • Jobs
    • Job Views: Cards, Tree, and Table
    • Creating and Running Jobs
    • Low Level Results Interface
    • Filters and Sorting
    • View Options
    • Tags
    • Flat vs Hierarchical Navigation
    • File Browser
    • Blueprints
    • Workflows
    • Inspecting Data
    • Managing Jobs
    • Interactive Jobs
    • Upload Local Files
    • Managing Data
    • Downloading and Exporting Data
    • Instance Management
    • Admin Panel
  • Cryo-EM Foundations
    • Image Formation
      • Contrast in Cryo-EM
      • Waves as Vectors
      • Aliasing
  • Expectation Maximization in Cryo-EM
  • Processing Data in cryoSPARC
    • Get Started with CryoSPARC: Introductory Tutorial (v4.0+)
    • Tutorial Videos
    • All Job Types in CryoSPARC
      • Import
        • Job: Import Movies
        • Job: Import Micrographs
        • Job: Import Particle Stack
        • Job: Import 3D Volumes
        • Job: Import Templates
        • Job: Import Result Group
        • Job: Import Beam Shift
      • Motion Correction
        • Job: Patch Motion Correction
        • Job: Full-Frame Motion Correction
        • Job: Local Motion Correction
        • Job: MotionCor2 (Wrapper) (BETA)
        • Job: Reference Based Motion Correction (BETA)
      • CTF Estimation
        • Job: Patch CTF Estimation
        • Job: Patch CTF Extraction
        • Job: CTFFIND4 (Wrapper)
        • Job: Gctf (Wrapper) (Legacy)
      • Exposure Curation
        • Job: Micrograph Denoiser (BETA)
        • Job: Micrograph Junk Detector (BETA)
        • Interactive Job: Manually Curate Exposures
      • Particle Picking
        • Interactive Job: Manual Picker
        • Job: Blob Picker
        • Job: Template Picker
        • Job: Filament Tracer
        • Job: Blob Picker Tuner
        • Interactive Job: Inspect Particle Picks
        • Job: Create Templates
      • Extraction
        • Job: Extract from Micrographs
        • Job: Downsample Particles
        • Job: Restack Particles
      • Deep Picking
        • Guideline for Supervised Particle Picking using Deep Learning Models
        • Deep Network Particle Picker
          • T20S Proteasome: Deep Particle Picking Tutorial
          • Job: Deep Picker Train and Job: Deep Picker Inference
        • Topaz (Bepler, et al)
          • T20S Proteasome: Topaz Particle Picking Tutorial
          • T20S Proteasome: Topaz Micrograph Denoising Tutorial
          • Job: Topaz Train and Job: Topaz Cross Validation
          • Job: Topaz Extract
          • Job: Topaz Denoise
      • Particle Curation
        • Job: 2D Classification
        • Interactive Job: Select 2D Classes
        • Job: Reference Based Auto Select 2D (BETA)
        • Job: Reconstruct 2D Classes
        • Job: Rebalance 2D Classes
        • Job: Class Probability Filter (Legacy)
        • Job: Rebalance Orientations
        • Job: Subset Particles by Statistic
      • 3D Reconstruction
        • Job: Ab-Initio Reconstruction
      • 3D Refinement
        • Job: Homogeneous Refinement
        • Job: Heterogeneous Refinement
        • Job: Non-Uniform Refinement
        • Job: Homogeneous Reconstruction Only
        • Job: Heterogeneous Reconstruction Only
        • Job: Homogeneous Refinement (Legacy)
        • Job: Non-uniform Refinement (Legacy)
      • CTF Refinement
        • Job: Global CTF Refinement
        • Job: Local CTF Refinement
        • Job: Exposure Group Utilities
      • Conformational Variability
        • Job: 3D Variability
        • Job: 3D Variability Display
        • Job: 3D Classification
        • Job: Regroup 3D Classes
        • Job: Reference Based Auto Select 3D (BETA)
        • Job: 3D Flexible Refinement (3DFlex) (BETA)
      • Postprocessing
        • Job: Sharpening Tools
        • Job: DeepEMhancer (Wrapper)
        • Job: Validation (FSC)
        • Job: Local Resolution Estimation
        • Job: Local Filtering
        • Job: ResLog Analysis
        • Job: ThreeDFSC (Wrapper) (Legacy)
      • Local Refinement
        • Job: Local Refinement
        • Job: Particle Subtraction
        • Job: Local Refinement (Legacy)
      • Helical Reconstruction
        • Helical symmetry in CryoSPARC
        • Job: Helical Refinement
        • Job: Symmetry search utility
        • Job: Average Power Spectra
      • Utilities
        • Job: Exposure Sets Tool
        • Job: Exposure Tools
        • Job: Generate Micrograph Thumbnails
        • Job: Cache Particles on SSD
        • Job: Check for Corrupt Particles
        • Job: Particle Sets Tool
        • Job: Reassign Particles to Micrographs
        • Job: Remove Duplicate Particles
        • Job: Symmetry Expansion
        • Job: Volume Tools
        • Job: Volume Alignment Tools
        • Job: Align 3D maps
        • Job: Split Volumes Group
        • Job: Orientation Diagnostics
      • Simulations
        • Job: Simulate Data (GPU)
        • Job: Simulate Data (Legacy)
    • CryoSPARC Tools
    • Data Processing Tutorials
      • Case study: End-to-end processing of a ligand-bound GPCR (EMPIAR-10853)
      • Case Study: DkTx-bound TRPV1 (EMPIAR-10059)
      • Case Study: Pseudosymmetry in TRPV5 and Calmodulin (EMPIAR-10256)
      • Case Study: End-to-end processing of an inactive GPCR (EMPIAR-10668)
      • Case Study: End-to-end processing of encapsulated ferritin (EMPIAR-10716)
      • Case Study: Exploratory data processing by Oliver Clarke
      • Tutorial: Tips for Membrane Protein Structures
      • Tutorial: Common CryoSPARC Plots
      • Tutorial: Negative Stain Data
      • Tutorial: Phase Plate Data
      • Tutorial: EER File Support
      • Tutorial: EPU AFIS Beam Shift Import
      • Tutorial: Patch Motion and Patch CTF
      • Tutorial: Float16 Support
      • Tutorial: Particle Picking Calibration
      • Tutorial: Blob Picker Tuner
      • Tutorial: Helical Processing using EMPIAR-10031 (MAVS)
      • Tutorial: Maximum Box Sizes for Refinement
      • Tutorial: CTF Refinement
      • Tutorial: Ewald Sphere Correction
      • Tutorial: Symmetry Relaxation
      • Tutorial: Orientation Diagnostics
      • Tutorial: BILD files in CryoSPARC v4.4+
      • Tutorial: Mask Creation
      • Case Study: Yeast U4/U6.U5 tri-snRNP
      • Tutorial: 3D Classification
      • Tutorial: 3D Variability Analysis (Part One)
      • Tutorial: 3D Variability Analysis (Part Two)
      • Tutorial: 3D Flexible Refinement
        • Installing 3DFlex Dependencies (v4.1–v4.3)
      • Tutorial: 3D Flex Mesh Preparation
    • Webinar Recordings
  • Real-time processing in cryoSPARC Live
    • About CryoSPARC Live
    • Prerequisites and Compute Resources Setup
    • How to Access cryoSPARC Live
    • UI Overview
    • New Live Session: Start to Finish Guide
    • CryoSPARC Live Tutorial Videos
    • Live Jobs and Session-Level Functions
    • Performance Metrics
    • Managing a CryoSPARC Live Session from the CLI
    • FAQs and Troubleshooting
  • Guides for v3
    • v3 User Interface Guide
      • Dashboard
      • Project and Workspace Management
      • Create and Build Jobs
      • Queue Job, Inspect Job and Other Job Actions
      • View and Download Results
      • Job Relationships
      • Resource Manager
      • User Management
    • Tutorial: Job Builder
    • Get Started with CryoSPARC: Introductory Tutorial (v3)
    • Tutorial: Manually Curate Exposures (v3)
  • Resources
    • Questions and Support
Powered by GitBook
On this page
  • Why is particle caching effective?
  • Hardware
  • Configuration
  • Installation
  • Advanced Parameters
  • Updating Configuration
  • Use
  • Use the caching system when running a job
  • Set a default parameter for the project
  • Tips and Tricks
  • Consolidating a Particle Stack
  • Dynamic SSD Cache Paths
  • Increase or Reduce Cache Files Lifetime
  • Leveraging Multiple Threads to Copy Particles
  • Troubleshooting
  • SSD cache : cache waiting for requested files to become unlocked.
  • SSD cache : cache does not have enough space for download... but there are no files that can be deleted.
  • FAQ
  1. Setup, Configuration and Management
  2. Software System Guides

Guide: SSD Particle Caching in CryoSPARC

Overview of how SSD particle caching works, how much SSD space you need, configuration options and troubleshooting.

PreviousGuide: Configuring Custom Variables for Cluster Job Submission ScriptsNextGuide: Data Management in CryoSPARC (v4.0+)

Last updated 1 year ago

Why is particle caching effective?

For classification, refinement, and reconstruction jobs that deal with particles, having local SSDs on worker nodes can significantly speed up computation: Many cryo-EM algorithms rely on random-access patterns and multiple passes though the data, rather than sequentially reading the data once. When you install CryoSPARC, you have the option of adding an ssd_path, which is a fast drive location on the worker node that particles will be copied to and read from when being processed. CryoSPARC manages the SSD cache on each worker node transparently.

When you run jobs that have the Cache particle images on SSD option turned on, particles will be automatically copied to and read from the SSD path specified. Furthermore, if multiple jobs within the same project require the same particles, the cache will be re-used and the copying step is skipped. If more space is needed, previously cached data will be automatically deleted. Setting up an SSD cache is optional on a per-worker node basis, but it is highly recommended. Nodes reserved for pre-processing (motion correction, CTF estimation, particle picking, etc.) do not need to have an SSD.

Hardware

The size of your typical cryo-EM single particle datasets will inform the size of SSD you choose to use. To store the largest of particle stacks, we recommend 2TB SSDs. You can calculate the exact size of a particle dataset with the following calculation:

For example: A 1,000,000 particle dataset with box size 256 will have a total size of 263.3 GB

For example: A 2,000,000 particle dataset with box size 432 will have a total size of 1.5 TB

Configuration

Installation

When installing CryoSPARC, you can use the parameter --ssdpath to specify the path of your SSD drive when you connect your worker to your instance. If you don't want to configure an SSD cache for a workstation node, specify the --nossd option.

bin/cryosparcw connect 
  --worker <worker_hostname> 
  --master <master_hostname> 
  --port <port_num>   
  --ssdpath <ssd_path>             : path to directory on local ssd

By default, if you specify the SSD path then the cache will be enabled with no quota or reserve.

Advanced Parameters

You can specify two advanced parameters to fine-tune your SSD cache:

--ssdquota: The maximum amount of space that CryoSPARC can use on the SSD (MB)

--ssdreserve: The minimum amount of free space to leave on the SSD (MB)

The above options are useful when you're setting up CryoSPARC on a common compute node that will share the SSD with other applications.

Updating Configuration

You can always update the SSD configuration at any time by running the connect command with the --update flag:

bin/cryosparcw connect
  --worker <worker_hostname>
  --master <master_hostname>
  --port <port_num>
  --update                         : update an existing worker configuration
  [--nossd]                        : connect worker with no SSD
  [--ssdpath <ssd_path> ]          : path to directory on local ssd
  [--ssdquota <ssd_quota_mb> ]     : quota of how much SSD space to use (MB)
  [--ssdreserve <ssd_reserve_mb> ] : minimum free space to leave on SSD (MB)

Use

Use the caching system when running a job

When you are running jobs that process particles (for example: Ab-Initio, Homogeneous Refinement, 2D Classification, 3D Variability), you will find a parameter at the bottom of the job builder under "Compute Settings" called Cache particle images on SSD. Turn this option off to load raw data from their original location instead.

Set a default parameter for the project

By default, the Cache particle images on SSD parameter is always on for every job you build, but if you'd like to keep this option off across all jobs in a project, you can set a project-level default.

In v2.15+, the parameter can be adjusted from the sidebar when a project is selected.

In earlier versions of CryoSPARC, you can adjust this parameter by running the following command in a shell on the master node:

cryosparcm cli "set_project_param_default('PX', 'compute_use_ssd', False)"

where 'PX' is the Project UID you'd like to set the default for (e.g., 'P2')

You can undo this setting by running:

cryosparcm cli "unset_project_param_default('PX', 'compute_use_ssd')"

Tips and Tricks

Consolidating a Particle Stack

When caching a particle stack that is larger than space available on your SSD, you may optionally consolidate your particle stack. This option works if the current particle stack is a subset of the original particle stack. For example, when the cache reports how much data it's requesting to copy (SSD cache : cache requires 1000000.00 MB more on the SSD for files to be downloaded. & SSD cache : cache successfully requested to check 2000000 files.) and the sizes it reports seem much larger than you expected, you can consolidate your particle stack such that only the particle subset you care about is cached.

You might run into this situation if you ran an "Inspect Picks" job after an "Extract From Micrographs" job, and you modified the picking thresholds of your particles to include a smaller subset than the original stack.

You might also run into this situation after a round of 2D Classification. When you select classes, you create metadata that specifies which subset of the particle stack to use. When using this particle subset in further processing, the caching system will require the entire stack of particles to be cached, even though only the smaller subset is required.

To consolidate your particle stack, build a "Downsample Particles" job, connect your particles, and run the job. There is no need to change any parameters - nothing will change about your particle dataset except for the .cs metafile that will be recreated to reflect the smaller subset. You can use this smaller dataset to continue processing.

Dynamic SSD Cache Paths

On some systems it is not possible to know the SSD cache path ahead of time. Instead, a dynamically-generated path is available for jobs to use at run-time.

To prompt a job to use this path, make the path available via a system-defined environment variable. Open cryosparc_worker/config.sh for editing and set the value of CRYOSPARC_SSD_PATH in the worker environment config to this variable:

# cryosparc_worker/config.sh
export CRYOSPARC_LICENSE_ID="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
export CRYOSPARC_USE_GPU=true
export CRYOSPARC_CUDA_PATH="/usr/local/cuda"
export CRYOSPARC_SSD_PATH="$CUSTOM_DYNAMIC_SSD_PATH"

Increase or Reduce Cache Files Lifetime

As of CryoSPARC v3.3, jobs that require cache automatically remove cache files that have not been accessed in over 30 days. If projects on your instance get actively worked on for more or less time, you may change the cache file lifetime by adding the following line to cryosparc_master/config.sh:

export CRYOSPARC_SSD_CACHE_LIFETIME_DAYS=15

Substitute 15 with the number of days your projects typically get worked on.

Leveraging Multiple Threads to Copy Particles

In CryoSPARC v4.3.0+, multiple threads (default 2) are used to copy particles from the project directory to the local cache device. To modify the number of threads used, add the following line to cryosparc_worker/config.sh, where num_threads is the number of threads (e.g., 12) to spawn to copy files:

export CRYOSPARC_CACHE_NUM_THREADS=num_threads

Specify export CRYOSPARC_CACHE_NUM_THREADS=1 to turn off multithreading and copy particle files sequentially in the main process.

Troubleshooting

SSD cache : cache waiting for requested files to become unlocked.

This temporary message usually means the files this job is trying to access are currently being cached by another job. For example, if you started two different Refinement jobs at the same time on the same node (Job A and Job B) using the same particle stack that haven't been cached on SSD before, both jobs try to first copy all particles onto the SSD. If Job A acquires the lock for the files first, it starts copying them and Job B shows this message. When Job A finishes copying the files, it unlocks them. Job B is unlocked and finds that the particles are already on the SSD, so it skips over the copy step.

SSD cache : cache does not have enough space for download... but there are no files that can be deleted.

This message means that there is another CryoSPARC job or another application on the workstation taking up space on the SSD. If the former, the job showing this message will try to free up space as soon as it can, and it will continue processing. If there are files on the SSD that are not owned by CryoSPARC, it will not be able to delete them. It may be necessary to delete them manually.

FAQ

Is it safe to manually delete cache files for completed or unqueued/cleared jobs? Also, can I pre-cache with symlinks to skip caching?

Yes, it is safe to delete cache files any time (it’s a read-only cache) and yes, the cache checks to see if files exist just based on path/size/modification date so symlinks should cause it to skip. Though it may be easier to just set the SSD Cache parameter to False in each job that you queue up.

Source:

Dataset Size=(4∗(box_size2)+nsymbt+header_length)∗num_particlesDataset\ Size =(4*(box\_size^2)+nsymbt+header\_length)*num\_particlesDataset Size=(4∗(box_size2)+nsymbt+header_length)∗num_particles
(4∗(2562)+128+1024)∗1,000,000=263,296,000,000 bytes(4*(256^2)+128+1024)*1,000,000=263,296,000,000 \ bytes(4∗(2562)+128+1024)∗1,000,000=263,296,000,000 bytes
(4∗(4322)+128+1024)∗2,000,000=1,495,296,000,000 bytes(4*(432^2)+128+1024)*2,000,000=1,495,296,000,000 \ bytes(4∗(4322)+128+1024)∗2,000,000=1,495,296,000,000 bytes

How to clear the cache in v2?