CryoSPARC Guide
  • About CryoSPARC
  • Current Version
  • Licensing
    • Non-commercial license agreement
  • Setup, Configuration and Management
    • CryoSPARC Architecture and System Requirements
    • CryoSPARC Installation Prerequisites
    • How to Download, Install and Configure
      • Obtaining A License ID
      • Downloading and Installing CryoSPARC
      • CryoSPARC Cluster Integration Script Examples
      • Accessing the CryoSPARC User Interface
    • Deploying CryoSPARC on AWS
      • Performance Benchmarks
    • Using CryoSPARC with Cluster Management Software
    • Software Updates and Patches
    • Management and Monitoring
      • Environment variables
      • (Optional) Hosting CryoSPARC Through a Reverse Proxy
      • cryosparcm reference
      • cryosparcm cli reference
      • cryosparcw reference
    • Software System Guides
      • Guide: Updating to CryoSPARC v4
      • Guide: Installation Testing with cryosparcm test
      • Guide: Verify CryoSPARC Installation with the Extensive Validation Job (v4.3+)
      • Guide: Verify CryoSPARC Installation with the Extensive Workflow (≤v4.2)
      • Guide: Performance Benchmarking (v4.3+)
      • Guide: Download Error Reports
      • Guide: Maintenance Mode and Configurable User Facing Messages
      • Guide: User Management
      • Guide: Multi-user Unix Permissions and Data Access Control
      • Guide: Lane Assignments and Restrictions
      • Guide: Queuing Directly to a GPU
      • Guide: Priority Job Queuing
      • Guide: Configuring Custom Variables for Cluster Job Submission Scripts
      • Guide: SSD Particle Caching in CryoSPARC
      • Guide: Data Management in CryoSPARC (v4.0+)
      • Guide: Data Cleanup (v4.3+)
      • Guide: Reduce Database Size (v4.3+)
      • Guide: Data Management in CryoSPARC (≤v3.3)
      • Guide: CryoSPARC Live Session Data Management
      • Guide: Manipulating .cs Files Created By CryoSPARC
      • Guide: Migrating your CryoSPARC Instance
      • Guide: EMDB-friendly XML file for FSC plots
    • Troubleshooting
  • Application Guide (v4.0+)
    • A Tour of the CryoSPARC Interface
    • Browsing the CryoSPARC Instance
    • Projects, Workspaces and Live Sessions
    • Jobs
    • Job Views: Cards, Tree, and Table
    • Creating and Running Jobs
    • Low Level Results Interface
    • Filters and Sorting
    • View Options
    • Tags
    • Flat vs Hierarchical Navigation
    • File Browser
    • Blueprints
    • Workflows
    • Inspecting Data
    • Managing Jobs
    • Interactive Jobs
    • Upload Local Files
    • Managing Data
    • Downloading and Exporting Data
    • Instance Management
    • Admin Panel
  • Cryo-EM Foundations
    • Image Formation
      • Contrast in Cryo-EM
      • Waves as Vectors
      • Aliasing
  • Expectation Maximization in Cryo-EM
  • Processing Data in cryoSPARC
    • Get Started with CryoSPARC: Introductory Tutorial (v4.0+)
    • Tutorial Videos
    • All Job Types in CryoSPARC
      • Import
        • Job: Import Movies
        • Job: Import Micrographs
        • Job: Import Particle Stack
        • Job: Import 3D Volumes
        • Job: Import Templates
        • Job: Import Result Group
        • Job: Import Beam Shift
      • Motion Correction
        • Job: Patch Motion Correction
        • Job: Full-Frame Motion Correction
        • Job: Local Motion Correction
        • Job: MotionCor2 (Wrapper) (BETA)
        • Job: Reference Based Motion Correction (BETA)
      • CTF Estimation
        • Job: Patch CTF Estimation
        • Job: Patch CTF Extraction
        • Job: CTFFIND4 (Wrapper)
        • Job: Gctf (Wrapper) (Legacy)
      • Exposure Curation
        • Job: Micrograph Denoiser (BETA)
        • Job: Micrograph Junk Detector (BETA)
        • Interactive Job: Manually Curate Exposures
      • Particle Picking
        • Interactive Job: Manual Picker
        • Job: Blob Picker
        • Job: Template Picker
        • Job: Filament Tracer
        • Job: Blob Picker Tuner
        • Interactive Job: Inspect Particle Picks
        • Job: Create Templates
      • Extraction
        • Job: Extract from Micrographs
        • Job: Downsample Particles
        • Job: Restack Particles
      • Deep Picking
        • Guideline for Supervised Particle Picking using Deep Learning Models
        • Deep Network Particle Picker
          • T20S Proteasome: Deep Particle Picking Tutorial
          • Job: Deep Picker Train and Job: Deep Picker Inference
        • Topaz (Bepler, et al)
          • T20S Proteasome: Topaz Particle Picking Tutorial
          • T20S Proteasome: Topaz Micrograph Denoising Tutorial
          • Job: Topaz Train and Job: Topaz Cross Validation
          • Job: Topaz Extract
          • Job: Topaz Denoise
      • Particle Curation
        • Job: 2D Classification
        • Interactive Job: Select 2D Classes
        • Job: Reference Based Auto Select 2D (BETA)
        • Job: Reconstruct 2D Classes
        • Job: Rebalance 2D Classes
        • Job: Class Probability Filter (Legacy)
        • Job: Rebalance Orientations
        • Job: Subset Particles by Statistic
      • 3D Reconstruction
        • Job: Ab-Initio Reconstruction
      • 3D Refinement
        • Job: Homogeneous Refinement
        • Job: Heterogeneous Refinement
        • Job: Non-Uniform Refinement
        • Job: Homogeneous Reconstruction Only
        • Job: Heterogeneous Reconstruction Only
        • Job: Homogeneous Refinement (Legacy)
        • Job: Non-uniform Refinement (Legacy)
      • CTF Refinement
        • Job: Global CTF Refinement
        • Job: Local CTF Refinement
        • Job: Exposure Group Utilities
      • Conformational Variability
        • Job: 3D Variability
        • Job: 3D Variability Display
        • Job: 3D Classification
        • Job: Regroup 3D Classes
        • Job: Reference Based Auto Select 3D (BETA)
        • Job: 3D Flexible Refinement (3DFlex) (BETA)
      • Postprocessing
        • Job: Sharpening Tools
        • Job: DeepEMhancer (Wrapper)
        • Job: Validation (FSC)
        • Job: Local Resolution Estimation
        • Job: Local Filtering
        • Job: ResLog Analysis
        • Job: ThreeDFSC (Wrapper) (Legacy)
      • Local Refinement
        • Job: Local Refinement
        • Job: Particle Subtraction
        • Job: Local Refinement (Legacy)
      • Helical Reconstruction
        • Helical symmetry in CryoSPARC
        • Job: Helical Refinement
        • Job: Symmetry search utility
        • Job: Average Power Spectra
      • Utilities
        • Job: Exposure Sets Tool
        • Job: Exposure Tools
        • Job: Generate Micrograph Thumbnails
        • Job: Cache Particles on SSD
        • Job: Check for Corrupt Particles
        • Job: Particle Sets Tool
        • Job: Reassign Particles to Micrographs
        • Job: Remove Duplicate Particles
        • Job: Symmetry Expansion
        • Job: Volume Tools
        • Job: Volume Alignment Tools
        • Job: Align 3D maps
        • Job: Split Volumes Group
        • Job: Orientation Diagnostics
      • Simulations
        • Job: Simulate Data (GPU)
        • Job: Simulate Data (Legacy)
    • CryoSPARC Tools
    • Data Processing Tutorials
      • Case study: End-to-end processing of a ligand-bound GPCR (EMPIAR-10853)
      • Case Study: DkTx-bound TRPV1 (EMPIAR-10059)
      • Case Study: Pseudosymmetry in TRPV5 and Calmodulin (EMPIAR-10256)
      • Case Study: End-to-end processing of an inactive GPCR (EMPIAR-10668)
      • Case Study: End-to-end processing of encapsulated ferritin (EMPIAR-10716)
      • Case Study: Exploratory data processing by Oliver Clarke
      • Tutorial: Tips for Membrane Protein Structures
      • Tutorial: Common CryoSPARC Plots
      • Tutorial: Negative Stain Data
      • Tutorial: Phase Plate Data
      • Tutorial: EER File Support
      • Tutorial: EPU AFIS Beam Shift Import
      • Tutorial: Patch Motion and Patch CTF
      • Tutorial: Float16 Support
      • Tutorial: Particle Picking Calibration
      • Tutorial: Blob Picker Tuner
      • Tutorial: Helical Processing using EMPIAR-10031 (MAVS)
      • Tutorial: Maximum Box Sizes for Refinement
      • Tutorial: CTF Refinement
      • Tutorial: Ewald Sphere Correction
      • Tutorial: Symmetry Relaxation
      • Tutorial: Orientation Diagnostics
      • Tutorial: BILD files in CryoSPARC v4.4+
      • Tutorial: Mask Creation
      • Case Study: Yeast U4/U6.U5 tri-snRNP
      • Tutorial: 3D Classification
      • Tutorial: 3D Variability Analysis (Part One)
      • Tutorial: 3D Variability Analysis (Part Two)
      • Tutorial: 3D Flexible Refinement
        • Installing 3DFlex Dependencies (v4.1–v4.3)
      • Tutorial: 3D Flex Mesh Preparation
    • Webinar Recordings
  • Real-time processing in cryoSPARC Live
    • About CryoSPARC Live
    • Prerequisites and Compute Resources Setup
    • How to Access cryoSPARC Live
    • UI Overview
    • New Live Session: Start to Finish Guide
    • CryoSPARC Live Tutorial Videos
    • Live Jobs and Session-Level Functions
    • Performance Metrics
    • Managing a CryoSPARC Live Session from the CLI
    • FAQs and Troubleshooting
  • Guides for v3
    • v3 User Interface Guide
      • Dashboard
      • Project and Workspace Management
      • Create and Build Jobs
      • Queue Job, Inspect Job and Other Job Actions
      • View and Download Results
      • Job Relationships
      • Resource Manager
      • User Management
    • Tutorial: Job Builder
    • Get Started with CryoSPARC: Introductory Tutorial (v3)
    • Tutorial: Manually Curate Exposures (v3)
  • Resources
    • Questions and Support
Powered by GitBook
On this page
  • 1. Introduction
  • Benchmarks and Cost Estimates
  • 2. Pre-requisites
  • 3. AWS Management Console
  • 4. AWS CLI
  • For Linux
  • For macOS
  • Configure the AWS CLI tool
  • 5. IAM Role & Permissions Required
  • Permissions Required
  • 6. EC2 Dashboard
  • 6.1. EC2 Service Quotas
  • 6.2. EC2 key
  • 7. Amazon VPCs and Subnets
  • 8. Deployment Scripts
  • 9. Amazon S3
  • 10. Amazon FSx for Lustre
  • 11. AWS CloudFormation
  • 12. AWS ParallelCluster
  • 12.1. Configure the cluster
  • 12.2. How to deploy
  • 12.3. Deployed Architecture
  • 12.4. Launch CryoSPARC web interface
  • 12.5. Export output data to Amazon S3
  • 13. Tearing the cluster down
  • Release Notes
  1. Setup, Configuration and Management

Deploying CryoSPARC on AWS

Version 1.0 (May 10, 2021)

PreviousAccessing the CryoSPARC User InterfaceNextPerformance Benchmarks

Last updated 1 year ago

This deployment guide is based on version 2 of AWS ParallelCIuster. For an updated configuration using a newer version of AWS ParallelCluster, please see AWS sample.

This Deployment Guide provides end-to-end sample instructions for deploying , a state-of-the-art scientific software platform for cryo-EM, on AWS using AWS ParallelCluster. CryoSPARC is developed by Inc. Additional information about CryoSPARC, including , is available at .

1. Introduction

Cryo-electron microscopy (cryo-EM) is a biophysical technique that allows scientists to determine the structure of biological macromolecules and assemblies. This uses advanced microscopes to reveal 3D structures of biomolecules in near-native states. Cryo-EM is rapidly becoming the go-to technique for protein structure determination in life-sciences and drug discovery. Just recently, cryo-EM was used to produce .

Storing the micrographs (images produced by the microscope) requires enormous data storage and the processing workflow requires massive computing power. This workload is therefore an ideal use case for High-Performance Computing in Amazon Web Services.

A typical cryo-EM workflow involves biological sample preparation, data collection, and finally computation. Images of a sample are collected with a transmission electron microscope. The raw data, which is generally 5-10 TB in size, is then processed to reconstruct a three-dimensional structure of the protein of interest from the two-dimensional images. Resolving a 3D structure often requires multiple iterations through the entire processing pipeline, or parts of the pipeline, for a near-atomic resolution result. In many cases, the entire workflow starting from data collection must be repeated to achieve state-of-the-art results.

Benchmarks and Cost Estimates

This guide is accompanied by a Performance Benchmarks document outlining various steps in running a CryoSPARC workflow, timings and cost estimates:

  • p4d.24xlarge: $37 USD

  • p3dn.24xlarge: $42 USD

  • g4dn.metal: $10 USD

NOTE: This guide serves as an example of possible installation options, performance and cost, but each user’s results may vary. Performance and costs may scale differently depending on the specific compute setup, data being processed, how long AWS compute resources are being used, specific steps used in processing, etc.

2. Pre-requisites

You will also need the following:

  • A computer with internet running macOS or Linux. For Windows users, a terminal emulator.

  • An Internet browser such as Chrome or Firefox

  • Familiarity with Linux terminal commands

  • Time to Request EC2 Service Quota increases (at least 24 hours)

  • IAM Permissions

3. AWS Management Console

Once you are logged into the AWS Management Console, spend some time becoming familiar with the interface. This page is a central place you can use to find and learn about AWS services as well as to manage and monitor your account. Some key items to note are the Search Bar and the AWS region menu. The latter allows you to select the geographical region you would like your AWS resources to be located. AWS currently has 24 regions, each of which contains multiple Availability Zones (AZ), each containing one or more data centers, spread out across the globe where you can run your cryo-EM analysis.

Note that some newer compute instances may not be available in all AZs. Please select the AZ that has all the resources required for your workflow.

4. AWS CLI

For Linux

$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install

For macOS

$ curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
$ sudo installer -pkg AWSCLIV2.pkg -target /

Verify that the AWS shell was installed correctly with the following commands (example outputs included)

$ which aws
/usr/local/bin/aws

$ aws --version
aws-cli/2.0.47 Python/3.7.4 Darwin/18.7.0 botocore/2.0.0

Configure the AWS CLI tool

5. IAM Role & Permissions Required

Whether you are using a new or existing AWS account to deploy CryoSPARC in the cloud, it’s best to create a new IAM user specifically for this purpose. Doing so allows you to give the IAM user-specific policies that are scoped to the resources and actions needed to complete the deployment.

Permissions Required

During testing, the following AWS managed policies were attached to the IAM user deploying CryoSPARC:

  • AmazonEC2FullAccess

  • AmazonFSxFullAccess

  • AmazonS3FullAccess

  • AmazonDynamoDBFullAccess

  • CloudWatchLogsFullAccess

  • AmazonRoute53FullAccess

  • AWSCloudFormationFullAccess

  • AWSLambda_FullAccess

as well as the following custom managed policy:

{
  "Version": "2012-10-17",
  "Statement": [
   {
    "Sid": "CustomIAMCryoSPARCPolicy",
    "Effect": "Allow",
    "Action": [
     "iam:CreateInstanceProfile",
     "iam:DeleteInstanceProfile",
     "iam:GetRole",
     "iam:RemoveRoleFromInstanceProfile",
     "iam:CreateRole",
     "iam:DeleteRole",
     "iam:AttachRolePolicy",
     "iam:PutRolePolicy",
     "iam:AddRoleToInstanceProfile",
     "iam:PassRole",
     "iam:DetachRolePolicy",
     "iam:DeleteRolePolicy",
     "iam:GetRolePolicy"
    ],
   "Resource": "*"
  }
 ]
}

6. EC2 Dashboard

The EC2 dashboard displays resources and provides the ability to launch an instance. On the left-hand side of the dashboard, there are links to EC2 limits, instances, AMIs, Security Groups and keys.

6.1. EC2 Service Quotas

All accounts initially have a lower limit to protect against fraud. Increasing these limits requires a simple request based on your region and what instances you need. Please verify that GPU-based p* and g* instances are available in your region. This deployment guide uses the us-east-1 (US East - N. Virginia) region.

6.2. EC2 key

  1. Go to the EC2 dashboard in your AWS console. Select Key pairs, and select Create key pair. Give an appropriate name, download the key pair.

  2. Open your terminal and type the following command:

$ aws ec2 create-key-pair \
--key-name cryoSPARC \
--region name-of-region \
--output text > ~/.ssh/key-cryoSPARC
  • Substitute name-of-region for the region where you want to deploy your cluster. For this guide, enter us-east-1

$ chmod 600 ~/.ssh/key-cryoSPARC

Your key is stored in the ~/.ssh directory and is called key-cryoSPARC.

7. Amazon VPCs and Subnets

A network access control list (ACL) is an optional layer of security for your VPC. that acts as a firewall, controlling traffic in and out of one or more subnets. A security group acts as a virtual firewall for your instance to control inbound and outbound traffic. Security groups operate at the instance level, not the subnet level. Therefore, each instance in a subnet in your VPC can be assigned to a different set of security groups.

8. Deployment Scripts

The folder contains 5 files:

  1. README.md

  2. cryosparc-pcluster.config.template

  3. deploy-cryosparc_v1.sh

  4. install-cryosparc_v1.sh

  5. vpc-cryosparc.template

9. Amazon S3

A single bucket is required for this guide. In the following instructions, replace the given example bucket name with your own, as bucket names must be globally unique. The name must not contain any uppercase characters. Since the bucket will store raw data, make sure your bucket is private.

S3 buckets and compute resources required to run a cryoSPARC workflow must be in the same AZ (see Section 10.2). Note that some newer compute instances may not be available in all AZs. Please create the bucket in an AZ that has all the resources required for your workflow.

Create the bucket cryosparc-test-data-np for raw data. This S3 bucket will be linked to the Amazon FSx for Lustre service for high-performance read and write storage operations.

To create the bucket and upload the raw .tif and .mrc movies (multi-frame micrographs) to cryosparc-test-data-np:

$ aws s3 mb s3://cryosparc-test-data-np
$ aws s3 cp ./ s3://crysoparc-test-data-np --exclude "*" --include "*.tif"
$ aws s3 cp ./ s3://cryosparc-test-data-np --exclude "*" --include "*.mrc"

From the S3 management console, verify that all files were successfully uploaded to the bucket.

10. Amazon FSx for Lustre

You can choose between two file systems when using Amazon FSx for Lustre:

  1. Persistent File Systems - these are ideal for long-term storage.

  2. Scratch File Systems - these are ideal for temporary and short-term storage.

This guide uses the Scratch file system. This provides the most performant and cost-effective option. If you require data resilience, consider the persistent deployment guide.

11. AWS CloudFormation

The script vpc-cryosparc.template is a CloudFormation template that deploys the VPC and subnets. As is, the script creates a VPC and two subnets. The subnet for the head instance is public, and the compute instances are placed in a private subnet. Outside the VPC, you can only log into the head instance in the public subnet (via SSH).

12. AWS ParallelCluster

This deployment will use ParallelCluster 2.10.1. CryoSPARC requires the multiple queues feature in ParallelCluster, which is only supported by versions 2.9.0 and later.

To install ParallelCluster 2.10.1, run the following command on your local machine:

$ pip3 install aws-parallelcluster --user

This enables access to the terminal command-line tool pcluster. First, confirm that you have the correct version installed.

$ pcluster version
2.10.1

12.1. Configure the cluster

Open cryosparc-pcluster.config.template in a text editor; here, provide the details of the cluster required to deploy and run the CryoSPARC workflow.

Look through the values that are not explicitly defined in cryosparc-pcluster.config.template.

  • aws_region_name - the name of the region

  • key_name - key pair name

  • post_install - the path to the install script that runs after the cluster is created

  • s3_read_resource - the S3 bucket with raw data

  • vpc_id - the VPC id

  • master_subnet_id - the subnet id where the head node resides

  • compute_subnet_id - the subnet id where the compute node resides

You will provide the region, key pair name S3 bucket names and CryoSPARC license ID when you deploy the cluster. The deploy-cryosparc.sh fetches the remaining values about the networking infrastructure created by vpc-cryosparc.template.

This config file uses an Amazon EC2 c5n.9xlarge instance as the head node. The c5n.9xlarge instance uses an Intel Xeon Platinum (Skylake) processor with 36 vCPUs. All instances use Amazon Linux 2, a Linux server operating system used by Amazon Web Services (AWS), as their operating system. None of the compute instances are running yet; they will run as needed. Three computing queues are also defined: gpu-large, gpu-med, and gpu-small. Each queue can host multiple EC2 instances. For example, the gpu-large queue is made of p4d.24xlarge, p3.16xlarge, and g4dn.metal instances. Multiple queues are very useful since different steps of the CryoSPARC workflow run better with different instances. See the attached benchmarking guide for instance recommendations.

12.2. How to deploy

In your local directory, verify that the three provided scripts exist:

  • cryosparc-pcluster.config.template

  • deploy-cryosparc.sh

  • vpc-cryosparc.template

Before launching the cluster, choose the Availability Zone (AZ) where the cluster will run. All EC2 instance types you choose to instantiate should be available within the same AZ in the region your S3 bucket is in. If a required EC2 instance type is not available in any of the AZs in your region, re-create your S3 bucket in a region that has them available.

Run the command below to see where a given instance type is available.

$ aws ec2 describe-instance-type-offerings \
--location-type availability-zone \
--region your-aws-region \
--filters Name=instance-type,Values=type of your instance \
--query "InstanceTypeOfferings[*].Location" \
--output text
  • --region Region you want to deploy the cluster in

  • Values= type of instance

For example - this command outputs the AZs in the us-east-1 region where g4dn.12xlarge instances are available.

$ aws ec2 describe-instance-type-offerings \
--location-type availability-zone \
--region us-east-1 \
--filters Name=instance-type,Values=g4dn.12xlarge \
--query "InstanceTypeOfferings[*].Location" \
--output text
us-east-1b us-east-1f us-east-1d us-east-1c us-east-1a

Finally, to deploy the cluster

$ ./deploy-cryosparc_v1.sh --region your-aws-region \
--cluster-name cryosparc \
--az <your-availability-zone> \
--config-bucket cryosparc-demo-np \
--data-bucket cryosparc-test-data-np \
--aws-key key-cryoSPARC \
--cryosparc-license-id <your-cryosparc-license>
  • --region Region in which to deploy the cluster

  • --cluster-name Name of the cluster for identification purposes

  • --az AZ in which to deploy the cluster. Make sure that instance is available in the specified AZ.

  • --data-bucket Existing S3 bucket created earlier. This will be linked to the EC2 instance with Amazon FSx for Lustre. All movies will be uploaded here.

  • --aws-key Name (not the path) of the SSH key you created earlier

  • --cryosparc-license-id The license ID provided by Structura Bio for your cryoSPARC instance

During deployment, a cryosparc.config file is created from the cryosparc.config.template file with all the values for the specified variables. This cryosparc.config is specific to the cluster you just deployed. Check the cryosparc.config file and ensure all the information is correct. Pay particular attention to the cluster cryosparc, vpc cryosparc-vpc and fsx cryosparc-fsx sections.

Retain the vpc-cryosparc.template for later use. The deployment will take about 30 minutes.

12.3. Deployed Architecture

The script deploys a VPC in the region you selected with two subnets. A cluster called cryosparc with c5n.9xlarge as the head node is also deployed. The head node resides in the public subnet and the compute node (launched as needed) resides in the private subnet. The head node hosts both the cryoSPARC web interface and the database. Additionally, the script creates and mounts an EBS volume of type gp2 as a shared file system and an Amazon FSx for Lustre file system.

12.4. Launch CryoSPARC web interface

As the process completes, navigate to the AWS Management Console and take a look at which resources were deployed. In your terminal, look for instructions to connect to CryoSPARC’s web interface. Following the prompts, log into the head node of the cluster over SSH. The prompt will look like:

$ ssh -i /path/to/key/key-cryoSPARC ec2-user@publicIPofyourinstance

Once logged in, create a new CryoSPARC user.

$ cryosparcm createuser \
--email "<youremail@gmail.com>" \
--password "<yourpassword>" \
--username "<yourusername>" \
--firstname "yourname>" \
--lastname "<yourlastname>"

The password should not contain special characters. When finished, log out of the head node. Set up an SSH tunnel to the CryoSPARC head node to connect to the CryoSPARC web interface:

$ ssh -i /path/to/key/key-cryoSPARC -N -f -L \ localhost:45000:localhost:45000 ec2user@publicIPofyourinstance

12.5. Export output data to Amazon S3

Use the create-data-repository-task command to export data from your Amazon FSx for Lustre file system to back to the data bucket:

aws fsx create-data-repository-task \
--file-system-id fs-xxxxxxx \
--type EXPORT_TO_REPOSITORY \
--paths path1 \
--report Enabled=true,Scope=FAILED_FILES_ONLY,Format=REPORT_CSV_20191124,Path=s3://crysoparc-test-data-np/data-repo-report
  • --file-system-id The id for the Lustre file system created. Find this in the AWS console, under Amazon FSx.

  • --paths The paths of the directory or file you want to export relative to the mount point of the file system. If the mount point is /mnt/fsx and /mnt/fsx/path1 is a directory or file on the file system you want to export, then provide path1.

13. Tearing the cluster down

Unless you plan to run more analysis immediately, we recommend tearing down the cluster to avoid incurring costs. The cluster scales down based on your config file. The current config file ensures the head node and the Amazon FSx for Lustre file system continue running. You can also completely destroy the cluster and spin up a new cluster as needed.

From the AWS command line, execute the following:

$ aws --region your-aws-region cloudformation delete-stack --stack-name parallelcluster-cryosparc 

to delete the cluster. This includes instances, attached volumes, FSx file system, etc.

$ aws --region your-aws-region cloudformation delete-stack --stack-name cryosparc-VPC

to delete all networking-related infrastructure. This deletes the VPC and subnets.

After 15 minutes, log into your AWS console. Check for resources that are no longer required and delete them.

Release Notes

Version 1.0 (May 10, 2021)

The benchmark was performed on the dataset of approximately 2800 micrographs totalling 476GB. For the deployment guide, we recommend running the smaller that ships with CryoSPARC. Approximate costs for the T20S example on different EC2 instance types are:

This deployment guide uses , , , and . See the architecture diagram in section 11.3 for details.

This deployment guide assumes minimal AWS knowledge; however, there are a few prerequisites. The first step is to create an AWS account, as described here:

A

To log into the AWS Management Console, navigate to this address: and click on the link labelled “Already have an account? Sign in.” You will be prompted for an Account ID or alias, IAM user name, and Password. If you have just created a new account you will need to sign up with your root user email and password.

The (AWS CLI) is an open-source tool that enables you to interact with AWS services from your command-line shell. Download AWS CLI using the commands below:

If you do not have superuser (sudo) permissions, AWS CLI can also be installed using pip with regular user permissions. You can find more instructions .

In order to access services associated with your AWS account, you must first configure the AWS CLI. You will need to provide an Access Key ID and a Secret access key.

. The type of access required is “Programmatic Access”, as this IAM user will only be used for the deploy-cryosparc.sh script via the CLI.

is a service that provides scalable and flexible computing capacity in the AWS cloud. Using Amazon EC2 eliminates the need to invest in computing hardware and enables quick development and deployment of applications. It allows many virtual servers to be configured with security, networking and storage management. EC2 virtual servers, also known as instances, are the building blocks for supercomputing on AWS.

From the AWS console, search for Service Quotas (AWS Services – Service Quotas) to go to the Elastic Compute Cloud section. Type in ‘on-demand’ to filter. Select the type of instance for which you wish to increase the vCPU limits and click ‘Request Limit Increase’. For this workload, request a limit increase for p*, g*, and c* instance types. Select the desired region. Enter a case description on why you want the limits to be increased and click Submit. You should receive a response from AWS Support within 24 hours confirming your limits have been increased. You can read more about service limits .

A key pair, consisting of a private key and a public key, is a set of security credentials required to prove your identity when connecting to an instance. Create an pair in the region you plan to deploy the CryoSPARC cluster. This can be done in two ways:

(Amazon VPC) can launch AWS resources in a virtual network that you have defined. A VPC is a virtual network dedicated to your AWS account. It is logically isolated from other virtual networks. A subnet is a range of IP addresses in your VPC in which you can launch AWS resources. This virtual network closely resembles a traditional network operating in an on-premises data center, with the benefits of using the scalable infrastructure of AWS.

Before proceeding further, download the zip folder containing all the files required for this deployment here:

Optionally, browse the code on GitHub .

(Amazon S3) is a global object storage service for storing data and securing it from unauthorized access. Use Amazon S3 to store raw images for cryoSPARC to analyze.

is a fully managed service that provides cost-effective, high-performance storage for compute workloads. Many workloads such as machine learning, high-performance computing (HPC), video rendering and financial simulations depend on compute instances accessing the same set of data through high-performance shared storage. Amazon FSx for Lustre file systems can also be linked to Amazon S3 buckets, enabling access and process data concurrently from a high-performance file system. Amazon FSx for Lustre can also be configured to back up data to Amazon S3, and further to Amazon S3 Glacier to optimize costs for data backup.

provides a way to model a collection of related AWS and third-party resources, provision them quickly and consistently, and manage them throughout their lifecycles, by treating infrastructure as code. A CloudFormation template describes your desired resources and their dependencies so you can launch and configure them together as a stack. You can use a template to create, update and delete an entire stack as a single unit, as often as you need to, instead of managing resources individually. You can manage and provision stacks across multiple AWS accounts and AWS Regions.

enables you to quickly build an HPC environment on AWS. It automatically sets up the required compute resources, a shared filesystem, and offers a variety of batch schedulers. You define all the resources you need in a config file.

An AWS ParallelCluster configuration is defined in multiple sections. A section starts with the section name in square brackets, followed by parameters and configuration. for more information about the config file. At the time of deployment, a cryosparc.config file is created from the cryosparc.config.template file (more on this later).

Open a web browser to and log into CryoSPARC using the username and password created.

Before you start your production workload, familiarize yourself with the CryoSPARC environment. Follow the instructions and run your first CryoSPARC workload. More information about CryoSPARC configuration and management is available .

After the workload is completed, download the data from the Amazon FSx for Lustre file system to Amazon S3. This uses a .

These two commands should delete the infrastructure initialized in this guide. However, they retain the S3 buckets with data and installation scripts. Delete the buckets if no longer needed, or archive them for a minimal .

Performance Benchmarks
EMPIAR-10288
T20S tutorial
Amazon EC2
Amazon Simple Storage Service (S3)
Amazon FSx for Lustre
Amazon CloudFormation
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/get-set-up-for-amazon-ec2.html#sign-up-for-aws
CryoSPARC license ID
https://aws.amazon.com/console/
AWS Command Line Interface
here
For more information on configuring the AWS CLI, see this article.
This article explains how to create a new IAM user in your AWS account
See this article for more details on creating custom IAM policies.
Amazon EC2
here
EC2 key
Amazon Virtual Private Cloud
https://github.com/cryoem-uoft/aws-deployment-guide/archive/refs/tags/v1.0.zip
https://github.com/cryoem-uoft/aws-deployment-guide
Amazon Simple Storage Service
Amazon FSx for Lustre
AWS CloudFormation
AWS ParallelCluster
See this page
http://localhost:45000
here
here
data repository task
price
this
CryoSPARC™
Structura Biotechnology
licensing
guide.cryosparc.com
technology that was awarded the 2017 Nobel Prize in Chemistry
the first atomic-level 3D structure of the spike protein responsible for the COVID-19 virus