Prerequisites and Compute Resources Setup
Last updated
Last updated
CryoSPARC Live is an independent application that is hosted alongside your main CryoSPARC instance. It works best when it has direct access to the files being written by the microscope that is actively recording images. Users can access the CryoSPARC Live interface directly from within the network via a web-browser, in the same way as interacting with the main CryoSPARC instance. See: Access cryoSPARC Live.
CryoSPARC Live automatically manages hardware resources available (GPUs) and deploys multiple preprocessing workers concurrently as well as dispatching reconstruction jobs to the main cryoSPARC instance transparently via the job scheduler.
The compute workflow for CryoSPARC Live is detailed below. While the overall hardware and system requirements for CryoSPARC Live are similar to the prerequisites of the main CryoSPARC system (See: Hardware and System Requirements), please read below on for detailed information on GPU requirements.
We recommend a minimum of 4 GPUs for a seamless experience and the ability to keep up with data collection.
There are three "task types" in CryoSPARC Live which correspond to the three "lane types", described below. On configuration of a particular Live session, you will need to tell Live which node(s) or "lanes" to use for each of these three types of tasks as well as how many GPUs to use in parallel for preprocessing. The compute resources allocated to these different task types can be configured on a per-session basis, can be adjusted during a session, and can be saved for future use via Configuration Profiles.
Preprocessing jobs: CryoSPARC Live Worker (motion correction, CTF estimation, thumbnail generation, particle picking and particle extraction all in one)
Reconstruction jobs: Streaming 2D Classification and Streaming 3D Refinement
Auxiliary jobs: Which are optional or transient (e.g., 2D Classification used for template creation for the template picker and Ab-Initio Reconstruction)
Minimum requirement: 1 GPU
The Number of Preprocessing GPU Workers
(minimum one) selected here will determine how many CryoSPARC Live workers will be spawned. Live workers perform preprocessing of incoming movies, including motion correction, CTF estimation, thumbnail generation, particle picking, and particle extraction. Each CryoSPARC Live worker will allocate one GPU to its main process and use this continuously unless the worker is killed (by way of pausing the Session). The number of preprocessing workers can be changed by modifying this value. The new number of workers will be spawned once the Session is paused, then started again.
CPU Memory Bandwidth
CryoSPARC Live's preprocessing step required fast copies of movie data from disk to the GPU. To achieve this, configure your system with a high memory bandwidth CPU (>100 GB/s). Slower CPUs will still work but may not achieve optimal throughput.
Minimum requirement: 2 GPUs
CryoSPARC Live will automatically schedule a Streaming 2D Classification and Streaming Refinement job when requested on the lane selected here. Each reconstruction job (Streaming 2D Classification, Streaming 3D Refinement) will allocate one GPU to its main process and use this continuously as new particles arrive. Therefore, the minimum required GPUs for this lane is two. In the case where the lane selected here does not have more than one GPU available when the job is queued, the CryoSPARC scheduler will still automatically schedule reconstruction jobs to the lane as resources become available. Since these jobs hold resources indefinitely, any other queued
jobs that are waiting for resources in this lane may remain indefinitely in the queue until the Session is paused or the reconstruction job is manually stopped.
The GPUs in this lane are only used once the streaming 2D/3D jobs are started. You can opt, during a Live collection session, to not start these streaming jobs until some time into data collection. During this time, the GPUs are free, and can be used for other phases (eg., Preprocessing or Auxiliary jobs - see below).
Minimum: 1 GPU
Auxiliary jobs in CryoSPARC Live include 2D Classification jobs (for generating templates for the template picker), Create Template jobs (for generating templates from volumes for the template picker) and Ab-Initio jobs (for initial model creation). Each job allocates one GPU to its main process. The minimum required GPUs for this lane is one. You can use the same lane for auxiliary jobs as for reconstruction even if there are only 2 GPUs available in the lane, as long as you are careful that the streaming reconstruction jobs are not already started when auxiliary jobs need to run (otherwise these jobs will be indefinitely queued). Most of the time, this is not an issue as template generation generally precedes streaming 2D classification and ab-initio reconstruction precedes streaming 3D refinement.
The recommended amount of GPU memory required is at minimum 11GB per GPU to be able to process most types of data successfully in CryoSPARC. For minimum GPU requirements per job type/data type, see the table below.
Preprocessing | Reconstruction | Auxiliary |
Gatan K2 Images, TFS Falcon 3 Images: 8GB+ Gatan K2 Super Resolution, Gatan K3, Gatan K3 Super Resolution, TFS Falcon 4 Images TFS Falcon 4 EER Images: 11GB+ | 2D Classification: 4GB+ 3D Refinement (heavily dependent on box size): 11GB+ | 2D Classification: 4GB+ Ab-Initio Reconstruction: 8GB+ |
We recommend a minimum of 4 GPUs so that you can allocate two to preprocessing and two to reconstruction or auxiliary tasks for a seamless experience and the ability to truly keep up with a data collection session - i.e., be able to generate 3D structures while still collecting data.
If 4 GPUs are not available or you are using CryoSPARC Live on previously collected data (i.e., not "live" during a data collection session), it is possible to reduce the compute requirement somewhat.
Preprocessing can use 1 or more GPUs, scaling linearly in terms of throughput. Reconstruction uses one GPU for 2D classification and one GPU for 3D refinement, which run indefinitely in streaming mode, continuously taking in new particles and updating the 2D/3D results. Both 2D streaming classification and 3D streaming refinement can be started part-way into the session (e.g., an Ab-Initio model needs to be first created before refinement can begin).
Three GPUS will also be able to perform all the functions, but depending on your data collection speed, having only one GPU for preprocessing may not be sufficient to keep up with your camera. With more GPUs, the preprocessing can become correspondingly faster.
With two GPUs, you can still achieve many of the benefits of Live processing but you will need to manually switch between using 2 GPUs for preprocessing / 1 GPU for preprocessing + 1 GPU for 2D classification / 1 GPU for preprocessing + 1 GPU for 3D refinement. Therefore, the results will not be updating in a Live streaming fashion, but only periodically as these switches are made.
Users can select a cluster lane for any of the three processing lanes in CryoSPARC Live. Some considerations to keep in mind when using a cluster include resource availability and quality of service limitations.
When you start a Live session, preprocessing workers are immediately dispatched to the queuing system to be launched. If you are using a cluster, the scheduling and launching of jobs is taken care of by the cluster scheduler, whose timings may be unpredictable due to availability of resources at queue time. This increases the latency of starting your Live session, and we believe it impacts the experience of CryoSPARC Live.
Some cluster management systems, especially those serving multiple users, for example in a university, also have QOS’s (quality of service policies) in place, which limit how long a single process can use resources. Due to the nature of CryoSPARC Live, the preprocessing, streaming 2D classification and streaming 3D refinement workers run indefinitely until the session is “paused”. See: Session Functions. If your cluster has these policies, ensure you are keeping track of your running jobs by either using the built-in resource manager in CryoSPARC, or the cluster scheduler’s CLI.
You may also choose to circumvent these policies if your cluster supports dedicating entire worker nodes. If this is the case, you may be able to use a dedicated node to run CryoSPARC Live jobs by configuring it as a normal worker node in CryoSPARC. More details on how add worker nodes can be found in the general installation instructions ('Worker Node'): Downloading and Installing cryoSPARC