Guide: Cluster Integration Validation
How to validate the configuration of a cluster lane and monitor job status.
The information in this section applies to CryoSPARC v4.0+.
CryoSPARC v4.0 includes two improvements to cluster integration:
Cluster integrations can be tested and validated after being connected to CryoSPARC by running
cryosparcm cluster validate
For jobs submitted to cluster lanes, the cluster scheduler is polled in an interval until the job enters
running
status, to prevent jobs that failed to submit or launch in the cluster scheduler from indefinitely appearing in CryoSPARC inlaunched
status.
For more information on how to connect a cluster scheduler to CryoSPARC, see Connecting a Cluster to CryoSPARC.
Cluster Setup Testing
The cryosparcm cluster validate
command validates the configuration of a cluster lane by running the configured submit, status, and delete (qsub_cmd_tpl
, qstat_cmd_tpl
, qdel_cmd_tpl
) commands to submit test jobs to the cluster lane.
Usage: cryosparcm cluster validate <cluster_lane_name> --projects_dir <abs_path_to_projects_dir>
where
<cluster_lane_name>
is the name of the cluster lane configured in CryoSPARC that will be tested<abs_path_to_projects_dir>
is the absolute path to the directory where the validation tests will be run. It is useful to provide the path to a directory where CryoSPARC projects will be stored on disk, as this directory is tested for readability/writability from cluster nodes and will be the output location for files created by the validation tests. The directory does not need to be empty.
The output of the jobs submitted during the test can be found in the projects_dir
directory in test_cluster_qsub.txt
and test_cluster_qdel.txt
, and should be manually checked for correctness output once cryosparcm cluster validate
finishes.
Example test_cluster_qsub.txt
test output:
Example test_cluster_qdel.txt
test output for a Slurm cluster:
Cluster Job Status Monitoring
The status of jobs submitted to a cluster lane is checked by CryoSPARC on an interval using the configured qstat_cmd_tpl
in the cluster configuration until the CryoSPARC job status is updated to running
or the job fails to launch with an error. The current status can be found inside a CryoSPARC job’s event log as the job is launched.
Example job event log entry for a Slurm cluster job status update:
The length of interval between status updates and maximum retries can be configured using the CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL
and CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES
variables in config.sh
. By default, cluster jobs will be checked every 10 seconds with 1000000 retries.
Last updated