Guide: Cluster Integration Validation
How to validate the configuration of a cluster lane and monitor job status.
The information in this section applies to CryoSPARC v4.0+.
CryoSPARC v4.0 includes two improvements to cluster integration:
- Cluster integrations can be tested and validated after being connected to CryoSPARC by running
cryosparcm cluster validate
- For jobs submitted to cluster lanes, the cluster scheduler is polled in an interval until the job enters
runningstatus, to prevent jobs that failed to submit or launch in the cluster scheduler from indefinitely appearing in CryoSPARC in
For more information on how to connect a cluster scheduler to CryoSPARC, see Connecting a Cluster to CryoSPARC.
cryosparcm cluster validatecommand validates the configuration of a cluster lane by running the configured submit, status, and delete (
qdel_cmd_tpl) commands to submit test jobs to the cluster lane.
cryosparcm cluster validate <cluster_lane_name> --projects_dir <abs_path_to_projects_dir>
<cluster_lane_name>is the name of the cluster lane configured in CryoSPARC that will be tested
<abs_path_to_projects_dir>is the absolute path to the directory where the validation tests will be run. It is useful to provide the path to a directory where CryoSPARC projects will be stored on disk, as this directory is tested for readability/writability from cluster nodes and will be the output location for files created by the validation tests. The directory does not need to be empty.
The output of the jobs submitted during the test can be found in the
test_cluster_qdel.txt, and should be manually checked for correctness output once
cryosparcm cluster validatefinishes.
Process started 2022-08-25 21:08:58.999810
NODE : slurmnode1
PID : 536
Executing for 5 secs
Process finished 2022-08-25 21:09:04.005305
test_cluster_qdel.txttest output for a Slurm cluster:
slurmstepd-slurmnode1: error: *** JOB 4 ON slurmnode1 CANCELLED AT 2022-08-25T21:09:03 ***
The status of jobs submitted to a cluster lane is checked by CryoSPARC on an interval using the configured
qstat_cmd_tplin the cluster configuration until the CryoSPARC job status is updated to
runningor the job fails to launch with an error. The current status can be found inside a CryoSPARC job’s event log as the job is launched.
Example job event log entry for a Slurm cluster job status update:
The length of interval between status updates and maximum retries can be configured using the
config.sh. By default, cluster jobs will be checked every 10 seconds with 1000000 retries.