Guide: Installation Testing with cryosparcm test
This guide covers how to use cryosparcm test to verify your CryoSPARC installation is working properly.
The information in this section applies to CryoSPARC v4.0+.
Overview
After installing CryoSPARC using the instructions here, you can verify your instance is correctly installed by using cryosparcm test install
and cryosparcm test workers
via the command line. Running these functions will perform several tests that ensure users can seamlessly launch jobs and process data in CryoSPARC.
cryosparcm test install
cryosparcm test install
This function tests the core components of CryoSPARC (HTTP connections, licensing, workers, etc.) that are required to start running jobs and provides information on the status of the CryoSPARC instance (e.g., which version is running, whether a patch is available, etc.).
To run this function, log into a shell on the master node as the user that owns the CryoSPARC instance.
Run cryosparcm test -h
for full usage instructions.
Example Output
Test Checklist
Running cryosparcm test install
will test or check the following components:
Test if the
cryosparcm test install
command is running as the user who owns the CryoSPARC instance.Test if the
cryosparcm test install
command is running on the machine that runs the CryoSPARC master instance.Check if the CryoSPARC instance is turned on.
If this test fails, turn on CryoSPARC by running
cryosparcm start
and run the command again.
Test if an HTTP connection can be successfully created to the
command_core
(CRYOSPARC_BASE_PORT
+2) server.If this test fails, ensure a firewall isn’t blocking access to the ten consecutive ports from
CRYOSPARC_BASE_PORT
(default 39000, e.g., 39000-39010). For more information, see Open TCP Ports in the Guide.
Check if the environment variable
CRYOSPARC_LICENSE_ID
is set.Test if the CryoSPARC License ID is in the correct format.
If this test fails, ensure the CryoSPARC License ID found in
cryosparc_master/config.sh
is set to the correct license ID.
Check if insecure request mode is enabled or disabled.
This option is controlled by the
CRYOSPARC_INSECURE
environment variable found incryosparc_master/config.sh
.Enabling this option ignores SSL certificate errors when connecting to HTTPS endpoints. This is useful if you are behind an enterprise network using SSL injection.
Check if the URL to the license server is valid.
The URL can be overridden by the
CRYOSPARC_LICENSE_SERVER_ADDR
environment variable found incryosparc_master/config.sh
.The default URL ishttps://get.cryosparc.com
Check if the CryoSPARC License ID being used is active.
If this test fails, either the CryoSPARC instance wasn’t able to connect to the licensing server, the license isn’t active, or there was a network partition causing data corruption (in which case, trying the command again in a few minutes may fix the issue).
If the instance is having trouble connecting to the licensing server, see License Server Troubleshooting in the Guide. Additionally, if your network is behind an HTTP proxy, see Custom SSL Certificate Authority Bundle in the Guide.
If the license being used is no longer active, request a new CryoSPARC License ID. See Obtaining A License ID in the Guide.
Check the current running version of the CryoSPARC instance.
See the CryoSPARC Changelog.
Check if there is an update available for CryoSPARC.
To update CryoSPARC, run
cryosparcm update
. For more information, see Software Updates and Patches in the Guide.
Check if there is a patch update available for CryoSPARC.
To patch CryoSPARC, run
cryosparcm patch
. For more information, see Apply Patches in the Guide.
Check if a worker is connected with at least one GPU.
To add a GPU worker to CryoSPARC, see Connecting A Worker Node in the Guide.
cryosparcm test workers
cryosparcm test workers
This function tests workers connected to CryoSPARC to ensure they can correctly run CryoSPARC jobs by testing if the worker can launch jobs, cache particles to an SSD (if an SSD is configured), and utilize the GPU correctly. This test can be run via the command line, or directly in the CryoSPARC user interface. Three new jobs have been added to CryoSPARC that can be run at any time on the lane you’d like to test.
Usage
Run cryosparcm test -h
for full usage instructions.
The tests require a project to be run inside. If there are no projects in the instance, create one before running this function.
To run all tests on all workers:
run
cryosparcm test workers <project_uid>
(e.g.,
cryosparcm test workers P1
)
To run only the GPU test on all workers:
run
cryosparcm test workers <project_uid> --test gpu
(e.g.,
cryosparcm test workers P1 --test gpu
).
To run only the GPU test on a single worker:
run
cryosparcm test workers <project_uid> --test gpu --target <workstation_hostname>
(e.g.,
cryosparcm test workers P1 --test gpu --target cryoem1.uoft.ca
)
To run only the GPU test (with Tensorflow and PyTorch) on a single worker:
run
cryosparcm test workers <project_uid> --test gpu --test-tensorflow --test-pytorch --target <workstation_hostname>
(e.g.,
cryosparcm test workers P1 --test gpu --test-tensorflow --test-pytorch --target cryoem1.uoft.ca
)
To run only the GPU test on two workers:
run
cryosparcm test workers <project_uid> --test gpu --target <workstation1_hostname> --target <workstation2_hostname>
(e.g.,
cryosparcm test workers P1 --test gpu --target cryoem1.uoft.ca --target cryoem2.uoft.ca
)
Example Output
Some text removed for readability.
When the worker test is run, a new workspace inside the specified project will be created to contain all test jobs. The workspace will be named with the date and time (UTC) of execution.
If a test job fails, check the job's Event Log and stdout log (joblog) for more details.
Launch Test
The ability to launch jobs will be tested first. This will indicate if the worker is accessible and can correctly run CryoSPARC jobs. If this test fails, it most likely indicates a connection issue between the master and the worker. For more information, see Cannot Queue or Run Job in the Guide.
Note that if a launch test fails on a worker, the SSD and GPU tests will not run:
SSD Test
If an SSD is configured for a worker, the SSD test will confirm that particle caching is working properly. The test creates five different particle stacks of shape (500, 512, 512)
in the project directory, and tries to cache them to the SSD.
If an SSD Test fails for any reason, the reason will be summarized in the test results:
For more information on configuring and troubleshooting an SSD cache for a worker, see SSD Particle Caching in CryoSPARC in the Guide.
GPU Test
The GPU test will collect information about all the GPUs on the worker and test if the worker can compile and run GPU code.
The following information is collected about each GPU via nvidia-smi
:
driver_version
: GPU driver versionkeeping the driver up to date ensures stability
persistence_mode
: GPU driver persistenceenabling this reduces GPU driver load times
enable this by running
nvidia-smi --pm 1
as root
power_limit
: GPU power limit (TDP)information only
sw_power_limit
: software power limiterif “Active”, this might indicate the power supply unit (PSU) on the workstation isn’t able to support the power draw from the GPU, or if a power supply cable is faulty or not properly connected to the GPU
if “Active”, this might indicate the GPU temperature is too high
hw_power_limit
: hardware power limiterif “Active”, this might indicate the power supply unit (PSU) on the workstation isn’t able to support the power draw from the GPU
if “Active”, this might indicate the GPU temperature is too high
compute_mode
: current compute mode (Default, Exclusive Process, etc.)the “default” compute mode allows users to launch multiple GPU jobs onto the same GPU via the Queue modal in the UI. See Queuing Directly To A GPU in the Guide.
the “exclusive process” compute mode prevents a process from obtaining a context from a GPU while another process already has one, useful in anonymous multi-user scenarios
set the compute mode of the GPU by running
nvidia-smi -c compute_mode -i target_gpu_id
wherecompute_mode
is one of:0/Default, 1/Exclusive Thread, 2/Prohibited, 3/Exclusive Process
max_pcie_link_gen
: maximum PCIe link generation (e.g., PCIe 3 or PCIe 4)information only
current_pcie_link_gen
: current PCIe link generationinformation only
this may be equal to or lower than the
max_pcie_link_gen
, as the GPU automatically switches to a higher link under load
temperature
: current temperatureinformation only
gpu_utilization
: current utilizationinformation only
memory_utilization
: current memory utilizationinformation only
Example data:
Finally, PyCUDA (and optionally Tensorflow and PyTorch) will be tested to ensure they are working properly. If the either of these tests fail, the error will be summarized in the test results. For more information, check the failed job’s Event Log and stdout log (joblog).
Testing Tensorflow and PyTorch
By default, Tensorflow and PyTorch capabilities are not tested during the GPU test. To enable these tests, specify --test-tensorflow
and/or --test-pytorch
when starting the worker test. For example:
cryosparcm test workers P12 --test-tensorflow --target cryoem9.structura.dev
The PyTorch test will fail if the 3D Flex Refine dependencies were not installed using cryosparcw install-3dflex
introduced in CryoSPARC v4.1.0. For more information, see <Link to 3D Flex Refine: Installing Dependencies>
If Tensorflow or PyTorch was not able to detect all GPUs on your system, the job will fail, and the error message will appear in the job's stdout log (found in the 'Metadata' tab of the Job Dialog).
Last updated