# Guide: Installation Testing with cryosparcm test

{% hint style="warning" %}
The information in this section applies to CryoSPARC v4.0+.
{% endhint %}

## Overview

After installing CryoSPARC [using the instructions here](https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure), you can verify your instance is correctly installed by using `cryosparcm test install` and `cryosparcm test workers` via the command line. Running these functions will perform several tests that ensure users can seamlessly launch jobs and process data in CryoSPARC.

## `cryosparcm test install`

This function tests the core components of CryoSPARC (HTTP connections, licensing, workers, etc.) that are required to start running jobs and provides information on the status of the CryoSPARC instance (e.g., which version is running, whether a patch is available, etc.).

To run this function, log into a shell on the master node as the user that owns the CryoSPARC instance.

Run `cryosparcm test -h` for full usage instructions.

### Example Output

```
cryosparcuser@uoft ~/ $ cryosparcm test i
✓ Running as CryoSPARC owner cryosparcuser
✓ Running on master node uoft
✓ CryoSPARC is running
✓ CRYOSPARC_LICENSE_ID environment variable is set
✓ Insecure mode is disabled
✓ License server set to "https://get.cryosparc.com"
✓ Connection to license server succeeded
✓ License server returned success status code 200
✓ License server returned valid JSON response
✓ License exists and is valid
✓ CryoSPARC is running v5.0.0
✓ Develop version - no updates available
✓ Admin user exists
✓ GPU worker connected
```

### Test Checklist

Running `cryosparcm test install` will test or check the following components:

1. Test if the `cryosparcm test install` command is running as the user who owns the CryoSPARC instance.
2. Test if the `cryosparcm test install` command is running on the machine that runs the CryoSPARC master instance.
3. Check if the CryoSPARC instance is turned on.
   * If this test fails, turn on CryoSPARC by running `cryosparcm start` and run the command again.
4. Test if an HTTP connection can be successfully created to the `api` (`CRYOSPARC_BASE_PORT`+2) server.
   * If this test fails, ensure a firewall isn’t blocking access to the ten consecutive ports from `CRYOSPARC_BASE_PORT` (default 61000, e.g., 61000-61010). For more information, see [Open TCP Ports](https://guide.cryosparc.com/setup-configuration-and-management/cryosparc-installation-prerequisites#4.-open-tcp-ports) in the Guide.
5. Check if the environment variable `CRYOSPARC_LICENSE_ID` is set.
6. Test if the CryoSPARC License ID is in the correct format.
   1. If this test fails, ensure the CryoSPARC License ID found in `cryosparc_master/config.sh` is set to the correct license ID.
7. Check if insecure request mode is enabled or disabled.
   1. This option is controlled by the `CRYOSPARC_INSECURE` environment variable found in `cryosparc_master/config.sh`.
   2. Enabling this option ignores SSL certificate errors when connecting to HTTPS endpoints. This is useful if you are behind an enterprise network using SSL injection.
8. Check if the URL to the license server is valid.
   1. The URL can be overridden by the `CRYOSPARC_LICENSE_SERVER_ADDR` environment variable found in `cryosparc_master/config.sh`.
   2. The default URL is <https://get.cryosparc.com>
9. Check if the CryoSPARC License ID being used is active.
   1. If this test fails, either the CryoSPARC instance wasn’t able to connect to the licensing server, the license isn’t active, or there was a network partition causing data corruption (in which case, trying the command again in a few minutes may fix the issue).
   2. If the instance is having trouble connecting to the licensing server, see [License Server Troubleshooting](https://guide.cryosparc.com/setup-configuration-and-management/troubleshooting#license-error-or-license-not-found) in the Guide. Additionally, if your network is behind an HTTP proxy, see [Custom SSL Certificate Authority Bundle](https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/accessing-cryosparc#appendix-d-custom-ssl-certificate-authority-bundle) in the Guide.
   3. If the license being used is no longer active, request a new CryoSPARC License ID. See [Obtaining A License ID](https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/obtaining-a-license-id) in the Guide.
10. Check the current running version of the CryoSPARC instance.
    1. See the [CryoSPARC Changelog](https://cryosparc.com/updates).
11. Check if there is an update available for CryoSPARC.
    1. To update CryoSPARC, run `cryosparcm update`. For more information, see [Software Updates and Patches](https://guide.cryosparc.com/setup-configuration-and-management/software-updates) in the Guide.
12. Check if there is a patch update available for CryoSPARC.
    1. To patch CryoSPARC, run `cryosparcm patch`. For more information, see [Apply Patches](https://guide.cryosparc.com/setup-configuration-and-management/software-updates#apply-patches) in the Guide.
13. Check if a worker is connected with at least one GPU.
    1. To add a GPU worker to CryoSPARC, see [Connecting A Worker Node](https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc#connecting-a-worker-node) in the Guide.

## `cryosparcm test workers`

This function tests workers connected to CryoSPARC to ensure they can correctly run CryoSPARC jobs by testing if the worker can launch jobs, cache particles to an SSD (if an SSD is configured), and utilize the GPU correctly. This test can be run via the command line, or directly in the CryoSPARC user interface. Three new jobs have been added to CryoSPARC that can be run at any time on the lane you’d like to test.

<figure><img src="https://1916621962-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M7DGv3GkRvGGpbVPCgg%2Fuploads%2FhyXVEHcIEbHZufQ1X7jP%2Fv4-0-0-installation-testing-joblist-0.png?alt=media&#x26;token=80676764-5691-4b91-a6ea-75ee8d4c5534" alt=""><figcaption><p>You can find the jobs used for worker tests in the "Instance Testing Utilities" section of the job builder.</p></figcaption></figure>

### Usage

Run `cryosparcm test --help` for full usage instructions.

The tests require a project to be run inside. If there are no projects in the instance, create one before running this function.

To run all tests on all workers:

* run `cryosparcm test workers <project_uid> --test all`
* (e.g., `cryosparcm test workers P1 --test all`)

To run only the GPU test on all workers:

* run `cryosparcm test workers <project_uid> --test gpu`
* (e.g., `cryosparcm test workers P1 --test gpu`).

To run only the GPU test on a single worker:

* run `cryosparcm test workers <project_uid> --test gpu --target <workstation_hostname>`
* (e.g., `cryosparcm test workers P1 --test gpu --target cryoem1.uoft.ca`)

To run only the GPU test (with Tensorflow and PyTorch) on a single worker:

* run `cryosparcm test workers <project_uid> --test gpu --test-tensorflow --test-pytorch --target <workstation_hostname>`
* (e.g., `cryosparcm test workers P1 --test gpu --test-tensorflow --test-pytorch --target cryoem1.uoft.ca`)

To run only the GPU test on two workers:

* run `cryosparcm test workers <project_uid> --test gpu --target <workstation1_hostname> --target <workstation2_hostname>`
* (e.g., `cryosparcm test workers P1 --test gpu --target cryoem1.uoft.ca --target cryoem2.uoft.ca`)

### Example Output

*Some text removed for readability.*

```
cryossparcuser@uoft ~/ $ cryosparcm test workers P1
Using project P1
Running worker tests...
Worker test results
cryoem3
  ✓ LAUNCH
  ✓ SSD
  ✓ GPU
cryoem2
  ✓ LAUNCH
  ✓ SSD
  ✓ GPU
cryoem5
  ✓ LAUNCH
  ✕ SSD
    Error: [Errno 13] Permission denied: '/scratch'
    See P1 J1211 for more information
  ⚠ GPU
    No GPU available
cryoem6
  ✕ LAUNCH
    Error: 
    See P1 J1203 for more information
  ⚠ SSD
    Did not run: Launch test failed
  ⚠ GPU
    Did not run: Launch test failed
cryoem1
  ✓ LAUNCH
  ✓ SSD
  ✓ GPU
    ⚠ RTX A6000 @ 00000000:03:00.0: Persistence Mode is Disabled. 
      Enable Persistence mode by running `nvidia-smi -pm 1` as root to persist 
      the NVIDIA driver, reducing GPU load times.
    ⚠ RTX A6000 @ 00000000:03:00.0: GPU Software Power Cap is Active
    ⚠ RTX A6000 @ 00000000:21:00.0: Persistence Mode is Disabled. 
      Enable Persistence mode by running `nvidia-smi -pm 1` as root to persist 
      the NVIDIA driver, reducing GPU load times.
    ⚠ RTX A6000 @ 00000000:21:00.0: GPU Software Power Cap is Active
    ⚠ GeForce RTX 3090 @ 00000000:4C:00.0: Persistence Mode is Disabled. 
      Enable Persistence mode by running `nvidia-smi -pm 1` as root to persist 
      the NVIDIA driver, reducing GPU load times.
cryoem7
  ✓ LAUNCH
  ✓ SSD
  ✓ GPU
cryoem9
  ✓ LAUNCH
  ✓ SSD
  ✕ GPU
    Error: Tensorflow detected 0 of 7 GPUs.
    See P1 J1222 for more information
cryoem10
  ✓ LAUNCH
  ✓ SSD
  ✓ GPU
```

When the worker test is run, a new workspace inside the specified project will be created to contain all test jobs. The workspace will be named with the date and time (UTC) of execution.

<figure><img src="https://1916621962-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M7DGv3GkRvGGpbVPCgg%2Fuploads%2Fnz70SblfQ6EPxuHudym0%2Fv4-0-0-installation-testing-workspace-1.png?alt=media&#x26;token=c1d0d9b5-6c12-4ba3-8c81-7712f0251eba" alt=""><figcaption><p>Workspace card of an instance testing run.</p></figcaption></figure>

{% hint style="info" %}
If a test job fails, check the job's Event Log and [stdout log (joblog)](https://guide.cryosparc.com/setup-configuration-and-management/management-and-monitoring/cryosparcm?q=queuing+#cryosparcm-joblog-px-jxx) for more details.
{% endhint %}

### Launch Test

The ability to launch jobs will be tested first. This will indicate if the worker is accessible and can correctly run CryoSPARC jobs. If this test fails, it most likely indicates a connection issue between the master and the worker. For more information, see [Cannot Queue or Run Job](https://guide.cryosparc.com/setup-configuration-and-management/troubleshooting#cannot-queue-or-run-job) in the Guide.

Note that if a launch test fails on a worker, the SSD and GPU tests will not run:

```
cryoem6
  ✕ LAUNCH
    Error: ssh: connect to host cryoem6 port 22: No route to host
    See P1 J1203 for more information
  ⚠ SSD
    Did not run: Launch test failed
  ⚠ GPU
    Did not run: Launch test failed
```

### SSD Test

If an SSD is configured for a worker, the SSD test will confirm that particle caching is working properly. The test creates five different particle stacks of shape `(500, 512, 512)` in the project directory, and tries to cache them to the SSD.

```
Testing SSD

Generating a 500 particle stack with shape (512, 512).

Writing particle stack 1/5... Done in 1.517s
Writing particle stack 2/5... Done in 1.329s.
Writing particle stack 3/5... Done in 1.290s.
Writing particle stack 4/5... Done in 1.221s.
Writing particle stack 5/5... Done in 1.219s.

Loading a ParticleStack with 5 items...
 SSD cache : cache successfully synced in_use
 SSD cache : cache successfully synced, found 233127.92MB of files on SSD.
 SSD cache : cache successfully requested to check 5 files.
 SSD cache : cache requires 2500.00MB more on the SSD for files to be downloaded.
 SSD cache : cache has enough available space.

 Transferring J33/data/simulated_particles_4.mrc (500 MB) (5/5)
  Complete      :         2500 MB (100.00%)
  Total         :         2500 MB
  Current Speed :    1133.22 MB/s
  Average Speed :    1089.09 MB/s
  ETA           :      0h  0m  0s

 SSD cache : complete, all requested files are available on SSD.
Done.

Cleaning up testing data...
SSD Test completed successfully.
```

If an SSD Test fails for any reason, the reason will be summarized in the test results:

```
cryoem5
  ✓ LAUNCH
  ✕ SSD
    Error: [Errno 13] Permission denied: '/scratch'
    See P1 J1211 for more information
  ⚠ GPU
    No GPU available
```

For more information on configuring and troubleshooting an SSD cache for a worker, see [SSD Particle Caching in CryoSPARC](https://guide.cryosparc.com/setup-configuration-and-management/software-system-guides/tutorial-ssd-particle-caching-in-cryosparc) in the Guide.

### GPU Test

The GPU test will collect information about all the GPUs on the worker and test if the worker can compile and run GPU code.

The following information is collected about each GPU via `nvidia-smi`:

* `driver_version`: GPU driver version
  * keeping the driver up to date ensures stability
  * [NVIDIA Driver Downloads](https://www.nvidia.com/Download/index.aspx)
* `persistence_mode`: GPU driver persistence
  * [NVIDIA Docs: Driver Persistence](https://docs.nvidia.com/deploy/driver-persistence/index.html)
  * enabling this reduces GPU driver load times
  * enable this by running `nvidia-smi --pm 1` as root
* `power_limit`: GPU power limit (TDP)
  * information only
* `sw_power_limit`: software power limiter
  * if “Active”, this might indicate the power supply unit (PSU) on the workstation isn’t able to support the power draw from the GPU, or if a power supply cable is faulty or not properly connected to the GPU
  * if “Active”, this might indicate the GPU temperature is too high
* `hw_power_limit`: hardware power limiter
  * if “Active”, this might indicate the power supply unit (PSU) on the workstation isn’t able to support the power draw from the GPU
  * if “Active”, this might indicate the GPU temperature is too high
* `compute_mode`: current compute mode (Default, Exclusive Process, etc.)
  * the “default” compute mode allows users to launch multiple GPU jobs onto the same GPU via the Queue modal in the UI. See [Queuing Directly To A GPU](https://guide.cryosparc.com/setup-configuration-and-management/software-system-guides/tutorial-queuing-directly-to-a-gpu?q=queuing+) in the Guide.
  * the “exclusive process” compute mode prevents a process from obtaining a context from a GPU while another process already has one, useful in anonymous multi-user scenarios
  * set the compute mode of the GPU by running `nvidia-smi -c compute_mode -i target_gpu_id` where `compute_mode` is one of:
    * 0/Default, 1/Exclusive Thread, 2/Prohibited, 3/Exclusive Process
* `max_pcie_link_gen`: maximum PCIe link generation (e.g., PCIe 3 or PCIe 4)
  * information only
* `current_pcie_link_gen`: current PCIe link generation
  * information only
  * this may be equal to or lower than the `max_pcie_link_gen`, as the GPU automatically switches to a higher link under load
* `temperature`: current temperature
  * information only
* `gpu_utilization`: current utilization
  * information only
* `memory_utilization`: current memory utilization
  * information only

Example data:

```
Obtaining GPU info via `nvidia-smi`...

NVIDIA GeForce RTX 3090 @ 00000000:01:00.0
    driver_version                :510.68.02
    persistence_mode              :Enabled
    power_limit                   :350.00
    sw_power_limit                :Not Active
    hw_power_limit                :Not Active
    compute_mode                  :Default
    max_pcie_link_gen             :4
    current_pcie_link_gen         :1
    temperature                   :25
    gpu_utilization               :0
    memory_utilization            :0

NVIDIA A100-PCIE-40GB @ 00000000:61:00.0
    driver_version                :510.68.02
    persistence_mode              :Enabled
    power_limit                   :250.00
    sw_power_limit                :Not Active
    hw_power_limit                :Not Active
    compute_mode                  :Default
    max_pcie_link_gen             :4
    current_pcie_link_gen         :4
    temperature                   :33
    gpu_utilization               :0
    memory_utilization            :0

Starting PyCuda GPU test on: NVIDIA A100-PCIE-40GB @ 0000:61:00.0
    PyCuda was compiled with CUDA: (11, 2, 0)
Finished PyCuda GPU test in 0.026s

Testing Tensorflow...
    Tensorflow found 4 GPUs.
Tensorflow test completed in 3.385s
```

Finally, PyCUDA (and optionally Tensorflow and PyTorch) will be tested to ensure they are working properly. If the either of these tests fail, the error will be summarized in the test results. For more information, check the failed job’s Event Log and [stdout log (joblog)](https://guide.cryosparc.com/setup-configuration-and-management/management-and-monitoring/cryosparcm?q=queuing+#cryosparcm-joblog-px-jxx).

```
cryoem9
  ✓ LAUNCH
  ✓ SSD
  ✕ GPU
    Error: Tensorflow detected 0 of 7 GPUs.
    See P1 J1222 for more information
```

#### Testing Tensorflow and PyTorch

By default, Tensorflow and PyTorch capabilities are not tested during the GPU test. To enable these tests, specify `--test-tensorflow` and/or `--test-pytorch` when starting the worker test. For example:&#x20;

`cryosparcm test workers P12 --test-tensorflow --target cryoem9.structura.dev`

{% hint style="info" %}
The PyTorch test will fail if the 3D Flex Refine dependencies were not installed using `cryosparcw install-3dflex` introduced in CryoSPARC v4.1.0. For more information, see \<Link to 3D Flex Refine: Installing Dependencies>
{% endhint %}

If Tensorflow or PyTorch was not able to detect all GPUs on your system, the job will fail, and the error message will appear in the job's stdout log (found in the 'Metadata' tab of the Job Dialog).
