Troubleshooting
Overview of common issues and advice on how to resolve them.
Unless otherwise noted:
Log in to the workstation or remote node where
cryosparc_master
is installed.Use the same non-root UNIX user account that runs the CryoSPARC process and was used to install CryoSPARC.
Run all commands on this page in a terminal running
bash
In v4.0+, you can download error reporting information from within the application. For more details, see: Guide: Download Error Reports
Common Issues
Cannot download or install CryoSPARC
Problems with the installation steps are indicated with the some of the following error messages:
"Couldn't connect to host" "Could not resolve host" {"success": false} "tar: This does not look like a tar archive" "Version mismatch! Worker and master versions are not the same. Please update." "An unexpected error has occurred."
Steps
If you have Conda installed, deactivate any active environments.
Check that your
LICENSE_ID
environment variable is set correctly with this commandEnsure the output exactly matches the CryoSPARC License ID issued to you over email.
Check your machine's connection to CryoSPARC's license servers at get.cryosparc.com with this
curl
command:You should see the message
{"success": true}
. If instead you see{"success": false}
, your license is not valid, so please check it has been entered correctly. If you see an error message like "Couldn't connect to host" or "Could not resolve host" check your Internet connection, firewall or ensure your IT department has theget.cryosparc.com
license server domain whitelisted.
Cannot update CryoSPARC
Steps
Reinstall CryoSPARC, following the steps from the "Cannot download CryoSPARC" section
CryoSPARC does not start or encounters error on startup
This can happen following a fresh install or recent update.
Steps
In a command line, run
cryosparcm status
Check that the output looks like this
Check that all items under "CryoSPARC process status" that do not end in
_dev
or_legacy
areRUNNING
. If any are not, runcryosparcm restart
If any non-
_dev
/non-_legacy
components have a status other thanRUNNING
(such asSTOPPED
orEXITED
), check their log for errors. For example, this command checks for errors on thedatabase
process:(Press
control C
, thenq
to stop logging)If the web interface is inaccessible, check firewall settings to ensure CryoSPARC's base port number (default
39000
) is exposed for network access
Any error messages here could indicate specific configuration issues and may require re-installing CryoSPARC.
If at any point you see No command 'cryosparcm' found
or command not found: cryosparcm
:
Check that you are on the master node or workstation where
cryosparc_master
is installedRun
echo $PATH
and check that it contains<installation directory>/cryosparc_master/bin
Reinstall CryoSPARC if the above did not restore the the
cryosparcm
command
'User not found' error when attempting to log in
This error message occurs if the email address field does not match any existing users in your CryoSPARC instance. Use the CryoSPARC command-line to verify the details of your user account and change the email address or password if needed.
Run the following command in your terminal:
cryosparcm listusers
If an email address is incorrect (e.g., mispelled or with an extra space at the beginning or end), modify it in the database. Run the following commands:
Log into the MongoDB shell:
cryosparcm mongo
Once in the MongoDB shell, enter the following (replace the incorrect/correct email):
Exit the MongoDB shell with
exit
If you don't remember your password, reset it with the following command (replace with your email address and new password):
Incomplete CryoSPARC shutdown
An incomplete shutdown of CryoSPARC is likely to interfere with subsequent attempts to start CryoSPARC and/or CryoSPARC software updates. Incomplete shutdowns can occur for various reasons, including, but not limited to:
unclean shutdown of the computer that runs
cryosparc_master
processesfailed coordination of services by
cryosparc_master
'ssupervisord
process
Follow this sequence to ensure a complete shutdown of CryoSPARC
1. Basic shutdown
For CryoSPARC instances that were not configured as a systemd service, run the command
Do not use
for CryoSPARC instances that are controlled by systemd. For such instances, use the appropriate systemctl stop
command.
2. Find and, if necessary, terminate "zombie" processes
Confirm that the basic shutdown did not "leave behind" any CryoSPARC-related processes. If the basic shutdown was successful, a suitable ps
command should not show any processes for the CryoSPARC instance in question, but processes may be shown if
a glitch occurred during the basic shutdown or
the computer hosts multiple CryoSPARC instances.
To illustrate what kind of processes one might encounter, here is an example command and its output for a running CryoSPARC v4.4 instance:
This is a simple example. More complex configurations, such a host with multiple active CryoSPARC instances, may require different ps
options and/or grep
patterns.
The ps
output may include processes that belong to non-CryoSPARC applications or to CryoSPARC instances other than the CryoSPARC instance that you wish to shutdown. Parent process identifiers and port numbers in the listed commands can help in attributing processes to a common parent supervisord
process. Carefully confirm the purpose and identity of any process before termination.
For the example above, it should be sufficient to kill
the supervisord
process using the process identifier shown by the ps
command
and wait a few seconds for the supervisord
process' children to be terminated automatically.
Never use the kill -9
option for mongod
processes.
Finally, using another ps
command with suitable options, re-confirm that all relevant processes have in fact been terminated
3. Only under certain circumstances, delete "orphaned" socket files
An intact CryoSPARC instance manages the creation and deletion of socket files for mongod
and supervisord
, like
Filenames differ between CryoSPARC instances, for example based on $CRYOSPARC_DB_PORT
.
Socket files should be deleted only under specific circumstances, subject to precautions given below.
Never delete socket files before confirming that associated processes have been terminated as described in the previous step.
The computer may store socket files that belong to non-CryoSPARC applications or to CryoSPARC instances other than the CryoSPARC instance that you wish to shutdown. Such socket files may have names similar to the files you wish to delete. Carefully confirm the purpose and identity of each file before any deletion.
License error or license not found
Follow the steps in this section when you see error messages that look like this:
"License is invalid." "License signature invalid." "Could not find local license file. Please re-establish your connection to the license servers." "Local license file is expired. Please re-establish your connection to the license servers." "Token is invalid. Another CryoSPARC instance is running with the same license ID."
Steps
Run cryosparcm licensestatus
. This should result in "License is valid". If you see this error:
Your license ID is not entered configured correctly. Check the CRYOSPARC_LICENSE_ID
entry in cryosparc_master/config.sh
.
If you see this error:
This includes a list of checks, the last of which will indicate a failure. Depending on which check failed, do one of the following:
Ensure you entered the license correctly during the installation step.
Check your Internet connection
Check your machine's connection to CryoSPARC's license servers at get.cryosparc.com with this
curl
command (substitute<license>
with your unique license ID):
Look for the message message {"success": true}
If instead you see {"success": false}
, your license is not valid so please check it has been entered correctly.
If you see an error message like "Couldn't connect to host" or "Could not resolve host" check your Internet connection, firewall or ensure your IT department has the get.cryosparc.com
license server domain whitelisted.
If you see a license ID conflict such as
"Another cryoSPARC instance is running with the same license ID."
Follow the complete shutdown procedure before running
Cannot queue or run job
Follow these steps when the CryoSPARC web interface is up and running normally and jobs may be created but do not run. These error messages may indicate this issue:
"list index out of range" "Could not resolve hostname ... Name or service not known"
A job that never changes from Queued
or Launched
status may also indicate this.
Steps
Ensure at least one worker is connected to the master. See the Installation page for details. Visit Manage > Resources to see what lanes are available
Check that all non-development CryoSPARC processes are running with the
cryosparcm status
command(For master/worker setups) check that SSH is configured between the master and worker machines.
Check the log for the
command_core
log to find any application error messages(press
Control + C
on the keyboard followed byq
to exit when finished)If applicable, check that the cluster submission script is correct
Refresh job types and reload CryoSPARC:
Restart CryoSPARC:
Clear the job and re-run it.
Job stuck in launched status
This indicates that CryoSPARC started the job process but the job encountered an internal error immediately after.
Steps
or from the command line with
Substituting PX
and JY
for the project and job IDs, respectively.
Typically the errors here occur when the worker process cannot connect back to the master. Ensure there is a stable network connection between all machines involved. Ensure CryoSPARC was installed correctly and re-install if necessary.
Job runs but ends unexpectedly with status "Failed"
When a job fails, its job card in the interface turns red and the bottom of the job log includes an error message with the text Traceback (most recent call last)
Common failure reasons include:
Invalid or unspecified input slots
Invalid or unspecified required parameters, including file/folder paths
Incorrectly set up GPU (e.g., running a job on a node without enough GPUs or CUDA drivers not installed)
Another process taking up memory on a GPU
Cache not set up correctly for a worker
Lost connection to
cryosparc_master
Common job failure error messages:
"AssertionError: Child process with PID ... has terminated unexpectedly!" Job is unresponsive - no heartbeat received in 30 seconds.
Common error messages that indicate incorrectly configured GPU drivers:
"cuInit failed: unknown error" "no CUDA-capable device is detected" "cuMemHostAlloc failed: OS call failed or operation not supported on this OS" "cuCtxCreate failed: invalid device ordinal kernel.cu ... error: identifier "__shfl_down_sync" is undefined
Common error messages that indicate not enough GPU memory:
"cuMemAlloc failed: out of memory" "cuArrayCreate failed: out of memory" "cufftAllocFailed"
Steps
Ensure a supported version of the CUDA toolkit is installed and running on the workstation or each worker node
Check the GPU configuration on the workstation or node where the job runs on. Log into that machine and navigate to the CryoSPARC installation directory. Run the
cryosparcw gpulist
command:Run nvidia-smi to check that no other processes are using GPU memory. CryoSPARC-related process appear with process name "python"
If you don't recognize the processes using GPU memory, run
kill <PID>
, substituting<PID>
with the value under the Processes PID columnClear the job and select the "Build" status badge on the job card to enter the Job Builder
Check the job parameters: To learn about setting specific parameters, hover over or touch the parameter names in the Job Builder to see a description of what they do
Find the target job type in this guide's Job Reference for more detailed descriptions of expected input slots and parameters: All Job Types in CryoSPARC.
Reduce the box-size of extracted particles. Some jobs need to fit thousands of particles in GPU memory at a time, and larger box sizes exceed GPU memory limits. Either extract with a smaller box size or with the Downsample Particles job.
Refresh job types and reload CryoSPARC:
Look for extended error information with the
cryosparcm joblog
command (pressCtrl + C
on the keyboard to exit when finished)Check the network connection from the worker machine to the master
On occasion, a job fails due to an error in the CryoSPARC code (bug). The CryoSPARC team regularly releases updates and patches with bug fixes. Check for and install the latest update or patch. If you find a new bug, see the Additional Help section for advice.
Job stuck or taking a very long time
Due to their large sizes, cryo-EM datasets can take a long time to process with sub-optimal hardware or parameters. Here are some facilities that CryoSPARC provides for increasing speed/performance.
Connect workers with SSD cache enabled. This speeds up processing for extracted particles during 2D Classification, ab-initio reconstruction, refinement and more. Ensure the "Cache particle images on SSD" parameter is enabled under "Compute settings" for the target particle-processing job
Some jobs (motion correction, ctf estimation, 2D classification) can run on multiple GPUs. If your hardware supports it, increase the number of GPUs to parallelize over
Extracted particles with large box sizes (relative to their pixel size) take a long time to process. Consider Fourier-cropping (or "binning") the extracted particle blobs with the Downsample Particles job
Minimize the number of processes using system resources on the workstation or worker nodes
Check for zombie processes on worker machines. The process is similar to the steps under "Another CryoSPARC instance is running with the same license ID" under the License error or license not found section
Cancel the job, clear and re-queue
GPU Issues
cudaErrorInsufficientDriver
or CUDA_ERROR_UNSUPPORTED_PTX_VERSION
cudaErrorInsufficientDriver
or CUDA_ERROR_UNSUPPORTED_PTX_VERSION
A job fails with errors similar to the following when the Nvidia driver is out-of-date or incompatible with the target GPU or CUDA Toolkit Version that ships with CryoSPARC:
To fix this, update the Nvidia driver to the minimum driver version noted in Installation Prerequistes. Please follow instructions specific to the worker's Linux distribution to install the Nvidia driver. The latest Nvidia driver is available to download on Nvidia's website.
Related Discussion Forum post where a user encountered this error.
undefined symbol: _ZSt28__throw_bad_array_new_lengthv
undefined symbol: _ZSt28__throw_bad_array_new_lengthv
A job may fail with the following error when running GPU jobs (Patch Motion Correction, 2D Classification) with CryoSPARC v3.3.2 or v3.4.0 on Ubuntu 22+.
To fix, set environment variables CFLAGS="-static-libstdc++"
and CXXFLAGS="-static-libstdc++"
to the environment before recompiling the PyCUDA module with cryosparcw newcuda
:
SSD Cache Issues
In some circumstances, jobs with "Cache particle images on SSD" enabled will not complete with one of the following errors in the job log:
SSD cache needs additional B but drive can only be filled up to B
SSD cache needs additional B but drive has B free. CryoSPARC can only free a maximum of B. This may indicate that programs other than CryoSPARC are using the SSD.
Cannot allocate space on the SSD to cache /path/to/particles.mrc; other non-CryoSPARC processes are using the cache.
Cannot finish transfer; other programs may be accessing the SSD.
FileNotFoundError: [Errno 2] No such file or directory: '…/…/store-v2/...' → '/scratch/instance_cryosparc:39001/links/...’
You may also observe some of the following symptoms:
Slower than expected SSD copy speeds
Jobs that use the SSD cache take much longer than they should
Jobs spend a very long time waiting on cache files locked by other jobs or waiting for additional space to free up
These issues may occur in long-active CryoSPARC instances with many projects and files on the SSD, and many jobs or other non-CryoSPARC processes running simultaneously. There are several strategies to address these or reduce their likelihood.
Option 1: Check SSD health
SSDs naturally degrade over time, the likelihood of failure increasing with heavy usage. Use a tool such as smartctl
to check the SSD. If enough errors have accumulated, the SSD may have to be replaced.
CryoSPARC automatically removes files from the SSD that have not been accessed in a while (more than 30 days by default) each time the SSD cache system runs. If the SSD is very heavily used in particle processing jobs or by other external tools, leaving more free space available may extend its lifetime. This is possible with one or both of these strategies:
Reconnect the worker with the
--ssdreserve
flag set (default 10GB or 10000MB) to ensure CryoSPARC always leaves the given amount of free space on the SSD (will clean out old files to stay above the threshold)Set the
CRYOSPARC_SSD_CACHE_LIFETIME_DAYS
environment variable incryosparc_master/config.sh
to clean up unused files on the SSD more frequently. The default value is30
days
Option 2: Ensure no other programs are using the CryoSPARC SSD cache path
CryoSPARC assumes that it has exclusive access to the SSD cache path. If other programs are accessing the cache path at the same time as CryoSPARC, the cache step may fail with a file system-related or OS-level error.
Sharing a cache device with other applications, including other CryoSPARC instances, is not recommended:
Performance may suffer from due to shared bandwidth usage.
The given CryoSPARC instance's caching may be disrupted because the CryoSPARC instance does not prevent use of free space within its own quota by other applications or CryoSPARC instances.
Option 3: Increase or Reduce the SSD Quota
The SSD Quota is the maximum amount of SSD space that CryoSPARC jobs use on a connected worker. CryoSPARC will never use more than this amount of SSD space. A job that requires caching will delete enough older cache files to ensure the quota is not exceeded.
If the quota is significantly lower than the total size of the SSD and jobs frequently wait for cache space to free up, consider increasing the quota.
If other programs regularly share the SSD with CryoSPARC and jobs fail because the cache drive runs out of space, consider decreasing the quota to reduce the likelihood of this.
Change the quota by re-running the cryosparcw connect
command with --ssdquota <amount in MB>
and --update
arguments.
Option 4: Increase or Reduce the SSD Reserve
The SSD Reserve is the minimum amount of free space that CryoSPARC leaves on the SSD, also considering potential use of the the cache device by other applications even if CryoSPARC has not reached its SSD quota. CryoSPARC jobs will delete, if possible, unused files from its cache until this much space is free again. CryoSPARC jobs will not copy files to the SSD until at least this amount of space is free. This setting is intended to preserve SSD health.
If CryoSPARC reports slower cache write speeds than expected, or jobs that use the SSD cache take longer than they should, consider increasing the reserve.
If jobs frequently wait for cache space to free up, consider decreasing the reserve.
Change the reserve by re-running the cryosparcw connect
command with --ssdreserve <amount in MB>
(default 10000
) and --update
arguments.
Option 5: Change the SSD cache locking strategy
Applies to SSD caches that run on network file systems such as NFS, GPFS, BeeGFS or Lustre. Note that we strongly recommend local SSD caches for best performance.
By default, CryoSPARC jobs use a file-system level POSIX lock to ensure mutual exclusion between multiple jobs that access the SSD cache simultaneously. These locks can be unreliable on network file systems such as NFS, GPFS, BeeGFS or Lustre. This may lead to errors during the SSD caching step, particularly during copy or symbolic link operations. Error messages with the following format are key symptoms of this:
FileNotFoundError: [Errno 2] No such file or directory: '…/…/store-v2/...' → '/scratch/instance_cryosparc:39001/links/...’
Add the following line to cryosparc_worker/config.sh
to use the CryoSPARC master to broker cache access instead:
Option 6: Fix database inconsistencies (CryoSPARC ≤4.4)
These instructions apply to CryoSPARC versions v4.4 or older. v4.5+ uses a new cache system that does not require the database.
cryosparc_master
uses its MongoDB database to coordinate SSD caching between multiple workers running in parallel. If a job fails unexpectedly during the SSD caching step, this could lead to database inconsistencies which prevent other jobs from proceeding.
To address these, try the following steps:
Ensure no jobs are running in CryoSPARC
In a terminal, enter
cryosparcm mongo
to enter the interactive database promptEnter the following command to check how many records are in an inconsistent state:
If the result is not
0
(zero), enter the following command to fix themExit from the database prompt with
Ctrl + D
Try re-running the problematic jobs
Option 7: Fully reset the SSD cache system
Fully reset the caching system with the following steps:
Ensure no jobs are running in CryoSPARC
For each connected worker machine:
Navigate to the SSD cache directory containing CryoSPARC's cache files (e.g.,
/scratch/
). This path was configured during installation timeLook for a directory named
instance_<master hostname>:<master port + 1>
e.g.,instance_localhost:39001
Delete this directory and all its contents
The following additional reset instructions are required for CryoSPARC v4.4 or older.
In a terminal, enter
cryosparcm mongo
to enter the interactive database promptEnter the following command to clear out the cache records
Exit from the database prompt with
Ctrl + D
Try re-running the problematic jobs
Option 8: Disable SSD cache for the job
If the issue persists after trying any of the above options, consider disabling the "Cache particle images on SSD" parameter for the affected job.
User Interface Error Logging
If you encounter a problem with the user interface in your web browser, e.g., one or more elements of a page are not loading, etc., you can use the following steps to obtain debugging information.
Open the browser console
In Chrome, Firefox, Edge, and Safari this can be done by right clicking the page to open the browser context menu and selecting the
Inspect
option (Inspect Element
in Safari) .This will open up a “DevTools” panel used for inspecting and debugging in the browser. This panel includes a number of tabs at the top used to display different views. When opened using the context menu the current view will be the
Elements
tab. Click on theConsole
tab directly beside theElements
tab in order to view the web console. This is where errors, warnings, and general information about the page’s javascript code can be observed.In order to keep the console clean in production we disable our development logs. Enable these logs by pasting the command
window.localStorage.setItem('cryosparc_debug', true);
into the browser console and then pressing the enter key on your keyboard.undefined
will be logged below this command if it was submitted correctly.Now reload the page and all of the development console logs and errors will be visible.
Save Console Output
Please save console output as a
.log
file, including the type of browser (Chrome, Firefox, Edge, Safari, etc.) in the file name. The filename should be formatted as such:console_{browser}.log
, eg.console_chrome.log
. Before saving the file, try to reproduce the issue you encountered.Chrome or Edge: Right click anywhere in the console panel to open the context menu and select the
Save As...
option. This will allow you to save the entire output as a.log
file.Firefox: Right click on a console message to open the context menu and select the
Save all Messages to File
option. This will allow you to save the entire output as a.txt
file as default (or.log
file optionally).Safari: Click and drag the cursor over all items in the console output so that the items are all highlighted blue. You can then right click on any of the highlighted items to open the context menu and select the
Save Selected
option to to save the entire output as a.txt
file as default (or.log
file optionally).
Save Network Output
Navigate to the
Network
tab in the DevTools by selecting it from the tabs in the top bar of the panel. If theNetwork
tab is not shown then it is likely hidden in the overflow menu (this appears when there is not enough space to display all of the tab options in the DevTools). Click the overflow menu button represented by two right chevrons (>>
) and select theNetwork
option.Please include the type of web browser (Chrome, Firefox, Edge, or Safari) in the name of the
.har
file you are saving. The filename should be formatted as such:network_{browser}.har
, eg.network_chrome.har
.Chrome or Edge: Right click on any of the items in the network request table and then select
Save all as HAR with context
from the context menu.Make sure that the red record button at the top left of the panel has been activated. It will appear as a grey circle if it has not been activated, and a red circle if it has. The
Preserve log
checkbox must also be selected. Now reload the page and the network panel will be populated with all requests made and received by the browser. Before saving the file, try to reproduce the issue you encountered.Firefox: Right click on any of the items in the network request table and then select
Save All As HAR
from the context menu.Safari: Right click on any of the items in the network request table and then select
Export HAR
from the context menu.
Additional Help
For topics not covered above, get additional help through the CryoSPARC Discussion Forum:
If no related discussions exist, please create a new post. Review our Troubleshooting Guidelines for items to include in your post:
Last updated