Powered By GitBook
Tutorial: Manipulating .cs Files Created By cryoSPARC
A short guide on reading and modifying cryoSPARC dataset (.cs) files
There may be an instance where you need to manually open a .cs (cryoSPARC) file to manipulate your data's attributes for a specific data processing workflow or question you may have. In this tutorial, you will learn how to open, view, manipulate and save .cs files using Python, allowing you to craft complex workflows within cryoSPARC (e.g., reverse symmetry expansion for a particle set, sorting particles above a certain angle and shift threshold, etc.).

Extracting recentered particles

In the following example, we edit a particle dataset in preparation for re-extraction after centering particle locations based on the shifts calculated by a 2D Classification job.

Setup

Use a subset of raw movies from EMPIAR-10025 and picked particle locations from a Template Picker job with templates resembling the top and side views of the protein. Then extract the particles and run them through a 2D Classification job, which creates templates. Filter the classified particles with a Select 2D job.
Tree-view diagram of the workspace
Press the "Export" on the particles_selected output result group in the "Outputs" tab of the Select 2D job. This consolidates the newly created dataset with the passthrough outputs (datasets created by ancestor jobs that were not modified during processing, but are passed along the workflow). For more information on Exporting data in cryoSPARC, read the Data Management Guide.
Press the "Export" button under "Actions" to export an output result group.
Navigate back to the "Overview" tab to find the location of the exported .cs (cryoSPARC file) and .csg (cryoSPARC group file) at the end of the streamlog. Keep this path; we use it to open the dataset file on disk to modify it. For more information on .cs and .csg files, see our FAQ here.
This is the path to the .cs file you will manipulate, then import back into cryoSPARC.

Manipulating a .cs File with Python

Here, you need a Python environment with modules required to open and edit .cs files. You may use cryoSPARC's own built-in interactive Python shell or a Jupyter Notebook running a Python 2.7 environment.

Option 1: Use the built-in interactive shell

To use cryoSPARC's built-in interactive Python shell, run the following command:
1
cryosparcm icli
Copied!
This starts an interactive Python command shell where you can import the required modules and copy+paste the commands below.

Option 2: Write your own Python script

You can also create a .py Python script and execute it non-interactively. First evaluate all of cryoSPARC's environment variables into your current shell session's path with this command:
1
eval $(cryosparcm env)
Copied!
Then use cryoSPARC's provided Python interpreter to execute your script.
To import cryosparc_compute.datasetin your Python script, set the PYTHONPATH environment variable to cryoSPARC's root execution directory:
1
# Enter these in the command line
2
eval $(cryosparcm env)
3
export PYTHONPATH="${CRYOSPARC_ROOT_DIR}"
4
python recenter_particles.py
Copied!
The full script is provided at the end of this guide.

Import the Required Modules

First, import the two modules required for this example:
1
import numpy as n
2
3
# "dataset" is the main module required to interact with cryoSPARC .cs files
4
from cryosparc_compute import dataset
Copied!

Loading the Dataset File

Use the .cs file path from exporting the particle result group:
1
# instatiate a new Dataset object
2
particle_dataset = dataset.Dataset()
3
# load the dataset into memory from file
4
dataset_path = '/bulk/data/cryosparc_projects/P28/exports/groups/P28_J283_particles_selected/P28_J283_particles_selected_exported.cs'
5
particle_dataset.from_file(dataset_path)
Copied!

Manipulate the data

You can access any column in a dataset as you would a Python dictionary: First call the accessor attribute .data[column] for some value of column. This returns a Numpy array of all values inside the column. In this example, these are the columns (also known as "fields") available:
1
Dataset with 12994 items and 51 fields:
2
uid <u8
3
blob/path |O
4
blob/idx <u4
5
blob/shape <u4 (2,)
6
blob/psize_A <f4
7
blob/sign <f4
8
alignments2D/split <u4
9
alignments2D/shift <f4 (2,)
10
alignments2D/pose <f4
11
alignments2D/psize_A <f4
12
alignments2D/error <f4
13
alignments2D/error_min <f4
14
alignments2D/resid_pow <f4
15
alignments2D/slice_pow <f4
16
alignments2D/image_pow <f4
17
alignments2D/cross_cor <f4
18
alignments2D/alpha <f4
19
alignments2D/alpha_min <f4
20
alignments2D/weight <f4
21
alignments2D/pose_ess <f4
22
alignments2D/shift_ess <f4
23
alignments2D/class_posterior <f4
24
alignments2D/class <u4
25
alignments2D/class_ess <f4
26
ctf/type |O
27
ctf/exp_group_id <u4
28
ctf/accel_kv <f4
29
ctf/cs_mm <f4
30
ctf/amp_contrast <f4
31
ctf/df1_A <f4
32
ctf/df2_A <f4
33
ctf/df_angle_rad <f4
34
ctf/phase_shift_rad <f4
35
ctf/scale <f4
36
ctf/scale_const <f4
37
ctf/shift_A <f4 (2,)
38
ctf/tilt_A <f4 (2,)
39
ctf/trefoil_A <f4 (2,)
40
ctf/tetra_A <f4 (4,)
41
ctf/anisomag <f4 (4,)
42
ctf/bfactor <f4
43
location/micrograph_uid <u8
44
location/exp_group_id <u4
45
location/micrograph_path |O
46
location/micrograph_shape <u4 (2,)
47
location/center_x_frac <f4
48
location/center_y_frac <f4
49
pick_stats/ncc_score <f4
50
pick_stats/power <f4
51
pick_stats/template_idx <u4
52
pick_stats/angle_rad <f4
Copied!
Calculate new particle locations for our dataset by applying the shift values from a 2D Classification job. These shifts, when added to existing particle (x,y) coordinates, yield a "recentered" position based on where the 2D Classification algorithm calculated the "centre of mass" of the particle to be.
First convert the raw particle location into pixels, then angstroms (Å) (the shifts are stored in angstroms relative to the particle image). CryoSPARC stores particle coordinates as ratios of the centre of the particle relative to the original micrograph's shape. Transpose the array, then multiply it by the length of the micrograph in each dimension. Finally, multiply it by the original micrograph's pixel size (set manually on line 1).
1
# organize information about the original micrographs these particles were extracted from
2
micrograph_pixel_size = 0.6575 # also can be obtained from mscope_params/psize_A
3
micrograph_ny = particle_dataset.data['location/micrograph_shape'][0][0]
4
micrograph_nx = particle_dataset.data['location/micrograph_shape'][0][1]
5
6
# get the shift values obtained from 2D classification in angstroms (Å)
7
pixel_size = particle_dataset.data['alignments2D/psize_A'].reshape(-1,1)
8
shifts = particle_dataset.data['alignments2D/shift'] * pixel_size
9
10
# calulate the current locations of the particles in angstroms (Å)
11
location_xs = (particle_dataset.data['location/center_x_frac'].reshape(-1,1) * # fraction relative to micrograph
12
micrograph_nx *
13
micrograph_pixel_size)
14
15
location_ys = (particle_dataset.data['location/center_y_frac'].reshape(-1,1) * # fraction relative to micrograph
16
micrograph_ny *
17
micrograph_pixel_size)
18
19
# calculate the new locations of the particles, after applying the shifts
20
recentered_locations = n.concatenate((location_xs, location_ys), 1) - shifts
21
22
# convert the new locations back to fractions relative to the micrograph
23
recentered_fraction_xs = (recentered_locations[:,0] /
24
micrograph_nx /
25
micrograph_pixel_size)
26
27
recentered_fraction_ys = (recentered_locations[:,1] /
28
micrograph_ny /
29
micrograph_pixel_size)
Copied!

Storing the Data in the Dataset

After recentering particle locations by adding the shifts, convert the locations back to fractions. Save these back to the dataset.
1
# save the new fractions back to the dataset
2
particle_dataset.data['location/center_x_frac'] = recentered_fraction_xs
3
particle_dataset.data['location/center_y_frac'] = recentered_fraction_ys
4
5
# reset the shifts now that they've already been applied to the particle locations
6
particle_dataset.data['alignments2D/shift'] = 0
Copied!

Save the Dataset To Disk

Save the dataset back to disk with the to_file(string)function.
1
# save the dataset back to disk
2
# Note: this will overwrite the currently exported dataset
3
particle_dataset.to_file(dataset_path)
Copied!
If using the interactive Python shell, exit with ctrl+D. If using a script, save it and run it with python my_script.py.

Import the Dataset into cryoSPARC

Re-import the dataset into the workspace via the Import Result Groups job. Use the Extract From Micrographs job to extract recentered particle locations.
In the Import Result Group job, specify the path to the .csg file that was next to the modified .cs file on disk to re-import the dataset.
Connect the particles_selected group (or similar) to an Extract From Micrographs job to re-extract the recentered particles.

Full Example Python Script

recenter_particles.py
1
import numpy as n
2
3
# dataset is the main module required to interact with cryoSPARC .cs files
4
from cryosparc_compute import dataset
5
6
# instatiate a new Dataset object
7
particle_dataset = dataset.Dataset()
8
# load the dataset into memory from file
9
dataset_path = '/bulk6/data/devv2stephan_projects_bulk6/P28/exports/groups/P28_J283_particles_selected/P28_J283_particles_selected_exported.cs'
10
particle_dataset.from_file(dataset_path)
11
12
# organize information about the original micrographs these particles were extracted from
13
micrograph_pixel_size = 0.6575 # also can be obtained from mscope_params/psize_A
14
micrograph_ny = particle_dataset.data['location/micrograph_shape'][0][0]
15
micrograph_nx = particle_dataset.data['location/micrograph_shape'][0][1]
16
17
# get the shift values obtained from 2D classification in angstroms (Å)
18
pixel_size = particle_dataset.data['alignments2D/psize_A'].reshape(-1,1)
19
shifts = particle_dataset.data['alignments2D/shift'] * pixel_size
20
21
# calulate the current locations of the particles in angstroms (Å)
22
location_xs = (particle_dataset.data['location/center_x_frac'].reshape(-1,1) * # fraction relative to micrograph
23
micrograph_nx *
24
micrograph_pixel_size)
25
26
location_ys = (particle_dataset.data['location/center_y_frac'].reshape(-1,1) * # fraction relative to micrograph
27
micrograph_ny *
28
micrograph_pixel_size)
29
30
# calculate the new locations of the particles, after applying the shifts
31
recentered_locations = n.concatenate((location_xs, location_ys), 1) - shifts
32
33
# convert the new locations back to fractions relative to the micrograph
34
recentered_fraction_xs = (recentered_locations[:,0] /
35
micrograph_nx /
36
micrograph_pixel_size)
37
38
recentered_fraction_ys = (recentered_locations[:,1] /
39
micrograph_ny /
40
micrograph_pixel_size)
41
42
# save the new fractions back to the dataset
43
particle_dataset.data['location/center_x_frac'] = recentered_fraction_xs
44
particle_dataset.data['location/center_y_frac'] = recentered_fraction_ys
45
46
# reset the shifts now that they've already been applied to the particle locations
47
particle_dataset.data['alignments2D/shift'] = 0
48
49
# save the dataset back to disk
50
# Note: this will overwrite the currently exported dataset
51
particle_dataset.to_file(dataset_path)
52
Copied!
Last modified 10mo ago