item
. Every item
has a type, for example an exposure, particle, volume, mask, etc. Every item
also has properties, with each property having a name (eg. ctf
) and sub-properties containing actual metadata values (eg. ctf/defocus
, ctf/astigmatism
, etc). Collections of item
s with the same type and same properties, constitute a dataset
. A dataset
is essentially a table, where each row is a single item, and columns are properties/sub-properties. Every job can only input and output dataset
s. Therefore every type of data/metadata is stored in a dataset
. On disk, dataset
s are stored in the .cs
file format, which is a binary numpy
format descibed in a later section.item
result
. For example, a CTF estimation job would output a ctf
result, containing sub-properties like defocus, astigmatism, spherical abberation, etc. The result
s that a job outputs are the basic component of what gets connected to other jobs.result-group
s - each one is a set of result
s that describe the same type of item
. Thus a job can output a result-group
defining particles, with two result
s, one ctf
and another alignments
.input-group
s each allowing a certain type of item
like particles, volumes, etc.input-group
has certain slot
s, each taking in a particular kind of result
. For example, an input-group
taking in particles may have a slot
for ctf
and another for alignments
input-group
also defines the number of different result-group
s that can be connected to it. In general all the items from all result-group
s that are connected are appended together to make one larger dataset
that forms the input to the job. So for example, connecting two particle stacks to a single input-group
will cause those stacks to be appended together.result
s, result-group
s, etc is so that in cryoSPARC, most connections between jobs can be made simply at the group
level, without having to specify particular files, paths, columns or rows in tables or text files. Subsets of dataset
s can be easily defined and passed around, and different subsets can be joined together as inputs to a further job. For advanced uses, however, the lower-level result
s allow a user to connect only certain metadata about an item from one job to another, or override the metadata for certain properties in a result-group
. Examples of how and when to use this capability follow.result
s that it doesn't actually need, and then output those result
s as "passthrough" metadata that is not read or modified by the job, but just passed along in its output so that subsetquent jobs can use it without needing to be manually connected to an earlier output in the chain.uid
(a 64-bit integer) that is used to maintain correspondences across chains of processing jobs and to ensure that regardless of the order that a job outputs items, the properties of each items are always correctly assigned to the correct item.uid
is an array of C
structures, implemented using numpy
structured arrays. These arrays are stored in memory and on disk in the same format. On disk, we store these arrays in binary format in .cs
files. Each .cs
file in cryoSPARC contains a single table. Each row corresponds to a single item. A .cs
file must contain a column for the uid
of each item, and further columns define properties/sub-properties of that item. Multiple .cs
files therefore can be used in aggregate to define all the properties of a set of items, since the rows in every table all have a uid
that can be used to join the tables. In general, when multiple tables are used to specify a dataset, the dataset contains only the intersection of items included in each table.half_map_A
and half_map_B
inputs from different jobs. The example below outlines the three step process of using one input group in the local resolution estimation job builder to populate volume data from three separate jobs.particles.blob
input slot with the non-downsampled data that you previously connected as an input group.