# Job: Helical Refinement (BETA)

Description: Reconstruct and refine a homogeneous helical assembly, with or without imposition and refinement of symmetry parameters. Helical Refinement (BETA) uses an algorithm that is conceptually similar to Egelman's Iterative Helical Real Space Reconstruction (IHRSR) algorithm [1], while incorporating the same maximum likelihood framework, accelerated branch-and-bound alignment algorithm, and optional Non-Uniform regularization as used in other cryoSPARC refinement jobs. For an example of a full workflow that incorporates the helical refinement job, please refer to the EMPIAR-10031 case study.

Input:

• Particles

• Initial model (optional)

Output:

• Refined 3D map

• Symmetrized and sharpened maps

• Half-maps

• Mask used in FSC calculation

• Gold-standard FSC curve

• Plots, including orientation distributions

• Estimated symmetry parameters (helical twist and rise)

# Parameters

## Helical Symmetry

These parameters inform the application and treatment of helical symmetry within the refinement iterations.

• Helical twist estimate and Helical rise estimate

• These are the initial symmetry parameters that the job will use to apply helical symmetry. The helical twist (º) specifies the rotation angle between successive asymmetric units in the helical lattice; the helical rise (Å) specifies the translation along the helical axis between successive asymmetric units. For a more detailed description of how helical symmetry is treated in cryoSPARC, please refer to this page.

• Note: The refinement can also run without symmetry estimates, in which case no helical symmetry will be applied. These are referred to as asymmetric refinements. For the majority of helical assemblies, especially those with small asymmetric units, applying correct helical symmetry is necessary to produce a high resolution map. However, some helical assemblies with larger asymmetric units relative to the overall size of the assembly may refine to an output map with clear helical symmetry without the enforcement of helical symmetry.

• In any workflow, once the symmetry parameters are known to a moderately high level of confidence, it is always recommended to run a helical refinement with symmetry enforced to achieve the high map quality and resolution.

• Maximum symmetry order to apply during reconstruction

• This corresponds to the number of times each particle image is used for the application of helical symmetry in reconstruction. If particles were picked using the filament tracer or template picker, an appropriate value will automatically be calculated (using the distance between extracted boxes and the helical rise) and printed out in the streamlog as "Refining with helical symmetrization degree of x enforced".

• If particles were picked using a different picker (e.g. Topaz or the Deep Picker, or if they were imported), this value must be input manually, and should be set to $\left \lfloor (\frac{d}{\Delta z}) \right \rfloor$, where $d$ is the distance between extracted boxes and $\Delta z$is the helical rise, both in Angstroms.

• If the twist and rise estimates are not suspected to be highly accurate, manually setting this to a rather low value (2 or 3) may allow the symmetry parameters more room to vary over the course of the refinement.

• Limit shifts along the helical axis

• This controls whether or not particles should be aligned and backprojected with a reduced shift search space, covering only the central asymmetric unit in the helical filament. If enabled, this is done only in the last few iterations once the alignments have already converged.

• Generally, limiting the search space can improve resolutions slightly as it ensures that each individual asymmetric unit in the particle images is accounted for in the reconstruction.

• This option must be enabled if later on, particles will be Symmetry Expanded.

• Resolution to begin real-space symmetrization

• When the GSFSC reaches this resolution in the refinement, the volumes used for alignment will have helical symmetry enforced in real space.

• This is usually important for overcoming signal distortion near the box edges associated with applying helical symmetry in reconstruction [2], however once this process begins, it tends to "lock" the helical symmetry in, and further optimization of the symmetry parameters may produce no change.

• If the twist and rise estimates are believed to be accurate, then the default of 8 Å works well; alternatively, turning this down to 4-5 Å may allow the symmetry parameters more room to vary over the course of the refinement.

• Point group symmetry

• If there is a known point group symmetry in addition to the helical symmetry, applying it often boosts map quality even more. Note that cyclic symmetries are assumed to be along the same axis as the helical axis (z-axis), and the dyad axis for dihedral symmetries is assumed to be perpendicular to the helical axis. Only cyclic and dihedral (Cn and Dn) symmetries are supported.

## Helical Symmetry Search (BETA)

These parameters inform the local searching / optimization of helical symmetry parameters during the refinement iterations.

• Resolution to begin local searches of helical symmetry

• When the GSFSC reaches this resolution, local searches over the predefined (twist, rise) search range will begin and the helical symmetry parameters will be updated each iteration with the optimal values from the previous reconstruction.

• If the twist and rise estimates are believed to be accurate, then the default of 5 Å may be appropriate; alternatively, turning this up to ~7 Å may allow the symmetry parameters more room to vary over the course of the refinement.

• Note that for resolutions beyond ~7-8 Å, the model may not have enough features to indicate what the optimal symmetry is, and the search may become unstable. Thus if setting the resolution beyond 7-8 Å, it is recommended to tighten the (twist, rise) search extents from their defaults to avoid converging to an incorrect local minima.

• Twist and rise search extents

• The Minimum (maximum) helical twist (rise) to search over specify the search ranges over which local searches are to be done. By default, twist ranges are set to the initial twist estimate $\pm 3º$, and rise ranges are set to the initial rise estimate $\pm 10 \%$.

• Twist and rise grid sizes

• The Twist (rise) grid size parameters specify the granularity of the local searches; larger grid sizes mean more fine sampling of the helical symmetry parameters, but come at the cost of longer computation time. These likely do not need to be changed from the default of 128.

• Fix search grid

• This controls whether the search extents are fixed throughout the refinement, or are re-centered at the optimal helical symmetry parameters from the last iteration.

• Override outer (inner) filament diameter for search

• Local searches of the symmetry parameters will only be done over these ranges of filament diameters. By default, these will be computed using the maximum (minimum) radial extent of the refinement mask.

## Non-Uniform Refinement (NEW!)

• Use Non-Uniform Refinement

• Set this to true to use non-uniform adaptive regularization during refinement. This often significantly improves map quality over a standard homogeneous refinement, with the added cost of increased computation time.

• Local processing start resolution (A)

• When the GSFSC reaches this resolution, local processing and adaptive regularization will begin. By default this is set to 6 Å, but for helical assemblies with larger asymmetric units, this could be turned up to ~8 Å

## Initial Model

These parameters concern the generation of an initial model within the helical refinement job. For datasets with a relatively constant filament diameter, and particles with known in-plane rotations (either picked from the filament tracer, or estimated during 2D classification), it is advised to allow the helical refinement to generate it's own initial density using the particle stack, rather than use a cylindrical initial model. As well, note that any starting model can be passed into the job through the volume slot.

• Initial lowpass resolution (A)

• This is the resolution at which the initial model is lowpass filtered to.

• For refinements with symmetry enforced, the results are rather insensitive to this parameter and resolutions between 15 - 45 Å may work well.

• Alternatively, for asymmetric refinements, this parameter may strongly influence results as the additional structural information from the symmetry parameters is not there to constrain the reconstruction. Higher resolutions (< 20 Å) may help give the initial density more details to encourage convergence, however may have the caveat that the refinement converges to ambiguous or wrong symmetry. Lower resolutions (> 25 Å) may provide a more unbiased starting model, but may not have enough features to converge to a moderate or high resolution structure. For asymmetric refinements, the optimal value of this parameter is likely to depend strongly on the dataset.

• Number of images for initial density generation

• This controls the number of particle images that are used to construct the initial density; this is done by reconstructing a density where the azimuthal orientation of each particle is randomized.

• Like the initial lowpass resolution, this parameter also influences the characteristics of the initial model, and asymmetric refinements may be sensitive to its value.

• For asymmetric refinements, using less images (< 5000) may have a similar effect as using a higher initial lowpass resolution (< 20 Å); using more images (> 5000) may have a similar effect as using a lower initial lowpass resolution (> 25 Å).

• Use cylindrical model

• Set this to true in order to generate a initial cylindrical model for the refinement, with the specified inner/outer diameters. By default, initial density will be generated using the known in-plane rotation estimates, however, if these do not exist, a cylindrical model with the specified diameters can be generated.

## Refinement

• Number of extra final passes

• This controls the number of extra refinement iterations to do once the GSFSC has stopped improving.

• Maximum number of iterations below 6A

• This controls the maximum number of refinement iterations to do when the structure remains at a resolution below 6 Å (i.e.GSFSC > 6 Å)

• Since asymmetric refinements can often have slow convergence, GSFSC is not used in the algorithm's convergence criteria until the structure exceeds a resolution of 6 Å; this parameter will limit the number of iterations that the refinement will allow while the structure remains at a low resolution. By default, a maximum of 12 such iterations will be allowed.

• Mask (dynamic, static)

• Whether or not to use a dynamic or static mask. If a static mask is chosen, it must be input into the refinement.

• Note that even if a static mask is connected, this parameter must be set to "static" in order to maintain this mask.

• Dynamic mask z-clip fraction

• If a dynamic mask is used, this controls the fraction along the z-axis that the mask is allowed to have non-zero values. This can prevent density near the edges of the box from negatively impacting alignment. For example, for a z-clip fraction of 0.8, the central 80% of the mask along the z-axis will be retained and the rest will be softly set to zero.

• GSFSC split resolution (A)

• The two half-maps are considered independent only at resolutions beyond this value in Angstroms. This assumption is made in order to prevent each of the half-maps from diverging from each other in early iterations (in terms of their relative orientations, for example). By default, this is set to 20 Å. However, for smaller helical filaments or filaments with very small asymmetric units, you may observe higher quality reconstructions and better FSC curves when this is set to around 13 - 15 Å.

• Do high-resolution noise substitution in FSC computation

• This parameter controls whether or not to do high resolution noise substitution [3] once the entire dataset has been processed. This is useful for preventing overfitting, but in some cases it may limit alignment resolution for asymmetric refinements.

# Notes and Limitations

## Challenges with unknown symmetry

In general, the most critical step of the helical processing workflow is enforcement of the correct symmetry parameters. Not only does enforcing the correct symmetry allow for the averaging over many asymmetric units (hence boosting the signal contribution of every particle image), but it also imposes strong structural constraints on the refined map, allowing the refinement to converge even from a very unbiased and/or low resolution model. Even in cases with pseudo-helical symmetry (e.g. microtubules with a seam), the application of the pseudo-helical symmetry parameters may be crucial to obtain a starting model for a further asymmetric refinement. For a discussion of challenges with asymmetric refinements, and pitfalls of applying incorrect helical symmetry, please refer to this page.

## Optimal box sizes

Since helical assemblies are not limited in their spatial extent, the choice of box size with which to extract particles at can affect the quality of the refinement.

• As a general rule of thumb, the optimal box size may lie between 2 to 4 times the filament diameter, covering anywhere from 5 to hundreds of asymmetric units.

• At minimum, cryoSPARC expects the the box size to be greater than 1.5 times the filament diameter, ideally more than 2 times the filament diameter.

• In addition, for refinements without symmetry enforced, the box size should contain at least 3 asymmetric units (i.e. helical rises); for refinements with symmetry enforced, this should be increased to a minimum of 5-6 asymmetric units.

Outside of these constraints, the optimal box size likely depends the most in the degree of flexibility present in the filament dataset [1]. Highly rigid datasets, such as Tobacco Mosaic Virus (e.g. EMPIAR-10022, EMDB-2835) may work well with larger box sizes, since flexibility is not a concern and thus the larger box size only serves to increase the amount of signal available for alignment. On the other hand, flexible filaments may have a more moderate optimal box size [1]. In the case of flexibility, too large box sizes may cause the blurring of density near the box edges (due to flexibility) hence hindering alignment, and too small box sizes may not provide enough signal for optimal alignment. In the case of extreme flexibility (e.g. EMPIAR-10213, EMD-4340), a box size near covering the minimum of 5-6 helical rises may be optimal, and subsequent processing could be done by using local refinement and masking out only a single asymmetric unit.

## Challenges with amyloid fibrils

Often, amyloid fibrils present a challenge for cryo-EM reconstruction. Along with difficulty in particle picking, they exhibit signal characteristics that differ starkly from globular proteins, helical viruses, actin filaments, microtubules, etc, which complicates the refinement algorithm. Currently, there are no specialized tools in cryoSPARC for the particle picking or refinement of amyloid fibrils.

## Common next steps

• Sharpening Tools

• Create Templates