Make about page more extensive, and other minor fixes

rs-station · Jan 29, 2024 · acd036b · acd036b
1 parent d59fa78
commit acd036b
Show file tree

Hide file tree

Showing 4 changed files with 55 additions and 17 deletions.
diff --git a/docs/about.md b/docs/about.md
@@ -1,31 +1,69 @@
 # About the `matchmaps` algorithm
 
 If you want to learn more about the idea behind `matchmaps`, along with some examples, please check out our new pre-print!  
-
-> [MatchMaps: Non-isomorphous difference maps for X-ray crystallography](https://www.biorxiv.org/content/10.1101/2023.09.01.555333v1) 
 
-## Excerpts from the pre-print
+> [MatchMaps: Non-isomorphous difference maps for X-ray crystallography](https://www.biorxiv.org/content/10.1101/2023.09.01.555333v2) 
 
-### Abstract
-Conformational change mediates the biological functions of proteins. Crystallographic measurements can map these changes with extraordinary sensitivity as a function of mutations, ligands, and time. The isomorphous difference map remains the gold standard for detecting structural differences between datasets. Isomorphous difference maps combine the phases of a chosen reference state with the observed changes in structure factor amplitudes to yield a map of changes in electron density. Such maps are much more sensitive to conformational change than structure refinement is, and are unbiased in the sense that observed differences do not depend on refinement of the perturbed state. However, even minute changes in unit cell dimensions can render isomorphous difference maps useless. This is unnecessary. Here we describe a generalized procedure for calculating observed difference maps that retains the high sensitivity to conformational change and avoids structure refinement of the perturbed state. We have implemented this procedure in an open-source python package, MatchMaps, that can be run in any software environment supporting PHENIX and CCP4. Through examples, we show that MatchMaps “rescues” observed difference electron density maps for near-isomorphous crystals, corrects artifacts in nominally isomorphous difference maps, and extends to detecting differences across copies within the asymmetric unit, or across altogether different crystal forms.
+If what you're looking for is a user's guide, you can find that [here](quickstart.md). But if you're looking for more details about how `matchmaps` works, read on!
 
-### Algorithm description
+## Abstract
+Conformational change mediates the biological functions of macromolecules. Crystal-lographic measurements can map these changes with extraordinary sensitivity as a function of mutations, ligands, and time. The isomorphous difference map remains the gold standard for detecting structural differences between datasets. Isomorphous difference maps combine the phases of a chosen reference state with the observed changes in structure factor amplitudes to yield a map of changes in electron density. Such maps are much more sensitive to conformational change than structure refinement is, and are unbiased in the sense that observed differences do not depend on refinement of the perturbed state. However, even minute changes in unit cell properties can render isomorphous difference maps useless. This is unnecessary. Here we describe a generalized procedure for calculating observed difference maps that retains the high sensitivity to conformational change and avoids structure refinement of the perturbed state. We have implemented this procedure in an open-source python package, MatchMaps, that can be run in any software environment supporting PHENIX and CCP4. Through examples, we show that MatchMaps “rescues” observed difference electron density maps for poorly-isomorphous crystals, corrects artifacts in nominally isomorphous difference maps, and extends to detecting differences across copies within the asymmetric unit, or across altogether different crystal forms.
+
+## Algorithm overview
 
   1. Place both sets of structure factor amplitudes on a common scale using CCP4’s `SCALEIT` utility and truncate the data to the same resolution range.
-  2. Generate phases for each dataset via the `phenix.refine` program. For each dataset, the OFF starting model is used, and only rigid-body refinement is permitted to prevent the introduction of model bias.
+  2. Generate phases for each dataset via the `phenix.refine` program. For both datasets, the OFF starting model is used, and only rigid-body refinement is permitted to prevent the introduction of model bias.
   3. Fourier-transform each set of complex structure factors into a real-space electron density map using the python packages `reciprocalspaceship` and `gemmi`.
   4. Compute the translation and rotation necessary to overlay the two rigid-body refined models. Apply this translation-rotation to the ON real-space map such that it overlays with the OFF map. These computations are carried out using `gemmi`.
-  5. Place both real-space maps on a common scale.
-  6. Subtract real-space maps voxel-wise.
-  7. Apply a solvent mask to the final difference map.
+  5. Subtract real-space maps voxel-wise.
+
+## Algorithmic details
+
+### Scaling
+
+Scaling includes fitting both an overall scale factor and an anisotropic B-factor. `matchmaps` performs scaling via the [CCP4 `SCALEIT` utility](https://www.ccp4.ac.uk/html/scaleit.html), assisted by the [`rs-booster` `rs.scaleit` utility](https://rs-station.github.io/rs-booster/misc.html#rs-scaleit).  
+
+### Refinement
+
+Refinement is performed via `phenix.refine`. `matchmaps` makes use of a custom `.eff` parameter template which can be found in full in the [source code](https://github.com/rs-station/matchmaps/blob/d59fa78c2f549904d0042e637262ef6c5171d355/src/matchmaps/_utils.py#L216).
+
+By default, refinement includes bulk-solvent scaling, as this produces the best refinement results. However, in some cases, you may expect your ON data to include interesting signal far away from the OFF model, in regions outside the solvent mask. A common example of this would be if your ON data includes a bound ligand. In such situations, we recommend that bulk-solvent scaling be deactivated. You can find instructions for turning off bulk-solvent scaling (and for changing the solvent mask; see [below](#solvent-masking)) [here](quickstart.md#other-useful-options) 
+
+The user also has the option to perform rigid-body refinement on multiple different selections. For example, if your protein model contains multiple chains, those chains may move slightly relative to each other; you may not be interested in visualizing this shift in your difference map. In this case, you can specify the model selections that should be refined separately. Instuctions for doing so can be found [here](quickstart.md#other-useful-options). Note that any non-macromolecule atoms in your model will be renumbered to belong to the nearest macromolecule chain using the `phenix.sort_hetatms` utility.
+
+More generally, this refinement can be fully customized by providing a `.eff` file. If you're interested in doing this, I recommend using the [template found in the source code](https://github.com/rs-station/matchmaps/blob/d59fa78c2f549904d0042e637262ef6c5171d355/src/matchmaps/_utils.py#L216) as a starting point. Don't hesitate to [file an issue on github](https://github.com/dennisbrookner/matchmaps/issues) if you have any issues.
+
+### The Fourier transform
+
+Conversion of structure factors into a real-space electron density grid is handled by `matchmaps` in python, with the help of the `reciprocalspaceship` and `gemmi` packages. This approach allows for maximum flexibility and minimal "black-box code."
+
+`matchmaps` uses the "F-obs-filtered" column for structure factor amplitudes, and the "PH2FOFCWT" column for structure factor phases. Optionally, structure factor amplitudes can be error-weighted (using "SIGF-obs-filtered" uncertainties) via the formula described in Equation 7 here: [*reciprocalspaceship*: a Python library for crystallographic data analysis](https://scripts.iucr.org/cgi-bin/paper?S160057672100755X). If you are interested in using different MTZ columns, please let me know by [filing an issue on github](https://github.com/dennisbrookner/matchmaps/issues) and this feature could be added.
+
+By default, both input datasets are truncated to matching resolution. Note that this is a "refine, then truncate" approach - both refinements make use of the full resolution of the dataset, and the data is only truncated afterwards. Optionally, you may provide an explicit resolution cut to be applied to both datasets; this (or error-weighting) may be useful if you believe your high-resolution reflections are noisy.
+
+By default, the real-space voxels in the output maps are approximately 0.5 Angstrom cubes. If you're planning on visualizing your maps in Coot, I recommend keeping this default; if you're planning on visualizing your maps in PyMOL, I recommend 0.25-Angstrom spacing. Find more details [here](quickstart.md#other-useful-options).
+
+### Real-space alignment
+
+Conveniently, real-space alignment of the two rigid-body-refined models is very easy, because they're exactly the same! `matchmaps` use C-alphas, but any atom selection would do. Then, the transformation defining this alignment is applied to the ON real-space grid, such that it aligns with the OFF real-space grid. (You can reverse this an align the OFF grid to the ON grid if you would like, as described [here](quickstart.md#other-useful-options)). Grid alignment is handled conveniently via the `gemmi` method `interpolate_grid_of_aligned_model2`.
+
+Note that if you have specified multiple rigid-body selections (as described [above](#refinement)), then this real-space alignment will be performed separately for each selection.
+
+### Solvent masking
+
+The real-space alignment performed by `matchmaps` will align the molecule chain(s) of interest, but will likely **mis-align** any symmetry-related molecules. For this reason, it is essential to mask the symmetry-related chains out of your final maps. The main difference map output by `matchmaps` is solvent masked with a 2 Angstrom radius - pretty tight.
+
+However, as discussed [above](#refinement), you may expect your ON data to include signal far away from your OFF model. To account for this possibility, `matchmaps` produces a second difference map with a more generous solvent-masking radius. This more generous radius defaults to 5 Angstroms and [can be changed](quickstart.md#other-useful-options). The file containing this generously-masked difference map will have `_unmasked` at the end of its name; `matchmaps` output files are described [here](quickstart.md#important-map-outputs).
+
+Note that real-space normalization of the two maps is critical for producing a "balanced" difference map. This normalization is performed not on across the unit cell or the asymmetric unit, but rather on the remaining non-zero voxels after applying the more generous solvent mask.
 
-## `matchmaps` variants
+## Variants of `matchmaps`
 
 In addition to the core `matchmaps` utility, two additional command-line functions are available: `matchmaps.mr` and `matchmaps.ncs`. The algorithm is modified slightly for each utility.
 
 ### `matchmaps.mr`
 
-`matchmaps.mr` can support two input reflection files that are in different crystal packings or spacegroups. Accordingly, step 1 above is replaced with a round of molecular replacement (using `phenix.phaser`) whereing the OFF starting model is used as a molecular replacement solution for the ON reflections. The algorithm proceeds essentially identically from this point.
+`matchmaps.mr` can support two input reflection files that are in different crystal packings or spacegroups. Accordingly, step 1 above (scaling) is replaced with a round of molecular replacement (using `phenix.phaser`) wherein the OFF starting model is used as a molecular replacement solution for the ON reflections. The algorithm proceeds essentially identically from this point.
 
 One small but important change in `matchmaps.mr` is that all ordered water molecules in the OFF starting model are discarded. This is important, as the ordered water molecules in one spacegroup/crystal packing are often not appropriate in a different spacegroup/crystal packing.
 

diff --git a/docs/index.md b/docs/index.md
@@ -4,8 +4,8 @@ Welcome to the docs for `matchmaps`, a python package for aligning and subtracti
 
 The [quickstart guide](quickstart.md) contains installation instructions and sample usage of the basic `matchmaps` utility, along with other useful tips.
 
-If you'd like to learn more about `matchmaps` and see the package in action, please check out our pre-print!
-> [MatchMaps: Non-isomorphous difference maps for X-ray crystallography](https://www.biorxiv.org/content/10.1101/2023.09.01.555333v1.full.pdf+html)
+If you'd like to learn more about `matchmaps`, you can find our exploration of the package [here](about.md). For even more information, and see the package in action, please check out our pre-print!
+> [MatchMaps: Non-isomorphous difference maps for X-ray crystallography](https://www.biorxiv.org/content/10.1101/2023.09.01.555333v2.full.pdf+html)
 
 This software is part of the [Reciprocal Space Station](https://rs-station.github.io/) family of open-source crystallography software and was conceived in [Doeke Hekstra's Lab](https://hekstralab.fas.harvard.edu/) at Harvard by [Dennis Brookner](https://dennisbrookner.github.io/)
 

diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -1,6 +1,6 @@
 # Getting started with `matchmaps`
 
-On this page, we'll explore how to use the basic `matchmaps` utility and examine its outputs.  Full documation of all options for all three command-line utilities can be found [here](cli.md) or by typing the command plus `--help` into the command line.
+On this page, we'll explore how to use the basic `matchmaps` utility and examine its outputs.  Full documation of all options for all three command-line utilities can be found [here](cli.md) or by typing the command plus `--help` into the command line. A more detailed exploration of the `matchmaps` algorithm can be found [here](about.md).
 
 ## Installation
 
@@ -70,7 +70,7 @@ matchmaps --mtzoff apo_data.mtz Fobs SIGFobs \
     --pdboff apo.pdb \
     --ligands weird_solvent_1.cif weird_solvent_2.cif
 ```
- 
+
 If you'd like read or write files from somewhere other than your current directory, you can! There are three ways to specify input files:
  - Provide relative paths directly for all input files
  - Provide only file names, and add the `--input-dir` option to specify where those files live. If you do this, the same `--input-dir` will be preprended to all filenames, so your files should all live in the same place.

diff --git a/docs/visualization.md b/docs/visualization.md
@@ -36,7 +36,7 @@ isomesh differencemesh_positive, difference_map, 2, pdb and resi 20
 
 `matchmaps` outputs are **already normalized**. This means that when looking at `matchmaps` outputs, the `normalize_ccp4_maps` option should always be set to `off`. If this option is turned on, then the map's contour levels are likely to be very different from, say, how they look in Coot.
 
-### Symmetry expansion.
+### Symmetry expansion
 As mentioned above, `matchmaps` outputs are always in spacegroup P1. This means that the only relevant symmetry operation is the periodic boundary condition. But there is a catch - even though your map is always in P1, your structural model is unlikely to be! PyMOL looks for symmetry operations wherever it can find them. This means that if `map_auto_expand_sym` is on, and you use the `isomesh` command as such:
 
 ```