diff --git a/docs/analysis.rst b/docs/analysis.rst index ceb7f1e6..f2db2c57 100644 --- a/docs/analysis.rst +++ b/docs/analysis.rst @@ -424,6 +424,69 @@ contacts to link definitions from :ref:`monomer library ` and to connections (LINK, SSBOND) from the structure. If you find it useful, please contact the author. +Matthews coefficient +==================== + +Matthews coefficient V\ :sub:`M` is defined as the crystal volume +per unit of protein molecular weight. Typically, the molecular weight +for V\ :sub:`M` is calculated from a sequence, +and that's what this section is mostly about. + +First, let's read a structure and get a protein sequence: + +.. doctest:: + + >>> st = gemmi.read_structure('../tests/5cvz_final.pdb') + >>> st.setup_entities() # it should sort out chain parts + >>> list(st[0]) + [] + >>> # we have just a single chain, which makes this example simpler + >>> chain = st[0]['A'] + >>> chain.get_polymer() + + >>> st.get_entity_of(_) # doctest: +ELLIPSIS + + >>> sequence = _.full_sequence + +Gemmi provides a simple function to calculate molecular weight +from the sequence using the built-in table of popular residues: + +.. doctest:: + + >>> weight = gemmi.calculate_sequence_weight(_.full_sequence) + >>> # Now we can calculate Matthews coefficient + >>> st.cell.volume_per_image() / weight + 3.1983428753317003 + +We can continue and calculate the solvent content, assuming the protein +density of 1.35 g/cm\ :sup:`3` (the other constants below are the Avogadro +number and Å\ :sup:`3`/cm\ :sup:`3` = 10\ :sup:`-24`): + +.. doctest:: + + >>> protein_fraction = 1. / (6.02214e23 * 1e-24 * 1.35 * _) + >>> print('Solvent content: {:.1f}%'.format(100 * (1 - protein_fraction))) + Solvent content: 61.5% + +If the sequence includes rare chemical components +(outside of the top 300+ most popular components in the PDB), you may +specify the average weight of the components that are not tabulated: + +.. doctest:: + + >>> sequence = ['DSN', 'ALA', 'N2C', 'MVA', 'DSN', 'ALA', 'NCY', 'MVA'] + >>> gemmi.calculate_sequence_weight(sequence, unknown=130.0) + 784.6114543066407 + +The weights are assumed to be of unbonded residues. Therefore, the chain weight +is calculated as a sum of all components minus +(*N*--1) × weight of H\ :sub:`2`\ O. + +.. note:: + + Gemmi includes a program that calculates the Matthews coefficient + and the solvent content: :ref:`gemmi-contents `. + Superposition ============= @@ -1131,89 +1194,3 @@ where TBC - -.. _pdb_dir: - -Local copy of the PDB archive -============================= - -Some of the examples in this documentation work with a local copy -of the Protein Data Bank archive. This subsection describes -the assumed setup. - -Like in BioJava, we assume that the `$PDB_DIR` environment variable -points to a directory that contains `structures/divided/mmCIF` -- the same -arrangement as on the -`PDB's FTP `_ server. - -.. code-block:: console - - $ cd $PDB_DIR - $ du -sh structures/*/* # as of Jun 2017 - 34G structures/divided/mmCIF - 25G structures/divided/pdb - 101G structures/divided/structure_factors - 2.6G structures/obsolete/mmCIF - -A traditional way to keep an up-to-date local archive is to rsync it -once a week: - -.. code-block:: shell - - #!/bin/sh -x - set -u # PDB_DIR must be defined - rsync_subdir() { - mkdir -p "$PDB_DIR/$1" - # Using PDBe (UK) here, can be replaced with RCSB (USA) or PDBj (Japan), - # see https://www.wwpdb.org/download/downloads - rsync -rlpt -v -z --delete \ - rsync.ebi.ac.uk::pub/databases/pdb/data/$1/ "$PDB_DIR/$1/" - } - rsync_subdir structures/divided/mmCIF - #rsync_subdir structures/obsolete/mmCIF - #rsync_subdir structures/divided/pdb - #rsync_subdir structures/divided/structure_factors - -Gemmi has a helper function for using the local archive copy. -It takes a PDB code (case insensitive) and a symbol denoting what file -is requested: P for PDB, M for mmCIF, S for SF-mmCIF. - -.. doctest:: - - >>> os.environ['PDB_DIR'] = '/copy' - >>> gemmi.expand_if_pdb_code('1ABC', 'P') # PDB file - '/copy/structures/divided/pdb/ab/pdb1abc.ent.gz' - >>> gemmi.expand_if_pdb_code('1abc', 'M') # mmCIF file - '/copy/structures/divided/mmCIF/ab/1abc.cif.gz' - >>> gemmi.expand_if_pdb_code('1abc', 'S') # SF-mmCIF file - '/copy/structures/divided/structure_factors/ab/r1abcsf.ent.gz' - -If the first argument is not in the PDB code format (4 characters for now) -the function returns the first argument. - -.. doctest:: - - >>> arg = 'file.cif' - >>> gemmi.is_pdb_code(arg) - False - >>> gemmi.expand_if_pdb_code(arg, 'M') - 'file.cif' - -Multiprocessing -=============== - -(Python-specific) - -Most of the gemmi objects cannot be pickled. Therefore, they cannot be -passed between processes when using the multiprocessing module. -Currently, the only picklable classes (with protocol >= 2) are: -UnitCell and SpaceGroup. - -Usually, it is possible to organize multiprocessing in such a way that -gemmi objects are not passed between processes. The example script below -traverses subdirectories and asynchronously analyzes coordinate files, -using 4 worker processes in parallel. - -.. literalinclude:: ../examples/multiproc.py - :language: python - :lines: 4- diff --git a/docs/chemistry.rst b/docs/chemistry.rst index 353974fc..5c513061 100644 --- a/docs/chemistry.rst +++ b/docs/chemistry.rst @@ -476,49 +476,3 @@ The `logging` argument above is described in the next section. TBC - -.. _logger: - -Logger -====== - -Gemmi Logger is a tiny helper class for passing messages from a gemmi function -to the calling function. It doesn't belong in this section, but it's -documented here because it's used in the previous subsection and I haven't found -a better spot for it. - -The messages being passed are usually info or warnings that a command-line -program would print to stdout or stderr. - -The Logger has two member variables: - -.. literalinclude:: ../include/gemmi/logger.hpp - :language: cpp - :start-at: /// - :end-at: int threshold - -and a few member functions for sending messages. - -When a function takes a Logger argument, we can pass: - -**C++** - -* `{&Logger::to_stderr}` to redirect messages to stderr - (to_stderr() calls fprintf), -* `{&Logger::to_stdout}` to redirect messages to stdout, -* `{&Logger::to_stdout, 3}` to print only warnings (threshold=3), -* `{nullptr, 0}` to disable all messages, -* `{}` to throw errors and ignore other messages (the default, see Quirk above), -* `{[](const std::string& s) { do_anything(s);}}` to do anything else. - -**Python** - -* `sys.stderr` or `sys.stdout` or any other stream (an object with `write` - and `flush` methods), to redirect messages to that stream, -* `(sys.stdout, 3)` to print only warnings (threshold=3), -* `(None, 0)` to disable all messages, -* `None` to throw errors and ignore other messages (the default, see Quirk above), -* a function that takes a message string as its only argument - (e.g. `lambda s: print(s.upper())`). - - diff --git a/docs/conf.py b/docs/conf.py index a785fcbf..ecdf8008 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -21,7 +21,8 @@ version = _line.split()[2].strip('"') release = version -exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] +# now sure if we'll use headers.rst again, disable it for now +exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', 'headers.rst' ] pygments_style = 'sphinx' todo_include_todos = False highlight_language = 'cpp' @@ -43,6 +44,30 @@ html_show_sourcelink = False html_copy_source = False +def setup(app): + app.connect("builder-inited", monkey_patching_furo) + +def monkey_patching_furo(app): + if app.builder.name != 'html': + return + + import furo + def _compute_navigation_tree(context: Dict[str, Any]) -> str: + # The navigation tree, generated from the sphinx-provided ToC tree. + if "toctree" in context: + toctree = context["toctree"] + toctree_html = toctree( + collapse=False, + titles_only=False, + maxdepth=2, + includehidden=True, + ) + else: + toctree_html = "" + return furo.get_navigation_tree(toctree_html) + + furo._compute_navigation_tree = _compute_navigation_tree + # -- Options for LaTeX output --------------------------------------------- latex_elements = { diff --git a/docs/hkl.rst b/docs/hkl.rst index 62475caf..2eb9bd32 100644 --- a/docs/hkl.rst +++ b/docs/hkl.rst @@ -1001,6 +1001,11 @@ program documentation for details. >>> # and convert it back >>> cif_string = gemmi.MtzToCif().write_cif_to_string(_) +XDS_ASCII +========= + +TODO: document functions from `xds_ascii.hpp` + SX hkl CIF ========== diff --git a/docs/index.rst b/docs/index.rst index c9617963..ae9733e0 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,11 +1,15 @@ .. meta:: :google-site-verification: LsEfb1rjo2RL8WOSZGigV11Kgyhtk9v1Vb-6GZFnHKo -GEMMI - library for structural biology -====================================== +Overview +######## -Gemmi is a library, accompanied by a set of programs, -developed primarily for use in **macromolecular crystallography** (MX). +What is it for? +=============== + +Gemmi is a library, accompanied by a :ref:`set of programs `, +developed primarily for use in **structural biology**, +and in particular in **macromolecular crystallography** (MX). For working with: * macromolecular models (content of PDB, PDBx/mmCIF and mmJSON files), @@ -53,22 +57,109 @@ Source code repository: https://github.com/project-gemmi/gemmi .. _me: wojdyr+gemmi@gmail.com Contents --------- +======== .. toctree:: - :maxdepth: 2 + :maxdepth: 1 - Introduction + Overview install + program + +.. toctree:: + :caption: Prerequisites + :maxdepth: 2 + cif symmetry cell + misc + +.. toctree:: + :caption: Working with Molecules + :maxdepth: 2 + chemistry mol analysis + +.. toctree:: + :caption: Working with Data + :maxdepth: 2 + grid hkl scattering - program + +.. toctree:: + :caption: Other Docs + + ChangeLog Python API reference C++ API reference + +Credits +======= + +This project is using code from a number of third-party open-source projects. + +Projects used in the C++ library, included under +`include/gemmi/third_party/` (if used in headers) or `third_party/`: + +* `PEGTL `_ -- library for creating PEG + parsers. License: MIT. +* `sajson `_ -- high-performance + JSON parser. License: MIT. +* `PocketFFT `_ -- FFT library. + License: 3-clause BSD. +* `stb_sprintf `_ -- locale-independent + snprintf() implementation. License: Public Domain. +* `fast_float `_ -- locale-independent + number parsing. License: Apache 2.0. +* `tinydir `_ -- directory (filesystem) + reader. License: 2-clause BSD. + +Code derived from the following projects is used in the library: + +* `ksw2 `_ -- sequence alignment in + `seqalign.hpp` is based on the ksw_gg function from ksw2. License: MIT. +* `QCProt `_ -- superposition method + in `qcp.hpp` is taken from QCProt and adapted to our project. License: BSD. +* `Larch `_ -- calculation of f' and f" + in `fprime.cpp` is based on CromerLiberman code from Larch. + License: 2-clause BSD. + +Projects included under `third_party/` that are not used in the library +itself, but are used in command-line utilities, python bindings or tests: + +* `zpp serializer `_ -- + serialization framework. License: MIT. +* `The Lean Mean C++ Option Parser `_ -- + command-line option parser. License: MIT. +* `doctest `_ -- testing framework. + License: MIT. +* `linalg.h `_ -- linear algebra library. + License: Public Domain. +* `zlib `_ -- a subset of the zlib library + for decompressing gz files, used as a fallback when the zlib library + is not found in the system. License: zlib. + +Not distributed with Gemmi: + +* `nanobind `_ -- used for creating + Python bindings. License: 3-clause BSD. +* `zlib-ng `_ -- optional, can be used + instead of zlib for faster reading of gzipped files. +* `cctbx `_ -- used in tests + (if cctbx is not present, these tests are skipped) and + in scripts that generated space group data and 2-fold twinning operations. + License: 3-clause BSD. + +Mentions: + +* `NLOpt `_ + was used to try out various optimization methods for class Scaling. + License: MIT. + +Email me if I forgot about something. + diff --git a/docs/install.rst b/docs/install.rst index f2495264..cf44e921 100644 --- a/docs/install.rst +++ b/docs/install.rst @@ -225,76 +225,3 @@ We also have *Python doctest* tests in the documentation, and a few other test routines. All the commands used for testing are listed in the `run-tests.sh` script in the repository. - -Credits -------- - -This project is using code from a number of third-party open-source projects. - -Projects used in the C++ library, included under -`include/gemmi/third_party/` (if used in headers) or `third_party/`: - -* `PEGTL `_ -- library for creating PEG - parsers. License: MIT. -* `sajson `_ -- high-performance - JSON parser. License: MIT. -* `PocketFFT `_ -- FFT library. - License: 3-clause BSD. -* `stb_sprintf `_ -- locale-independent - snprintf() implementation. License: Public Domain. -* `fast_float `_ -- locale-independent - number parsing. License: Apache 2.0. -* `tinydir `_ -- directory (filesystem) - reader. License: 2-clause BSD. - -Code derived from the following projects is used in the library: - -* `ksw2 `_ -- sequence alignment in - `seqalign.hpp` is based on the ksw_gg function from ksw2. License: MIT. -* `QCProt `_ -- superposition method - in `qcp.hpp` is taken from QCProt and adapted to our project. License: BSD. -* `Larch `_ -- calculation of f' and f" - in `fprime.cpp` is based on CromerLiberman code from Larch. - License: 2-clause BSD. - -Projects included under `third_party/` that are not used in the library -itself, but are used in command-line utilities, python bindings or tests: - -* `zpp serializer `_ -- - serialization framework. License: MIT. -* `The Lean Mean C++ Option Parser `_ -- - command-line option parser. License: MIT. -* `doctest `_ -- testing framework. - License: MIT. -* `linalg.h `_ -- linear algebra library. - License: Public Domain. -* `zlib `_ -- a subset of the zlib library - for decompressing gz files, used as a fallback when the zlib library - is not found in the system. License: zlib. - -Not distributed with Gemmi: - -* `nanobind `_ -- used for creating - Python bindings. License: 3-clause BSD. -* `zlib-ng `_ -- optional, can be used - instead of zlib for faster reading of gzipped files. -* `cctbx `_ -- used in tests - (if cctbx is not present, these tests are skipped) and - in scripts that generated space group data and 2-fold twinning operations. - License: 3-clause BSD. - -Mentions: - -* `NLOpt `_ - was used to try out various optimization methods for class Scaling. - License: MIT. - -Email me if I forgot about something. - -List of C++ headers -------------------- - -Here is a list of C++ headers in `gemmi/include/`. -This list also provides an overview of the library. - -.. include:: headers.rst diff --git a/docs/misc.rst b/docs/misc.rst new file mode 100644 index 00000000..fba497b8 --- /dev/null +++ b/docs/misc.rst @@ -0,0 +1,149 @@ +Miscellaneous utils +################### + +FASTA and PIR reader +-------------------- + +Gemmi provides a function to parse two sequence file formats, FASTA and PIR. +The function takes a string containing the file's content as an argument: + +.. doctest:: + + >>> with open('P0C805.fasta') as f: + ... fasta_str = f.read() + >>> gemmi.read_pir_or_fasta(fasta_str) #doctest: +ELLIPSIS + [] + +The string must start with a header line that begins with `>`. +In the case of the PIR format, which starts with `>P1;` (or F1, DL, DC, RL, RC, +or XX instead of P1), the next line is also part of the header. +The sequence file may contain multiple sequences, each preceded by a header. +Whitespace in a sequence is ignored, except for blank lines, +which are only allowed between sequences. +A sequence can contain letters, dashes, and residue names in parentheses. +The latter is an extension inspired by the format used in mmCIF files, +in which non-standard residues are given in parentheses, e.g., `MA(MSE)GVN`. +The sequence may end with `*`. + +`FastaSeq` objects, returned from `read_pir_or_fasta()`, +contain only two strings: + +.. doctest:: + + >>> (fasta_seq,) = _ + >>> fasta_seq.header + 'sp|P0C805|PSMA3_STAA8 Phenol-soluble modulin alpha 3 peptide OS=Staphylococcus aureus (strain NCTC 8325 / PS 47) OX=93061 GN=psmA3 PE=1 SV=1' + >>> fasta_seq.seq + 'MEFVAKLFKFFKDLLGKFLGNN' + +.. _logger: + +Logger +====== + +Gemmi Logger is a tiny helper class for passing messages from a gemmi function +to the calling function. It doesn't belong in this section, but it's +documented here because it's used in the previous subsection and I haven't found +a better spot for it. + +The messages being passed are usually info or warnings that a command-line +program would print to stdout or stderr. + +The Logger has two member variables: + +.. literalinclude:: ../include/gemmi/logger.hpp + :language: cpp + :start-at: /// + :end-at: int threshold + +and a few member functions for sending messages. + +When a function takes a Logger argument, we can pass: + +**C++** + +* `{&Logger::to_stderr}` to redirect messages to stderr + (to_stderr() calls fprintf), +* `{&Logger::to_stdout}` to redirect messages to stdout, +* `{&Logger::to_stdout, 3}` to print only warnings (threshold=3), +* `{nullptr, 0}` to disable all messages, +* `{}` to throw errors and ignore other messages (the default, see Quirk above), +* `{[](const std::string& s) { do_anything(s);}}` to do anything else. + +**Python** + +* `sys.stderr` or `sys.stdout` or any other stream (an object with `write` + and `flush` methods), to redirect messages to that stream, +* `(sys.stdout, 3)` to print only warnings (threshold=3), +* `(None, 0)` to disable all messages, +* `None` to throw errors and ignore other messages (the default, see Quirk above), +* a function that takes a message string as its only argument + (e.g. `lambda s: print(s.upper())`). + + +.. _pdb_dir: + +Copy of the PDB archive +======================= + +Some of the examples in this documentation work with a local copy +of the Protein Data Bank archive. This subsection describes +the assumed setup and functions for working with this setup. + +Like in BioJava, we assume that the `$PDB_DIR` environment variable +points to a directory that contains `structures/divided/mmCIF` -- the same +arrangement as on the +`PDB's FTP `_ server. + +.. code-block:: console + + $ cd $PDB_DIR + $ du -sh structures/*/* # as of Jun 2017 + 34G structures/divided/mmCIF + 25G structures/divided/pdb + 101G structures/divided/structure_factors + 2.6G structures/obsolete/mmCIF + +A traditional way to keep an up-to-date local archive is to rsync it +once a week: + +.. code-block:: shell + + #!/bin/sh -x + set -u # PDB_DIR must be defined + rsync_subdir() { + mkdir -p "$PDB_DIR/$1" + # Using PDBe (UK) here, can be replaced with RCSB (USA) or PDBj (Japan), + # see https://www.wwpdb.org/download/downloads + rsync -rlpt -v -z --delete \ + rsync.ebi.ac.uk::pub/databases/pdb/data/$1/ "$PDB_DIR/$1/" + } + rsync_subdir structures/divided/mmCIF + #rsync_subdir structures/obsolete/mmCIF + #rsync_subdir structures/divided/pdb + #rsync_subdir structures/divided/structure_factors + +Gemmi has a helper function for using the local archive copy. +It takes a PDB code (case insensitive) and a symbol denoting what file +is requested: P for PDB, M for mmCIF, S for SF-mmCIF. + +.. doctest:: + + >>> os.environ['PDB_DIR'] = '/copy' + >>> gemmi.expand_if_pdb_code('1ABC', 'P') # PDB file + '/copy/structures/divided/pdb/ab/pdb1abc.ent.gz' + >>> gemmi.expand_if_pdb_code('1abc', 'M') # mmCIF file + '/copy/structures/divided/mmCIF/ab/1abc.cif.gz' + >>> gemmi.expand_if_pdb_code('1abc', 'S') # SF-mmCIF file + '/copy/structures/divided/structure_factors/ab/r1abcsf.ent.gz' + +If the first argument is not in the PDB code format (4 characters for now) +the function returns the first argument. + +.. doctest:: + + >>> arg = 'file.cif' + >>> gemmi.is_pdb_code(arg) + False + >>> gemmi.expand_if_pdb_code(arg, 'M') + 'file.cif' diff --git a/docs/mol.rst b/docs/mol.rst index 04268be8..446bc436 100644 --- a/docs/mol.rst +++ b/docs/mol.rst @@ -24,9 +24,9 @@ Reading coordinate files Gemmi support the following coordinate file formats: - * mmCIF (PDBx/mmCIF), - * PDB (with popular extensions), - * mmJSON. +* mmCIF (PDBx/mmCIF), +* PDB (with popular extensions), +* mmJSON. It can also read coordinates from the chemical components dictionary (CCD) and from Refmac monomer library -- these are not really coordinate @@ -1885,95 +1885,6 @@ way around, if we know the kind of residues encoded with single letters: ['DSN', 'ALA', 'N2C', 'MVA', 'DSN', 'ALA', 'NCY', 'MVA'] -Molecular weight ----------------- - -Gemmi provides a simple function to calculate molecular weight -from the sequence. It uses the same built-in table of popular residues. -Since in this example we have two rare components that are not tabulated, -we must specify the average weight of unknown residue: - -.. doctest:: - - >>> gemmi.calculate_sequence_weight(seq, unknown=130.0) - 784.6114543066407 - -In such case the result is not accurate, but this is not a typical case. - -Now we will take a PDB file with standard residues -and calculate the Matthews coefficient: - -.. doctest:: - - >>> st = gemmi.read_structure('../tests/5cvz_final.pdb') - >>> list(st[0]) - [] - >>> # we have just a single chain, which makes this example simpler - >>> chain = st[0]['A'] - >>> chain.get_polymer() - - >>> # Not good. The chain parts where not assigned automatically, - >>> # because of the missing TER record in this file. We need to call: - >>> st.setup_entities() # it should sort out chain parts - >>> chain.get_polymer() - - >>> st.get_entity_of(_) # doctest: +ELLIPSIS - - >>> weight = gemmi.calculate_sequence_weight(_.full_sequence) - >>> # Now we can calculate Matthews coefficient - >>> st.cell.volume_per_image() / weight - 3.1983428753317003 - -We could continue and calculate the solvent content, assuming the protein -density of 1.35 g/cm\ :sup:`3` (the other constants below are the Avogadro -number and Å\ :sup:`3`/cm\ :sup:`3` = 10\ :sup:`-24`): - -.. doctest:: - - >>> protein_fraction = 1. / (6.02214e23 * 1e-24 * 1.35 * _) - >>> print('Solvent content: {:.1f}%'.format(100 * (1 - protein_fraction))) - Solvent content: 61.5% - -Gemmi also includes a program that calculates the solvent content: -:ref:`gemmi-contents `. - -FASTA and PIR -------------- - -The coordinate files can contain sequences internally. -Nevertheless, we may need to use a sequence from UniProt or another source. -Gemmi provides a function to parse two sequence file formats, FASTA and PIR. -The function takes a string containing the file's content as an argument: - -.. doctest:: - - >>> with open('P0C805.fasta') as f: - ... fasta_str = f.read() - >>> gemmi.read_pir_or_fasta(fasta_str) #doctest: +ELLIPSIS - [] - -The string must start with a header line that begins with `>`. -In the case of PIR format, which starts with `>P1;` (or F1, DL, DC, RL, RC, -or XX instead of P1), the next line is also part of the header. -The sequence file may contain multiple sequences, each preceded by a header. -Whitespace in a sequence is ignored, except for blank lines, -which are only allowed between sequences. -A sequence can contain letters, dashes, and residue names in parentheses. -The latter is an extension inspired by the format used in mmCIF files, -in which non-standard residues are given in parentheses, e.g., `MA(MSE)GVN`. -The sequence may end with `*`. - -FastaSeq objects, returned from `read_pir_or_fasta()`, -contain only two strings: - -.. doctest:: - - >>> (fasta_seq,) = _ - >>> fasta_seq.header - 'sp|P0C805|PSMA3_STAA8 Phenol-soluble modulin alpha 3 peptide OS=Staphylococcus aureus (strain NCTC 8325 / PS 47) OX=93061 GN=psmA3 PE=1 SV=1' - >>> fasta_seq.seq - 'MEFVAKLFKFFKDLLGKFLGNN' - .. _sequence-alignment: Sequence alignment @@ -3072,3 +2983,15 @@ rainbow-colored chain: :scale: 100 :target: https://www.rcsb.org/3d-view/5XG2/ + +Multiprocessing +--------------- + +(Python-specific) + +The example script below traverses subdirectories and asynchronously +analyzes coordinate files, using 4 worker processes in parallel. + +.. literalinclude:: ../examples/multiproc.py + :language: python + :lines: 4- diff --git a/docs/program.rst b/docs/program.rst index 77735613..f647fc72 100644 --- a/docs/program.rst +++ b/docs/program.rst @@ -1,5 +1,7 @@ .. highlight:: console +.. _program: + Gemmi program ############# diff --git a/include/gemmi/seqtools.hpp b/include/gemmi/seqtools.hpp index 64f5602e..5ae031de 100644 --- a/include/gemmi/seqtools.hpp +++ b/include/gemmi/seqtools.hpp @@ -13,7 +13,7 @@ namespace gemmi { constexpr double h2o_weight() { return 2 * 1.00794 + 15.9994; } inline double calculate_sequence_weight(const std::vector& seq, - double unknown=0.) { + double unknown=100.) { double weight = 0.; for (const std::string& item : seq) { ResidueInfo res_info = find_tabulated_residue(Entity::first_mon(item)); diff --git a/tests/disulf.cpp b/tests/disulf.cpp index 899e6dec..34c2e84b 100644 --- a/tests/disulf.cpp +++ b/tests/disulf.cpp @@ -8,8 +8,7 @@ #include #include #include -#include -#include +#include #include #include // for runtime_error #include @@ -115,7 +114,7 @@ static std::vector find_disulfide_bonds2(Model& model, static void check_disulf(const std::string& path) { if (verbose) printf("path: %s\n", path.c_str()); - Structure st = read_structure(MaybeGzipped(path)); + Structure st = read_structure_gz(path); Model& model = st.first_model(); using Clock = std::chrono::steady_clock; auto start = Clock::now();