Skip to content

Files

68 lines (53 loc) · 4.02 KB

STYLE.md

File metadata and controls

68 lines (53 loc) · 4.02 KB

AllenNLP Style Guide

Our highest priority for code style is that our code be easily readable to someone new to the codebase. Deep learning is easy to get wrong, and we want our code to be easy enough to read that someone looking at it can be thinking about our modeling decisions, not trying to understand what is going on.

To that end, we use descriptive names, we use type annotations, and we write coherent docstrings. In code that manipulates tensors, most lines that compute a tensor have a comment describing the tensor's shape. When there's an interesting or important modeling decision in the code, we write a comment about it (either in-line or in an appropriate docstring).

Docstrings

All reasonably complex public methods should have docstrings describing their basic function, their inputs and their outputs. Private methods should also most often have docstrings, so that people who read your code know what the method is supposed to do. The basic outline we use for docstrings is: (1) a brief description of what the method does, sometimes also including how or why the method does it, (2) the parameters / arguments to the method, (3) the return value of the method, if any. If the method is particularly simple or the arguments are obvious, (2) and (3) can be omitted. Our docs use Markdown formatting, so function arguments and return values should be formatted as Markdown headers (e.g # Parameters), seen in just about any model or module in the codebase. We treat the class docstring as the documentation for __init__ methods, giving parameters there and omitting any docstring on the constructor itself. For model / module constructors and methods like forward, always include the parameters and return values (when there is one) in the docstring.

Code format

We use flake8, black and mypy to enforce some basic consistency in formatting. Those formatting guidelines roughly follow Google's python style guide, with a few notable exceptions. In particular, because we use type annotations and descriptive variable names, we use 100-character lines instead of 80-character lines, and it's ok to go over sometimes in code. Additionally, we use mkdocs for building our docs, so Google's docstring formats don't apply.

Naming

We follow Google's general naming rules, and their definition of camel case.

Module layout and imports

To keep files from getting too big, we typically have one class per file, though small classes that are inseparable from a companion class can also go in the same file (often these will be private classes).

To avoid verbosity when importing classes structured this way, classes should be imported from their module's __init__.py. For example, the Batch class is in allennlp/data/batch.py, but allennlp/data/__init__.py imports the class, so that you can just do from allennlp.data import Batch.

Abstract classes typically go in a module containing the abstract class and all built-in implementations. This includes things like Field (in allennlp.data.fields), Seq2SeqEncoder (in allennlp.modules.seq2seq_encoders), and many others. In these cases, the abstract class should be imported into the module above, so that you can do, e.g., from allennlp.data import Field. Concrete implementations follow the same layout as above: from allennlp.data.fields import TextField.

Imports should be formatted at the top of the file, following PEP 8's recommendations: three sections (standard library, third-party libraries, internal imports), each sorted and separated by a blank line.

Conclusion

Some of the conventions we've adopted are arbitrary (e.g., other definitions of camel case are also valid), but we stick to them to keep a consistent style throughout the codebase, which makes it easier to read and maintain.