Skip to content
This repository has been archived by the owner on Nov 26, 2023. It is now read-only.

Code Specs and Developer Workflow

Alex C. Viana edited this page May 29, 2014 · 5 revisions

Because the Moving Target Pipeline is a large multideveloper project with both data and code deliverable a consistent approach to programming style and procedures is important. When in doubt we use the astropy project as a guide: http://www.astropy.org/index.html

Coding Style

The project coding style is generally defined by the Python Enhancement Proposal 8 (PEP8): http://legacy.python.org/dev/peps/pep-0008/. Further clarification and specification can be found in the Google Python Style Guide: http://google-styleguide.googlecode.com/svn/trunk/pyguide.html. In the case of a conflict or uncertainty consult @acviana. In both cases these are only conventions and can be broken anywhere it's justified - provided it's justified.

PEP 8 compliance can be automatically assessed using the pylint linter, either through the command line tool with the Sublime Text plugin (I'm sure similar plugins exist for other editors as well). Note that pylint is more stringent with the PEP 8 conventions that we need to be, a score of 8/10 is usually sufficient.

Docstrings

A multiline docstring is required for every module, function, and class in the codebase. Simple methods may have a one-line docstring but more complicated methods should also use a multiline docstring format. Docstring should be in encased in double-quotes """like this""". Comments (#) should be used sparingly in favor of docstrings and clear variable names. Per PEP8 they should only be 72 characters wide.

The docstrings should be written in reStructuredSext markup and will be converted to HTML using sphinx. The required sections for each type of docstring are described here [XXX].

Docstring should be written with the assumption that the reader can understand the Python it's describing. Thus, docstrings should primarily explain what the code is doing, e.g. _"applying a median correction" and not how the code is doing it "e.g. looping over the columns". To this end extra attention should be paid to the readability of the code itself.

Not all legacy code in the project will follow this convention but dostrings should be updated to the correct format anytime the corresponding code is updated.

Repository Organization

We are in the process of reorganizing the repository into the following structure:

.
|-- .gitignore
|-- docs/
|-- mtpipeline/
|  |-- __init__.py
|  |-- imaging/
|  |  |-- __init__.py
|  |  |-- foo.py
|  |-- database/
|  |  |-- __init__.py
|  |  |-- foo.py
|  |-- ephem/
|  |  |-- __init__.py
|  |  |-- foo.py
|  |-- tests/
|  |  |-- __init__.py
|  |  |-- imaging/
|  |  |  |-- test_foo.py
|  |  |-- database/
|  |  |  |-- test_foo.py
|  |  |-- ephem/
|  |  |  |-- test_foo.py
|-- setup.py
|-- scripts/
|  |-- run_foo.py

A couple of notes, the docs folder will contain the HTML output from sphinx. The tests module will contain a duplicate of the package tree structure (excluding the tree module itself) which the test for each module foo/bar.py in a module called tests/foo/test_bar.py. All tests will be unit tests run by the nosetests package. End-to-end testing will be performed by hand in development environments. Data completeness will be validated by other scripts.

The code in the main mtpipeline package should library quality code. In particular this means a couple of things. No user names, absolute paths, server names, or passwords should be anywhere in the codebase. These values should instead be stored in a file at the top of the repository called settings.yaml, this file is excluded by the .gitignore settings. * The main package also shouldn't contain any command line argument parsing, logging, or monitoring code (email on crash or complete). These should instead be included in the thin wrappers found in the scripts folder.

*In general, you should never store passwords as plain text. However, because we are only accessing localhost MySQL instances in our development environment this isn't an issue. When we are ready to move the system into production we can ask ITSD how the would like us to implement and required security features.

Workflow

In general the workflow for development is:

  1. Create an issue
  2. Create a new branch
  3. Write some code
  4. Write some tests (3 and 4 can be reversed if you're doing TDD)
  5. Repeat 3 and 4 until the tests pass, noting anything important in the issue.
  6. Repeat 3-5 until test and an end-to-end test pass in a production-like environment (if applicable)
  7. Write the docstrings
  8. Submit a Pull Request (PR)
  9. Iterate on 3-8 until @acviana accepts the PR.

These might change if/when we set up a Travis instance.

Pull Requests

Pull requests are a way to initiate a code review before new code is accepted in to the repository. It is similar to an issue except that it is driven by code submission and end in the code is either rejected or accepted (iterations are possible and common prior to an accept/reject decision). A detailed description of Pull Requests can be found in the GitHub help pages: https://help.github.com/articles/using-pull-requests. There are two main models of code contribution that GitHub supports. Quoting directly from the help page above:

Fork & pull

The fork & pull model lets anyone fork an existing repository and push changes to their personal fork without requiring access be granted to the source repository. The changes must then be pulled into the source repository by the project maintainer. This model reduces the amount of friction for new contributors and is popular with open source projects because it allows people to work independently without upfront coordination. Shared repository model

The shared repository model is more prevalent with small teams and organizations collaborating on private projects. Everyone is granted push access to a single shared repository and topic branches are used to isolate changes.

Pull requests are especially useful in the fork & pull model because they provide a way to notify project maintainers about changes in your fork. However, they're also useful in the shared repository model where they're used to initiate code review and general discussion about a set of changes before being merged into a mainline branch.

After thinking about this of a while I think we're going to this summer with a "shared repository" model. If that become problematic we can always switch to a "fork & pull" model. So, what does that mean? It means that you will have unrestricted write permission to the code repository. This comes with one major responsibility: never push to the master branch. Specifically, you should never run this command:

$ git push origin master

This will push your chances directly to the master branch without any review or approval. It is reversible (usually, depending on what changes you pushed) but not recommended. Instead if you are working a branch called my-feature you should run this command when you are ready to share you changes:

$ git push origin my-feature

This will create a new branch in the remote repository (reference to as "origin") called my-feature. When you think the code in my-feature is ready to be merged into the master branch you go to the GitHub interface and issue a Pull Request (see the help page above).

Clone this wiki locally