Skip to content

agamdua/madhatter

Repository files navigation

Madhatter

Build Status Coverage Status

Experimental data processing pipeline with job parallelization on those time-consuming for loops.

Problem statement

Python does not "magically" handle parallel execution, one has to muck about with multiprocessing and/or other libraries need to be leveraged and that can be a chore, especially if what one needs is really a cluster and not differnet processes on the same machine.

Objective

madhatter attempts to take the work out of a data processing problem (wild unicorn appears!) so that the programmer can focus on the task at hand.

The aim is to have minimal dependencies in the actual implementation, and to leverage the standard lib as much as possible.

The programmer should be able to construct their data execution chain where simple python functions (with a madhatter decorator) will serve as each filter in the chain (see below, design section for details).

This will also attempt to integrate with containers (docker to start with) so that each task may have the ability to be spun off into its own execution environment from a where a naive scheduler from madhatter can be used to pass this off to the host OS scheduling.

There is a web interface planned which will integrate with a message queue for async operations. That interface will most likely have madhatter as a dependency.

Architecture & Design

This follows the pipes and filters design pattern.

There's a great article on MSDN about such a pipeline where they describe the model pretty well:

Decompose the processing required for each stream into a set of discrete components (or filters), each of which performs a single task. By standardizing the format of the data that each component receives and emits, these filters can be combined together into a pipeline.

Future Direction

Please refer this conversation on the first commit for the design plans.

Thanks, @miki725 for forcing me to write them out.

Examples

Please check out the contrived_examples module in the repo.

About

Experimental parallel job runner

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages