Skip to content

doc: Use inner/outer splitter, analogize to zip/product #773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions docs/source/explanation/design-approach.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,16 @@ Rationale

Scientific workflows often require sophisticated analyses that encompass a large collection
of algorithms.
The algorithms, that were originally not necessarily designed to work together,
and were written by different authors.
Some may be written in Python, while others might require calling external programs.
These algorithms are frequently written by different authors, and rarely designed to work together.
Some may be written in Python, our language of choice,
while others might require calling external programs.
It is a common practice to create semi-manual workflows that require the scientists
to handle the files and interact with partial results from algorithms and external tools.
This approach is conceptually simple and easy to implement, but the resulting workflow
is often time consuming, error-prone and difficult to share with others.
Consistency, reproducibility and scalability demand scientific workflows
to be organized into fully automated pipelines.
This was the motivation behind Pydra - a new dataflow engine written in Python.
to be organized into fully-automated pipelines.
This was the motivation behind Pydra - a dataflow engine written in Python.

History
-------
Expand All @@ -39,8 +39,8 @@ Goals

The goal of Pydra is to provide a lightweight dataflow engine for computational graph construction,
manipulation, and distributed execution, as well as ensuring reproducibility of scientific pipelines.
In Pydra, a dataflow is represented as a directed acyclic graph, where each node represents a Python
function, execution of an external tool, or another reusable dataflow.
In Pydra, a dataflow is represented as a directed acyclic graph, where each node represents
the invocation of a Python function, an external tool, or another reusable dataflow.
The combination of several key features makes Pydra a customizable and powerful dataflow engine:

- Composable dataflows: Any node of a dataflow graph can be another dataflow, allowing for nested
Expand Down
17 changes: 9 additions & 8 deletions docs/source/explanation/splitting-combining.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,30 +31,30 @@ nodes represent stateless copies of the original Task after splitting the input,

Types of Splitter
-----------------
Whenever a *Task* has more complicated inputs,
i.e. multiple fields, there are two ways of creating the mapping,
each one is used for different application.
These *splitters* are called *scalar splitter* and *outer splitter*.
Whenever a *Task* has more complicated inputs, for example, multiple fields,
there are two ways of creating the mapping, each one is used for different application.
These *splitters* are called *inner splitter* and *outer splitter*.
They use a special, but Python-based syntax as described next.

Scalar Splitter
Inner Splitter
---------------
A *scalar splitter* performs element-wise mapping and requires that the lists of
values for two or more fields to have the same length. The *scalar splitter* uses
A *inner splitter* performs element-wise mapping and requires that the lists of
values for two or more fields to have the same length. The *inner splitter* uses
Python tuples and its operation is therefore represented by a parenthesis, ``()``:

.. math::

S = (x, y) : x=[x_1, x_2, .., x_n],~y=[y_1, y_2, .., y_n] \mapsto (x, y)=(x_1, y_1),..., (x, y)=(x_n, y_n),


where `S` represents the *splitter*, `x` and `y` are the input fields.
This is also represented as a diagram:

.. figure:: ../_static/images/nd_spl_4.png
:figclass: h!
:scale: 80%

Inner splitters can be analogized to the Python builtin function :func:`zip`.


Outer Splitter
--------------
Expand Down Expand Up @@ -85,5 +85,6 @@ and `inp3`. This can be extended to arbitrary complexity.
In additional, the output can be merge at the end if needed.
This will be explained in the next section.

Outer splitters can be analogized to the Python function :func:`itertools.product`.

.. _Map-Reduce: https://en.wikipedia.org/wiki/MapReduce
Loading