diff --git a/docs/source/explanation/design-approach.rst b/docs/source/explanation/design-approach.rst index 07c94226f1..a7a68d7a85 100644 --- a/docs/source/explanation/design-approach.rst +++ b/docs/source/explanation/design-approach.rst @@ -7,16 +7,16 @@ Rationale Scientific workflows often require sophisticated analyses that encompass a large collection of algorithms. -The algorithms, that were originally not necessarily designed to work together, -and were written by different authors. -Some may be written in Python, while others might require calling external programs. +These algorithms are frequently written by different authors, and rarely designed to work together. +Some may be written in Python, our language of choice, +while others might require calling external programs. It is a common practice to create semi-manual workflows that require the scientists to handle the files and interact with partial results from algorithms and external tools. This approach is conceptually simple and easy to implement, but the resulting workflow is often time consuming, error-prone and difficult to share with others. Consistency, reproducibility and scalability demand scientific workflows -to be organized into fully automated pipelines. -This was the motivation behind Pydra - a new dataflow engine written in Python. +to be organized into fully-automated pipelines. +This was the motivation behind Pydra - a dataflow engine written in Python. History ------- @@ -39,8 +39,8 @@ Goals The goal of Pydra is to provide a lightweight dataflow engine for computational graph construction, manipulation, and distributed execution, as well as ensuring reproducibility of scientific pipelines. -In Pydra, a dataflow is represented as a directed acyclic graph, where each node represents a Python -function, execution of an external tool, or another reusable dataflow. +In Pydra, a dataflow is represented as a directed acyclic graph, where each node represents +the invocation of a Python function, an external tool, or another reusable dataflow. The combination of several key features makes Pydra a customizable and powerful dataflow engine: - Composable dataflows: Any node of a dataflow graph can be another dataflow, allowing for nested diff --git a/docs/source/explanation/splitting-combining.rst b/docs/source/explanation/splitting-combining.rst index 906a51443c..5c5a698253 100644 --- a/docs/source/explanation/splitting-combining.rst +++ b/docs/source/explanation/splitting-combining.rst @@ -31,23 +31,21 @@ nodes represent stateless copies of the original Task after splitting the input, Types of Splitter ----------------- -Whenever a *Task* has more complicated inputs, -i.e. multiple fields, there are two ways of creating the mapping, -each one is used for different application. -These *splitters* are called *scalar splitter* and *outer splitter*. +Whenever a *Task* has more complicated inputs, for example, multiple fields, +there are two ways of creating the mapping, each one is used for different application. +These *splitters* are called *inner splitter* and *outer splitter*. They use a special, but Python-based syntax as described next. -Scalar Splitter +Inner Splitter --------------- -A *scalar splitter* performs element-wise mapping and requires that the lists of -values for two or more fields to have the same length. The *scalar splitter* uses +A *inner splitter* performs element-wise mapping and requires that the lists of +values for two or more fields to have the same length. The *inner splitter* uses Python tuples and its operation is therefore represented by a parenthesis, ``()``: .. math:: S = (x, y) : x=[x_1, x_2, .., x_n],~y=[y_1, y_2, .., y_n] \mapsto (x, y)=(x_1, y_1),..., (x, y)=(x_n, y_n), - where `S` represents the *splitter*, `x` and `y` are the input fields. This is also represented as a diagram: @@ -55,6 +53,8 @@ This is also represented as a diagram: :figclass: h! :scale: 80% +Inner splitters can be analogized to the Python builtin function :func:`zip`. + Outer Splitter -------------- @@ -85,5 +85,6 @@ and `inp3`. This can be extended to arbitrary complexity. In additional, the output can be merge at the end if needed. This will be explained in the next section. +Outer splitters can be analogized to the Python function :func:`itertools.product`. .. _Map-Reduce: https://en.wikipedia.org/wiki/MapReduce