Skip to content

Commit 11d18f2

Browse files
authored
Allow to pass dependencies and products as dictionaries. (#38)
1 parent 1d7e1ac commit 11d18f2

17 files changed

+414
-112
lines changed

docs/changes.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@ all releases are available on `Anaconda.org <https://anaconda.org/pytask/pytask>
1515
- :gh:`34` skips ``pytask_collect_task_teardown`` if task is None.
1616
- :gh:`35` adds the ability to capture stdout and stderr with the CaptureManager.
1717
- :gh:`36` reworks the debugger to make it work with the CaptureManager.
18-
- :gh:`37` removes reports argument from hooks related to task collection.
18+
- :gh:`37` removes ``reports`` argument from hooks related to task collection.
19+
- :gh:`38` allows to pass dictionaries as dependencies and products and inside the
20+
function ``depends_on`` and ``produces`` become dictionaries.
1921

2022

2123
0.0.8 - 2020-10-04

docs/tutorials/how_to_define_dependencies_products.rst

+63-21
Original file line numberDiff line numberDiff line change
@@ -40,14 +40,6 @@ The ``@pytask.mark.produces`` decorator attaches a product to a task. The string
4040
task is defined.
4141

4242

43-
Optional usage in signature
44-
---------------------------
45-
46-
As seen before, if you have a task with products (or dependencies), you can use
47-
``produces`` (``depends_on``) as a function argument and receive the path or list of
48-
paths inside the functions. It helps to avoid repetition.
49-
50-
5143
Dependencies
5244
------------
5345

@@ -64,27 +56,77 @@ Most tasks have dependencies. Similar to products, you can use the
6456
produces.write_text(bold_text)
6557
6658
59+
Optional usage in signature
60+
---------------------------
61+
62+
As seen before, if you have a task with products (or dependencies), you can use
63+
``produces`` (``depends_on``) as a function argument and receive the path or a
64+
dictionary of paths inside the functions. It helps to avoid repetition.
65+
66+
6767
Multiple dependencies and products
6868
----------------------------------
6969

70-
If you have multiple dependencies or products, pass a list to the decorator. Inside the
71-
function you receive a list of :class:`pathlib.Path` as well.
70+
Most tasks have multiple dependencies or products. The easiest way to attach multiple
71+
dependencies or products to a task is to pass a :class:`list`, :class:`tuple` or other
72+
iterator to the decorator which contains :class:`str` or :class:`pathlib.Path`.
7273

7374
.. code-block:: python
7475
75-
@pytask.mark.depends_on(["text_a.txt", "text_b.txt"])
76-
@pytask.mark.produces(["bold_text_a.txt", "bold_text_b.txt"])
77-
def task_make_text_bold(depends_on, produces):
78-
for dependency, product in zip(depends_on, produces):
79-
text = dependency.read_text()
80-
bold_text = f"**{text}**"
81-
product.write_text(bold_text)
76+
@pytask.mark.depends_on(["text_1.txt", "text_2.txt"])
77+
def task_example(depends_on):
78+
pass
79+
80+
The function argument ``depends_on`` or ``produces`` becomes a dictionary where keys are
81+
the positions in the list and values are :class:`pathlib.Path`.
82+
83+
.. code-block:: python
84+
85+
depends_on = {0: Path("text_1.txt"), 1: Path("text_2.txt")}
86+
87+
Why dictionaries and not lists? First, dictionaries with positions as keys behave very
88+
similar to lists and conversion between both is easy.
89+
90+
Secondly, dictionaries allow to access paths to dependencies and products via labels
91+
which is preferred over positional access when tasks become more complex and the order
92+
changes.
93+
94+
To assign labels to dependencies or products, pass a dictionary or a list of tuples with
95+
the name in the first and the path in the second position to the decorator. For example,
96+
97+
.. code-block:: python
98+
99+
@pytask.mark.depends_on({"first": "text_1.txt", "second": "text_2.txt"})
100+
@pytask.mark.produces("out.txt")
101+
def task_example(depends_on, produces):
102+
text = depends_on["first"].read_text() + " " + depends_on["second"].read_text()
103+
produces.write_text(text)
104+
105+
or with tuples
106+
107+
.. code-block:: python
108+
109+
@pytask.mark.depends_on([("first", "text_1.txt"), ("second", "text_2.txt")])
110+
def task_example():
111+
...
112+
113+
114+
Multiple decorators
115+
-------------------
116+
117+
You can also attach multiple decorators to a function which will be merged into a single
118+
dictionary. This might help you to group certain dependencies and apply them to multiple
119+
tasks.
120+
121+
.. code-block:: python
122+
123+
common_dependencies = ["text_1.txt", "text_2.txt"]
82124
83-
The last task is overly complex since it is the same operation performed for two
84-
independent dependencies and products. There must be a better way |tm|, right? Check out
85-
the :doc:`tutorial on parametrization <how_to_parametrize_a_task>`.
86125
87-
.. |tm| unicode:: U+2122
126+
@pytask.mark.depends_on(common_dependencies)
127+
@pytask.mark.depends_on("text_3.txt")
128+
def task_example():
129+
...
88130
89131
90132
.. rubric:: References

src/_pytask/clean.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,6 @@ def pytask_post_parse(config):
5555

5656
@click.command()
5757
@click.option(
58-
"-m",
5958
"--mode",
6059
type=click.Choice(["dry-run", "interactive", "force"]),
6160
help=_HELP_TEXT_MODE,
@@ -166,7 +165,7 @@ def _yield_paths_from_task(task):
166165
"""Yield all paths attached to a task."""
167166
yield task.path
168167
for attribute in ["depends_on", "produces"]:
169-
for node in getattr(task, attribute):
168+
for node in getattr(task, attribute).values():
170169
if isinstance(node.value, Path):
171170
yield node.value
172171

src/_pytask/collect.py

+28-8
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import importlib
44
import inspect
55
import sys
6+
import time
67
import traceback
78
from pathlib import Path
89

@@ -15,11 +16,14 @@
1516
from _pytask.report import CollectionReport
1617
from _pytask.report import CollectionReportFile
1718
from _pytask.report import CollectionReportTask
19+
from _pytask.report import format_collect_footer
1820

1921

2022
@hookimpl
2123
def pytask_collect(session):
2224
"""Collect tasks."""
25+
session.collection_start = time.time()
26+
2327
reports = _collect_from_paths(session)
2428
tasks = _extract_successful_tasks_from_reports(reports)
2529

@@ -31,13 +35,12 @@ def pytask_collect(session):
3135
)
3236
reports.append(report)
3337

34-
session.hook.pytask_collect_log(session=session, reports=reports, tasks=tasks)
35-
3638
session.collection_reports = reports
3739
session.tasks = tasks
3840

39-
if any(i for i in reports if not i.successful):
40-
raise CollectionError
41+
session.hook.pytask_collect_log(
42+
session=session, reports=session.collection_reports, tasks=session.tasks
43+
)
4144

4245
return True
4346

@@ -214,19 +217,36 @@ def _extract_successful_tasks_from_reports(reports):
214217
@hookimpl
215218
def pytask_collect_log(session, reports, tasks):
216219
"""Log collection."""
220+
session.collection_end = time.time()
217221
tm_width = session.config["terminal_width"]
218222

219223
message = f"Collected {len(tasks)} task(s)."
220-
if session.deselected:
221-
message += f" Deselected {len(session.deselected)} task(s)."
224+
225+
n_deselected = len(session.deselected)
226+
if n_deselected:
227+
message += f" Deselected {n_deselected} task(s)."
222228
click.echo(message)
223229

224230
failed_reports = [i for i in reports if not i.successful]
225231
if failed_reports:
226-
click.echo(f"{{:=^{tm_width}}}".format(" Errors during collection "))
232+
click.echo("")
233+
click.echo(f"{{:=^{tm_width}}}".format(" Failures during collection "))
227234

228235
for report in failed_reports:
229236
click.echo(f"{{:_^{tm_width}}}".format(report.format_title()))
237+
238+
click.echo("")
239+
230240
traceback.print_exception(*report.exc_info)
241+
231242
click.echo("")
232-
click.echo("=" * tm_width)
243+
244+
duration = round(session.collection_end - session.collection_start, 2)
245+
click.echo(
246+
format_collect_footer(
247+
len(tasks), len(failed_reports), n_deselected, duration, tm_width
248+
),
249+
nl=True,
250+
)
251+
252+
raise CollectionError

src/_pytask/collect_command.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,8 @@ def _organize_tasks(tasks):
8585

8686
task_dict = {
8787
task_name: {
88-
"depends_on": [node.name for node in task.depends_on],
89-
"produces": [node.name for node in task.produces],
88+
"depends_on": [node.name for node in task.depends_on.values()],
89+
"produces": [node.name for node in task.produces.values()],
9090
}
9191
}
9292

src/_pytask/nodes.py

+97-13
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
"""Deals with nodes which are dependencies or products of a task."""
22
import functools
33
import inspect
4+
import itertools
45
import pathlib
56
from abc import ABCMeta
67
from abc import abstractmethod
@@ -13,7 +14,7 @@
1314
from _pytask.exceptions import NodeNotCollectedError
1415
from _pytask.exceptions import NodeNotFoundError
1516
from _pytask.mark import get_marks_from_obj
16-
from _pytask.shared import to_list
17+
from _pytask.shared import find_duplicates
1718

1819

1920
def depends_on(objects: Union[Any, Iterable[Any]]) -> Union[Any, Iterable[Any]]:
@@ -68,22 +69,24 @@ class PythonFunctionTask(MetaTask):
6869
"""pathlib.Path: Path to the file where the task was defined."""
6970
function = attr.ib(type=callable)
7071
"""callable: The task function."""
71-
depends_on = attr.ib(converter=to_list)
72+
depends_on = attr.ib(factory=dict)
7273
"""Optional[List[MetaNode]]: A list of dependencies of task."""
73-
produces = attr.ib(converter=to_list)
74+
produces = attr.ib(factory=dict)
7475
"""List[MetaNode]: A list of products of task."""
75-
markers = attr.ib()
76+
markers = attr.ib(factory=list)
7677
"""Optional[List[Mark]]: A list of markers attached to the task function."""
7778
_report_sections = attr.ib(factory=list)
7879

7980
@classmethod
8081
def from_path_name_function_session(cls, path, name, function, session):
8182
"""Create a task from a path, name, function, and session."""
8283
objects = _extract_nodes_from_function_markers(function, depends_on)
83-
dependencies = _collect_nodes(session, path, name, objects)
84+
nodes = _convert_objects_to_node_dictionary(objects, "depends_on")
85+
dependencies = _collect_nodes(session, path, name, nodes)
8486

8587
objects = _extract_nodes_from_function_markers(function, produces)
86-
products = _collect_nodes(session, path, name, objects)
88+
nodes = _convert_objects_to_node_dictionary(objects, "produces")
89+
products = _collect_nodes(session, path, name, nodes)
8790

8891
markers = [
8992
marker
@@ -118,8 +121,10 @@ def _get_kwargs_from_task_for_function(self):
118121
attribute = getattr(self, name)
119122
kwargs[name] = (
120123
attribute[0].value
121-
if len(attribute) == 1
122-
else [node.value for node in attribute]
124+
if len(attribute) == 1 and 0 in attribute
125+
else {
126+
node_name: node.value for node_name, node in attribute.items()
127+
}
123128
)
124129

125130
return kwargs
@@ -169,8 +174,9 @@ def state(self):
169174

170175
def _collect_nodes(session, path, name, nodes):
171176
"""Collect nodes for a task."""
172-
collect_nodes = []
173-
for node in nodes:
177+
collected_nodes = {}
178+
179+
for node_name, node in nodes.items():
174180
collected_node = session.hook.pytask_collect_node(
175181
session=session, path=path, node=node
176182
)
@@ -180,9 +186,9 @@ def _collect_nodes(session, path, name, nodes):
180186
f"'{name}' in '{path}'."
181187
)
182188
else:
183-
collect_nodes.append(collected_node)
189+
collected_nodes[node_name] = collected_node
184190

185-
return collect_nodes
191+
return collected_nodes
186192

187193

188194
def _extract_nodes_from_function_markers(function, parser):
@@ -195,4 +201,82 @@ def _extract_nodes_from_function_markers(function, parser):
195201
"""
196202
marker_name = parser.__name__
197203
for marker in get_marks_from_obj(function, marker_name):
198-
yield from to_list(parser(*marker.args, **marker.kwargs))
204+
parsed = parser(*marker.args, **marker.kwargs)
205+
yield parsed
206+
207+
208+
def _convert_objects_to_node_dictionary(objects, when):
209+
list_of_tuples = _convert_objects_to_list_of_tuples(objects)
210+
_check_that_names_are_not_used_multiple_times(list_of_tuples, when)
211+
nodes = _convert_nodes_to_dictionary(list_of_tuples)
212+
return nodes
213+
214+
215+
def _convert_objects_to_list_of_tuples(objects):
216+
out = []
217+
for obj in objects:
218+
if isinstance(obj, dict):
219+
obj = obj.items()
220+
221+
if isinstance(obj, Iterable) and not isinstance(obj, str):
222+
for x in obj:
223+
if isinstance(x, Iterable) and not isinstance(x, str):
224+
tuple_x = tuple(x)
225+
if len(tuple_x) in [1, 2]:
226+
out.append(tuple_x)
227+
else:
228+
raise ValueError("ERROR")
229+
else:
230+
out.append((x,))
231+
else:
232+
out.append((obj,))
233+
234+
return out
235+
236+
237+
def _check_that_names_are_not_used_multiple_times(list_of_tuples, when):
238+
"""Check that names of nodes are not assigned multiple times.
239+
240+
Tuples in the list have either one or two elements. The first element in the two
241+
element tuples is the name and cannot occur twice.
242+
243+
Examples
244+
--------
245+
>>> _check_that_names_are_not_used_multiple_times(
246+
... [("a",), ("a", 1)], "depends_on"
247+
... )
248+
>>> _check_that_names_are_not_used_multiple_times(
249+
... [("a", 0), ("a", 1)], "produces"
250+
... )
251+
Traceback (most recent call last):
252+
ValueError: '@pytask.mark.produces' has nodes with the same name: {'a'}
253+
254+
"""
255+
names = [x[0] for x in list_of_tuples if len(x) == 2]
256+
duplicated = find_duplicates(names)
257+
258+
if duplicated:
259+
raise ValueError(
260+
f"'@pytask.mark.{when}' has nodes with the same name: {duplicated}"
261+
)
262+
263+
264+
def _convert_nodes_to_dictionary(list_of_tuples):
265+
nodes = {}
266+
counter = itertools.count()
267+
names = [x[0] for x in list_of_tuples if len(x) == 2]
268+
269+
for tuple_ in list_of_tuples:
270+
if len(tuple_) == 2:
271+
node_name, node = tuple_
272+
nodes[node_name] = node
273+
274+
else:
275+
while True:
276+
node_name = next(counter)
277+
if node_name not in names:
278+
break
279+
280+
nodes[node_name] = tuple_[0]
281+
282+
return nodes

0 commit comments

Comments
 (0)