Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deep_map #138

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Fix deep_map #138

wants to merge 1 commit into from

Conversation

nicolasmueller
Copy link
Contributor

Currently deep_map is not aware of dict_values. This resulted in input_tables not being populated in MaterializationWrapper.__call__().

Checklist

  • Added a docs/source/changelog.md entry

@nicolasmueller nicolasmueller self-assigned this Feb 24, 2024
@NMAC427
Copy link
Member

NMAC427 commented Feb 24, 2024

I'm not sure if this is the correct approach to solve this issue. Either convert dict_values to a list / tuple before passing it to deep_map, or we devise a more general solution. Specifically, this issue applies to a variety of other iterable objects as well:

  • dict.keys(), dict.items(), dict.values()
  • map, filter
  • collections.deque, collections.defaultdict

I don't know what the optimal solution for this problem would be.


Some thing I would like to note:

  • Because dict_values, dict_keys, dict_items, map and filter aren't json serializable, they aren't particularily relevant when it comes to user code. In other words: if a user tried passing any of those types as an argument to a task, the task will fail during input serialization.
  • Some collections like deque, defaultdict and namedtuple are json serializable. This means that deep_map should also support them properly.
  • If we were to add support for iterators like dict_values or map, I would suggest that we convert those iterators to tuples first, and then handle them like regular tuples.

@windiana42
Copy link
Member

windiana42 commented Feb 28, 2024

How about converting Iterables != tuple to list and Mapping to dict?
Of course we need to be careful about the order in which we check for isinstance of those general types.

    if cls == tuple:
        y = _deep_map_tuple(x, fn, memo)
    elif isinstance(cls, Mapping):
        y = _deep_map_dict(x, fn, memo)
    elif isinstance(cls, Iterable):
        y = _deep_map_list(x, fn, memo)
    else:
        y = fn(x)

@windiana42
Copy link
Member

@NMAC427 what do you think about my proposal to make any Mapping a dictionary and any Iterable != tuple a list?

@NMAC427
Copy link
Member

NMAC427 commented Mar 9, 2024

@windiana42 it depends on how exactly you identify mappings and iterables. For example, pd.DataFrame inherits from the Iterable ABC, but converting it to a list would be fatal, because we use deep_map for the auto table feature.
And even if we decide to convert iterables, I think it would be preferential to convert them to tuples (instead of lists) to keep them immutable.

@NMAC427
Copy link
Member

NMAC427 commented Mar 9, 2024

Theoretically any user code could quite easily extend deep_map for any arbitrary iterable by providing the appropriate mapping function. For example:

def f(x):
    if isinstance(x, SomeContainer):
        return deep_map(f, list(x))
    return str(x)

c = SomeContainer(...)
deep_map(f, c)

However, the inverse (preventing deep_map from deep mapping an iterable) is impossible. Therefore we should be quite careful to ensure we don't accidentally make the problem worse.

@windiana42
Copy link
Member

I think dict().values() should be converted to list or tuple. I think list, tuple, and dict deserve extra status in the sense that we want to keep them in-tact. We should make sure nothing bad happens to pipedag.Table() objects and sources of auto-table conversion should also be kept as is until they are converted to pipedag.Table().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants