-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Itertools helpers #318
Comments
Sure. I can give it a try. But no guarantee.. The real issue is in the stackoverflow post is that the user is running out of memory. This operation is very expensive.. We can get rid of the cross join, which might save a bit memory but still it will be expensive because n choose k can get quite large. I also don't think plugins can support streaming (not yet). |
I have a PR here: #322, which adds the function below and you can call it via pds.combinations(...) def combinations(source: str | pl.Expr, k: int, unique: bool = False) -> pl.Expr:
"""
Get all k-combinations of non-null values in source. This is an expensive operation, as
n choose k can grow very fast.
Parameters
----------
source
Input source column, must have numeric or string type
k
The k in N choose k
unique
Whether to run .unique() on the source column
Examples
--------
>>> df = pl.DataFrame({
>>> "category": ["a", "a", "a", "b", "b"],
>>> "values": [1, 2, 3, 4, 5]
>>> })
>>> df.select(
>>> pds.combinations("values", 3)
>>> )
shape: (10, 1)
┌───────────┐
│ values │
│ --- │
│ list[i64] │
╞═══════════╡
│ [1, 2, 3] │
│ [1, 2, 4] │
│ [1, 2, 5] │
│ [1, 3, 4] │
│ [1, 3, 5] │
│ [1, 4, 5] │
│ [2, 3, 4] │
│ [2, 3, 5] │
│ [2, 4, 5] │
│ [3, 4, 5] │
└───────────┘
>>> df.group_by("category").agg(
>>> pds.combinations("values", 2)
>>> )
shape: (2, 2)
┌──────────┬──────────────────────────┐
│ category ┆ values │
│ --- ┆ --- │
│ str ┆ list[list[i64]] │
╞══════════╪══════════════════════════╡
│ a ┆ [[1, 2], [1, 3], [2, 3]] │
│ b ┆ [[4, 5]] │
└──────────┴──────────────────────────┘
""" |
This is amazing! However, it seems I cannot install from the branch as I require Rust to be installed and have some limitations to do so in my machine. Out of curiosity, is the itertools.product expected in the near future too? |
It is plausible. These are very similar operations and really depends on how much demand there is.. I believe I will publish v0.8.1 which will have the combination function in the middle of the month. |
Looking for a solution to implement e combination of two lists, I have noticed these two questions in StackOverflow:
https://stackoverflow.com/questions/77354114/how-would-i-generate-combinations-of-items-within-polars-using-the-native-expres
https://stackoverflow.com/questions/79340441/python-polars-expression-list-product
and request for enhancement here: pola-rs/polars#11999
I think this would be a great addition to the set of data science support tools provided by this extension, allowing to apply cross_join or combinations of lists without the hustle of recurring to map_elements.
The text was updated successfully, but these errors were encountered: