Add Series.drop_nil #864

josevalim · 2024-02-20T07:49:18Z

There are two possible versions of this function:

Operates on every series and removes nils from the series
Operates on list series and only drops elements inside lists (it is unclear if it is any level)

We need to decide the API. If we call it drop_nil/1 for series, how would we call it the list version? We could use the list_ prefix but we have generally avoided those.

Another option is to pass an atom to drop_nil/2, such as Series.drop_nil(series, :list) but then I am not sure how to call the regular version. Ideas are welcome.

The text was updated successfully, but these errors were encountered:

cigrainger · 2024-02-20T08:54:44Z

Do we need the regular version of this? It feels like clutter. The idea of minimal verbs is that you can do everything you need with them. Why not just Series.filter(not is_nil(_))?

cigrainger · 2024-02-20T09:04:23Z

Just thinking out loud here: I think there's got to be a more elegant way of dealing with lists than cleaving to Polars's API too closely. I'm not sure I like everything about purrr but I wonder if something like map_depth might help us out here?

There's also modify_tree. Something that indicates it's recursive might be useful?

josevalim · 2024-02-20T09:27:50Z

@cigrainger you are right, we don't need drop_nil at the root. However, the issue presented here is also available for many of the aggregate functions. How to distinguish between max of a list and maxof the series? Perhaps some sort of prefix or suffix for all lists operations?

map_depth/modify_tree is definitely interesting. Implementing it is a bit less trivial. We would need to introduce some sort of LazyDepthSeries, that collects operations in a series, but as it relates to a struct or list field, and then translate that to polars. Do we want to go down this route?

cigrainger · 2024-02-20T09:33:06Z

Maybe. I need to explore how Polars handles this itself and in py polars.

cigrainger · 2024-02-26T13:20:39Z

So putting this a bit to the test with Python Polars:

It seems like nested lists may not be supported? That would fix the recursion problem pretty cleanly.

In [16]: df = pl.DataFrame({"values": [[None, 1, None, 2], [None], [3, 4], [[None, 1], [2], [None]]]})

In [17]: df
Out[17]:
shape: (4, 1)
┌────────────────────┐
│ values             │
│ ---                │
│ list[i64]          │
╞════════════════════╡
│ [null, 1, … 2]     │
│ [null]             │
│ [3, 4]             │
│ [null, null, null] │
└────────────────────┘

josevalim · 2024-02-26T14:25:12Z

@cigrainger all lists need to be nested equally. When it fails to cast to a certain type, it returns null instead of raising.

billylanchantin · 2024-02-26T14:48:49Z

@cigrainger Also, in case you missed it, there was an interesting saga of us discovering what Polars was doing WRT nested lists here:

Nested lists of lists can panic #857

Some take-aways:

The first element of a nested list is the tie-breaker when the dtype is ambiguous.
We decided to be more strict with our inference code than py-polars in certain situations.

cigrainger · 2024-02-26T15:18:02Z

I did! Thanks @billylanchantin

josevalim mentioned this issue Feb 20, 2024

Support Series.map/2 with lists #835

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Series.drop_nil #864

Add Series.drop_nil #864

josevalim commented Feb 20, 2024

cigrainger commented Feb 20, 2024

cigrainger commented Feb 20, 2024 •

edited

Loading

josevalim commented Feb 20, 2024 •

edited

Loading

cigrainger commented Feb 20, 2024

cigrainger commented Feb 26, 2024

josevalim commented Feb 26, 2024

billylanchantin commented Feb 26, 2024

cigrainger commented Feb 26, 2024

Add Series.drop_nil #864

Add Series.drop_nil #864

Comments

josevalim commented Feb 20, 2024

cigrainger commented Feb 20, 2024

cigrainger commented Feb 20, 2024 • edited Loading

josevalim commented Feb 20, 2024 • edited Loading

cigrainger commented Feb 20, 2024

cigrainger commented Feb 26, 2024

josevalim commented Feb 26, 2024

billylanchantin commented Feb 26, 2024

cigrainger commented Feb 26, 2024

cigrainger commented Feb 20, 2024 •

edited

Loading

josevalim commented Feb 20, 2024 •

edited

Loading