Add include_empty keyword argument #250

HDictus · 2023-11-29T15:40:25Z

When using get-like functions (edges.pathway_edges, nodes.get) it is common to concatenate the outputs of the functions to mimic the old bluepysnap interface. However, if no nodes/edges satisfy the query, instead of resulting in an empty dataframe, this approach will result in an error, because no dataframes at all are yielded by the generator.

The include_empty keyword argument allows the user to specify that they still want to see a dataframe if the query turns up nothing.

One example where this is useful for me is when I use the length of a connection dataframe to determine connection probability.

When using get-like functions (edges.pathway_edges, nodes.get) it is common to concatenate the outputs of the functions to mimic the old bluepysnap interface. However, if no nodes/edges satisfy the query, instead of resulting in an empty dataframe, this approach will result in an error, because no dataframes at all are yielded by the generator. The include_empty keyword argument allows the user to specify that they still want to see a dataframe if the query turns up nothing. One example where this is useful for me is when I use the length of a connection dataframe to determine connection probability.

codecov-commenter · 2023-11-29T15:55:31Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (c094e4f) 100.00% compared to head (1d96b29) 100.00%.

Additional details and impacted files

@@            Coverage Diff            @@
##            master      #250   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           32        32           
  Lines         2449      2449           
=========================================
  Hits          2449      2449

Flag	Coverage Δ
pytest	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mgeplf · 2023-11-30T07:59:37Z

Thanks for the PR.

it is common to concatenate the outputs of the functions to mimic the old bluepysnap interface

I'm generally a bit leary to add more arguments to functions, since it's a bit a of a slipperly slope (but I'm not ruling this out).
Out of curiousity, is it not possible to do something like this to filter the empty ones:

dfs = (df for df in foo.get({"layer": 10}) if not df.empty)

HDictus · 2023-12-05T11:32:00Z

dfs = (df for df in foo.get({"layer": 10}) if not df.empty)

The current behavior does not yield anything if the query comes up empty. So foo.get({"layer": 10}) would give an empty iterator. If .get instead yielded empty dataframes then I would not need the keyword argument. However, that would be a potentially breaking change, which is why I've put a kwarg and default = False. Having the iterator include empty dataframes without a kwarg would be preferable for me as well, if you think it would be ok to change the behavior that way I would be happy to change the PR to that.

joni-herttuainen · 2023-12-05T13:42:28Z

If I understood correctly, the use case is to pd.concat the dataframes. If this is the use case where you have a wrapper for the dataframes, you can do something like:

def _concat(iterator):
    dfs = [df for _,df in iterator]
    return pd.concat(dfs) if dfs else pd.DataFrame([])  # or if you just want to check the length, you might just have "else []"

df = _concat(c.nodes.get(query))

or, you can use dict(...).values():

def _concat(iterator):
    dfs = dict(iterator).values() 
    return pd.concat(dfs) if dfs else pd.DataFrame([])  # or "else []" if just for length

You can even one-line it, albeit the readability takes a hit:

def _concat(iterator):
     return pd.concat(d.values()) if (d := dict(iterator)) else pd.DataFrame([])

HDictus · 2023-12-05T18:10:47Z

The issue with this is that downstream code which expects the dataframe to have columns will break because it has none, and so I will need to add a if df.empty then do something else clause to everything.

joni-herttuainen · 2023-12-06T10:09:30Z

No worries, you can set any columns to empty dataframes, too:

columns = list(circuit.nodes.property_names)
pd.DataFrame(columns=columns)

I'm afraid I don't understand why do you need to process empty dataframes. If you can point me to your code and especially those parts in which the empty dataframe would cause issues, I can take a look and see what can we do so those needn't be processed.

HDictus · 2023-12-06T12:45:52Z

I often only retrieve a subset of the columns, so then I'll need to pass this subset to the function as well. In this case I may be best off wrapping the .get method call as a whole

def get_dataframe(networkobj, *args, properties=None, attr='get'):
     dfs = [df for _ df in getattr(networkobj, attr)(*args, properties=properties)]
     if len(dfs) == 0:
        return pd.DataFrame(columns=properties or networkobj.property_names)
     return pd.concat(dfs)

So strictly speaking it isn't necessary to have the empty dataframes, just simpler and more concise.

joni-herttuainen · 2023-12-06T13:46:11Z

If wrapping the .get (or other functionality) solves your issue, nice.

strictly speaking it isn't necessary to have the empty dataframes, just simpler and more concise

Yeah, I get it. But, to be fair, it depends on the point of view: adding/passing/handling extra keyword arguments for covering one particular use case is not simpler, nor more concise for us. I'd rather always return empty dataframes than add an extra behaviour handled by an extra argument if this is absolutely necessary. If not, I wouldn't break the current behavior.

Anyway, I still don't understand why do you need to keep processing the empty dataframes and, again, if you'd like me to take a look at the code to possibly not process them, let me know, I'm happy to help.

HDictus added 2 commits November 29, 2023 16:34

run black

1d96b29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add include_empty keyword argument #250

Add include_empty keyword argument #250

HDictus commented Nov 29, 2023

codecov-commenter commented Nov 29, 2023

mgeplf commented Nov 30, 2023

HDictus commented Dec 5, 2023

joni-herttuainen commented Dec 5, 2023 •

edited

Loading

HDictus commented Dec 5, 2023

joni-herttuainen commented Dec 6, 2023

HDictus commented Dec 6, 2023

joni-herttuainen commented Dec 6, 2023

Add include_empty keyword argument #250

Are you sure you want to change the base?

Add include_empty keyword argument #250

Conversation

HDictus commented Nov 29, 2023

codecov-commenter commented Nov 29, 2023

Codecov Report

mgeplf commented Nov 30, 2023

HDictus commented Dec 5, 2023

joni-herttuainen commented Dec 5, 2023 • edited Loading

HDictus commented Dec 5, 2023

joni-herttuainen commented Dec 6, 2023

HDictus commented Dec 6, 2023

joni-herttuainen commented Dec 6, 2023

joni-herttuainen commented Dec 5, 2023 •

edited

Loading