Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for reduce returning nested information #132

Open
3 tasks
dougbrn opened this issue Aug 7, 2024 · 1 comment
Open
3 tasks

Improve support for reduce returning nested information #132

dougbrn opened this issue Aug 7, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@dougbrn
Copy link
Collaborator

dougbrn commented Aug 7, 2024

Feature request
We expect users that are applying their functions via reduce to sometimes want more complex outputs. For example, if I'm applying Lombscargle to all lightcurves in my nested dataset, I may want a result that has a max power column, a frequency of max power column, and a nested column holding the full periodogram. Presently, we don't really support this as we primarily want the user to return a dictionary-style output, which will limit the user towards producing list columns rather than nested columns, for example:

from nested_pandas.datasets import generate_data
import numpy as np

ndf = generate_data(3,20)

def complex_output(flux):
    return {"max_flux":np.max(flux), "flux_quantiles":np.quantile(flux, [0.1,0.2,0.3,0.4,0.5])}

ndf.reduce(complex_output, "nested.flux")

	max_flux	flux_quantiles
0	98.744076	[15.293187217097268, 21.834338973710633, 25.02...
1	98.502034	[6.337989346945357, 8.019180689729948, 9.69707...
2	99.269021	[12.42551556001139, 15.901779148332189, 26.199...

With #131 it would be easier to convert this to a nested output, but I think ideally reduce would have a more native ability to produce nested structures for these types of functions. My thinking about how we might try to facilitate this, is to read more into the user defined dictionary output to determine the nestedframe. For example, a user could instead specify this reduce function:

def complex_output(flux):
    return {"max_flux":np.max(flux), "quantiles":{"flux_quantiles":np.quantile(flux, [0.1,0.2,0.3,0.4,0.5])}}

The json-like nesting of dataframes would signal that the user would like a "quantiles" nested column with a "flux_quantiles" field. I'm not entirely sure on the full implementation plan, but this seems most intuitive from the users perspective.

Before submitting
Please check the following:

  • I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
  • I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
  • If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.
@dougbrn dougbrn added the enhancement New feature or request label Aug 7, 2024
@dougbrn
Copy link
Collaborator Author

dougbrn commented Aug 9, 2024

Noticing this is actually half a resubmit of #101, but each ticket proposes a different solution. We should talk about this at an upcoming meeting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant