Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do something more sensible with data from pandas #90

Open
ngoldbaum opened this issue Jun 20, 2019 · 2 comments
Open

Do something more sensible with data from pandas #90

ngoldbaum opened this issue Jun 20, 2019 · 2 comments

Comments

@ngoldbaum
Copy link
Member

  • unyt version: v2.2.0+7.g5d3ace5'
  • Python version: 3.6.8
  • Operating System: Ubuntu 18.04

Description

If you apply units to a pandas dataframe you get back something that doesn't actually have any units:

In [1]: import unyt as u
data
In [2]: import pandas as pd

In [3]: data = pd.read_csv('/home/goldbaum/Documents/rc-co2monitor/co2data.csv')

In [4]: t = data['Temperature']*u.degC

In [5]: t.units
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-7e2982815421> in <module>
----> 1 t.units

~/.pyenv/versions/3.6.8/lib/python3.6/site--packages/pandas/core/generic.py in __getattr__(self, name)
   5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5066                 return self[name]
-> 5067             return object.__getattribute__(self, name)
   5068
   5069     def __setattr__(self, name, value):

AttributeError: 'Series' object has no attribute 'units'

In [6]: type(t)
Out[6]: pandas.core.series.Series

Adding full support for pandas data types may be a lot to ask for, in which case we should somehow detect whether we're handed a pandas series or dataframe (preferably without needing to actually import pandas) and then raise an error telling the user to convert data to numpy arrays first.

@l-johnston
Copy link
Contributor

Another option, and a very light touch to unyt, is to register an accessor with pandas. I have prototyped this and usage looks like:

>>> import pandas as pd
>>> import unyt
>>> data = pd.DataFrame({"Temperature":[0.0, 23.0, 55.0]})
>>> data.Temperature.unyt.set_units("degC")
unyt_array([ 0., 23., 55.], 'degC')

Is this approach of interest?

@ngoldbaum
Copy link
Member Author

I’d probably need to see more details on how this would work inside a pandas workflow. Feel free to open a PR but please do include some usage examples that demonstrate how this would be useful.

I’d also like it if we could avoid importing pandas (or at least delay importing pandas until it’s needed) as that would increase the import time cost for the whole library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants