Skip to content

WIP: Add return_table helper function #1336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

willschlitzer
Copy link
Contributor

This is a proof-of-concept pull request for the return_tablehelper function in utils.py. It accepts the default string output of a table from the GMT C API and can return either a string, numpy array, or pandas DataFrame. This is a possible answer for #1318.

Fixes #

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

  • /format: automatically format and lint the code
  • /test-gmt-dev: run full tests on the latest GMT development version

@willschlitzer willschlitzer added the feature Brand new feature label Jun 16, 2021
@willschlitzer willschlitzer added this to the 0.5.0 milestone Jun 16, 2021
@willschlitzer willschlitzer self-assigned this Jun 16, 2021
@@ -267,3 +269,48 @@ def args_in_kwargs(args, kwargs):
If one of the required arguments is in ``kwargs``.
"""
return any(arg in kwargs for arg in args)


def return_table(result, data_format, format_parameter, df_columns):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising! A few comments:

  1. I think you will need to use this with one of the existing functions so that we can test it out in this PR. In my opinion, grdtrack would be a good option.
  2. My preference would be to add a format_options parameter for return_table. This could take a list, with defaults values including numpy, pandas, and str. This way, if it doesn't for example make sense to return string values then that could not be given as an option in the individual function documentation/implementation.
  3. table-like options for input are now str or numpy.ndarray or pandas.DataFrame or xarray.Dataset or geopandas.GeoDataFrame. It would be nice if all of these were output options too.
  4. I would prefer a more description argument for requested data format than a|d|s. Something like numpy|pandas|str seems more readable.
  5. I think df_columns needs to be optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3. table-like options for input are now `str or numpy.ndarray or 
pandas.DataFrame or xarray.Dataset or geopandas.GeoDataFrame`. 
It would be nice if all of these were output options too.

Sounds good; I'll have to get smarter on the last two but I don't see why it should be a problem.

4. I would prefer a more description argument for requested data format than **a**|**d**|**s**. 
Something like `numpy`|`pandas`|`str` seems more readable.

I like the idea of keeping it short, especially when there is a default option (I anticipate it being a numpy array) and the strings are not also the same word as Python modules or variable types. But I understand how the single letters could be confusing.

5. I think `df_columns` needs to be optional.

Since this is a helper function, I envisioned that the argument for df_columns would be set up in the GMT function that is using it, such as using ["x", "y", "z"] when calling this function inside of grd2xyz.

Copy link
Contributor Author

@willschlitzer willschlitzer Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3. table-like options for input are now `str or numpy.ndarray or pandas.DataFrame 
or xarray.Dataset or geopandas.GeoDataFrame`. 
It would be nice if all of these were output options too.

Added in c8cef5e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I would prefer a more description argument for requested data format than a|d|s.
    Something like numpy|pandas|str seems more readable.

I like the idea of keeping it short, especially when there is a default option (I anticipate it being a numpy array) and the strings are not also the same word as Python modules or variable types. But I understand how the single letters could be confusing.

I'll have to agree with Meghan that long descriptive names like numpy|pandas|str are preferable 🙂

@willschlitzer
Copy link
Contributor Author

Added return_table() to grdtrack. I have a few issues:

  1. I'm not sure how to inform the user of the ValueError (lines 307-308) if it can't convert a value in the table to a float (such as a title or section header). My initial reaction was to print that it cannot convert the value to float, but that is a TON of prints, as every line of text is split up by word.
  2. Specifically for grdtrack, I removed the use of data_kind since column names don't need to be specified by the user (but can be with the df_columns= parameter), but should there still be a check of the information format of the table?

@willschlitzer
Copy link
Contributor Author

I can't see why this is failing deployment, but it is running into a ModuleNotFoundError: No module named 'geopandas' for the Python 3.7 CI job. I asked the question in #1354 about making geopandas a dependency, and assume the fix for both pull requests will be the same.

@willschlitzer
Copy link
Contributor Author

@weiji14 I tried adding in gpd = pytest.importorskip("geopandas") but am still running into a ModuleNotFound error for geopandas on Python 3.7/NumPy 1.17. Any idea how to fix this to make the tests pass?

@willschlitzer
Copy link
Contributor Author

I'm unable to figure out why this is causing the deployments to fail. My guess is it has something to do with trying to incorporate it into grdtrack (as deployment and tests started to fail after that change) but I'm not sure why that is causing a problem.

@willschlitzer willschlitzer mentioned this pull request Jun 29, 2021
5 tasks
@@ -10,6 +10,9 @@
from collections.abc import Iterable
from contextlib import contextmanager

import geopandas as gpd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import geopandas line shouldn't be here at the top-level. I'd suggest importing geopandas in the return_table function itself if you need it, and only under elif data_format=="geopandas".

Suggested change
import geopandas as gpd

points,
grid,
data_format="d",
df_columns=["longitude", "latitude", "z-value"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's a good idea to have default column names, especially for the x (longitude) and y (latitude) columns since someone passing in a pandas.DataFrame table with existing column names would have their column names overridden by this default.

@willschlitzer willschlitzer changed the title Add return_table helper function WIP: Add return_table helper function Jul 12, 2021
@willschlitzer
Copy link
Contributor Author

Closing this PR; I think there may be better ways to return table-like data, but I think the best move is to wrap some more of those functions that do return table like data before trying to figure out how to best refactor them to use a helper function.

@seisman seisman modified the milestones: 0.5.0, 0.4.1 Jul 25, 2021
@weiji14 weiji14 removed this from the 0.4.1 milestone Aug 10, 2021
@weiji14 weiji14 deleted the table-output-function branch August 10, 2021 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Brand new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants