You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use the read_parquet function in my code, and I've encountered some trouble where the documentation says one thing and the code seems to be doing something different.
These are the two discrepancies I have found, but there could be others:
The engine parameter cannot really be set. dask-geopandas uses a custom GeoArrowEngine engine, and trying to specify anything else raises an exception.
The documentation says that for split_row_groups "Default is True if a _metadata file is available or if the dataset is composed of a single file (otherwise defult is False).". It looks like the code does not even try to provide a default value and just passes it directly to dask.dataframe.read_parquet, which defaults to False in all cases.
The text was updated successfully, but these errors were encountered:
Your first point is correct, only the custom engine is allowed in dask-geopandas I think.
For your second point, maybe raise an issue in dask/dask if you think that documentation isn't accurate there too. I know there's been some churn around that parameter recently and it may be out of date.
As for what to do here, I'm not sure. dask-geopandas mostly aligns with dask.dataframe.read_parquet, so it's nice to pick up the changes from there automatically. Dask does include a derived_from decorator that can be used to copy over docstrings, with some control over things like "unused arugments": https://github.com/dask/dask/blob/34a1e88bb3f6196361f398ddab55e59d315d8d40/dask/utils.py#L818. Perhaps that could be used to add a caveat about the engine.
I see, thank you. What confused me is the fact that the dask documentation is different than the dask-geopandas one for split_row_groups. But maybe as you said it could just be out of date.
I've noticed the effects of the derived_from decorator, where some pages (example) have a disclaimer saying that this docstring was copied from somewhere else. Using the decorator for read_parquet and to_parquet should make the same disclaimer appear there, but I believe that would involve rewriting the function definitions so that the arguments used are listed explicitly and not just packed into *args and *kwargs.
I am trying to use the
read_parquet
function in my code, and I've encountered some trouble where the documentation says one thing and the code seems to be doing something different.These are the two discrepancies I have found, but there could be others:
engine
parameter cannot really be set.dask-geopandas
uses a customGeoArrowEngine
engine, and trying to specify anything else raises an exception.split_row_groups
"Default is True if a _metadata file is available or if the dataset is composed of a single file (otherwise defult is False).". It looks like the code does not even try to provide a default value and just passes it directly todask.dataframe.read_parquet
, which defaults to False in all cases.The text was updated successfully, but these errors were encountered: