Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datashader and polygons: Cannot interpret MultiPolygonDtype(float64) as data type #1134

Closed
johannesnauta opened this issue Oct 15, 2022 · 3 comments
Assignees

Comments

@johannesnauta
Copy link

johannesnauta commented Oct 15, 2022

Description of expected behavior and the observed behavior

I get errors following the exact code outlined in the the tutorial on working with polygons in Datashader.
More specifically the error boils down to a specific polygon data type:

TypeError: Cannot interpret 'MultiPolygonDtype(float64)' as a data type

The only related issue I found was a spatialpandas Github issue, which indicated that a solution would be to downgrade to older versions of geopandas. However, this did nothing on my systems and seeing that the issue was raised more than 2 years ago I do not think downgrading multiple versions would be useful by any means.

I have also tried converting the datatype to other datatypes that Datashader might understand, but have yet to succeed in producing anything.

How come that the example given by the Datashader developer does not work on my end due to some typing error? Is this an underlying issue with different versions of the used libraries?

Complete, minimal, self-contained example code that reproduces the issue

import pandas as pd
import numpy as np
import dask.dataframe as dd
import colorcet as cc
import datashader as ds
import datashader.transfer_functions as tf
import spatialpandas as spd
import spatialpandas.geometry
import geopandas as gpd

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world.to_crs(epsg=4087) # simple cylindrical projection
world['boundary'] = world.geometry.boundary
world['centroid'] = world.geometry.centroid

# Convert the geopandas GeoDataFrame to spatialpandas GeoDataFrame for Datashader to use
df_world = spd.GeoDataFrame(world)

cvs = ds.Canvas(plot_width=650, plot_height=400)
agg = cvs.polygons(df_world, geometry='geometry', agg=ds.mean('pop_est'))
tf.shade(agg)
Output with error message

Output

TypeError                                 Traceback (most recent call last)
Cell In [60], line 19
   16 # Convert the geopandas GeoDataFrame to spatialpandas GeoDataFrame for Datashader to use
   17 df_world = spd.GeoDataFrame(world)
---> 19 tf.shade(cvs.polygons(df_world, geometry='geometry', agg=ds.mean('pop_est')))
   20 cvs = ds.Canvas(plot_width=650, plot_height=400)
   21 agg = cvs.polygons(df_world, geometry='geometry', agg=ds.mean('pop_est'))

File ~/.local/lib/python3.10/site-packages/datashader/core.py:752, in Canvas.polygons(self, source, geometry, agg)
  750     agg = any_rdn()
  751 glyph = PolygonGeom(geometry)
--> 752 return bypixel(source, self, glyph, agg)

File ~/.local/lib/python3.10/site-packages/datashader/core.py:1265, in bypixel(source, canvas, glyph, agg)
 1263     if len(cols_to_keep) < len(source.columns):
 1264         source = source[cols_to_keep]
-> 1265     dshape = dshape_from_pandas(source)
 1266 elif isinstance(source, dd.DataFrame):
 1267     dshape = dshape_from_dask(source)

File ~/.local/lib/python3.10/site-packages/datashader/utils.py:442, in dshape_from_pandas(df)
  440 def dshape_from_pandas(df):
  441     """Return a datashape.DataShape object given a pandas dataframe."""
--> 442     return len(df) * datashape.Record([(k, dshape_from_pandas_helper(df[k]))
  443                                        for k in df.columns])

File ~/.local/lib/python3.10/site-packages/datashader/utils.py:442, in <listcomp>(.0)
  440 def dshape_from_pandas(df):
  441     """Return a datashape.DataShape object given a pandas dataframe."""
--> 442     return len(df) * datashape.Record([(k, dshape_from_pandas_helper(df[k]))
  443                                        for k in df.columns])

File ~/.local/lib/python3.10/site-packages/datashader/utils.py:433, in dshape_from_pandas_helper(col)
  431 elif isinstance(col.dtype, (RaggedDtype, GeometryDtype)):
  432     return col.dtype
--> 433 dshape = datashape.CType.from_numpy_dtype(col.dtype)
  434 dshape = datashape.string if dshape == datashape.object_ else dshape
  435 if dshape in (datashape.string, datashape.datetime_):

File ~/.local/lib/python3.10/site-packages/datashape/coretypes.py:779, in CType.from_numpy_dtype(self, dt)
  777 except KeyError:
  778     pass
--> 779 if np.issubdtype(dt, np.datetime64):
  780     unit, _ = np.datetime_data(dt)
  781     defaults = {'D': date_, 'Y': date_, 'M': date_, 'W': date_}

File /usr/lib/python3/dist-packages/numpy/core/numerictypes.py:418, in issubdtype(arg1, arg2)
  360 r"""
  361 Returns True if first argument is a typecode lower/equal in type hierarchy.
  362 
 (...)
  415 
  416 """
  417 if not issubclass_(arg1, generic):
--> 418     arg1 = dtype(arg1).type
  419 if not issubclass_(arg2, generic):
  420     arg2 = dtype(arg2).type

TypeError: Cannot interpret 'MultiPolygonDtype(float64)' as a data type
}

ALL software version info

pandas=1.4.4
numpy=1.21.5
colorcet=3.0.1
datashader=0.14.2
spatialpandas=0.4.4
geopandas=0.11.1
@ianthomas23
Copy link
Member

I cannot reproduce this is a new conda environment using the same versions of the libraries that you have (excluding colorcet which isn't used in the reproducer):

$ conda create -n temp 
$ conda activate temp
$ conda install -c pyviz -c conda-forge pandas==1.4.4 numpy==1.21.5 datashader===0.14.2 spatialpandas==0.4.4 geopandas==0.11.1
$ conda list | grep "pandas\|datashader\|numpy\|datashape"
datashader                0.14.2                     py_0    pyviz
datashape                 0.5.4                      py_1    conda-forge
geopandas                 0.11.1             pyhd8ed1ab_0    conda-forge
geopandas-base            0.11.1             pyha770c72_0    conda-forge
numpy                     1.21.5           py39h42add53_3  
numpy-base                1.21.5           py39hadd41eb_3  
pandas                    1.4.4            py39he7125aa_0    conda-forge
spatialpandas             0.4.4                      py_0    pyviz

Can you check what version of datashape you have installed, although it hasn't changed for many years and should be 0.5.4?

I see that you are using your system's python and numpy and are using pip install --user for other packages into ~/.local. I would be happier if you were using an isolated environment by either using conda, or pip into a virtual environment, as then we could be more sure that the packages are consistent.

@johannesnauta
Copy link
Author

johannesnauta commented Oct 17, 2022

I reinstalled everything in a new virtual environment and somehow it appears to work. As you indeed mentioned my Jupyter notebook probably did not use all libraries from my previously created virtual environment (with python -m venv). I always aim to run my code in virtual environments, but somehow this got messed up in my Jupyter notebook as it did not default to the correct kernel when I restarted it at some point. I apologize for this.

Interestingly, when I reverted back to the versions mentioned in the spatialpandas Github issue the example did work, even when my virtual environment was all jumbled up.

Perhaps still relevant, the version of datashape that is installed in my virtual environment is 0.5.2, is it worth upgrading at least?

@ianthomas23
Copy link
Member

Thanks for trying out a new virtual environment and reporting back.

If you have datashape 0.5.2 then another dependency must have requested that particular version. If everything looks like it is working OK then I would be inclined to leave it as it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants