-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Environment #3
Comments
This is the result coming from my environment that includes pygeos, with no changes to the code (some of which will also be significant). |
Thanks for the comment and your results! Generally, I don't expect super performance from Python and R - this is the domain of low-level languages. My idea was a simple comparison of packages for vector data processing without code optimization, i.e. I used simple functions available in the packages. I used _Pop!OS 20.04 LTS system (based on Ubuntu 20.04 Focal Fossa) and the software available in the repository by default. I downloaded Python packages from PIP and R packages from CRAN. I didn't use I'm surprised how much the distance calculation performance has improved in particular, nice. |
Here more information about the environment used. Let me know if anything more is needed.
terra::gdal(lib = "all")
#> gdal proj geos
#> "3.0.4" "6.3.1" "3.8.0"
sf::sf_extSoftVersion()
#> GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H PROJ
#> "3.8.0" "3.0.4" "6.3.1" "true" "true" "6.3.1"
geos::geos_version()
#> [1] ‘3.10.0’ Python packages``` > pip list Package Version ----------------------- ---------------------------------- affine 2.3.0 appdirs 1.4.4 attrs 19.3.0 beautifulsoup4 4.8.2 blinker 1.4 Brlapi 0.7.0 cachetools 4.2.2 certifi 2019.11.28 cftime 1.4.1 chardet 3.0.4 chrome-gnome-shell 0.0.0 click 8.0.3 click-plugins 1.1.1 cligj 0.7.2 cloudpickle 1.6.0 colorama 0.4.3 command-not-found 0.3 cryptography 2.8 cupshelpers 1.0 cycler 0.10.0 dask 2021.4.1 datacube 1.8.3 dbus-python 1.2.16 decorator 4.4.2 defer 1.0.6 distributed 2021.4.1 distro 1.4.0 entrypoints 0.3 Fiona 1.8.21 fsspec 2021.4.0 future 0.18.2 GDAL 3.0.4 geocube 0.0.16 geopandas 0.9.0 gpg 1.13.1-unknown greenlet 1.0.0 HeapDict 1.0.1 hidpidaemon 18.4.6 html5lib 1.0.1 httplib2 0.14.0 idna 2.8 importlib-metadata 1.5.0 ipython-genutils 0.2.0 Jinja2 2.10.1 jsonschema 3.2.0 jupyter-core 4.6.3 keyring 18.0.1 kiwisolver 1.0.1 language-selector 0.1 lark-parser 0.11.2 launchpadlib 1.10.13 lazr.restfulclient 0.14.2 lazr.uri 1.0.3 locket 0.2.1 louis 3.12.0 lxml 4.5.0 macaroonbakery 1.3.1 MarkupSafe 1.1.0 matplotlib 3.1.2 more-itertools 4.2.0 msgpack 1.0.2 munch 2.5.0 nbformat 5.0.4 netCDF4 1.5.6 netifaces 0.10.4 numpy 1.17.4 oauthlib 3.1.0 olefile 0.46 OWSLib 0.19.1 packaging 21.2 pandas 1.2.2 partd 1.2.0 Pillow 7.0.0 pip 20.0.2 plotly 4.4.1 pop-transition 1.1.2 protobuf 3.6.1 psutil 5.8.0 psycopg2 2.8.4 pycairo 1.16.2 pycups 1.9.73 pydbus 0.6.0 Pygments 2.3.1 PyGObject 3.36.0 PyJWT 1.7.1 pymacaroons 0.13.0 PyNaCl 1.3.0 PyOpenGL 3.1.0 pyparsing 2.4.6 pyproj 2.5.0 PyQt5 5.14.1 pyRFC3339 1.1 pyrsistent 0.15.5 python-apt 2.1.2pop0-1587756471-20.04-cd2988e python-dateutil 2.7.3 python-debian 0.1.36ubuntu1 python-xlib 0.23 pytz 2019.3 pyxdg 0.26 PyYAML 5.3.1 rasterio 1.2.10 rasterstats 0.16.0 repoman 1.2.2 requests 2.22.0 requests-unixsocket 0.2.0 retrying 1.3.3 rioxarray 0.10.0 scipy 1.6.3 screen-resolution-extra 0.0.0 SecretStorage 2.3.1 sessioninstaller 0.0.0 setuptools 45.2.0 Shapely 1.7.1 simplejson 3.16.0 sip 4.19.21 six 1.14.0 snuggs 1.4.7 sortedcontainers 2.3.0 soupsieve 1.9.5 SQLAlchemy 1.4.12 ssh-import-id 5.10 systemd-python 234 tblib 1.7.0 toolz 0.11.1 tornado 6.1 traitlets 4.3.3 ubuntu-advantage-tools 27.6 ubuntu-drivers-common 0.0.0 ufw 0.36 urllib3 1.25.8 wadllib 1.3.3 webencodings 0.5.1 wheel 0.34.2 wxPython 4.0.7 xarray 0.17.0 xkit 0.0.0 zict 2.0.0 zipp 1.0.0 ``` |
No, it doesn't. Dask-geopandas would but pygeos is single-threaded, but vectorized. It is going to be shapely 2.0 and once released as such, a default geometry engine in geopandas. At the moment it is treated as experimental (though stable). |
Thanks for the clarification! Honestly, I've never used |
Yes and no :D. GeoPandas' default geometry engine is shapely. And pygeos has been integrated to shapely. So while we will never require pygeos to be installed explicitly, it will be factually installed when you install shapely 2.0 (to be released soon-ish, 95% of work is done). It is a long process aimed at consolidation of the ecosystem. Users of geopandas will get the speedup you see on my results for free essentially, without a need to change anything in their code. As you get now, if pygeos is installed. |
Great, so the best solution is if I install |
Ideally with the changes proposed in #5 as some of the code is not following the ideal pattern now. |
@martinfleis, could you check if the results are reproducible for The only problem I haven't noticed before is:
when I want to plot the points from sample.py. I see this is related to GEOS 3.3 (shapely/shapely#1001) but I have GEOS 3.8.
|
Yes, if you ensure to have shapely >= 2.0, then it's best to remove pygeos (otherwise geopandas will still use pygeos for now, giving some overhead in converting between pygeos and shapely) |
What code are you using to plot? |
|
But that result is supposed to only contain points, right? Not sure how that can trigger that warning .. |
Yes, points only. Anyway, the figure looks correct. |
Hi, this is a great initiative.
As geopandas is currently in the state of performance migration sort of, the out of the box performance is not necessarily the best one (I'll leave another issue on that). I wanted to check the environment to see if you do have
pygeos
engine installed and what are the versions of GEOS and the libraries but it doesn't seem to be listed.How do you create an environment for these tests?
The text was updated successfully, but these errors were encountered: