Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add osmium support for handling different kind of files #15

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

vesavlad
Copy link

@vesavlad vesavlad commented Aug 19, 2020

Still a work in progress and would kindly appreciate some review/feedback since some namings are not so clear.
Closes: #10

@patrickbr
Copy link
Member

patrickbr commented Aug 19, 2020

Thank you very much for your work! I will look over the code in the next days. How did you change the general workflow in OsmBuilder? Have you run any tests regarding memory consumption and parsing times? Is XML parsing now faster or slower than before?

In general, I am still a bit hesitant to use libosmium here. It's a huge additional dependency. In particular, it introduces Boost as a dependency, which I would like to avoid. If the main goal is to support .pbf files, I still think it would be a better approach to just parse the .pbf files directly. But maybe I am wrong :)

@vesavlad
Copy link
Author

vesavlad commented Aug 19, 2020

Currently this is still a WIP so currently what is done only reading the data through the libosmium.
How did you change the general workflow in OsmBuilder?

  • didn't changed anything regarding the processing flow.

Have you run any tests regarding memory consumption and parsing times?

  • not yet

Is XML parsing now faster or slower than before?

  • don't know yet

The idea was to keep the application "logic" as you have written it since there is still time required for me to understand in detail what is done there.

Also please don't hesitate to:

  • ask questions regarding why something is done in one way or another
  • suggest improvements to code structure (will try to move some things around and make some things more generic but first I wanted to make things work)

Honestly might be good to:

  • better document the classes so that any other person who want's to contribute to get around quickly. And by documenting I suggest using doxygen or something like that => maybe this can be turned into an issue
  • better document the workflow used

This is a very practical application that ads a huge benefit for processing gtfs data for agencies that do no generate their shapes for GTFS.
Thanks for developing it will try to contribute as much as I can.

@vesavlad
Copy link
Author

One more note: the pull request also contains some clang code improvement suggestions.

@derhuerst
Copy link
Contributor

@patrickbr I'm curious what's holding this PR back? Is it that you didn't have time/energy/motivation to review this yet, or is it the general direction (e.g. the Boost dependency) that you're unhappy with?

I'm currently map-matching many GTFS feeds using pfaedle (thanks for this tool btw!), and it has to re-read a 12gb OSM XML file for every GTFS feed. I hope that reading ~700mb of .pbf would be faster.

@derhuerst
Copy link
Contributor

I also noticed that pfaedle seems to read this file multiple times, once per matching iteration. In my case, it reads & parses the 12gb de-bw-buffered.osm file three times. Within Docker for macOS on my old laptop, each read takes ~15min.

@laem
Copy link

laem commented Feb 29, 2024

I must admit that having to handle a > 15 Go bz2 file instead of a 4,5 Go pbf file makes this lib harder to try. Thanks for the work on this PR !

@patrickbr
Copy link
Member

patrickbr commented Mar 1, 2024

Thank you again for all your efforts here. I have been hesitant to merge this PR because it would add major dependencies (libosmium and boost). I am not happy with that. Also, it was opened before a major refactoring and rewrite of large parts of the tool in 2021. The more sophisticated OSM formats (o5m, protobuf) are not that hard to parse, and I would still prefer a simple solution which just reads these formats directly, without going through libosmium. The main benefit that libosmium adds besides format parsing is reference resolution and the construction of ready-to-use geometrical objects. The techniques to do that are already there in the pfaedle code, all that is missing is a drop-in replacement of the XML parser with an o5m or protobuf parser.

I have been working on that for a few months now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use .pbf files instead of .osm files
4 participants