Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java+MapDB implementation may be feasible #8

Open
abyrd opened this issue Feb 5, 2015 · 3 comments
Open

Java+MapDB implementation may be feasible #8

abyrd opened this issue Feb 5, 2015 · 3 comments

Comments

@abyrd
Copy link
Member

abyrd commented Feb 5, 2015

Contrary to initial impressions, it turns out that with TreeMaps and compression enabled, a MapDB disked-backed DB gives quite reasonable sizes and speeds when storing large amounts of OSM data.

When we get into applying updates and other complicated things, it's not worth the maintenance effort on a C storage backend code if we have a nice Java library that is within say 70% performance-wise.

For comparison:

New York PBF (130MB) loaded into MapDB:

  • TreeMap is 306MB.
  • HashMap is much bigger at 626MB.

Netherlands PBF (size 941MB):

  • Takes 7 minutes to load into TempFileDB and the files are 2.2GB.
  • Size is 2.3x PBF, time is 7.4 min/GB.
  • Using MemoryDirectDB, NL PBF takes about 4 minutes to load.
  • Outputting VEX format takes 3min 22sec.

On SSD load time of Great Britain (PBF 665MB) is 3 minutes.

  • Writing VEX format takes 1:40.
  • MapDB file is 1.5G, so again about 2.2x the PBF size, 4.5 minutes per GB of PBF.
  • Extrapolating to Planet at 25.2GB PBF: 60GB, 113 minutes to load

Loading the entire planet.pbf on dev.opentripplanner.org to an SSD:

  • 26GB PBF loaded into MapDB in just under 2 hours
  • MapDB combined file size was 58 GB
  • Wrote out VEX in 1 hour 17 minutes, size 17GB (but metadata was stripped)
  • We still need the spatial index. That is a navigable map from Slippy map tiles (x,y) to a list of every way in the tile.
  • There are only 272 Mways in OSM. These will fit neatly in a hashmap as a spatial index.
  • Index scanning can be done with subMap(Fun.t2(x0, y0), Fun.t2(x1, y1)
@abyrd
Copy link
Member Author

abyrd commented Feb 6, 2015

Twitter exchange in which MapDB author Jan Kotek reports on loading OSM:
https://twitter.com/n_colomer/status/433343722625826817

Recent discussion on loading OSM into MapDB:
https://groups.google.com/forum/#!topic/mapdb/EaU4vV7Gyhk

And a mysterious empty Github repository, intended to use MapDB as an Osmosis data store:
https://github.com/komoot/osmosis-mapdb

I have shared my observations about MapDB+OSM with the MapDB mailing list:
https://groups.google.com/d/msg/mapdb/EaU4vV7Gyhk/tmjYVvZa3GEJ

@abyrd
Copy link
Member Author

abyrd commented Feb 28, 2015

A MapDB based Java implementation now exists within the OpenTripPlanner repo. It is intended to become an external OSM loading library but will remain within OTP until it becomes stable to avoid constantly updating dependencies in both directions.

https://github.com/opentripplanner/OpenTripPlanner/blob/master/src/main/java/org/opentripplanner/osm/VanillaExtract.java

@abyrd
Copy link
Member Author

abyrd commented May 1, 2015

This implementation has been moved to https://github.com/conveyal/osm-lib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant