Hey karussell, I really appreciate all the hard work you’ve put into Graphhopper. I wouldn't be able to create this project without GH. I have a question about memory usage during the import stage (specifically in the OSM Reader's preprocessRelations function). I'm using a HashMap<Long, List<Long>> to map way IDs to OSM bike route relation IDs, which means allocating lots of arrays. Could this be causing me to run out of heap memory faster or am I off base here?
I thought I would be able to compute the graph with 64GB of ram but it kept crushing before CH and LM stage. After switching to a 128GB instance, it finally worked, hitting around 90GB at peak memory usage. For context, I was using 3 profiles - one with CH and two with LM, plus elevation data and used all of the tips from deploy.md
Love your project!
Maybe you already considered, but there are a number of collection libraries out there that are optimized for holding Java primitives and/or for very large sets of data, which could help you save significant memory. Eclipse Collections [0] and Fastutil [1] come to mind first, but there are many out there [2]
[0] https://github.com/eclipse-collections/eclipse-collections [1] https://fastutil.di.unimi.it/ [2] https://github.com/carrotsearch/hppc/blob/master/ALTERNATIVE...
Thank you! I'm a total Java noob - actually, this is the first project where I've written any Java code (had to slightly modify the Graphhopper source code to suit my needs). Those libraries look very interesting. I'm saving this post for another battle with processing maany GBs of OSM data :D
We already use carrotsearch internally so you could replace the java util classes like HashMap and HashList with it to reduce memory usage a bit. But it won't help much. E.g. all data structures (in any standard library btw) do usually double their size at some point when their size increases and then copy from the old internal array to the new internal array, which means that you need roughly 3x the current size and if that happens roughly at the end of the import process you have a problem. For that reason we developed DataAccess (inmemory or MMAP possible) which is basically a large List but 1. increases only segment by segment and 2. allows more than 2 billion items (signed int).
Another trick for planet size data structure could be to use a List instead of the Map and the OSM ID as index. Because the memory overhead of a Map compared to a List is huge (and you could use DataAccess) and the OSM IDs for planet are nearly adjacent or at least have not that many gaps (as those gaps are refilled I think).
All these tricks (there are more!) are rather tricky&low level but necessary for memory efficiency. A simpler way for your use case could be to just use a database for that, like MapDB or sqlite. But this might be (a lot) slower compared to in-memory stuff.
> Could this be causing me to run out of heap memory faster
Yes, definitely.
> I thought I would be able to compute the graph with 64GB of ram but it kept crushing before CH and LM stage.
For normal GraphHopper and just the EU the 64GB should be more than sufficient.