-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with volume breaking algorithm and solutions #27
Labels
Comments
bertfrees
referenced
this issue
in bertfrees/dotify.formatter.impl
Dec 10, 2020
The reason is explained in issue https://github.com/mtmse/dotify.formatter.impl/issues/5. While an algorithm that favours solutions with more even volumes would be useful to have, EvenSizeVolumeSplitter (despite its name) does not actually accomplish this. Moreover, it has many problems, deteriorating the volume breaking more than improving it. Until we find a way to implement it properly, the whole feature is dropped.
bertfrees
referenced
this issue
in bertfrees/dotify.formatter.impl
Dec 10, 2020
The reason is explained in issue https://github.com/mtmse/dotify.formatter.impl/issues/5. While an algorithm that favours solutions with more even volumes would be useful to have, EvenSizeVolumeSplitter (despite its name) does not actually accomplish this. Moreover, it has many problems, deteriorating the volume breaking more than improving it. Until we find a way to implement it properly, the whole feature is dropped.
bertfrees
referenced
this issue
in bertfrees/dotify.formatter.impl
Jan 13, 2021
The reason is explained in issue https://github.com/mtmse/dotify.formatter.impl/issues/5. While an algorithm that favours solutions with more even volumes would be useful to have, EvenSizeVolumeSplitter (despite its name) does not actually accomplish this. Moreover, it has many problems, deteriorating the volume breaking more than improving it. Until we find a way to implement it properly, the whole feature is dropped.
kalaspuffar
referenced
this issue
in mtmse/dotify.formatter.impl
Feb 22, 2021
The reason is explained in issue https://github.com/mtmse/dotify.formatter.impl/issues/5. While an algorithm that favours solutions with more even volumes would be useful to have, EvenSizeVolumeSplitter (despite its name) does not actually accomplish this. Moreover, it has many problems, deteriorating the volume breaking more than improving it. Until we find a way to implement it properly, the whole feature is dropped.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The volume breaking algorithm has a number of issues that result in a slow convergence and no guarantee that the final solution is actually a good one:
EvenSizeVolumeSplitter does not compare solutions. All it does is step into a certain direction at every iteration, increasing the number of volumes when content does not fit, or decreasing the number of volumes when a good solution has been found, and keep doing this until it ends up at a solution where it was before. Because of this, and because there are two variables that determine the next solution, namely the volume offset and the sheet count, the final solution depends a lot on the (two-dimensional) path taken by the algorithm. Experience shows that small changes in the algorithm (how the two variables are computed) can influence the path and the final solution a lot.
A second problem is that EvenSizeVolumeSplitter doesn't even promote even volumes. The only thing it does is push the average number of sheets per volume down in order to push the size of the last volume up. At the moment it doesn't control the relation between the other volumes (these are kept equal by the distance penalty in VolumeProvider). Moreover, it probably isn't even very effective at getting the size of the last volume in balance with with the other volumes because it doesn't actually check the individual sizes, it's guesswork. In other words, the EvenSizeVolumeSplitter is basically useless.
These are the solutions that come to mind:
If there would be only one variable, namely the volume count (and if we would keep the target size of each volume constant at the maximum value), there would be no "randomness" because the path would be one-dimensional. EvenSizeVolumeSplitter could be eliminated. This would make the algorithm much faster, but it would still not favour solutions with more even volumes.
We could also do a more systematic search to make sure we find the optimum. For instance, whenever a volume is added we could start with the maximum amount of sheets and decrease the number of sheets (step by step, or with a binary search) until an optimum is found. Note that we would actually need to check the results (rather than descreasing the number of sheets until the content does not fit anymore).
The solution that I prefer would be to eliminate EvenSizeVolumeSplitter and do the "even sizes" optimization in VolumeProvider's cost function. The distance penalty should already keep the volume sizes close to some average value, which indirectly leads to even volumes. That is, if the optimization is done well. The problem is that the optimization is currently done in a greedy manner which does not lead to a global optimum, and especially for the last volume it can lead to big a deviation from the average volume size (and this is what EvenSizeVolumeSplitter tries to fix). A proper optimization could be achieved with something like the a* (a-star) algorithm. Of course the big question is: how much more processing will this require? In order to find a global optimum we need to try a lot more solutions. On the other hand, if instead of 50 we only need a handful of iterations (to let the cross-references settle down) that brings the total processing down. To get an idea of the net effect on the processing and the memory consumption, I have done some simulations.
In the following we assume a limit of 40 sheets per volume, and a worst-case scenario of 50 iterations for the greedy method:
Another difference with the greedy method is the number of unfinished solutions that need to be kept in memory: with the greedy method there is at most one solution that we need to save, while with the a* method we may need to save as many unfinished solutions as there are sheets in the whole document:
The text was updated successfully, but these errors were encountered: