Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with volume breaking algorithm and solutions #27

Open
bertfrees opened this issue Nov 18, 2019 · 0 comments
Open

Issues with volume breaking algorithm and solutions #27

bertfrees opened this issue Nov 18, 2019 · 0 comments
Labels

Comments

@bertfrees
Copy link
Contributor

The volume breaking algorithm has a number of issues that result in a slow convergence and no guarantee that the final solution is actually a good one:

  1. EvenSizeVolumeSplitter does not compare solutions. All it does is step into a certain direction at every iteration, increasing the number of volumes when content does not fit, or decreasing the number of volumes when a good solution has been found, and keep doing this until it ends up at a solution where it was before. Because of this, and because there are two variables that determine the next solution, namely the volume offset and the sheet count, the final solution depends a lot on the (two-dimensional) path taken by the algorithm. Experience shows that small changes in the algorithm (how the two variables are computed) can influence the path and the final solution a lot.

  2. A second problem is that EvenSizeVolumeSplitter doesn't even promote even volumes. The only thing it does is push the average number of sheets per volume down in order to push the size of the last volume up. At the moment it doesn't control the relation between the other volumes (these are kept equal by the distance penalty in VolumeProvider). Moreover, it probably isn't even very effective at getting the size of the last volume in balance with with the other volumes because it doesn't actually check the individual sizes, it's guesswork. In other words, the EvenSizeVolumeSplitter is basically useless.

These are the solutions that come to mind:

  1. If there would be only one variable, namely the volume count (and if we would keep the target size of each volume constant at the maximum value), there would be no "randomness" because the path would be one-dimensional. EvenSizeVolumeSplitter could be eliminated. This would make the algorithm much faster, but it would still not favour solutions with more even volumes.

  2. We could also do a more systematic search to make sure we find the optimum. For instance, whenever a volume is added we could start with the maximum amount of sheets and decrease the number of sheets (step by step, or with a binary search) until an optimum is found. Note that we would actually need to check the results (rather than descreasing the number of sheets until the content does not fit anymore).

  3. The solution that I prefer would be to eliminate EvenSizeVolumeSplitter and do the "even sizes" optimization in VolumeProvider's cost function. The distance penalty should already keep the volume sizes close to some average value, which indirectly leads to even volumes. That is, if the optimization is done well. The problem is that the optimization is currently done in a greedy manner which does not lead to a global optimum, and especially for the last volume it can lead to big a deviation from the average volume size (and this is what EvenSizeVolumeSplitter tries to fix). A proper optimization could be achieved with something like the a* (a-star) algorithm. Of course the big question is: how much more processing will this require? In order to find a global optimum we need to try a lot more solutions. On the other hand, if instead of 50 we only need a handful of iterations (to let the cross-references settle down) that brings the total processing down. To get an idea of the net effect on the processing and the memory consumption, I have done some simulations.

    In the following we assume a limit of 40 sheets per volume, and a worst-case scenario of 50 iterations for the greedy method:

    volume count greedy worst-case brute force a* min max
    5 10000 8.0E59 200 6832
    10 20000 1.3E120 1792 14957
    15 30000 2.1E180 6933 22442
    20 40000 3.3E240 13320 29973
    25 50000 5.4E300 18680 38224
    30 60000 ##Inf 27000 46792
    35 70000 ##Inf 32273 54498
    40 80000 ##Inf 38000 62179
    45 90000 ##Inf 43638 70703
    50 100000 ##Inf 56755 78030

    Another difference with the greedy method is the number of unfinished solutions that need to be kept in memory: with the greedy method there is at most one solution that we need to save, while with the a* method we may need to save as many unfinished solutions as there are sheets in the whole document:

    volume count dijkstra worst-case min max
    5 200 157 196
    10 400 256 375
    15 600 309 535
    20 800 331 540
    25 1000 404 635
    30 1200 461 622
    35 1400 478 713
    40 1600 528 737
    45 1800 530 767
    50 2000 507 761
bertfrees referenced this issue in bertfrees/dotify.formatter.impl Dec 10, 2020
The reason is explained in issue
https://github.com/mtmse/dotify.formatter.impl/issues/5. While an
algorithm that favours solutions with more even volumes would be
useful to have, EvenSizeVolumeSplitter (despite its name) does not
actually accomplish this. Moreover, it has many problems,
deteriorating the volume breaking more than improving it. Until we
find a way to implement it properly, the whole feature is dropped.
bertfrees referenced this issue in bertfrees/dotify.formatter.impl Dec 10, 2020
The reason is explained in issue
https://github.com/mtmse/dotify.formatter.impl/issues/5. While an
algorithm that favours solutions with more even volumes would be
useful to have, EvenSizeVolumeSplitter (despite its name) does not
actually accomplish this. Moreover, it has many problems,
deteriorating the volume breaking more than improving it. Until we
find a way to implement it properly, the whole feature is dropped.
bertfrees referenced this issue in bertfrees/dotify.formatter.impl Jan 13, 2021
The reason is explained in issue
https://github.com/mtmse/dotify.formatter.impl/issues/5. While an
algorithm that favours solutions with more even volumes would be
useful to have, EvenSizeVolumeSplitter (despite its name) does not
actually accomplish this. Moreover, it has many problems,
deteriorating the volume breaking more than improving it. Until we
find a way to implement it properly, the whole feature is dropped.
kalaspuffar referenced this issue in mtmse/dotify.formatter.impl Feb 22, 2021
The reason is explained in issue
https://github.com/mtmse/dotify.formatter.impl/issues/5. While an
algorithm that favours solutions with more even volumes would be
useful to have, EvenSizeVolumeSplitter (despite its name) does not
actually accomplish this. Moreover, it has many problems,
deteriorating the volume breaking more than improving it. Until we
find a way to implement it properly, the whole feature is dropped.
@kalaspuffar kalaspuffar transferred this issue from mtmse/dotify.formatter.impl Aug 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants