Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New near-linear time algorithm to solve the minimum cut problem for undirected graphs #372

Open
jeremy-murphy opened this issue Apr 30, 2024 · 8 comments
Assignees

Comments

@jeremy-murphy
Copy link
Contributor

jeremy-murphy commented Apr 30, 2024

Google Research blog post:
https://research.google/blog/solving-the-minimum-cut-problem-for-undirected-graphs/

The actual paper:
https://epubs.siam.org/doi/10.1137/1.9781611977912.111

This would presumably be a considerable improvement over our existing Stoer Wagner algorithm.

@fringewidth
Copy link

fringewidth commented Jun 28, 2024

Hello, I would love to work on this. I'm not sure how the contributing protocol is here. Do I get assigned the issue or should I open a pull request when I'm ready?

@jeremy-murphy
Copy link
Contributor Author

Hi @fringewidth ! Welcome to Boost.Graph. If you're new to the library, it's worth looking at some recently merged pull requests to get an idea of what is expected, for lack of us having a FAQ or guide.

I'll assign the ticket to you now and if you change your mind about working on it, just let me know. Assigning tickets is not strictly required and it's not strictly followed. Anyone is actually allowed to work on anything.

@fringewidth
Copy link

fringewidth commented Jul 7, 2024

Hey @jeremy-murphy,

I was going through the paper since I left the previous comment, and noticed that the method doesn't always give the exact answer. They transform the graph into a smaller variant by removing certain edges (sparsification) to preserve the minimum cut, similar to Karger's original algorithm. However, each sparsification introduces a small probability that the minimum cut is lost. To quote the authors:

we provide a structural theorem that there exists a sparse clustering, i.e., a partition of the vertices into set such that there are few edges between the sets, that preserves minimum cuts in weighted graphs with 1/ polylog(n) multiplicative error.

And elsewhere,

We address these problems by using a modified version of the techniques from [KT19, HRW20] which runs faster and is better in that the uncrossing of minimum cuts only introduces an error of (1+ϵ) at each level where ϵ = 1/ polylog(n). Thus, a subset of the edges (sampled deterministically as sketched above) that approximates the cuts in the resulting decomposition yields an accurate representation of all minimum cuts.

In other words, it seems the algorithm is deterministic but not exact. I'm unsure if such an algorithm has a place in Boost.Graph, and if it does, I'm not sure how to start writing unit tests for it. Could you provide some guidance?

@jeremy-murphy
Copy link
Contributor Author

Ah, interesting, I didn't realize that it was an approximation algorithm. That's fine, as getting a 99% minimum cut quickly is often perfectly good.
Testing the correctness does sound a little tricky, but fundamentally the test would be that it is getting within the stated error of an exact answer, which I presume our existing algorithm provides?
For unit testing, you could decompose the algorithm into parts that are amenable to unit tests, such as (I'm guessing) the sparsification logic.

@jeremy-murphy
Copy link
Contributor Author

I just realized that the sparsification step is probabilistic, so unit tests will only be able to test that the result has certain properties, etc.

@jeremy-murphy
Copy link
Contributor Author

Ok, now things are coming back to me about testing algorithms that use randomization.
Basically, the RNG will be a parameter to the algorithm for two reasons: i) these things are always improving, and ii) we can pass a non-random number generator for testing.
So tests will be deterministic.

@fringewidth
Copy link

That works! I think I can start working on it. We don't have to worry about non-determinism, as the algorithm is deterministic. (Ironically, this poses a different issue, since we can't just run the algorithm multiple times with the same input to reduce error probability.) Regardless, I believe I can gain a better understanding by digging deeper into the method.

There's a lot of reading I need to do before writing code, so I'll keep you updated with any new insights on this forum. Continuous feedback will maximize the chances of this PR being merged (I hope).

@jeremy-murphy
Copy link
Contributor Author

You're absolutely right -- teamwork is the key. Don't go up the mountain for forty nights and come back with an algorithm on stone tablets. :) I recommend some amount of test-driven development: getting some failing tests early will help elucidate the top-down design and set you on the right path.

For example, this sparsification logic sounds like a legitimate algorithm in its own right that could be reused by other algorithms.

We can still fuzz the algorithm in its natural, probabilistic mode, but that wouldn't form part of the automated unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants