Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpectedly endless pathfinding from the nth size. #72

Open
centuriononon opened this issue Nov 15, 2023 · 2 comments
Open

Unexpectedly endless pathfinding from the nth size. #72

centuriononon opened this issue Nov 15, 2023 · 2 comments

Comments

@centuriononon
Copy link

centuriononon commented Nov 15, 2023

Machine: Intel Core i5-8250U (1.60GHz), 16gb RAM
OS: Ubuntu 22.04
Elixir version: Elixir 1.15.0-rc.0
OTP version: 25.0
libgraph version: 0.16.0

I use graph to accumulate pages while I crawl. The crawler is a stream of pages: above page (previous) and sub page (current). Such relationship I have to record to find paths between pages later.

I noticed that from nth node it becomes impossible to compute paths. Literally, when my crawler reached a goal and needed to collect paths from A to B pages, it took all night and there was no calculation result (the calculation was definitely still going on, judging by the load).

I decided to log every thousandth relationship recorded and try to find all paths from the starting page to the current page (I record page to subpage relationship).

The results are as follows (relation a sub page to the above page -> time to find all paths from the first page to the sub page):

  • 1000 -> 99µ
  • 2000 -> 179µ
  • 3000 -> 306µ
  • 4000 -> 302µ
  • 5000 -> 745µ
  • 6000 -> 834µ
  • 7000 -> 3018µ
  • 8000 -> 3059µ
  • 9000 -> 6092µ
  • 10000 -> 66057µ
  • 11000 -> infinity

I used this to record relationship:

graph
|> Graph.add_vertex(abv_ref, abv)
|> Graph.add_vertex(sub_ref, sub)
|> Graph.add_edge(abv_ref, sub_ref)

Where abv and sub are structs, abv_ref and sub_ref are strings.

Tested like this:

IO.puts("Task now: #{x}")
if rem(x, 1000) === 0 do
  IO.puts("Trying to get all paths from the start to current urls...")

  {d, result} = :timer.tc(fn ->
    Graph.get_paths(graph, start_ref, sub_ref)
  end)

  IO.puts("Got result in #{d}µ (#{div(d, 1_000_000)}s): #{inspect(result)}")
end

Where result is a list of strings, start_ref is string as well.


I don't know if my problem is unique or if it's a general property of the library graph. Is there a benchmark that tests such a large amount of relations? Send links if available. If not, my suggestion is to make one.

@centuriononon centuriononon changed the title Infinity complexity Unexpectedly endless pathfinding from the nth size. Nov 15, 2023
@bitwalker
Copy link
Owner

The get_paths function performs a depth-first traversal of the entire graph in order to discover all the paths, so it is expected that it would get increasingly expensive as the size of the graph increases, but 11k nodes should not be a problem. I've certainly worked with larger graphs with this library.

That said, it isn't clear to me what the structure of your graph is: How dense is the graph? What is the depth vs breadth? How long are the paths you expect to observe on average? All of this will help discover where the bug here is.

@centuriononon
Copy link
Author

The get_paths function performs a depth-first traversal of the entire graph in order to discover all the paths, so it is expected that it would get increasingly expensive as the size of the graph increases, but 11k nodes should not be a problem. I've certainly worked with larger graphs with this library.

Okay, I see.

That said, it isn't clear to me what the structure of your graph is: How dense is the graph? What is the depth vs breadth? How long are the paths you expect to observe on average? All of this will help discover where the bug here is.

I am going to reproduce the same conditions as I had in that case. It was about multiple layers in depth and large amount of nodes in breadth (amount 200-1000 per node). So, I would say we can try to create similar structure manually with control within a test.

It may take a while, but once I get started and there are little results, I'll let you know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants