Skip to content

Unexpectedly endless pathfinding from the nth size.  #72

Open
@centuriononon

Description

@centuriononon

Machine: Intel Core i5-8250U (1.60GHz), 16gb RAM
OS: Ubuntu 22.04
Elixir version: Elixir 1.15.0-rc.0
OTP version: 25.0
libgraph version: 0.16.0

I use graph to accumulate pages while I crawl. The crawler is a stream of pages: above page (previous) and sub page (current). Such relationship I have to record to find paths between pages later.

I noticed that from nth node it becomes impossible to compute paths. Literally, when my crawler reached a goal and needed to collect paths from A to B pages, it took all night and there was no calculation result (the calculation was definitely still going on, judging by the load).

I decided to log every thousandth relationship recorded and try to find all paths from the starting page to the current page (I record page to subpage relationship).

The results are as follows (relation a sub page to the above page -> time to find all paths from the first page to the sub page):

  • 1000 -> 99µ
  • 2000 -> 179µ
  • 3000 -> 306µ
  • 4000 -> 302µ
  • 5000 -> 745µ
  • 6000 -> 834µ
  • 7000 -> 3018µ
  • 8000 -> 3059µ
  • 9000 -> 6092µ
  • 10000 -> 66057µ
  • 11000 -> infinity

I used this to record relationship:

graph
|> Graph.add_vertex(abv_ref, abv)
|> Graph.add_vertex(sub_ref, sub)
|> Graph.add_edge(abv_ref, sub_ref)

Where abv and sub are structs, abv_ref and sub_ref are strings.

Tested like this:

IO.puts("Task now: #{x}")
if rem(x, 1000) === 0 do
  IO.puts("Trying to get all paths from the start to current urls...")

  {d, result} = :timer.tc(fn ->
    Graph.get_paths(graph, start_ref, sub_ref)
  end)

  IO.puts("Got result in #{d}µ (#{div(d, 1_000_000)}s): #{inspect(result)}")
end

Where result is a list of strings, start_ref is string as well.


I don't know if my problem is unique or if it's a general property of the library graph. Is there a benchmark that tests such a large amount of relations? Send links if available. If not, my suggestion is to make one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions