Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrectly waiting for the entire timeout duration #595

Open
thenbe opened this issue Sep 17, 2023 · 1 comment
Open

Incorrectly waiting for the entire timeout duration #595

thenbe opened this issue Sep 17, 2023 · 1 comment
Labels
Investigation Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@thenbe
Copy link

thenbe commented Sep 17, 2023

Repro

  1. Serve a simple html page:
mkdir repro-katana-timeout
echo '<html> <body> <a href="http://localhost:3000?header">Header</a> <a href="http://localhost:3000?footer">Footer</a> </body> </html>' > repro-katana-timeout/index.html
npx serve -p 3000 repro-katana-timeout
  1. Crawl it with katana
# open new terminal
katana -u http://localhost:3000 -duc -timeout 1
katana -u http://localhost:3000 -duc

Expected

Both commands should complete within similar time.

Actual

If we specify a -timeout 1 flag, the command takes 2 seconds. Otherwise it takes 11(!) seconds. The requests, responses, and output of both commands are identical. The duration difference between the two commands is ~10 seconds, which happens to be the default timeout value. So I'm guessing that katana is waiting this full timeout duration when it shouldn't be, since there is nothing to wait for?

More info

Logs

$ katana -u http://localhost:3000 -duc --timeout 1 --verbose --debug

   __        __
  / /_____ _/ /____ ____  ___ _
 /  '_/ _  / __/ _  / _ \/ _  /
/_/\_\\_,_/\__/\_,_/_//_/\_,_/							

		projectdiscovery.io

[INF] Started standard crawling for => http://localhost:3000
[GET] http://localhost:3000
[a] [GET] http://localhost:3000?header
[a] [GET] http://localhost:3000?footer
$ katana -u http://localhost:3000 -duc --verbose --debug

   __        __
  / /_____ _/ /____ ____  ___ _
 /  '_/ _  / __/ _  / _ \/ _  /
/_/\_\\_,_/\__/\_,_/_//_/\_,_/							

		projectdiscovery.io

[INF] Started standard crawling for => http://localhost:3000
[GET] http://localhost:3000
[a] [GET] http://localhost:3000?header
[a] [GET] http://localhost:3000?footer

Time comparison

$ hyperfine --max-runs 10 --warmup 2 'katana -u http://localhost:3000 -duc -timeout 1' 'katana -u http://localhost:3000 -duc'
Benchmark 1: katana -u http://localhost:3000 -duc -timeout 1
  Time (mean ± σ):      2.120 s ±  0.016 s    [User: 0.086 s, System: 0.037 s]
  Range (min … max):    2.097 s …  2.142 s    10 runs

Benchmark 2: katana -u http://localhost:3000 -duc
  Time (mean ± σ):     11.127 s ±  0.017 s    [User: 0.102 s, System: 0.043 s]
  Range (min … max):   11.106 s … 11.155 s    10 runs

Summary
  katana -u http://localhost:3000 -duc -timeout 1 ran
    5.25 ± 0.04 times faster than katana -u http://localhost:3000 -duc

Versions

katana: v1.0.4

@thenbe thenbe added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Sep 17, 2023
@ehsandeep
Copy link
Member

@thenbe Thanks for reporting this, as per JSONL output, for some reason, katana is unable to get a response for two links in the response, which also explains additional time, as it's actually getting timed out.

{
  "timestamp": "2023-09-17T22:37:10.05036+05:30",
  "request": {
    "method": "GET",
    "endpoint": "http://localhost:3000",
    "raw": "GET / HTTP/1.1\r\nHost: localhost:3000\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36\r\nAccept-Encoding: gzip\r\n\r\n"
  },
  "response": {
    "status_code": 200,
    "headers": {
      "etag": "\"0b18c368f1577ce848a1fdcfc8834b858b4ad0ce\"",
      "content_disposition": "inline; filename=\"index.html\"",
      "content_type": "text/html; charset=utf-8",
      "connection": "keep-alive",
      "vary": "Accept-Encoding",
      "accept_ranges": "bytes",
      "date": "Sun, 17 Sep 2023 17:07:10 GMT",
      "keep_alive": "timeout=5",
      "content_length": "132"
    },
    "body": "<html> <body> <a href=\"http://localhost:3000/?header\">Header</a> <a href=\"http://localhost:3000/?footer\">Footer</a> </body> </html>\n",
    "raw": "HTTP/1.1 200 OK\r\nContent-Length: 132\r\nAccept-Ranges: bytes\r\nConnection: keep-alive\r\nContent-Disposition: inline; filename=\"index.html\"\r\nContent-Type: text/html; charset=utf-8\r\nDate: Sun, 17 Sep 2023 17:07:10 GMT\r\nEtag: \"0b18c368f1577ce848a1fdcfc8834b858b4ad0ce\"\r\nKeep-Alive: timeout=5\r\nVary: Accept-Encoding\r\n\r\n<html> <body> <a href=\"http://localhost:3000/?header\">Header</a> <a href=\"http://localhost:3000/?footer\">Footer</a> </body> </html>\n"
  }
}
{
  "timestamp": "2023-09-17T22:37:11.047186+05:30",
  "request": {
    "method": "GET",
    "endpoint": "http://localhost:3000/?footer",
    "tag": "a",
    "attribute": "href",
    "source": "http://localhost:3000",
    "raw": "GET /?footer HTTP/1.1\r\nHost: localhost:3000\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36\r\nAccept-Encoding: gzip\r\n\r\n"
  },
  "response": {
    "headers": {}
  }
}
{
  "timestamp": "2023-09-17T22:37:11.047368+05:30",
  "request": {
    "method": "GET",
    "endpoint": "http://localhost:3000/?header",
    "tag": "a",
    "attribute": "href",
    "source": "http://localhost:3000",
    "raw": "GET /?header HTTP/1.1\r\nHost: localhost:3000\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36\r\nAccept-Encoding: gzip\r\n\r\n"
  },
  "response": {
    "headers": {}
  }
}

We can investigate further what's the actual issue with accessing those endpoints in the response, which is same as input and should be accessible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Investigation Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

2 participants