-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZGrab performance issue with http? #452
Comments
First, this prevents a DNS lookup from happening when we encounter a redirect, *even if we don't intend to follow it*. This likely addresses some part of zmap#452 Second, if we aren't following redirects, don't have the scan fail in an 'application-error'. We are succeeding in what we intended to do, which is to scan without following redirects
I recently noticed that we do DNS lookups whenever we get a redirect, irrespective of if we are going to follow it, which is probably at least part of the problem. |
I've tracked this to this line: |
Indeed, that's the residual bit that remains, even if you do provide all of the initial addresses. And I'm not sure of any way around it Though I haven't actually measured it, I would assume it's a minor blip when compared with most target lists. Or maybe not, I haven't properly thought about it |
One question- do you know off-hand what happens for a case like http://site.com redirect -> https://site.com? Specifically, wondering if it knows that it can use that same address or it does another lookup I imagine that is a good portion of the redirects for most target lists |
One more note: if you do tackle this, keep in mind the potential pitfall mentioned in #307 which is really a general pattern that has manifested in different ways/places- TLS handshake failures, redirect exceeded, etc. I think your recent PR may have fixed this sort of behavior, just wanted to mention it I'm not sure I ever sent a PR for the TLS handshake case 😞 |
My use case is a bit simpler, as a lot of my scans are just IPs and not domains
I would expect it to just be cached locally, rather than zgrab2 trying to re-resolve everything. http://site.com/ -> http://www.site.com/ would trigger another lookup though. However, I don't think the golang code will cache those internally, so maybe a small improvement if we cached those for the http -> https redirect we are expecting when we start at http, avoiding a roundtrip to the system resolver even if the value is cached there.
Yeah. I am not sure how to solve that one. Iterative process of doing one level at a time is a lot of wrapping/stitching together. I guess you could try to embed something like ZDNS inside (and maybe this is why there has been work to make it more library work) to do faster lookups than what we are doing today. I wonder if part of it is also redirects like domain.com -> www.domain.com, where that doesn't take advantage of multiplexing. I wonder if you could be a little 'clever' and try to get the roundtripper to maintain a connection per IP address, rather than per a hostname. |
I see. So the initial request for our cases is similar (profile-wise) because no DNS lookup required, but your case won’t necessarily benefit at all from caching DNS from that first request (since there isn’t one) Actually, let me correct myself. I just realized that my target list could actually prime a cache. Since I provide, for example:
… it should be possible to first reference the name to IP mappings in the target list, before making a DNS request. Wouldn’t help you, though. I wonder how much of an impact this would have for me. As a mapping with writes occurring only at startup, it wouldn’t need any locking. The lookups should be cheap. Just a matter of how many cases would actually get a hit, which would vary for each user’s target set
I would like to confirm that if I get time
Interesting idea. Could be a quick win
In this case, are you describing making the same number of DNS requests, but with more optimized code? If so, I would be skeptical of that, mainly because the delay is probably i/o bound. For my dataset, I’m aware of a large number of domains that have very high latency DNS servers. I may be misunderstanding what you’re describing, though
That could work, as long as it’s HTTP and not HTTPS. It will break on some (possibly many) of the TLS >= 1.2 cases, because of SNI, which will require a new TLS negotiation with the appropriate/new name Have you considered globally caching DNS across all of the senders? Maybe that’s what you were describing with your idea about zdns “embedding”? A global cache was one if my first thoughts, but I immediately started arguing points against myself First, it may help me, but it wouldn’t help all target sets. Only those that have a lot of repeated names over the duration of the session, obviously In practice, my data tends to have a lot of those, because all of my targets are ultimately assets under the same organization/entity. So I see a lot of cases like:
These “feel” very common for my target list but I have to look at the data to gauge the magnitude, maybe by adding simple logging of all DNS requests and doing a uniq -c to see how many repeated. I could also do a wire capture, I guess Caching globally would also have sharply diminishing returns for those whose target sets don’t consist primarily of targets with the same organizational “owner”, though. Which may be most users Caching globally might also have potential locking issues impacting performance, if the requests were performed by each sender, but there must be some way to avoid a lock for every new lookup, or reduce the contention. It wouldn’t be an issue if the DNS was a dedicated thread, though Just thinking out loud here, I haven’t really given any of this proper thought. And it’s early, I probably misunderstood a few of your thoughts- hopefully not all of them 😊 Very happy to have you thinking about this, even though our use patterns may differ quite a bit. I’ve contributed a lot of bug fixes and minor features, but my golang experience and knowledge is sorely lacking. I have to defer to others on any non-trivial implementations, or anything too deep into the core, which is frustrating. I’m used to being more “hands-on” helpful Anyway, thanks for the discussion, let me know if there’s anything I can do to help in the way of profiling or more primitive data collection |
So, I was doing unrelated work with DNS, and I noticed that the TTLs for a lot of sites are really low
|
Not a bug per se, but was investigating a slow zgrab scan and noticed that it maxes out at sending 6 MB/s. While there might be a lot of CPU overhead, this just seems egregiously slow and surely there's some performance gains we can achieve. This issue is to track that.
CLI command -
cat ~/100k-domains.txt | ./zgrab2 http --max-redirects=3
This was run on a VM with a 1 Gb/s + link and plenty of cores(22)/RAM (56 GB)
The text was updated successfully, but these errors were encountered: