-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache mx_server_is_in? to speed up disposable_mx_server? #1
base: main
Are you sure you want to change the base?
Conversation
c71c7a4
to
bda32cc
Compare
end | ||
end | ||
|
||
MX_SERVERS_CACHE[cache_key] = result unless cache_key.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not thread-safe, which might lead to multiple threads trying to calculate value for the same key. Probably not a big deal though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the inputs are the same (i.e., no edge case in DNS record change happening during execution), the result will be the same so it doesn't matter if two threads write at the same time... I think 🤔
mx_servers_str = mx_servers.map(&:exchange).map(&:to_s).sort.join | ||
return domain if mx_servers_str == "" | ||
|
||
"#{domain_list.object_id}_#{domain_list.length}_#{mx_servers_str.downcase}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use hashing function here instead? I might be missing the idea of this cache key function tho.
Zlib.crc32("#{domain_list.length}_#{mx_servers_str.downcase}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of checking against disposable MX servers, domain_list
is a set of 160k disposable domains.
It could be something else tho, so I thought we'd capture the object id to avoid misfiring cache between similar requests, while still not having to iterate over the whole set.
I also thought about some kind of consistent sampling of the domain list, but maybe we don't need to go there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current approach seems reasonable and we can see what the maintainers think when you open a PR with the open source project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, object_id
is unique only while object exist. So whenever object is garbage collected, this id might be assigned to a new object. It doesn't refer to the content of the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm aware of that and... that's the compromise. In my book, as long as the cache is either valid or invalidated, it's fine.
In this case, if object gets garbage collected then the cache entry implicitly gets invalidated and the service will recalculate the query next time it arrives (with a new domain_list
and its object_id
). Yes, it might slow things down a bit every now and then, but it's good enough for our and probably most other use cases (bulk import and similar).
…ookup for the same domain
Co-authored-by: Sasha Yelkhovenka <[email protected]>
88cbdd7
to
74017a0
Compare
PR to upstream: micke#283