Cache mx_server_is_in? to speed up disposable_mx_server? #1

jurajjuricic-st · 2025-02-20T17:49:59Z

PR to upstream: micke#283

lib/valid_email2/dns.rb

sensorsasha · 2025-02-21T16:26:40Z

lib/valid_email2/dns.rb

+        end
+      end
+
+      MX_SERVERS_CACHE[cache_key] = result unless cache_key.nil?


This is not thread-safe, which might lead to multiple threads trying to calculate value for the same key. Probably not a big deal though.

If the inputs are the same (i.e., no edge case in DNS record change happening during execution), the result will be the same so it doesn't matter if two threads write at the same time... I think 🤔

sensorsasha · 2025-02-21T16:28:10Z

lib/valid_email2/dns.rb

+      mx_servers_str = mx_servers.map(&:exchange).map(&:to_s).sort.join
+      return domain if mx_servers_str == ""
+
+      "#{domain_list.object_id}_#{domain_list.length}_#{mx_servers_str.downcase}"


Can we use hashing function here instead? I might be missing the idea of this cache key function tho.

Zlib.crc32("#{domain_list.length}_#{mx_servers_str.downcase}")

In the case of checking against disposable MX servers, domain_list is a set of 160k disposable domains.
It could be something else tho, so I thought we'd capture the object id to avoid misfiring cache between similar requests, while still not having to iterate over the whole set.

I also thought about some kind of consistent sampling of the domain list, but maybe we don't need to go there?

I think the current approach seems reasonable and we can see what the maintainers think when you open a PR with the open source project.

Btw, object_id is unique only while object exist. So whenever object is garbage collected, this id might be assigned to a new object. It doesn't refer to the content of the object.

Yeah, I'm aware of that and... that's the compromise. In my book, as long as the cache is either valid or invalidated, it's fine.
In this case, if object gets garbage collected then the cache entry implicitly gets invalidated and the service will recalculate the query next time it arrives (with a new domain_list and its object_id). Yes, it might slow things down a bit every now and then, but it's good enough for our and probably most other use cases (bulk import and similar).

…ookup for the same domain

Co-authored-by: Sasha Yelkhovenka <[email protected]>

jurajjuricic-st force-pushed the cache-mx-servers-in branch from c71c7a4 to bda32cc Compare February 20, 2025 17:53

IvanTakarlikov-st approved these changes Feb 21, 2025

View reviewed changes

IvanTakarlikov-st requested review from sensorsasha, ST-eugenekrivdyuk, DenisSychyov and kevinrobell-st February 21, 2025 15:45

jurajjuricic-st marked this pull request as ready for review February 21, 2025 15:45

sensorsasha reviewed Feb 21, 2025

View reviewed changes

jurajjuricic-st and others added 6 commits February 24, 2025 12:19

Add benchmark spec for disposable_mx_server?

0c8aa89

Add unit tests for disposable_mx_server?

8f26a01

Cache results of mx_server_is_in? to speed up disposable_mx_server? l…

652740f

…ookup for the same domain

Refactor to put all cache into Dns class

42984d0

Cleanup

1374975

Apply suggestions from code review

74017a0

Co-authored-by: Sasha Yelkhovenka <[email protected]>

jurajjuricic-st force-pushed the cache-mx-servers-in branch from 88cbdd7 to 74017a0 Compare February 24, 2025 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache mx_server_is_in? to speed up disposable_mx_server? #1

Cache mx_server_is_in? to speed up disposable_mx_server? #1

jurajjuricic-st commented Feb 20, 2025 •

edited

Loading

sensorsasha Feb 21, 2025

jurajjuricic-st Feb 21, 2025

sensorsasha Feb 21, 2025

jurajjuricic-st Feb 21, 2025

kevinrobell-st Feb 21, 2025

sensorsasha Feb 24, 2025

jurajjuricic-st Feb 25, 2025

Cache mx_server_is_in? to speed up disposable_mx_server? #1

Are you sure you want to change the base?

Cache mx_server_is_in? to speed up disposable_mx_server? #1

Conversation

jurajjuricic-st commented Feb 20, 2025 • edited Loading

sensorsasha Feb 21, 2025

Choose a reason for hiding this comment

jurajjuricic-st Feb 21, 2025

Choose a reason for hiding this comment

sensorsasha Feb 21, 2025

Choose a reason for hiding this comment

jurajjuricic-st Feb 21, 2025

Choose a reason for hiding this comment

kevinrobell-st Feb 21, 2025

Choose a reason for hiding this comment

sensorsasha Feb 24, 2025

Choose a reason for hiding this comment

jurajjuricic-st Feb 25, 2025

Choose a reason for hiding this comment

jurajjuricic-st commented Feb 20, 2025 •

edited

Loading