Skip to content

Commit

Permalink
Merge pull request #93 from maxmind/horgh/email-normalize
Browse files Browse the repository at this point in the history
Add additional email normalization
  • Loading branch information
ugexe authored Feb 5, 2024
2 parents e9d72f1 + 86cc6bf commit 308d8fe
Show file tree
Hide file tree
Showing 3 changed files with 259 additions and 6 deletions.
28 changes: 28 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,33 @@
# Changelog

## v2.5.0

* Equivalent domain names are now normalized when `hash_address` is used.
For example, `googlemail.com` will become `gmail.com`.
* Periods are now removed from `gmail.com` email address local parts when
`hash_address` is used. For example, `[email protected]` will become
`[email protected]`.
* Fastmail alias subdomain email addresses are now normalized when
`hash_address` is used. For example, `[email protected]` will
become `[email protected]`.
* Additional `yahoo.com` email addresses now have aliases removed from
their local part when `hash_address` is used. For example,
`[email protected]` will become `[email protected]` for additional
`yahoo.com` domains.
* Duplicate `.com`s are now removed from email domain names when
`hash_address` is used. For example, `example.com.com` will become
`example.com`.
* Extraneous characters after `.com` are now removed from email domain
names when `hash_address` is used. For example, `example.comfoo` will
become `example.com`.
* Certain `.com` typos are now normalized to `.com` when `hash_address` is
used. For example, `example.cam` will become `example.com`.
* Additional `gmail.com` domain names with leading digits are now
normalized when `hash_address` is used. For example, `100gmail.com` will
become `gmail.com`.
* Additional `gmail.com` typos are now normalized when `hash_address` is
used. For example, `gmali.com` will become `gmail.com`.

## v2.4.0 (2024-01-12)

* Ruby 2.7+ is now required. If you're using Ruby 2.5 or 2.6, please use
Expand Down
225 changes: 221 additions & 4 deletions lib/minfraud/components/email.rb
Original file line number Diff line number Diff line change
Expand Up @@ -90,29 +90,237 @@ def clean_email_address(address)

domain = clean_domain(domain)

if domain == 'yahoo.com'
if YAHOO_DOMAINS.key?(domain)
local_part.sub!(/\A([^-]+)-.*\z/, '\1')
else
local_part.sub!(/\A([^+]+)\+.*\z/, '\1')
end

if domain == 'gmail.com'
local_part.gsub!('.', '')
end

domain_parts = domain.split('.')
if domain_parts.length > 2
possible_domain = domain_parts[1..].join('.')
if FASTMAIL_DOMAINS.key?(possible_domain)
domain = possible_domain
if local_part != ''
local_part = domain_parts[0]
end
end
end

"#{local_part}@#{domain}"
end

TYPO_DOMAINS = {
# gmail.com
'35gmai.com' => 'gmail.com',
'636gmail.com' => 'gmail.com',
'gmai.com' => 'gmail.com',
'gamil.com' => 'gmail.com',
'gmail.comu' => 'gmail.com',
'gmali.com' => 'gmail.com',
'gmial.com' => 'gmail.com',
'gmil.com' => 'gmail.com',
'gmaill.com' => 'gmail.com',
'gmailm.com' => 'gmail.com',
'gmailo.com' => 'gmail.com',
'gmailyhoo.com' => 'gmail.com',
'yahoogmail.com' => 'gmail.com',
# outlook.com
'putlook.com' => 'outlook.com',
}.freeze
private_constant :TYPO_DOMAINS

EQUIVALENT_DOMAINS = {
'googlemail.com' => 'gmail.com',
'pm.me' => 'protonmail.com',
'proton.me' => 'protonmail.com',
'yandex.by' => 'yandex.ru',
'yandex.com' => 'yandex.ru',
'yandex.kz' => 'yandex.ru',
'yandex.ua' => 'yandex.ru',
'ya.ru' => 'yandex.ru',
}.freeze
private_constant :EQUIVALENT_DOMAINS

FASTMAIL_DOMAINS = {
'123mail.org' => true,
'150mail.com' => true,
'150ml.com' => true,
'16mail.com' => true,
'2-mail.com' => true,
'4email.net' => true,
'50mail.com' => true,
'airpost.net' => true,
'allmail.net' => true,
'bestmail.us' => true,
'cluemail.com' => true,
'elitemail.org' => true,
'emailcorner.net' => true,
'emailengine.net' => true,
'emailengine.org' => true,
'emailgroups.net' => true,
'emailplus.org' => true,
'emailuser.net' => true,
'eml.cc' => true,
'f-m.fm' => true,
'fast-email.com' => true,
'fast-mail.org' => true,
'fastem.com' => true,
'fastemail.us' => true,
'fastemailer.com' => true,
'fastest.cc' => true,
'fastimap.com' => true,
'fastmail.cn' => true,
'fastmail.co.uk' => true,
'fastmail.com' => true,
'fastmail.com.au' => true,
'fastmail.de' => true,
'fastmail.es' => true,
'fastmail.fm' => true,
'fastmail.fr' => true,
'fastmail.im' => true,
'fastmail.in' => true,
'fastmail.jp' => true,
'fastmail.mx' => true,
'fastmail.net' => true,
'fastmail.nl' => true,
'fastmail.org' => true,
'fastmail.se' => true,
'fastmail.to' => true,
'fastmail.tw' => true,
'fastmail.uk' => true,
'fastmail.us' => true,
'fastmailbox.net' => true,
'fastmessaging.com' => true,
'fea.st' => true,
'fmail.co.uk' => true,
'fmailbox.com' => true,
'fmgirl.com' => true,
'fmguy.com' => true,
'ftml.net' => true,
'h-mail.us' => true,
'hailmail.net' => true,
'imap-mail.com' => true,
'imap.cc' => true,
'imapmail.org' => true,
'inoutbox.com' => true,
'internet-e-mail.com' => true,
'internet-mail.org' => true,
'internetemails.net' => true,
'internetmailing.net' => true,
'jetemail.net' => true,
'justemail.net' => true,
'letterboxes.org' => true,
'mail-central.com' => true,
'mail-page.com' => true,
'mailandftp.com' => true,
'mailas.com' => true,
'mailbolt.com' => true,
'mailc.net' => true,
'mailcan.com' => true,
'mailforce.net' => true,
'mailftp.com' => true,
'mailhaven.com' => true,
'mailingaddress.org' => true,
'mailite.com' => true,
'mailmight.com' => true,
'mailnew.com' => true,
'mailsent.net' => true,
'mailservice.ms' => true,
'mailup.net' => true,
'mailworks.org' => true,
'ml1.net' => true,
'mm.st' => true,
'myfastmail.com' => true,
'mymacmail.com' => true,
'nospammail.net' => true,
'ownmail.net' => true,
'petml.com' => true,
'postinbox.com' => true,
'postpro.net' => true,
'proinbox.com' => true,
'promessage.com' => true,
'realemail.net' => true,
'reallyfast.biz' => true,
'reallyfast.info' => true,
'rushpost.com' => true,
'sent.as' => true,
'sent.at' => true,
'sent.com' => true,
'speedpost.net' => true,
'speedymail.org' => true,
'ssl-mail.com' => true,
'swift-mail.com' => true,
'the-fastest.net' => true,
'the-quickest.com' => true,
'theinternetemail.com' => true,
'veryfast.biz' => true,
'veryspeedy.net' => true,
'warpmail.net' => true,
'xsmail.com' => true,
'yepmail.net' => true,
'your-mail.com' => true,
}.freeze
private_constant :FASTMAIL_DOMAINS

YAHOO_DOMAINS = {
'y7mail.com' => true,
'yahoo.at' => true,
'yahoo.be' => true,
'yahoo.bg' => true,
'yahoo.ca' => true,
'yahoo.cl' => true,
'yahoo.co.id' => true,
'yahoo.co.il' => true,
'yahoo.co.in' => true,
'yahoo.co.kr' => true,
'yahoo.co.nz' => true,
'yahoo.co.th' => true,
'yahoo.co.uk' => true,
'yahoo.co.za' => true,
'yahoo.com' => true,
'yahoo.com.ar' => true,
'yahoo.com.au' => true,
'yahoo.com.br' => true,
'yahoo.com.co' => true,
'yahoo.com.hk' => true,
'yahoo.com.hr' => true,
'yahoo.com.mx' => true,
'yahoo.com.my' => true,
'yahoo.com.pe' => true,
'yahoo.com.ph' => true,
'yahoo.com.sg' => true,
'yahoo.com.tr' => true,
'yahoo.com.tw' => true,
'yahoo.com.ua' => true,
'yahoo.com.ve' => true,
'yahoo.com.vn' => true,
'yahoo.cz' => true,
'yahoo.de' => true,
'yahoo.dk' => true,
'yahoo.ee' => true,
'yahoo.es' => true,
'yahoo.fi' => true,
'yahoo.fr' => true,
'yahoo.gr' => true,
'yahoo.hu' => true,
'yahoo.ie' => true,
'yahoo.in' => true,
'yahoo.it' => true,
'yahoo.lt' => true,
'yahoo.lv' => true,
'yahoo.nl' => true,
'yahoo.no' => true,
'yahoo.pl' => true,
'yahoo.pt' => true,
'yahoo.ro' => true,
'yahoo.se' => true,
'yahoo.sk' => true,
'ymail.com' => true,
}.freeze
private_constant :YAHOO_DOMAINS

def clean_domain(domain)
domain = domain.strip

Expand All @@ -121,10 +329,19 @@ def clean_domain(domain)

domain = SimpleIDN.to_ascii(domain)

domain.sub!(/(?:\.com){2,}$/, '.com')
domain.sub!(/\.com[^.]+$/, '.com')
domain.sub!(/(?:\.(?:com|c[a-z]{1,2}m|co[ln]|[dsvx]o[mn]|))$/, '.com')
domain.sub!(/^\d+(?:gmail?\.com)$/, 'gmail.com')

if TYPO_DOMAINS.key?(domain)
domain = TYPO_DOMAINS[domain]
end

if EQUIVALENT_DOMAINS.key?(domain)
domain = EQUIVALENT_DOMAINS[domain]
end

domain
end
end
Expand Down
12 changes: 10 additions & 2 deletions spec/components/email_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,22 @@
{ input: ' [email protected]', output: '[email protected]' },
{
input: '[email protected]|abc124472372',
output: '[email protected]|abc124472372',
output: '[email protected]',
},
{ input: '[email protected]', output: '[email protected]' },
{ input: '[email protected]', output: '[email protected]' },
{ input: '[email protected]', output: '[email protected]' },
{ input: '[email protected]', output: '[email protected]' },
{ input: '[email protected]', output: 'gamil.com@gmail.com' },
{ input: '[email protected]', output: 'gamilcom@gmail.com' },
{ input: 'Test+alias@bücher.com', output: '[email protected]' },
{ input: '[email protected]', output: '[email protected]' },
{ input: '[email protected]', output: '[email protected]' },
{ input: '[email protected]', output: '[email protected]' },
{ input: '[email protected]', output: '[email protected]' },
{ input: '[email protected]', output: '[email protected]' },
{ input: '[email protected]', output: '[email protected]' },
{ input: '[email protected]', output: '[email protected]' },
{ input: '[email protected]', output: '[email protected]' },
]

tests.each do |i|
Expand Down

0 comments on commit 308d8fe

Please sign in to comment.