Skip to content

Commit

Permalink
Allow parsing with HTML5
Browse files Browse the repository at this point in the history
Nokogiri is on the path to parsing with HTML5 by default:
sparklemotion/nokogiri#2331

But, there are some things they still need to do. For those of us who
want to opt-in to HTML5 parsing, I've added an option for it. This will
prevent the gem from messing with the structure of the html
(specifically, prematurely closing <a> tags that wrapped table elements.
  • Loading branch information
jesseduffield committed Jul 15, 2024
1 parent 83ba8a1 commit 065bd0e
Show file tree
Hide file tree
Showing 6 changed files with 49 additions and 2 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,16 @@ Get stats for a campaign
AhoyEmail.stats("my-campaign")
```

## HTML5 Parsing

By default, this gem uses Nokogiri's HTML 4 parser to rewrite href attributes for the `utm_params` and `track_clicks` features. This can cause link tags to be prematurely closed if they were wrapping table elements, because doing so violates the HTML 4 spec.
To use HTML5 parsing instead, set this in an initializer:
```ruby
AhoyEmail.html5 = true
```
## History
View the [changelog](https://github.com/ankane/ahoy_email/blob/master/CHANGELOG.md)
Expand Down
4 changes: 3 additions & 1 deletion lib/ahoy_email.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
require_relative "ahoy_email/engine" if defined?(Rails)

module AhoyEmail
mattr_accessor :secret_token, :default_options, :subscribers, :invalid_redirect_url, :track_method, :api, :preserve_callbacks, :save_token
mattr_accessor :secret_token, :default_options, :subscribers, :invalid_redirect_url, :track_method, :api, :preserve_callbacks, :save_token, :html5
mattr_writer :message_model

self.api = false
Expand Down Expand Up @@ -79,6 +79,8 @@ module AhoyEmail

self.save_token = false

self.html5 = false

self.subscribers = []

self.preserve_callbacks = []
Expand Down
10 changes: 9 additions & 1 deletion lib/ahoy_email/processor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def track_links
if html_part?
part = message.html_part || message

doc = Nokogiri::HTML::Document.parse(part.body.raw_source)
doc = parse_message(part.body.raw_source)
doc.css("a[href]").each do |link|
uri = parse_uri(link["href"])
next unless trackable?(uri)
Expand Down Expand Up @@ -92,6 +92,14 @@ def track_links
end
end

def parse_message(raw_source)
if AhoyEmail.html5
Nokogiri::HTML5.parse(raw_source)
else
Nokogiri::HTML::Document.parse(raw_source)
end
end

def html_part?
(message.html_part || message).content_type =~ /html/
end
Expand Down
4 changes: 4 additions & 0 deletions test/internal/app/mailers/utm_params_mailer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ def nested
mail_html('<a href="https://example.org"><img src="image.png"></a>')
end

def nested_table
mail_html('<a href="https://example.org"><table></table></a>')
end

def multiple
mail_html('<a href="https://example.org">Test</a>')
end
Expand Down
6 changes: 6 additions & 0 deletions test/test_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,12 @@ def with_save_token
yield
end
end

def with_html5
AhoyEmail.stub(:html5, true) do
yield
end
end
end

class ActionDispatch::IntegrationTest
Expand Down
17 changes: 17 additions & 0 deletions test/utm_params_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,23 @@ def test_nested
assert_body '<img src="image.png"></a>', message
end

# When nokogiri parses with html5, it allows an <a> tag to wrap a <table> tag
def test_nested_table_html5
with_html5 do
message = UtmParamsMailer.nested_table.deliver_now
assert_body "utm_medium=email", message
assert_body '<table></table></a>', message
end
end

# When nokogiri parses with html4, it disallows an <a> tag to wrap a <table> tag,
# and closes the <a> tag before the <table> tag
def test_nested_table_html4
message = UtmParamsMailer.nested_table.deliver_now
assert_body "utm_medium=email", message
assert_body '</a><table></table>', message
end

def test_multiple
message = UtmParamsMailer.multiple.deliver_now
assert_body "utm_campaign=second", message
Expand Down

0 comments on commit 065bd0e

Please sign in to comment.