Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epub: fix fatal errors while parsing EPUB files #1854

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions lib/ex_doc/formatter/epub.ex
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,23 @@ defmodule ExDoc.Formatter.EPUB do
Path.relative_to_cwd(epub)
end

@doc """
Helper that replaces anchor names and links that could potentially cause problems on EPUB documents

This helper replaces all the `&` with `&` found in anchors like
`Kernel.xhtml#&&/2` or `<h2 id="&&&/2-examples" class="section-heading">...</h2>`

These anchor names cause a fatal error while EPUB readers parse the files,
resulting in truncated content.

For more details, see: https://github.com/elixir-lang/ex_doc/issues/1851
"""
def fix_anchors(content) do
content
|> String.replace(~r{id="&+/\d+[^"]*}, &String.replace(&1, "&", "&amp;"))
|> String.replace(~r{href="[^#"]*#&+/\d+[^"]*}, &String.replace(&1, "&", "&amp;"))
Comment on lines +66 to +67
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I frowned a little with these nested String.replace. So, please let me now if you have any advice on how to improve this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtekmach I though we had already escaped those when generating the links. Maybe this is something (or an option) we can pass when autolinking? The id we can fix by escaping in the document itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtekmach @josevalim I know it has been a while, but is there any action that's expected on my end? How do we move on/resume this discussion?

end

defp normalize_config(config) do
output =
config.output
Expand All @@ -63,7 +80,11 @@ defmodule ExDoc.Formatter.EPUB do
for {_title, extras} <- config.extras do
Enum.each(extras, fn %{id: id, title: title, title_content: title_content, content: content} ->
output = "#{config.output}/OEBPS/#{id}.xhtml"
html = Templates.extra_template(config, title, title_content, content)

html =
config
|> Templates.extra_template(title, title_content, content)
|> fix_anchors()

if File.regular?(output) do
ExDoc.Utils.warn("file #{Path.relative_to_cwd(output)} already exists", [])
Expand Down Expand Up @@ -157,7 +178,11 @@ defmodule ExDoc.Formatter.EPUB do
end

defp generate_module_page(module_node, config) do
content = Templates.module_page(config, module_node)
content =
config
|> Templates.module_page(module_node)
|> fix_anchors()

File.write("#{config.output}/OEBPS/#{module_node.id}.xhtml", content)
end

Expand Down
18 changes: 18 additions & 0 deletions test/ex_doc/formatter/epub_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,9 @@ defmodule ExDoc.Formatter.EPUBTest do
assert content =~
~r{<a href="TypesAndSpecs.Sub.xhtml"><code(\sclass="inline")?>TypesAndSpecs.Sub</code></a>}

assert content =~
~r{<a href="https://hexdocs.pm/elixir/Kernel.html#&amp;&amp;/2"><code(\sclass="inline")?>&amp;&amp;/2</code></a>}

content = File.read!(tmp_dir <> "/epub/OEBPS/nav.xhtml")
assert content =~ ~r{<li><a href="readme.xhtml">README</a></li>}
end
Expand Down Expand Up @@ -248,4 +251,19 @@ defmodule ExDoc.Formatter.EPUBTest do
after
File.rm_rf!("test/tmp/epub_assets")
end

describe "fix_anchors/1" do
test "adapts anchor names to avoid parsing errors from EPUB readers" do
for {source, expected} <- [
{~S|<a href="Kernel.SpecialForms.xhtml#&/1">its documentation</a>|,
~S|<a href="Kernel.SpecialForms.xhtml#&amp;/1">its documentation</a>|},
{~S|<a href="Kernel.xhtml#&&/2"><code class="inline">&amp;&amp;/2</code></a>|,
~S|<a href="Kernel.xhtml#&amp;&amp;/2"><code class="inline">&amp;&amp;/2</code></a>|},
{~S|<h2 id="&&&/2-examples" class="section-heading">title</h2>|,
~S|<h2 id="&amp;&amp;&amp;/2-examples" class="section-heading">title</h2>|}
] do
assert ExDoc.Formatter.EPUB.fix_anchors(source) == expected
end
end
end
end
4 changes: 4 additions & 0 deletions test/fixtures/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@ hello
## more > than

<p><strong>raw content</strong></p>

The following text includes a reference to an anchor that causes problems in EPUB documents.

To remove this anti-pattern, we can replace `&&/2`, `||/2`, and `!/1` by `and/2`, `or/2`, and `not/1` respectively.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this line to demonstrate that we're transforming the links to problematic anchors in EPUB files.

Loading