Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML image handling #27

Closed
eneroth opened this issue Aug 15, 2024 · 1 comment
Closed

HTML image handling #27

eneroth opened this issue Aug 15, 2024 · 1 comment

Comments

@eneroth
Copy link

eneroth commented Aug 15, 2024

Possibly related to #7, or maybe not.

I'm trying to deal with the fact that images are sometimes inlined as <img … HTML tags rather than e.g. ![some alt text](some.png "Some title").

So I would like to attempt conversion of <img …'s encountered into proper AST.

The first thing I'm coming up against is getting HTML emitted at all. When I run the following example, the two imgs are lost along the way.

(def image-test
  "### Images
   <img src=\"https://www.example.com/image1.jpg\" alt=\"High-Efficiency Antenna\">
   <img src=\"https://www.example.com/image2.jpg\" alt=\"5G Network Coverage Map\">")

(md.parser/parse md.parser/empty-doc
  (md/tokenize image-test))

;; =>
{:footnotes []
 :type      :doc
 :title     "Images"
 :content   [{:type          :heading
              :heading-level 3
              :attrs         {:id "images"}
              :content       [{:type :text
                               :text "Images"}]}]
 :toc       {:type     :toc
             :children [{:type     :toc
                         :children [{:type     :toc
                                     :children [{:type          :toc
                                                 :content       [{:type :text :text "Images"}]
                                                 :heading-level 3
                                                 :attrs         {:id "images"}
                                                 :path          [:content 0]}]}]}]}}

If I look at the tokenization, they are registered as an HTML block:

{:children nil
 :block    true
 :meta     nil
 :content  "   <img src=\"https://www.example.com/image1.jpg\" alt=\"High-Efficiency Antenna\">
            <img src=\"https://www.example.com/image2.jpg\" alt=\"5G Network Coverage Map\">"
 :type     "html_block"
 :markup   ""
 :level    0
 :hidden   false
 :info     ""
 :attrs    nil
 :tag      ""
 :nesting  0
 :map      [1 3]}

So, it looks like it maybe gets lost along the way somehow. Any advice would be great!

But my scope for now remains to convert HTML images -> AST. If there's a better way to do this than the one I'm heading down, I'd greatly appreciate being steered in another direction as well.

@zampino
Copy link
Collaborator

zampino commented Aug 15, 2024

Yes, your problem is exactly #7. Please try out this #7 (comment) and reopen this issue if that doesn't work (or add comments to #7).

@zampino zampino closed this as completed Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants