Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] atom.ui.markdown Refactor #893

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

confused-Techie
Copy link
Member

@confused-Techie confused-Techie commented Jan 26, 2024

After some time has passed with our new Markdown parsing internal API, there's been one major issue and a few smaller ones, which have prompted me to give this some love before it becomes harder to change (due to wide usage)

  • The Syntax Highlighting was unclear, and difficult to work with. Previously to syntax highlight, you had to pass objects around three different functions, with special options to ensure it worked, and was otherwise unclear and painful to work with.
  • The arguments that can be passed didn't make much sense to end users, while they made sense during implementation defaultEmoji doesn't make sense when there's no way to implement a custom emoji handler.
  • The code wasn't as clean as I'd like it to be, especially if we consider we might expand more on it later
  • There is no way to avoid having to do the entire MarkdownIT setup over and over again.

This PR aims to solve all of the above issues.

Converting our previous few functions available for use, to the following:

  • render()
  • buildRenderer()

Now these should be obvious and clear on what they do. The render function obviously renders Markdown content, meanwhile the buildRenderer function is only used to build a renderer instance.

This time around I've put an emphasis on optionality, where ideally nearly all options can be left out and things will still work perfectly fine.

buildRenderer

A completely optional function. This will return a MarkdownIT instance that a package can then store itself, this way it can avoid having to rebuild the markdown instance on every call, if that setup never changes. Such as the notifications package.

render

This function will obviously be in charge of rendering content. It takes three parameters:

  • content: This is the text you want to parse
  • givenOpts: The arguments you want to pass
  • mdInstance: A MarkdownIT instance to use for rendering.

But of course, the mdInstance is optional, if one isn't provided your full givenOpts will be passed to the buildRenderer function to build your renderer, then render it.

As you may have guessed, syntax highlighting is much simpler. To enable it all that you need to do is pass { highlight: true } as one of your arguments, and it will be enabled. Although, it likely won't detect the language properly for syntax highlighting (yet), as the highlight: true will enable syntax highlighting, the Markdown API assumes you are providing a function that can resolve a Fenced Code Block Scope to a Pulsar syntax scope, such as js => source.js, so you could also do this { highlight: scopeResolvingFunc } and syntax highlighting will now work, with your custom resolver.

Of course I would love to include a default resolver into this API, but considering it's a bit beefy (look at the one I made for markdown-preview I'm not sure it should be included unless it's majorly simplified, or otherwise markdown-preview relies on this API in full.)

So ideally this can resolve the above issues mentioned, and we can make sure this API is a bit more long lasting.

Parameters

On another note, you may have noticed some option names have changed, such as syntaxScopeNameFunc => highlight these changes ensure to fallback to the old names in every single case, so that if everything works correctly, a consumer wouldn't have to update anything to continue using this API after this PR is merged (unless of course they had previously used the Syntax Highlighting)

Options

Last thing of important note, I've changed some defaults to the options. Notably I've changed the following options: (Where the first value is what it was previously, and the second value is what it is now)

  • transformImageLinks: true => false
  • transformNonFqdnLinks: true => false

The reason I've changed these is that manipulation of the content like this feels like something that should really be opt in behavior when it comes to users local files, as it may not always be needed, especially for a community package implementing this API, I'd rather than changes be obvious.

But you'll notice I've left transformAtomLinks as the default true since that feels more helpful to ensure we are always transforming links that will only link to the GitHub blog about Atom's sunset, this transformation only aims to preserve existing functionality.

@confused-Techie confused-Techie marked this pull request as ready for review January 30, 2024 15:46
@confused-Techie
Copy link
Member Author

This one is now ready for review!

* a Pulsar language grammar.
* @param {object} mdInstance - An optional instance of MarkdownIT. Retreived from
* `atom.ui.markdown.buildRenderer()`.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc doesn't explain the return type, can we add it?

Comment on lines +228 to +231
return async () => {
await Promise.all(promises);
return domHTMLFragment;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really happy with this code - first, because it returns a promise in this case, and not-a-promise on the else branch; but mainly because rendering markdown is now async.

A suggestion - return domHTMLFragment instead, in all cases, and if we're doing highlighting just put a property on domHTMLFragment - like, domHTMLFragment.highlightPromise - so users can "await" for it if they feel it's worth waiting before rendering the code.

} else {
// We aren't preforming any syntax highlighting, so lets return our rendered
// text.
return rendered;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return code is also weird, because it's a text in this case, but a HTML Fragment on the other.

Maybe if we want, we can add a different API to render pure text, but then we can't use the default syntax highlighting, so I don't think it's worth it. Plus, it should be easy to just return pure HTML from the fragment (I guess? :D)

src/ui.js Outdated Show resolved Hide resolved
src/ui.js Outdated Show resolved Hide resolved
src/ui.js Outdated Show resolved Hide resolved
Copy link
Contributor

@mauricioszabo mauricioszabo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes, but mostly the ones addressed to the return type of render I feel quite strongly about it.

On old Atom days, some APIs had this situation where they would return a type in some situation, and a different type in another; I remember it wasn't fun to discover these kinds of things when writing a package, so that's why I feel we should not go to this route :)

confused-Techie and others added 3 commits February 15, 2024 08:11
Co-authored-by: Maurício Szabo <[email protected]>
Co-authored-by: Maurício Szabo <[email protected]>
Co-authored-by: Maurício Szabo <[email protected]>
@confused-Techie
Copy link
Member Author

@mauricioszabo and I were discussing the issues of this PR on Discord, but to open up the discussion to more people I'll post it here in hopes to find a resolution everyone's happy with.

Essentially, Mauricio isn't a fan of the changing return value here, which I can totally understand.
Since as written the return value of this function is html text most of the time, but if syntax highlighting is enabled, then we get a promise returned, which once resolved is an HTML Fragment.

Mauricio last suggested that we always return an HTML fragment, with an attribute that is a promise if syntax highlighting is applied, which could then be awaited on.

Although personally, I feel that returning a fragment all the time is problematic, since the vast majority of use cases for this API utilize the html text already, and I'd hate to rework more code in other packages here.

Myself, I'd vote to returning a document with the following values:

  • text: The raw HTML string
  • html: A document fragment
  • syntax: A promise that could return the html element once resolved

This way all three possibilities are always available, and in the event someone didn't enable syntax highlighting then the syntax promise would just never resolve.

But please, add any other opinions here if available

@mauricioszabo
Copy link
Contributor

@confused-Techie I think it's a good compromise - I don't believe returning the text and the fragment would be too costly :)

One change I suggest is that if syntax highlighting is not enabled, the promise is just resolved immediately - so that one doesn't need to check for "if syntaxEnabled, wait for promise" or something like that.

@confused-Techie
Copy link
Member Author

@mauricioszabo Great idea on resolving the promise right away, should the promise still return the html fragment without syntax highlighting?

@savetheclocktower
Copy link
Contributor

Given the following assumptions:

  • rendering markdown can always be done synchronously
  • performing syntax highlighting must always be done asynchronously

I can't help but think they should be separate methods.

That said, if we're OK with the method always being async, then that's an easy way of solving it. If we want to cater to a simpler use case for people who just want to render markdown synchronously, then we could expose a separate method that only does the markdown parsing.

@confused-Techie
Copy link
Member Author

@savetheclocktower Your assumptions are absolutely correct, and that's totally valid that exposing a separate function may be the best solution.

I was hoping to move away from multiple functions as the initial implementation was awkward, but I suppose having just two of render and highlight wouldn't be too unwieldy, and resolves the conversation of proper returns completely.

@mauricioszabo What do you think? Seem like a viable path forward? Since this may be the simplest answer

@savetheclocktower
Copy link
Contributor

To be clear, I'm fine with one method doing everything as long as it's always async. I don't have a problem with consuming a method that goes async even when it could technically perform the task I want in synchronous fashion. If people wanted a sync version, then we could add another method, but I'm not specifically asking for that.

@mauricioszabo
Copy link
Contributor

mauricioszabo commented Mar 1, 2024

So my also 2 cents - while syntax highlighting is always async, the actual HTML fragment return is sync - it's just that after the element is present, the UI is updated with the highlighting (I know that because that's what I'm doing with the old API).

So, I suggest the method to always be sync, actually, and the syntax is just a promise that resolves when the highlighting ends. Look at the code:

    return async () => {
      await Promise.all(promises);
      return domHTMLFragment;
    };

We actually don't do anything with the promises, so my suggestion is to return something like:

return {
  syntax: Promise.all(promises).then(_ => true),
  html: domHTMLFragment
}

And that's it - if promises is an empty array, it'll "resolve" immediately so things still work.

EDIT: explaining my reasoning, a consumer of this API could decide to render the element right away or choose to wait for the promises before rendering the element, and both will work just fine.

@savetheclocktower
Copy link
Contributor

I'm revisiting this PR because I'm working on a blog post with lots of code examples in it, and markdown-preview is getting slower and slower to re-render at a rate that increases with each new code block I add. This issue is present no matter which parser markdown-preview uses, but it's painful and it needs to be fixed in a way that I don't think the atom.ui implementation can deliver.

Whenever we call render on some Markdown with the appropriate options, we are asking Pulsar to highlight any code blocks it finds. For each code block, that process involves

  • creating an instance of TextEditor,
  • making it read-only and doing some other funky DOM stuff to it, and
  • adding it in place of the pre element that was ordinarily there.

In the case of markdown-preview, this happens whenever the buffer stops changing. We're wise enough not to try to re-render with each keystroke, choosing instead to wait until something like 200ms have elapsed with no new changes, but that doesn't help a lot. In my case, once I got to around 30 code blocks, the editor started freezing for about a second with each re-render.

I dug into it. Not surprisingly, the bulk of the cost revolves around the TextEditor stuff. Markdown is fast to re-render — and even if it weren't, it could be moved off the main thread into a web worker — but creating TextEditors must be done on the main thread.

We are very cavalier about throwing the old results away and creating new ones. Amazingly, though, the bulk of the cost is paid in style recalculations; this call to isVisible can freeze the main thread for as much as half a second at a time because reading offsetWidth or offsetHeight forces the browser to immediately apply a bunch of DOM operations it was saving for later.

There's probably an optimization to be made there. But even within the status quo, this cost wouldn't have to be paid if we could re-use TextEditor instances, and especially if we didn't have to move them around, since that's what triggers their connectedCallbacks and their costly are-we-visible-or-not logic. To make this possible, we should stop doing the wasteful thing we're doing — replacing the entire contents of the element each time the Markdown source changes. We seriously set textContent to an empty string and then dump in a new document fragment every time. We need to minimize the cost we pay to create TextEditor instances, and the first step is to keep content on the page longer.

I did a proof-of-concept with a library called morphdom that was quite successful. It does a virtual-DOM-style operation without the virtual DOM; it compares the existing content to the desired content (in either string or node form) and applies the minimum number of changes to the existing content in order to get them to match. It wasn't hard to configure it to ignore atom-text-editor elements during this process, leaving me with a pretty good flow for re-renders:

  • Call renderMarkdown and get a document fragment.
  • Clone the existing container node (without cloning its children) and append the fragment as a child of that node. We do not try to apply syntax highlighting yet; this means we keep the pre elements around for now.
  • Call morphdom(existingContainer, newContainer, options); this changes the existing container in increments so that its contents are identical to those of newContainer. (The options argument skips consideration of atom-text-editor and does some other stuff.)
  • After this is done, we can do a highlighting sweep. Any pre element that doesn't have a corresponding atom-text-editor gets one created. We pay the startup cost there, but we then cache the TextEditor instance by using the pre element as the key; since the pre nodes will be preserved wherever possible between renders, this works great for preventing excessive TextEditor creation. If the contents of a pre changed, we change the text in the TextEditor; same for if the language changes. But if a given change to the source didn't affect any code blocks at all, then nothing needs to change about the editors, either.
  • Instead of removing the pre elements — which would thwart my plan to use them as cache keys — I simply have a style rule that hides them: atom-text-editor + pre { display: none; }. Since the atom-text-editor is inserted just above the pre, this works as if by magic.
  • Any “orphaned” editors (from, say, a code block getting destroyed) get cleaned up and removed from the DOM.

This is what I spent a couple hours on today just because it bugged the hell out of me, and it got markdown-preview back to a usable state for this one particular Markdown file. This approach works great with the original markdown-preview parser, but I can't port it over to the atom.ui approach because it doesn't understand a lot of markdown-preview's concerns here, and I'm not even sure it ought to. A better idea would be for markdown-preview to handle syntax highlighting itself and not ask atom.ui to do it, since we have no good way to make atom.ui prevent the wasteful creation and destruction of TextEditor instances.

What I'll do instead is opt out of syntax highlighting from atom.ui and manage it within markdown-preview the same way for both Markdown parsers.


This is my extremely long way of saying that code highlighting should not be part of the atom.ui library — particularly if our approach to syntax highlighting is to delegate the task to a TextEditor. It's a footgun. I know we thought that the problem was the weird API, but I think the problem is that we're trying to do something really weird in the first place.

If we offer syntax highlighting as a feature, we need to do it in a much less costly way than this, and within an API that doesn't create and then dispose of so many different TextEditor instances. Let's take it out of the API sooner rather than later — before anyone is really relying on it. The simple API can go in atom.ui and then the advanced version with syntax highlighting can be provided by markdown-preview as a service once we figure out how to do it better.

@mauricioszabo
Copy link
Contributor

Well, too late - I'm already relying on it 😱

But that's fine, I can revert-ish my work.

@mauricioszabo
Copy link
Contributor

So, my 2~ish cents on this:

I think a Markdown API with Syntax Highlighting is a good idea - maybe even necessary - for our ui API. Usually, people won't do the same thing as Markdown Preview - they will highlight a single thing, and that's it.

BUT - I think it's also a good idea for us to not reinvent the wheel - like, if we do markdown + Syntax Highlight in a core plug-in, and offer the same thing as an API, that API should be sufficient for the core plug-in. So I vote on reverting this for now, and then thinking about a way to make an API that allows for incremental rendering, if that's possible, and use it on Markdown Preview.

So, @savetheclocktower, if you can share the Markdown you were writing, or at least make an example, maybe we can discuss a way to make it work? What do you think?

Also, thanks for finding this out! Better sooner than later :)

@savetheclocktower
Copy link
Contributor

So, @savetheclocktower, if you can share the Markdown you were writing, or at least make an example, maybe we can discuss a way to make it work? What do you think?

Funny you should mention! We can discuss it in #984.

Copy link
Contributor

@savetheclocktower savetheclocktower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm giving this the review that I should've given it back in February, and which I should've given the original implementation much earlier.

This is probably going to come across as Scrooge-like, and for that I apologize. I assure you I'm only trying to be tough, not mean. Toughness is important to me because these APIs will be present for a long time and will be very hard to change once we've properly documented them.

We are asking this API to do a lot. I wonder if we're letting our own internal use cases drive the design of this public API.

Here are the places where we're using atom.ui.markdown so far:

  • In autocomplete-plus to render API descriptions
  • In deprecation-cop to render deprecation messages
  • In settings-view to render package READMEs and rich setting descriptions
  • in markdown-preview to do… Markdown previewing

I think the first two usages (and the rich setting descriptions of settings-view) can use an extremely simple Markdown API. We're letting the other use cases — the ones for rendering entire documents, along with image resolution and syntax highlighting — complicate that simple API.


My proposal is to build an API that satisfies the simple cases and make that available at atom.ui.markdown.render. If a community package needs things that can't be satisfied by that API, then it'd need to import its own Markdown renderer, and I can see how we'd want to avoid that; but I think it's also likely that we'd fail to envision that hypothetical package's needs even if we shipped a kitchen-sink implementation.

If we need a thorough API for our internal use cases – or even if we just want to expose a reference to markdown-it and cut down on duplication — we could deliver that as a service of a new builtin package. The service route is how atom-ide solved this — and though that's not usually a phrase that inspires confidence, I think they got it right in this case. With a service you get versioning, decoupling, and modularity, whereas anything we ship as part of atom.ui is something we'll have to support for the long haul.


The syntax highlighting thing is its own headache, but I think that should also be a service exposed by a new markdown-renderer package. (Or we could expose it as part of markdown-preview; it's the same to me.)

The pain in the ass that syntax highlighting currently is is largely an artifact of us trying to put a heavy text editor component on the page when all we actually want is tokenization and styling. We should experiment with the other highlighting strategy, the one we use in the fragment use case: render some code in a TextEditor, transplant the insides into an ordinary pre, then hijack all the CSS that styles code and make sure it applies to the pre as well. I think we could probably keep the TextEditors offscreen in that case, and maybe even retain a reference to one (or a pool with one TextEditor per language).

I know I've just written something much more elaborate than that for markdown-preview, but I imagine the median use case is much simpler here — akin to syntax highlighting for data tips and method signatures. Most use cases won't have to deal with markdown-preview's challenge — the constant re-rendering of content as the input changes incrementally. It's better to expose a simpler API for those cases that doesn't require the consumers to care about the things that markdown-preview has to care about.


@confused-Techie, I wouldn't blame you for feeling like you're being jerked around here. I've had plenty of opportunity to scrutinize these APIs before now and you wouldn't necessarily have guessed that my opinions would be this strong. At the time this was first introduced, I can only guess that I was working on too many other things and had to do brain triage.

Anyway, here's a summary of my argument:

  • atom.ui should contain simple things with broad use cases; we can ship a version of atom.ui.markdown.render that fits that description
  • If/when we figure out how to do syntax highlighting simply and quickly, we can ship that in atom.ui (though it probably will have to be async and thus wouldn't be part of atom.ui.markdown.render, but we'll figure that out when we get there)
  • Complicated things, or things that give raw access to things, should be exposed via services so that we can more easily iterate, evolve, and deprecate APIs
  • I'm happy to put together a proof-of-concept for what I'm talking about, though it would be lower on my priority list than some of our other projects (like documentation)

Comment on lines +41 to +44
* @param {boolean} givenOpts.sanitizeAllowUnknownProtocols - Whether DOMPurify's
* `ALLOW_UNKNOWN_PROTOCOLS` should be enabled.
* @param {boolean} givenOpts.sanitizeAllowSelfClose - Whether DOMPurify's
* `ALLOW_SELF_CLOSE_IN_ATTR` should be enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These both feel very obscure. What's the use case for allowing these to be customized?

Comment on lines +45 to +46
* @param {string} givenOpts.renderMode - Determines how the page is returned.
* `full` or `fragment` applies only when Syntax Highlighting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of this option. I understand the difference in behavior it produces, but not how that's at all related to “full” versus “fragment.”

} else {
// No instance was provided, lets make our own
// We will pass all values that we were given onto the `buildRenderer` func
md = buildRenderer(givenOpts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird to me because

  • buildRenderer accepts more options than render does, meaning that you can pass some options into render that will have undocumented effects;
  • also it seems like providing an mdInstance option will override some other options you might specify. Would be great to avoid that confusion, but if we can't, we should at least make that clearer to the user.

// We now could return this text as ready to go, but lets check if we can
// apply any syntax highlighting
if (opts.highlight) {
// Checking above for truthy should match for if it's a function or true boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on record as saying that I think all of this should be extracted and moved to a service that markdown-preview provides; it would solve about half of what makes this function so complex.

Comment on lines +257 to +262
* @param {boolean} givenOpts.transformImageLinks - If it should attempt to
* resolve links to images.
* @param {boolean} givenOpts.transformNonFqdnLinks - If non fully qualified
* domain name links should be resolved.
* @param {boolean} givenOpts.transformAtomLinks - If links to Atom pages should
* resolved to the Pulsar equivalent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you envision any plausible scenario in which someone might want to specify a value for any of these three options that's different from the other two?

I think we could probably get away with not making this configurable at all… but failing that, I think we could turn them into one single transformUrls option.

Comment on lines 263 to +266
* @param {string} givenOpts.rootDomain - The root URL of the online resource.
* Useful when attempting to resolve any links on the page. Only works for online
* resources.
* @param {string} givenOpts.filePath - The local alternative to `rootDomain`.
* Used to resolve incomplete paths, but locally on the file system.
* @param {string} givenOpts.disabledMode - The level of disabling of markdown features.
* `none` by default. But supports: "none", "strict"
* @returns {string} Parsed HTML content.
* Used when resolving links.
* @param {string} givenOpts.filePath - The path to the local resource.
* Used when resolving links.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just have rootDomain be a string. If people really need to specify both rootDomain and filePath, we can make rootDomain optionally be a function that provides a string and expects a string in return.

Right now we do this clever thing where we try to resolve something to disk, then rewrite it to a remote URL if we think the location on disk doesn't exist. If that's something a consumer needs, they can do that fs.lstatSync call in their rootDomain handler.

@confused-Techie
Copy link
Member Author

@savetheclocktower I appreciate such a thorough review on this one.

I do want to quickly shift some of the blame of the syntax highlighting to say that the issues experienced do likely already exist in markdown-preview since we have copied their methodology for syntax highlighting to this new API with hardly any change. But due to this API not being very optimized it doesn't surprise me to see it exacerbate the weak points and make usage of the feature even worse.

But you do make a great case towards prioritizing services, rather than core API's for big important features like this, and that's something I can totally get behind if we do think it's the best way forward, so if we think so it's something I'd be happy to take a look at if you haven't already beat me too it.

In the meantime, as for the work on this PR, it sounds like it may be best to go ahead and close this one out, in favour of a different PR that simplifies or even removes it, in favour of a service. I know removal might be worrisome, but considering I think Mauricio is the only one to actually rely on it, I can't imagine it'd have much impact.

Thanks again for such a thorough review

@savetheclocktower
Copy link
Contributor

I do want to quickly shift some of the blame of the syntax highlighting to say that the issues experienced do likely already exist in markdown-preview since we have copied their methodology for syntax highlighting to this new API with hardly any change.

Yeah, I certainly do wonder why they made some of the choices they did. I changed some of them in #984 and I think it's an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants