-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: explicit syntax for custom tags #240
Comments
Djot was beyond Markdown, keeping its legacy:
This proposal opens Pandora's box:
And I wonder what a LaTeX renderer (say, or SILE) would then do. Have to support I am afraid the problems supposedly solved might be worse. Or did I miss something? |
There's no any special handling of tags. For example, the SILE renderer would do exactly what SILE XML Flavor would do, namely, interpret the document as
This might, or might not produce a valid SILE document, depending on which custom SILE commands the user has defined. Stated positively, the user gains access to all their pre-existing custom SILE commands without having to define custom Djot renders or filters. So, if the user has
defined, they can use
in their djot |
Well I am afraid I have to disagree on everything then...
(I don't think it's the place to discuss the SILE examples, but the SIL language should be completely avoidable, and the user shouldn't need custom commands to do this kind of things. Styles are a better paradigm with a nicer separation of concerns) |
Still, an additional comment though:
Would the user really want to do this, with markdown.sile they don't need to define custom Djot renders or filters, indeed. The following works:
And all other things equal, it does work identically whether the input is a Markdown file, a Djot file, or a Pandoc JSON AST1. Footnotes
|
This is an interesting and well thought-out proposal. It does go in a somewhat different direction than I'd originally had in mind, but I see its good points. The original conception was that if you wanted to do something like
and then make use of a filter that replaces this with AST nodes including the raw HTML This proposal would allow you to do
which is a bit more verbose and relies more on English keywords, but it would work out of the box without filters. The proposed change would be breaking for existing djot documents that used
but maybe that is okay as the language is still in an experimental phase. The proposed change would make the djot AST less compatible with the pandoc AST (which doesn't have a notion of "tag name"), and this would make pandoc interoperability less smooth. In general I don't like to rely on English language keywords. Perhaps one could work around that, though, by introducing the concept of a "tag dictionary" that allows you to define your own aliases for tag names? If we did implement the prefix You are right that allowing a special name for spans restores symmetry with what we now have for divs. However, there's also a question of symmetry with verbatim containers (code spans and code blocks). For example, in LaTeX you might want
to produce a As for syntax, I fear that the tag name in |
Yeah, that's the big thing here! One can view Djot as eihter:
This proposal pushes us more towards the second interpretation (but note that they are not mutually exclusive --- some people may use djot as 1, and some might use it as 2) As you've rightfully notice, everything expressible with this proposal is already possible with custom attributes and classes, the "custom tags" thing just basically formalizes this pattern. And that nicely segues in @jgm first point! Even under this proposal I would expect people to write
and handle this as a filter by default. The "raw html" mode I think is needed solely as an escape hatch. However, under the new proposal its syntactically apparent that That's probably what I like aesthetically most here --- that we clearly separate the "semantics" attribute from the style ones (including adding invariant that there's at most one custom tag, but many classes).
I was under the impression that we already don't restrict class names and such to be English, but apparently that's not the case. It feels a bit strange that the following is parsed differently
I would say if we are fine with class names being English, we should be fine with tag-names being English also (but it might be a good idea to include some quoted syntax then just in case, eg
FWIW, this is something that worries me quite a bit. The page https://djot.net doesn't say that Djot is in an experimental phase, and makes it look like its quite finished. Ideally, we'd be more clear with communicating our stability promise.
Yeah, I think syntactically the salient bits are that:
As for particular syntax, |
I don't think there was any intention to exclude non-English class names! If we do it seems like a bug. The attribute grammar in attributes.ts does say that keywords need to be ascii, but not classes or identifiers. |
See also #197 and #192 where I proposed another use for I'm thus all for storing these "tags" specially in the AST. What worries me is that this proposal seems very HTML-centric for such a "central" syntax feature. I think it is important that djot is output-format agnostic, not favoring any one output format. While I do not yet use djot for real (the lack of a metadata — and other data in the spirit of #192 — syntax which is interoperable with Pandoc is the main show stopper for me) I really like most of the syntax features where djot differs/adds to Markdown, but my typical target format is PDF via LaTeX. If this means "tags" are stored separately in the ast and can be used for anything by parsers, filters and renderers I'm all for. If this means that "tags" become unusable unless you target HTML/XML, or even djot gets tied to those formats I'm actually worried! |
As a data point, someone laments the inability to create HTML/djont sandwiches without writing custom filters: |
I find this very useful as well. Most notably, the |
I would like to mention an additional use case for this feature and strongly support it. There is a constellation of technical documentation tools, the most common of which is Sphinx. One of the reasons it's been so successful is that reStructuredText is an arbitrarily extensible markup language. The community finally found a way to bolt Markdown onto it, but the whole set of tools is very tied up in the Python + docutils ecosystem and feels idiosyncratic. I am working on an alternative and I'd like to use Djot for it, but the lack of a reStructuredText-like extensibility strategy is forcing me to make weird choices. I don't think cracking the door open to "arbitrary XML" will cause people to do bad things unnecessarily, and it could open up a lot of interesting opportunities for improving these types of sophisticated documentation systems. For example, one thing you can do in Sphinx is define a heading in one document like this: .. _glossary:
===
Glossary
=== And then refer to it elsewhere without needing to include the path to the document, because the heading itself is captured as a global reference:
In Djot, you could define heading refs the same way using attribute syntax:
And then link to it with What would be needed to get this over the line? I think it would be very valuable for the community to have access to an ergonomic, cross-platform, well-defined, cleanly-implemented, arbitrarily-extensible markup language, and this closes the only gap I can think of. |
The equivalent in djot would be:
Is that worse than your Sphinx example? |
Indeed, it's easier than that. See
|
Not sure this is what @irskep is asking for, reading the linked Sphinx specification -- I read the request as the need for a cross-reference via an identifier while the heading title might be changed independently:
And then one would want some way to use the |
I think I didn't make my point well enough and over-explained one possible use case. I'm fully aware of the ability to define IDs within a document and link to them. I made a mistake by going into so much detail about reference syntax as an example. The goal is not a cross-referencing system, which I think is too specific for a markup language. I'm talking about a syntax that supports sophisticated uses such as Sphinx-style references. Djot shouldn't need to implement all possible use cases for all document systems for all time, but it would be great to allow more complex systems to use it as a component without requiring any new syntax. reStructuredText has a near-monopoly on infinite extensibility in a markup language without resorting to regex-based token transforms, so I'm excited by the prospect of it having some cross-platform competition. |
Another example of how flexible syntax helps is how Sphinx lets you specify HTML metadata. One way to accomplish this in Djot today would be to write a filter that finds
But I think it makes more sense to write it the way this proposal suggests:
And if you wanted to go a step further toward enabling this type of use case, you could allow omitting the square brackets:
Does this make it clearer what I'm talking about? Again, this is just one example of what you can do with flexible syntax. My point is not that Djot should have HTML metadata support. |
I don't think it is. It's something one needs, eventually ;) |
Again, it is an example of a possible use of general extensibility. It is a kind of thing you could do as a user of Djot if this proposal were implemented. There are more kinds of things you could do that are not references or HTML metadata. I am not trying to argue that Djot should support HTML metadata. I'm trying to make the argument that extensibility is valuable. People use markup languages for all sorts of things. I assume that's why the filters feature exists. :-) |
I do rather complicated customizations with classes and other attributes and pandoc filters with both Pandoc's Markdown and djot as input formats and both LaTeX and HTML as target formats. The extensibility is there already even though the elements used are called spans and divs, or code and codeblock in some cases. The important thing is that you can attach attributes to them which filters can "pick up." For example I have a filter which implements list tables, converting lists of lists inside divs with a certain class into tables. |
I agree that extensibility already exists in the form of classes, IDs, and attributes, and it's one of the reasons I find this language so neat. My HTML metadata example above shows how you can already accomplish the use cases I have in mind. My support here is mostly about semantics and ergonomics. You can use classes and attributes as hacks when what you are really trying to do is make custom elements. It works, but it feels better and it's easier to explain if the custom-tag-ness is at the forefront rather than saying "add this thing using a CSS class even though it's not really a CSS class." Maybe a simpler version of this proposal would be to allow tag names to be specified just like CSS classes without the
This would introduce a new error cases where the user specifies multiple tags ( I still think it's better to put the custom tag first, because it "feels" better and avoids this error case being possible, but maybe a less invasive alternative is easier for the community to accept. I realize I'm creating a lot of noise today, so I'll try to back off for a while. Overall I don't have an opinion on the fine details of the syntax, just that Djot learns the distinction between a list of CSS classes vs tag identity. I would love to support this effort with implementation and/or documentation help if you decide to accept some form of this proposal. |
I agree, it's worth thinking about this variant of the original proposal as well (though it would preclude #257, so we'd have to be sure we don't want bare words to mean "flag attributes"). This variant has the advantage of being more uniform (it's just a tweak to what attributes can look like). |
Why not |
As for the concern how djot tags (possibly non-English) should map to HTML tags or LaTeX command names I think some kind of tabular mapping would be required. I already have pandoc filters which do this for classes, with the mapping in the metadata. |
After pondering this overnight, I think I've gone from "prefer tag first" to "no preference." I can see the value of tag-after-content for spans. For blocks, the attributes are already before the block, so you could just choose to put the tag first.
|
I’d say in this particular case the equivalent djot would use a class, rather than a tag name. This is precisely the intended usage difference:
For TeX, if we pretend we don’t have first class syntax for this features already, you’d say
but
The different between examples is that in the latter case you need to know that it is an equation to correctly infer the meaning of the attributed snippet. While in the former case you can mentally parse it as simple text, and then note that it is in italics. |
To take @matklad's last comment a step further, I assume this would work with inline verbatim as well, and could potentially reduce the need for special-case micro-language syntax such as for math ( If you had never implemented math, then you could use this proposal to do something like this:
I guess yet another option for this idea is to tweak the math syntax and use the dollar signs as markers instead of a colon...
Personally I'd like to use per-span language-specific syntax highlighting this way, but I recognize people probably already think CSS classes are sufficient for that.
vs what you can already do:
|
@matklad I must say that I don’t follow your reasoning why You see for me, and I’m hardly alone, the main reason for using an LML rather than HTML and LaTeX directly is this problem that the many textual tags/commands break the flow of reading so that it becomes basically impossible (for me at least!) to just scan the text and get an idea of what it’s about. Ideally an LML shouldn’t use any textual markup at all, though I readily admit that rigidly following that principle has proven impractical.1 A possible route to alleviate this is by allowing non-ASCII punctuation and symbol characters, and/or combinations of characters — djot markup like I firmly believe that any textual markup in an LML should be as inobtrusive as possible. Placing the attributes after the element has proven to be thus inobtrusive. It becomes like a parenthetical remark, although unfortunately attributes are often more frequent than parenthetical remarks should be! I don’t remember having ever seen it stated by a developer that this was the intention when the attributes were put after the code run in Pandoc Markdown (maybe @jgm remembers) but that has become the IMO welcome effect. Attributes after the opening fence in Pandoc Markdown is likewise nicely inobtrusive. I’m actually somewhat troubled by the superposed block attributes of djot, but I realize that short of requiring a “dummy” fence around paragraphs with attributes this is the best way to distinguish block attributes from inline attributes and my Perl script described in the first note also uses superposed attributes for blocks. I also try to use a single (short) class or attribute whenever possible offloading the actual customization to filters and the configuration to metadata — although I agree with @jgm that metadata should not really be used for filter configuration: in fact I think filter data and filter configuration should be separate namespaces fron both metadata and template variables, as well as from each other ( That said I don’t really think a single tag always suffices. My above-mentioned Perl script has a concept of “private attributes” with a Footnotes
|
I am firmly in favor of this proposal in some form or another. I regularly make use of semantic tags like |
This proposal is a synthesis of #239 and #146 and organized in TL;DR, What? and Why? sections, where the Why? is the most important.
TL;DR
Change djot such that the following input:
produces the following HTML:
What?
Specifically:
Change the parsing rule for
::: spam
to use"spam"
fortag_name
, rather than a class.Changing parsing rules for bare
:::
and[]
to settag_name
to"div"
and"span"
,respectively.
Add new concrete syntax
:tag-name[]
, that is,:(\S+)\[
where$1
, an arbitrary sequence of non-whitespace symbols, is atag_name
, and the rest is the usual span syntax. This concrete syntax produces aSpan
AST with the correspondingtag_name
set.Change default HTML renderer to use
tag_name
when renderingspan
anddiv
elements.The most invasive change here is
4
, as it adds a bit of new syntax to djot and directly enlarges the surface area.Why?
This single solution fixes several "problems" in the current version of djot, some big an some small. I list them roughly in order of priority:
Problem: users need a lightweight approach for producing custom HTML interspersed with normal djot.
Today, djot provides a
``` =HTML
syntax to embedded raw HTML (or any other format). The problem here is that its all-or-nothing: everything inside=HTML
needs to be HTML. You can't use that to wrap a part of a djot document into a custom tag:This is solvable by using a custom filter/renderer, but that's a significant step up in complexity, and might not be available to the user (e.g., a forum software using Djot for comments could alow raw HTML(with sanitization), but won't allow custom filters). In a more ad-hoc way, it's possible to split the raw block in two
but that's not quite as pretty as some might want!
With the proposed solution, the above can be written simply as
Naturally,
= HTML
doesn't go away: that's still the right tool for raw HTML, but we now gain a way to add HTML-Djot sandwiches.Note that while I say
HTML
, this feature applies to any roughly XML-shaped output format. For example, a docbook renderer could use that to emit arbitrary docbook elements, and a LaTeX renderer could emit apair.
Problem: extensibility properties of Djot are not obvious and need better explanation.
The core feature of Djot is that its syntax is fixed, but it is still extensible because the syntax is flexible enough to encode arbitrary attributed trees which could be interpreted specially by the renderers. This is a somewhat subtle and non-obvious point, and may not be immediately clear to the new users.
With this proposal, Djot gains an explicit first-class syntax for custom elements. We can clearly document that
::: plugin
and:plugin[]
is how one extends Djot. In terms of expressive power, this is exactly equivalent to[]{.plugin}
of course, but is easier to explain and search for.Overloading
.class
syntax to mean custom tags/elements is harder to teach.Problem: it's impossible to express arbitrary HTML in a Djot filter.
Djot has two programmatic extensibility mechanisms:
Filters are generally nicer, they are target-format-independent and composable (you can chain several filters together, because input and output have the same type). However, you can't use a filter to emit an HTML node not already used by a renderer, unless you resort to raw half-nodes, which is ugly, and output-format specific.
With this proposal, filters gain full power of HTML, while keeping a nice, well-typed tree structure. Fewer things need to be custom renderers, more things can can be filters.
Problem: the
::: spam
syntax is not orthogonalIn today's Djot, the following two are equivalent:
In the following example, both classes are on equal footing semantically, although syntactically one feels like it should be the primary:
The proposal fixes makes the syntaxes orthogonal by adding a new dimension.
::: spam
is no longer a class, it is a tag name.Problem: when reading custom elements existing "introducer last" syntax requires the reader to backtrack.
Consider a custom element in today's djot:
[Ctrl+C]{.kbd}
. Here, the+
would be interpreted specially by the renderer as a notation for shortcuts. However, if you read this left-to-right, you need to look ahead to{.kbd}
to get the context for interpreting the+
.In the proposal, this looks like
:kbd[Ctrl+C]
--- introducer keyword,kbd
, is leading, so a one-pass left-to-right visual scan tells you everything.Problem: smarter editors and IDEs need to know context to provide helpful suggestions.
Let's say you added a custom citation element to Djot, which looks like
[foo, p. 15]{.cite}
. A smart editor should be able to auto-completefoo
from your references library, but, if you are typing this left-to-write, by the time you get to[foo]
IDE doesn't yet know that it's going to be a cite.With the proposal, as soon as you've typed
:c
, the IDE can suggest auto-completing that to:cite[]
and then show completion list for actual citations.The text was updated successfully, but these errors were encountered: