Skip to content

Latest commit

 

History

History
283 lines (204 loc) · 7.99 KB

README.md

File metadata and controls

283 lines (204 loc) · 7.99 KB

Introduction

hxml is a simple, convenient way to write HTML and XML. Mainly, it reduces the clutter and burden of closing tags.

Writing HTML directly can be unpleasant, and tools exist to generate it at different levels of abstraction. Ultimately, however, most developers will need to work with HTML source directly to get what they want done.

hxml source compiles to regular HTML or XML, and is compatible with it, but with a few added features.

This syntax was developed as part of a larger ecosystem written in Haskell (called gxml). However, this part was useful on its own, and so I extracted it into one standalone file of Haskell source.

You do not need to know any Haskell to use it See Installation and use below.

Features

Error checking

hxml checks your HTML for unclosed or wrongly-closed tags. All tags must be closed, or self-closed such as <img src="picture.png"/>. This follows XML/XHTML style.

Generalized tag closing

The design goals for XML regarded verbosity as a non-issue. One example is that tags must be closed with the full name of the initial tag. This can sometimes be an aid in reading source, but often it is not helpful at all. For example, in HTML documents with numerous levels of <div>s, seeing </div> tells you very little. And for short snippets of enclosed text, it just adds work and noise.

hxml allows the name of the closing tag to be omitted, so that

This is a <span class="purple">purple chunk</> of text.

compiles to

This is a <span class="purple">purple chunk</span> of text.

Going the other way, you can supply extra information in the closing tag:

<div id="menubar" class="offset">
   <div>Home</><div>Dashboard</>
</div id="menubar">

Attributes on the closing tag will be checked by hxml to see if they match the opening tag, and then removed. This can catch mis-matches that an HTML checker would miss.

Block tag scope

The syntax <tag:> with a colon will scope the tag over an indented block. For example,

<div:>
   <h1:>This is a header
   <p class="big xyz":>This is a test paragaraph with
      multiple lines of
      content.

will become

<div>
<h1>This is a header.</h1>
<p class="big xyz">This is a test paragaraph with
multiple lines of
content.</p></div>

You can include blank lines within the indented block.

Attributes

XML requires that all attributes be quoted. This can be inconvenient and pointless for simple values, so hxml will quote these for you:

<table cellpadding=0 cellspacing=1>

turns into

<table cellpadding="0" cellspacing="1">

Of course, if there are special characters such as spaces or slashes in the attribute value, you'll still need to quote them.

An attribute with no value is expanded to having itself as a value, so that <input disabled> becomes <input disabled="disabled">.

Attributes starting with an underscore (_) are removed. These can be used for documentation and/or matching:

<div _pricelist>
   content
</div _pricelist>

To my knowledge, no application uses attributes that start with an underscore. If one turns up, I may revisit this syntax.

Short-content tags

A self-closing tag such as <hr/> has no content. We generalize this by allowing short content after the closing slash. That is,

This is a <b/bold> word.

expands to

This is a <b>bold</b> word.

This should only be used for short text with no potential syntactic ambiguity. For anything more, just use the closing tag syntax above:

This is a <b>bold/strong/emphasized</> phrase.

Chomments

Content enclosed with the <#> tag is considered an hxml comment and is removed. I dub these chomments. They pattern like any other tag:

<#>This will be removed.</#>
<#>This uses the simpler closing tag.</>
<#:>
   A potentially longer block
   chomment.
<#/a short chomment>

These are distinct from <!-- comments -->. The latter are preserved by hxml, and should be used for comments intended to be in the final HTML.

Inside a chommented block, tags are not parsed and matched since the indentation is sufficient to deliminate the block.

Encoding

hxml is fairly encoding-agnostic. It should work with UTF-8, ISO-8859-1, and any other 8-bit extension of ASCII. It will not work with UTF-16 and the like.

Fine points

I have tried to address various side and corner cases reasonably, although I may try different approaches in the future.

Empty content

In XML a self-closed tag is equivalent to a tag with no content, but in HTML5 tags are never self-closed, and instead are either inherently contentful or contentless (“void”). Closing a void tag is an error, and a contentful one must have a separate closing tag. Final slashes as in <hr/> may be present but are ignored.

hxml has a list of non-void HTML5 tags, and always expands them to include a distinct closing tag. So, <div/> and <i class=fa/> compile to <div></div> and <i class=fa></i>.

For other tags, hxml simply preserves the input. So, <br/> stays the same, and <br></br> would not be compressed (but no one should write that). If a new tag is not on hxml's list and needs to be closed, write out <new></> to assure a closing tag.

Whitespace

HTML has an odd relationship to whitespace. Sometimes it matters, sometimes it's ignored. hxml has careful rules for dispatching it.

As illustrated, blocked tags attach the closing tag to the end of the last line of the block. If space is desired before the closing tag, you'll have to fit it in somehow, say with a space at the end of the last line. Chomments can help visually here, such as the empty <#/>. If you require a newline, a line with only the indent can be added. It might be simpler to use </> in some cases.

Controlling space after the opening tag is easier. hxml also provides a mechanism for suppressing a newline immediately after tag, in case it is more sightly to have the whole block indented the same: simply put a space before the colon. Thus,

<span :>
   One line

becomes <span>One line</span>. This is perhaps a strange choice of syntax, but it has served well enough.

Chomments pattern just like block tags in that a chommented block will turn into a newline. However, there is one exception: a one-line chommented block disappears completely, so that

Foo<#:>chomment
bar.

becomes Foobar.

This is halfway between a bug and a feature. I set out to fix this defect in the parser, but decided this exception may be useful in practice, and so for now I am leaving it.

Error reporting

Error reporting should be pretty good. I initially wrote hxml using Megaparsec, but rewrote it to use Parsec, since it was more standard. I then rewrote it back into Megaparsec. As such, the code may be a bit rough around the edges as of this writing.

Other

Attributes on a close tag must be quoted (or not) the same as on the opening tag. E.g., <i class="foo">text</i class=foo> yields an error.

Tags are compacted to minimal spacing. For instance, a tag split across several lines will be compressed to one line, so

<div
   _updated=2017-07-19
   id=main_menu
   class=myclass
   ng-if=shown
   _todo=replace
   >

becomes <div id="main_menu" class="myclass" ng-if="shown">.

Tabs in the source are preserved and are treated as 4 spaces wide when comparing indentation. This value can be set in code as tabWidth. As usual, beware when mixing tabs and spaces.

Installation and use

You will need to install GHC or Stack, and use cabal install or stack install to build a library and executable.

The executable is simple:

hxml < main.hxml > main.html

If your application is written in Haskell, you can also use the parseHxml function directly. As an example, I write Heist templates in hxml.

Feedback is welcome!