Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Literate files #163

Open
dbaynard opened this issue Dec 18, 2017 · 4 comments
Open

Literate files #163

dbaynard opened this issue Dec 18, 2017 · 4 comments
Assignees

Comments

@dbaynard
Copy link

Hello,

I've been through the language addition guide, and would like some advice on implementing support for literate files (e.g. literate haskell, *.lhs).

In a literate file, the code is demarcated, whereas the comments are not. Traditionally literate haskell files used bird tracks.

This is a text description, that is stripped by the _literate preprocessor_.

> -- This is a normal comment
> example :: IO ()
> example = print "This is code"

More common now is the markdown style.

This is a text description, that is stripped by the _literate preprocessor_.

```haskell
-- This is a normal comment
example :: IO ()
example = print "This is code"
```

In either case, rather than demarcating the comments, it is the code which is demarcated, and therefore it does not appear that tokei can simply support such a file by extending the languages.json file.

How can tokei support such a syntax?

@XAMPPRocky
Copy link
Owner

@dbaynard Hello and thank you for this issue. This will require being added as a code path I think tokei can maybe support this syntax though I have a couple of questions about literate files. How many other types of styles are common? Can a *.lhs file have non haskell code fences and still be valid to the compiler?

@dbaynard
Copy link
Author

dbaynard commented Jan 3, 2018

The best resource on literate haskell files is the readme at https://github.com/wenkokke/unlit.

However, this applies beyond haskell.
For example, it may be useful to know how many lines of code are in tutorials written in markdown (or latex).

How many other types of styles are common?

unlit lists 6 types. In addition to the 'bird' style, there are two markdown styles (triple tildes or triple backticks) and three others:

LaTeX
\begin{code}
\end{code}
Org mode
#+BEGIN_SRC haskell
#+END_SRC
Jekyll
{% highlight haskell %}
{% endhighlight %}

Of these, I've only encountered the bird track, markdown and latex forms in the wild (and the latex form seems to be falling out of favour).

Can a *.lhs file have non haskell code fences and still be valid to the compiler?

With fences, haskell must be present.

@XAMPPRocky XAMPPRocky self-assigned this Jun 18, 2018
@benjaminselfridge
Copy link

@XAMPPRocky Any progress on this? I'd love this feature, and might be up for implementing a patch for it.

@foxyseta
Copy link

However, this applies beyond haskell. For example, it may be useful to know how many lines of code are in tutorials written in markdown (or latex).

True! Linguist and go-enry detect "Literate Haskell/Agda/.." as their own language, though:

Non-literate code blocks and listings in typesetting systems, on the other hand, are just detected as code written in the typesetting system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants