Replies: 6 comments 6 replies
-
See commit a04c15a for some interesting background. Long ago, we used to have a separate |
Beta Was this translation helpful? Give feedback.
-
I've never consulted the manual in manpage form. I regularly use the PDF or the website because I can use the table of contents to jump to the relevant section. I find it convenient to have most of the documentation I regularly consult downloadable as a single PDF and would not like to see it split up. The Lua filters documentation is the only thing I regularly consult that isn't in that PDF. |
Beta Was this translation helpful? Give feedback.
-
Agreed, so I think that it would be good to retain a single HTML file as one of the generated artifacts. |
Beta Was this translation helpful? Give feedback.
-
I would very much welcome a split, the current MANUAL is a bit intimidating and overwhelming. I've also considered to re-home the Lua API docs to a separate file, which might help to navigate the Lua filter documentation. |
Beta Was this translation helpful? Give feedback.
-
I concur in principle for reasons of authoring and actually reading the thing, but when splitting things it would be good to consider ease of search. I frequently word search the man page when I forget a syntax. An option to output it all in one go might be useful. Maybe a shorthand for |
Beta Was this translation helpful? Give feedback.
-
Getting down to a more detailed level, the original proposal above was:
I think that putting pandoc-templates in a separate document would make sense. I'm less sure about pandoc-defaults. There is a very close coupling between pandoc-defaults and the command-line options; when one is changed, the other needs to be changed too, so it might be more practical to keep them together. This also makes it clearer to the user how this coupling works. On the other hand, it's the sort of thing that often gets its own man page. pandoc-markdown, you might think, could go in its own document. One problem, though, is that this is where many of the extensions are documented. Documentation of the extensions, one might think, ought to be included in the main man page. To make things even more confusing, some of these extensions also affect other formats. For example, many pandoc-markdown extensions also work for commonmark/gfm. pandoc-latex-support: Much of this is in a section documenting what different settings of variables do in various formats. So, would we remove the material about LaTeX from this but keep the other formats? That seems odd. pandoc-slides, pandoc-epub: These could be separate pages. (Along with pandoc-org, pandoc-jats, pandoc-typst, currently already in a separate files but not made into man pages.) The documentation for filters, lua-filters, and custom readers and writers could also be made into man pages. Still unsure on the whole. The biggest gains would come from pulling out pandoc-markdown, but for reasons given above this isn't so straightforward.
So, on this conception we'd have man pages for pandoc-FORMAT for every supported format? These would include documentation for the variable settings affecting the format, the extensions affecting it, and anything else (e.g. the current section on EPUBs). Not a bad idea perhaps, but it would mean that pandoc would have a huge number of man pages. I'm not sure that would be appreciated. |
Beta Was this translation helpful? Give feedback.
-
Summary
I propose to split Pandoc's
MANUAL.txt
into multiple documents, separating the documentation of thepandoc
command and its options from the documentation of Pandoc-flavored Markdown and other material related to file formats. This would allow the generation of more concise and topic-specific man pages from the documentation, would establish a pattern for documenting currently-undocumented features of certain readers and writers, and could make documentation maintenance more accessible to contributors.Current state issues
MANUAL.txt runs to 7500+ lines of Markdown and constitutes a large fraction of Pandoc's documentation all by itself. It covers a great many topics, such as:
pandoc
CLIThe manual's focus on Pandoc-flavored Markdown as the first-among-equals input format for Pandoc has come to seem out of place to me. Pandoc now parses a great many formats, and a user might well use it primarily with Djot, rST, AsciiDoc, or Commonmark input. Such users are not served by assumptions in the manual that treat Pandoc's Markdown as the default format. Pandoc's Markdown, in turn, is quite featureful, and the "Pandoc's Markdown" section of MANUAL.txt is about 1/3 of the file.
Conversely, for formats other than Pandoc's Markdown, there are format-specific features and missing features that aren't documented anywhere. For a simple example, the HTML reader preserves some semantics by reading the
<var>
tag into a PandocCode
with classvariable
, and the HTML and Texinfo writers understand this convention. (My wip mdoc reader does this too.) Divs with class"section"
get treated specially by the HTML writer, and this has caught users out, see #8757. reStructuredText mavens are a good source of evidence here, see @jgm in #10318:As mentioned, Pandoc already has format-specific documentation contributed by @tarleb for org and JATS. The work I propose to do will endorse and extend this pattern.
Proposed work
I propose we start splitting MANUAL.txt into multiple documents, each suitable for conversion to a manual page and for rendering as an HTML document on the website. The organizational principle would be that MANUAL.txt continue to serve as the
pandoc(1)
man page, documenting the function of each command-line option and argument, but excluding the detailed material on specific input and output formats.The extracted material would be organized and adapted into new documents under
doc/
, perhaps initially along these lines (not comprehensive):pandoc-template(5)
pandoc-defaults(5)
pandoc-markdown(7)
pandoc-latex(7)
pandoc-slides(7)
pandoc-epub(7)
Beyond simple copy-pasting, work would be required to make the extracted material function well as standalone documents and integrate them with the website and manpage builds.
Possible followups
Plenty of reader and writer formats have no Pandoc-specific documentation. With a better-established pattern for writing and publishing these documents, we can start filling in the gaps.
All the documentation of the Pandoc AST as such is aimed at Haskell developers (through the
pandoc-types
haddock) and Lua filter writers. The documentation of Pandoc's markdown sort of serves as the documentation of "what's a Pandoc document?" We could write an author-focused document that describes the semantics of Block and Inline AST elements without the coding details. This documentation could also guide developers of new native or custom readers and writers, by providing a format-neutral description of the semantics of each element.Most extensions only affect a subset of readers and writers. Each extension could be documented in its own individual file, with metadata indicating the supported readers and writers. The
man
andhtml
versions of the relevant format documentation could then be scripted to include all relevant extensions. This would let each format document all available extensions without duplication and drift in the source files.Beta Was this translation helpful? Give feedback.
All reactions