Description
See #544 for the publisher epic that includes this story.
Context
After a call with Jonathan from Manning, a publisher is more interested in book page count than in word count.
This is odd in that expectation is that publishers reformat to fit specific page dimensions, and word count captures that better.
Acceptance criteria
- Count words in relevant files in the code and wiki repositories.
- Summarize word counts across both code and repo repositories combined.
- Skip counting words in files that Git ignores.
- Skip counting words in "boilerplate" files (like the Gradle/Maven wrapper scripts).
- Count book pages (not words) by converting each Markdown page to PDF (see "Tech context", below).
Out of scope
- Do not try to anticipate publisher page dimensions, and rely on default PDF pages from available tooling.
- Fix that some UNICODE characters do not convert to PDF (see comments below).
- See additional improvements in Better conversion of Markdown to PDF #585.
Tech context
Validate using the etc/count-words-or-pages.sh
script in the wiki repo with the -P
flag. This does two things:
- Converts Markdown files to default PDF (sans images and internal links) -- the files are left in place so you can open them to view
- Shows a page count for each Markdown (page) file -- output on the cmd line
Dev setup
My experience was:
- Clone the code and wiki repos under a common parent directory.
- Symlink
modern-java-practices.wiki/etc/count-words-or-pages.sh
into the parent directory so I don't need to type so much. - I put more effort into
-h
(help) than really needed. I hate doing things half way.
Converting Markdown to PDF
The pandoc
tool is excellent for converting among formats for documentation.
Steps taken (Homebrew should be similar):
$ sudo apt install pandoc
$ sudo apt install pdflatex # The tool that `pandoc` calls to for converting
$ sudo apt install texlive-latex-base
$ sudo apt install texlive-fonts-recommended
$ sudo apt install texlive-fonts-extra
$ sudo apt install texlive-xetex # More forgiving frontend for errors with UNICODE
Note that some UNICODE characters (such as 🟢) do not convert, but the Markdown still converts to PDF.
This affects 6 files.
The underlying issue is with the toolchain used to convert: PDF and Markdown support these just fine.
For PDF it relies on the LaTeX toolchain, and use XeLaTeX to ignore issues with UNICODE.
We need to play with the fonts that LaTeX uses, and find options that both look good, and have the missing characters (the default font is "lmroman10-regular").
Metadata
Metadata
Assignees
Labels
Projects
Status