Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count default PDF book pages #573

Closed
5 tasks done
binkley opened this issue Jul 11, 2024 · 3 comments
Closed
5 tasks done

Count default PDF book pages #573

binkley opened this issue Jul 11, 2024 · 3 comments

Comments

@binkley
Copy link
Owner

binkley commented Jul 11, 2024

See #544 for the publisher epic that includes this story.

Context

After a call with Jonathan from Manning, a publisher is more interested in book page count than in word count.
This is odd in that expectation is that publishers reformat to fit specific page dimensions, and word count captures that better.

Acceptance criteria

  • Count words in relevant files in the code and wiki repositories.
  • Summarize word counts across both code and repo repositories combined.
  • Skip counting words in files that Git ignores.
  • Skip counting words in "boilerplate" files (like the Gradle/Maven wrapper scripts).
  • Count book pages (not words) by converting each Markdown page to PDF (see "Tech context", below).

Out of scope

  • Do not try to anticipate publisher page dimensions, and rely on default PDF pages from available tooling.
  • Fix that some UNICODE characters do not convert to PDF (see comments below).
  • See additional improvements in Better conversion of Markdown to PDF #585.

Tech context

Validate using the etc/count-words-or-pages.sh script in the wiki repo with the -P flag. This does two things:

  • Converts Markdown files to default PDF (sans images and internal links) -- the files are left in place so you can open them to view
  • Shows a page count for each Markdown (page) file -- output on the cmd line

Dev setup

My experience was:

  1. Clone the code and wiki repos under a common parent directory.
  2. Symlink modern-java-practices.wiki/etc/count-words-or-pages.sh into the parent directory so I don't need to type so much.
  3. I put more effort into -h (help) than really needed. I hate doing things half way.

Converting Markdown to PDF

The pandoc tool is excellent for converting among formats for documentation.

Steps taken (Homebrew should be similar):

$ sudo apt install pandoc
$ sudo apt install pdflatex  # The tool that `pandoc` calls to for converting
$ sudo apt install texlive-latex-base
$ sudo apt install texlive-fonts-recommended
$ sudo apt install texlive-fonts-extra
$ sudo apt install texlive-xetex  # More forgiving frontend for errors with UNICODE

Note that some UNICODE characters (such as 🟢) do not convert, but the Markdown still converts to PDF.
This affects 6 files.
The underlying issue is with the toolchain used to convert: PDF and Markdown support these just fine.
For PDF it relies on the LaTeX toolchain, and use XeLaTeX to ignore issues with UNICODE.
We need to play with the fonts that LaTeX uses, and find options that both look good, and have the missing characters (the default font is "lmroman10-regular").

@binkley binkley self-assigned this Jul 11, 2024
@binkley binkley moved this to In progress in @binkley's Modern Build Jul 11, 2024
@binkley binkley moved this from In progress to In review in @binkley's Modern Build Jul 11, 2024
@binkley binkley assigned jwlibby and unassigned binkley Jul 11, 2024
@jwlibby jwlibby moved this from In review to In progress in @binkley's Modern Build Jul 15, 2024
@binkley binkley moved this from In progress to In review in @binkley's Modern Build Jul 16, 2024
@binkley binkley moved this from In review to In progress in @binkley's Modern Build Jul 17, 2024
@binkley binkley assigned binkley and unassigned jwlibby Jul 17, 2024
@binkley binkley moved this from In progress to In review in @binkley's Modern Build Jul 17, 2024
@binkley binkley assigned jwlibby and unassigned binkley Jul 17, 2024
@jwlibby
Copy link
Collaborator

jwlibby commented Jul 18, 2024

Notes on using the script:

for macos install

brew install pandoc
brew install texlive
brew install --cask basictex
sudo tlmgr update --self
sudo tlmgr install comment
brew install xpdf

also this path /usr/bin/bash is invalid on mac, had to change it to /bin/bash, did not commit this to the codebase
and also see attached for the complete log of what happened running with the -P option. Lots of warnings, not sure how critical these are to the final count:
count.log

@binkley
Copy link
Owner Author

binkley commented Jul 19, 2024

@jwlibby /bin/bash works on both Mac and Linux. Better might be /usr/bin/env bash to pick up the Bash version in the user's path. I'll update ... pushed.

Drawback: this has security implications if a user's PATH is compromised. However that concern is out of scope for this project, and the shell community is mixed on views for this.

@jwlibby jwlibby moved this from In review to Done in @binkley's Modern Build Jul 19, 2024
@jwlibby jwlibby removed their assignment Jul 19, 2024
@jwlibby jwlibby closed this as completed Jul 19, 2024
@binkley
Copy link
Owner Author

binkley commented Jul 22, 2024

@binkley Put PDF files in out folder instead of side-by-side with sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants