Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

READ-BDF create FONTDESCRIPTOR and write DISPLAYFONT files #2015

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

MattHeffron
Copy link
Contributor

Now can create the FONTDESCRIPTOR with all non-empty charsets.
Can write DISPLAYFONTFILE format ("STRIKE") files for the charsets.

Correct handling of ASCENT and DESCENT by FILEDESCRIPTOR not per-CHARSETINFO.
Add ability to use mapping of Unicode charcode to unknown XCCS charcode in the private space.

Create 2nd FONTDESCRIPTOR for unmapped Unicode to XCCS charcodes, organized by charset-like (8-bit splitting of charcode) of Unicode encoding value.

Needs more testing.

Can write DISPLAYFONTFILE format ("STRIKE") files for the charsets.

It still needs a bit more fine-tuning, and waiting for prerequisite PRs to be merged.
…SETINFO.

Add ability to use mapping of Unicode charcode to unknown XCCS charcode in the private space.
…anized by charset-like (8-bit splitting of charcode) of Unicode encoding value.

Needs more testing!
@MattHeffron MattHeffron added the enhancement New feature or request label Feb 5, 2025
@MattHeffron MattHeffron self-assigned this Feb 5, 2025
@MattHeffron
Copy link
Contributor Author

MattHeffron commented Feb 5, 2025

This seems to be working well. It is doing some extra work to allow checking of mapping of Unicode to XCCS.
Anyone want to "play" with it?

Here is the file I'm testing with (renamed from .BDF to .TXT for GitHub): CU12.TXT
I ran it by first loading: (FILESLOAD (FROM LISPUSERS) READ-BDF) in an Interlisp Exec.
Then in an XCL Exec:

(IN-PACKAGE "BDF")
; Argument to READ-BDF below is path to CU12.TXT (or .BDF if you renamed it), probably as a string
(TIME (PROG1 T (SETQ CU12 (READ-BDF path-to-CU12))))
; (PROG1 T is because the BDF-FONT structure prints a LOT!
; Next generate FONTDESCRIPTORs and the .DISPLAYFONT files for 2 fonts
; Final "path-to-put-files" is the directory in which to put the .DISPLAYFONT files. Each charset its own directory: c0
(TIME (SETQ CU12-FONTINFO (MULTIPLE-VALUE-LIST (WRITE-BDF-TO-DISPLAYFONT-FILES CU12 path-to-put-files))))

Here's the TIME output for the 2 expressions above:
image
Note, that this actually creates 2 FONTDISCRIPTORs and 2 sets of files.
CU12-FONTINFO is a list of 5 items.

  1. The FONTDESCRIPTOR for the charsets that contain glyphs of successfully mapped codes from Unicode to XCCS. For this BDF test file, the resulting font files are "ClearlyU17-MRR-C#.DISPLAYFONT" in each "c#" folder.
  2. The list of charsets that contain any mapped character codes (the numbers)
  3. The FONTDESCRIPTOR for the glyphs of codes which failed to map from Unicode to XCCS. They are grouped in pseudo-charsets by the high byte of the Unicode encoding (i.e. (LOGAND 255 (LRSH encoding 8))) For this BDF test file, the resulting font files are "CLEARLYU-UNMAPPED17-MRR-C#.DISPLAYFONT" in each "c#" folder.
  4. The list of pseudo-charsets that contain any unmapped Unicode character codes (the numbers)
  5. The list of (Unicode-encoding . GLYPH-structure-for-char) for encodings > FFFF, if any.

I used FONTSAMPLER to generate PDF files of the two "fonts". In the 'BDF' XCL Exec:

(IL:FILESLOAD IL:FONTSAMPLER)
(SETQ CU12-FD (FIRST CU12-FONTINFO))
(SETQ CU12-CSETS (SECOND CU12-FONTINFO))
(SETQ CU12-UM-FD (THIRD CU12-FONTINFO))
(SETQ CU12-UM-CSETS (FOURTH CU12-FONTINFO))
(IL:|FontSample| CU12-FD CU12-CSETS "ClearlyU.pdf")
(IL:|FontSample| CU12-UM-FD CU12-UM-CSETS "ClearlyU-Unmapped.pdf")

Here are the 2 pdf files I got from this:
ClearlyU.pdf
ClearlyU-Unmapped.pdf

It does appear that there are some strange things in the mappings.
E.g., in ClearlyU.pdf charset 164(8) there are many glyphs that aren't in the "MEDLEYDIR/unicode/xerox/Xerox Character Code Standard Version 2.0 1990.pdf" file in charset 164(8).

@rmkaplan
Copy link
Contributor

rmkaplan commented Feb 5, 2025

With respect to
"E.g., in ClearlyU.pdf charset 164(8) there are many glyphs that aren't in the "MEDLEYDIR/unicode/xerox/Xerox Character Code Standard Version 2.0 1990.pdf" file in charset 164(8)."

There were some comments in the Unicode PR that the tables we have for 164 and 165 are goofy. 164 for example had a bunch of 165 codes in it, which I removed. Some of the remaining didn't match what I see in the XCCS document, which I left for another day.

@rmkaplan
Copy link
Contributor

rmkaplan commented Feb 5, 2025

Can you say more about the unmapped codes? These are Unicode codes with glyphs that our tables don't map to XCCS codes?

What actually happens when you encounter one of those? UTOXCODE (without the ?) will fake up an XCCS code, but that will only persist in the current system. If you are making up a permanent XCCS mapping for such unicodes, should we write them out into a mapping file that essentially acts as an extension to XCCS (or MCCS)?

Or is something else going on?

@MattHeffron
Copy link
Contributor Author

Can you say more about the unmapped codes? These are Unicode codes with glyphs that our tables don't map to XCCS codes?

@rmkaplan Correct. These are the code which don't map to XCCS codes. (I.e., UTOXCODE? returned NIL)
In the -UNMAPPED FONTDESCRIPTOR the glyphs are associated by the Unicode code (from the BDF font). They are not mapped as with UTOXCODE (no ?). I understand that this isn't particularly useful, but it is handy to see which Unicode codes/glyphs are not mapped to XCCS codes.

…oes NO mapping from Unicode to XCCS.

All glyphs are at the Unicode encoding positions.
Any glyphs with Unicode encoding > xFFFF are not included in the FONTDESCRIPTOR or DISPLAYFONT files.
@MattHeffron
Copy link
Contributor Author

Added option to create and write files for RAW FONTDESCRIPTOR which does NO mapping from Unicode to XCCS.

All glyphs are at the Unicode encoding positions.
Any glyphs with Unicode encoding > xFFFF are not included in the FONTDESCRIPTOR or DISPLAYFONT files.

This should be useful to see all glyphs in a BDF font (using FONTSAMPLER see also PR #2018), for analyzing and extending UNICODE mapping tables.

(Other than needing documentation, this seems ready to be made not DRAFT. Other people testing it would be nice!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants