Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalized doc does not conform to PDF 1.7 standards #1548

Open
fuzailgilani opened this issue Aug 28, 2024 · 5 comments
Open

Finalized doc does not conform to PDF 1.7 standards #1548

fuzailgilani opened this issue Aug 28, 2024 · 5 comments

Comments

@fuzailgilani
Copy link

Bug Report

Description of the problem

The PDF that is generated using PDFKit can be opened in most PDF readers (e.g. Preview, browsers, etc.), but when we try to open it in Adobe Acrobat, it complains that the file is corrupted, giving the error code 135, which indicates that the file does not conform to PDF 1.7 standards. I ran one of the files through an online tool to validate the standard and it gave the following results:

Compliance: pdf1.7
Result: Document does not conform to PDF/A.
Details:
Validating file "WithPDFKitv15NoCompression.pdf" for conformance level pdf1.7
    The "endobj" keyword is missing.
    The key OutputConditionIdentifier is required but missing.
    The value of the key Info must not be of type name.
    The key Info is required but missing.
    The key DestOutputProfile is required but missing.
    The embedded ICC profile couldn't be read.
    The document does not conform to the requested standard.
    The file format (header, trailer, objects, xref, streams) is corrupted.
    The document doesn't conform to the PDF reference (missing required entries, wrong value types, etc.).
    The document does not conform to the PDF 1.7 standard.
Done.

We first noticed the issue with PDFKit version 0.13.0, and thought maybe upgrading to the latest version 0.15.0 would fix it, but no luck.

Code sample

We have a couple thousand lines of code for PDF generation as it's pretty central to our application and there's a lot of branching logic, but for now I'll just include how we initialize the document:

    const pdfDoc = new PDFDocument({
      size: 'A4',
      pdfVersion: '1.7',
      bufferPages: true,
      margins: {
        top: MARGIN,
        bottom: MARGIN,
        left: MARGIN,
        right: MARGIN,
      },
      compress: false,
    });

    const bufferPromise = new Promise<Buffer>((resolve) => {
      const buffers = [];

      pdfDoc.on('data', buffers.push.bind(buffers));
      pdfDoc.on('end', () => {
        const pdfData = Buffer.concat(buffers);
        resolve(pdfData);
      });
    });

Your environment

  • pdfkit version: 0.15.0
  • Node version: 20.5.0
  • Operating System: macOS Montery 12.4
@fuzailgilani
Copy link
Author

Okay, we've figured out what the issue was. A few weeks back, we went through the entire project and fixed all the linter errors and warnings that the project had. Further down from the code I posted in the snippet above, we set up the ICC profile for the PDF like this:

    // PDF/A standard requires embedded color profile.
    const colorProfile = Buffer.from(SRGB_IEC61966_ICC_PROFILE, 'base64');
    const refColorProfile = doc.ref({
      Length: colorProfile.length,
      N: 3,
    });
    refColorProfile.write(colorProfile);
    refColorProfile.end('');

    const rgbString = 'sRGB IEC61966-2.1';
    const refOutputIntent = doc.ref({
      Type: 'OutputIntent',
      S: 'GTS_PDFA1',
      Info: rgbString,
      OutputConditionIdentifier: rgbString,
      DestOutputProfile: refColorProfile,
    });
    refOutputIntent.end('');

The problem was with the const rgbString. Before we did our linter fixes, that line was originally:

    const rgbString = new String('sRGB IEC61966-2.1');

Our linter didn't like it because of the rule no-new-wrappers but apparently that was necessary for PDFKit to not have errors. Which is very odd and should probably be handled better by the library. For now, we've just added an eslint-disable-rule for that line and reverted it back to the new String constructor.

@Deku-nattsu
Copy link

@fuzailgilani that's the intended behavior by design, see https://github.com/foliojs/pdfkit/blob/master/lib/object.js#L46

@fuzailgilani
Copy link
Author

@Deku-nattsu The code you linked doesn't indicate to me that this is intended behavior, it just looks like the spot where the edge-case of being passed a primitive string isn't covered.

And if it is intended behavior that it expects a String object instead of a primitive string, then it should throw a type error when you pass in a primitive-type string. It can't possibly be intended behavior to simply create a broken PDF if one of the inputs is the wrong type.

@Deku-nattsu
Copy link

@fuzailgilani The condition for string literals is right above that line lol

In the pdf spec there are two types of string that can be used for dictionaries, you have name and text string, and this library indentifies name strings as String objects and text string as string literal and
the error you got was a result of converting the string literal value to pdf text string instead of name.

I understansd your frustration but the solution of making it type safe will require the team to define almost every object in the pdf spec since some keys accept name while others accept text string.

In the meanwhile i suggest having the spec around as reference and watch for the type of every key and use String objects for name and string literals for text string.

@liborm85
Copy link
Collaborator

    // PDF/A standard requires embedded color profile.
    const colorProfile = Buffer.from(SRGB_IEC61966_ICC_PROFILE, 'base64');
    const refColorProfile = doc.ref({
      Length: colorProfile.length,
      N: 3,
    });
    refColorProfile.write(colorProfile);
    refColorProfile.end('');

    const rgbString = 'sRGB IEC61966-2.1';
    const refOutputIntent = doc.ref({
      Type: 'OutputIntent',
      S: 'GTS_PDFA1',
      Info: rgbString,
      OutputConditionIdentifier: rgbString,
      DestOutputProfile: refColorProfile,
    });
    refOutputIntent.end('');

@fuzailgilani I don't really understand what you added there? Because in pdfkit this is:

const iccProfile = fs.readFileSync(`${__dirname}/data/sRGB_IEC61966_2_1.icc`);
const colorProfileRef = this.ref({
Length: iccProfile.length,
N: 3
});
colorProfileRef.write(iccProfile);
colorProfileRef.end();
const intentRef = this.ref({
Type: 'OutputIntent',
S: 'GTS_PDFA1',
Info: new String('sRGB IEC61966-2.1'),
OutputConditionIdentifier: new String('sRGB IEC61966-2.1'),
DestOutputProfile: colorProfileRef
});
intentRef.end();
this._root.data.OutputIntents = [intentRef];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants