Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot convert more than 200 docx files to pdf on a warm lambda instance #47

Open
dallan-kainos opened this issue Feb 7, 2025 · 0 comments

Comments

@dallan-kainos
Copy link

dallan-kainos commented Feb 7, 2025

I'm seeing this behaviour in my java project (using libreoffice with ProcessBuilder), but as you will see below it is also reproducible in nodejs.

The issue is that on the 201st invocation of a warm lambda the libreoffice command will no longer convert a docx to PDF.

The error is javaldx failed! osl::Thread::create failed

Here is the minimal amount of code to reproduce the issue:

You'll need a docx file at the root of your project (any docx file will do)
Dockerfile:

FROM public.ecr.aws/shelf/lambda-libreoffice-base:7.4-node16-x86_64
COPY index.js package.json Document.docx ./
RUN npm install
CMD [ "index.handler" ]

Here is the index.js handler:

const {execSync} = require('child_process');
const {writeFileSync, readdirSync, readFileSync, unlinkSync} = require('fs');

// to run this locally shell into the docker image and go to /var/task and run the following
// export PDF_ITERATIONS=250
// node -p "require('./index.js').handler()"

module.exports.handler = () => {

    console.log("process started");

    const iterations = process.env.PDF_ITERATIONS;
    if (!iterations) {
        throw new Error("PDF_ITERATIONS environment variable is not set")
    }

    for (let i = 0; i < iterations; i++) {

        const fileName = `document${i}`;
        const docxFileName = fileName + ".docx"
        const pdfFileName = fileName + ".pdf"

        console.log(`converting docx file ${fileName} to pdf file and saving to disk`);
        writeFileSync(`/tmp/${docxFileName}`, readFileSync("Document.docx"));

        let execSyncResult = execSync(`
        cd /tmp
        libreoffice7.4 --headless --invisible --nodefault --view --nolockcheck --nologo --norestore --convert-to pdf --outdir /tmp ./${docxFileName}`
        );

        console.log("Exec sync complete with result: " + execSyncResult);

        unlinkSync(`/tmp/${docxFileName}`);
        unlinkSync(`/tmp/${pdfFileName}`);

    }

    console.log("process finished");
};

If you create a docker image image with the above, then use it for your lambda (be sure to add the PDF_ITERATIONS env variable and increase the lambda execution time). Then create a test on the aws console and run for any event.

The curious thing about this is that when I run the image locally, it will happily create as many pdfs as I like. Does aws put some sort of wrapper around the docker images, limiting the number of threads that can be created inside the image?

Really stumped on this one and any help would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant