adding encoding options for pdftotext #469

Enzodtz · 2023-06-27T20:05:45Z

Hi,

I'm trying to use this tool to extract text from a PDF file, but it doesn't seem to support passing the encoding directly to pdftotext.

This would cause me issues with letters that aren't in the default encoding, such as ã, à, á etc. They're being saved as �.

In order to fix this, I added the shell_encoding kwarg that would allow one to choose the correct encoding for the shell parser, pdftotext, in this case.

In order to do that, I also needed to refactor a little bit the argument parsing code.

Thanks.

adding encoding options for pdftotext

d80c971

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding encoding options for pdftotext #469

adding encoding options for pdftotext #469

Enzodtz commented Jun 27, 2023

adding encoding options for pdftotext #469

Are you sure you want to change the base?

adding encoding options for pdftotext #469

Conversation

Enzodtz commented Jun 27, 2023