How to resolve misplaced text and other visual aspects of a PDF manipulated using redactions and insert_htmlbox. #3906
Replies: 3 comments 4 replies
-
I don't have an exhaustive advice, but some single comments:
|
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for your reply.
My main problems are now two -
Btw I cannot not fathom what I would've done if I didn't have this amazing package ! Thanks again for maintaining it and being so proactive ! |
Beta Was this translation helpful? Give feedback.
-
Hello again.
But now I am struggling with the font-size.
Without styling code -
With styling code -
What I would like to be the outcome -> |
Beta Was this translation helpful? Give feedback.
-
I am trying to use pymupdf to help translate a foreign language pdf document to an english language pdf document but also trying to maintaining the formatting the best I can. This involves maintaining the text location, font color, font styling (bold, italics), annotations (strikethrough, underline), hyperlinks, images, tables, etc.
I am working with this example Chinese document -
CHINESE.pdf
Here are the output translations -
Option 1 -
CHINESE_OPTION_1_custom_redactions_with_dict.pdf
Option 2 -
CHINESE_OPTION_2_translated_custom_redactions_with_blocks.pdf
Issues -
For some reason the text ff is missing throughout the pdf. Unsure why this is happening. This text is searchable. Searching for
Affairs
highlights the text. Here's the screenshot -Final pdf is not consistent through Option 1 and Option 2. I like the Option 2 output for the Chinese document in question but this same logic fails when the text is spaced out in the line say in table form. I would like to achieve Option 2 output in Option 1 (as this option allows me to gather font, size and other information, performs well for other languages). How can I achieve this ?
1. table output for option 1 which is good -
2. table output for option 2 which is bad -
3. Chinese document text misplaced in Option 1 -
4. Chinese document text looks good in Option 2 -
Hyperlinks underline being too long. Is there a way I could resolve this case also ? I think this is happening because of the two reasons - the font being used is not the same in the output and the translated text is not always necessarily the same size and the input text.
1.
References -
Here's my example code -
Any feedback would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions