Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComboBox choice_values full of empty strings despite PDF having valid choices. #4114

Open
sarahkittyy opened this issue Dec 4, 2024 · 3 comments

Comments

@sarahkittyy
Copy link

Description of the bug

I am using the 940b: https://www.irs.gov/pub/irs-pdf/f940b.pdf

The PDF file has identical pages, and each page has this specific dropdown:
image

The choice_values variable is empty.

import pymupdf

pdf = pymupdf.open('f940b.pdf')

for page in pdf:
    for widget in page.widgets():
        if widget.field_type_string == 'ComboBox':
            print(widget.choice_values)
        widget.update()
pdf.save('f940b-output.pdf')

Expected output:

[' - Select One - ', '  ', 'Cincinnati, OH 45999', 'Memphis, TN 37501', 'Ogden, UT 84201', 'Philadelphia, PA 19255']
[' - Select One - ', '  ', 'Cincinnati, OH 45999', 'Memphis, TN 37501', 'Ogden, UT 84201', 'Philadelphia, PA 19255']

Actual output:

['', '', '', '', '', '']
[' - Select One - ', '  ', 'Cincinnati, OH 45999', 'Memphis, TN 37501', 'Ogden, UT 84201', 'Philadelphia, PA 19255']

This also affects the resulting f940b-output.pdf, where the first combo box is suddenly completely empty with no choices available.
image

How to reproduce the bug

See above

PyMuPDF version

1.24.13

Operating system

Linux

Python version

3.12

@sarahkittyy
Copy link
Author

<<
  /Rect [ 213.206 341.196 391.491 361.361 ]
  /Subtype /Widget
  /Parent 86 0 R
  /F 4
  /P 505 0 R
  /StructParent 42
  /Type /Annot
  /MK <<
    /BG [ 1 ]
  >>
  /AP <<
    /N 533 0 R
  >>
>>
<<
  /Rect [ 213.206 341.196 391.491 361.361 ]
  /Subtype /Widget
  /TU (Mail to:)
  /Parent 86 0 R
  /F 4
  /I 47 0 R
  /P 1 0 R
  /StructParent 62
  /V 36 0 R
  /DA (/Helv 12 Tf 0 g)
  /DV 51 0 R
  /Opt 52 0 R
  /Type /Annot
  /Ff 4325376
  /MK <<
    /BG [ 1 ]
  >>
  /AP <<
    /N 45 0 R
  >>
>>

It seems that in this form, the dropdown on the first page has no /Opt key, only the one on the second page. Yet, in all PDF viewers, the options are shown in both dropdowns as expected. What other key is being used to link to these choices?

@sarahkittyy
Copy link
Author

These forms both have a /Parent 86 xref that links to

<<
  /TU (Mail to:)
  /I 87 0 R
  /T (p1-t14)
  /V 88 0 R
  /DA (/Helv 12 Tf 0 g)
  /DV 89 0 R
  /Opt 90 0 R
  /FT /Ch
  /Ff 4325376
  /Kids [ 532 0 R 46 0 R ]
>>

And /Opt 90 links to the correct list of options:

[ ( - Select One - ) (  ) (Cincinnati, OH 45999) (Memphis, TN 37501)
  (Ogden, UT 84201) (Philadelphia, PA 19255) ]

So somewhere in pymupdf you need to account for the fact that the /Opt key might live in the /Parent object.

@sarahkittyy
Copy link
Author

sarahkittyy commented Dec 4, 2024

Also, widget.field_value = widget.choice_values[foo] doesn't even work. (Leaves the field in the output PDF completely blank)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants