Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline placeholder image causes other normal images to be replaced #150

Open
pimlottc-gov opened this issue Dec 30, 2019 · 4 comments
Open
Assignees
Labels

Comments

@pimlottc-gov
Copy link
Contributor

When you have an inline placeholder image to be replaced (i.e. within image replacement fields), other non-placeholder images in the document after the placeholder image can end up being replaced along with the placeholder image.

For example:

Template:

inline template

Expected:

inline correct

Actual:

inline incorrect

Attached is a modified images_template.docx that demonstrates the issue as shown above:
images_template.docx

@pimlottc-gov
Copy link
Contributor Author

pimlottc-gov commented Dec 30, 2019

What seems to be happening here is that, when the placeholder image is inline with the tags, the start_field and end_field end up within the same w:p tag and start_node and end_node are the same. When the ImageBlock tries to collect the body nodes, it pulls in the entire rest of the document, so that replace ends up replacing the first image in every subsequent node.

The solution seems to be to update the body method in blocks.rb to check for this condition:

        def body
          return [] if start_node == end_node

However, this code is used by multiple other Block subclasses, and I'm not an expert in WordML, so I'm not certain if it wouldn't cause problems for other blocks or situations.

@stadelmanma stadelmanma self-assigned this Dec 31, 2019
@stadelmanma
Copy link
Collaborator

Interesting I thought I fixed the inline replacement problem in #131 but hopefully I'll have time later this week to look into it further.

@pimlottc-gov
Copy link
Contributor Author

pimlottc-gov commented Dec 31, 2019

Thanks. I just realized there's another case that my proposed fix doesn't address - if there is another inline image preceding the placeholder image in the same paragraph. replace searches the entire w:p tag containing the placeholder and matches the first image, even though it's before the starting tag.

I think what's really needed is a more robust algorithm to walk the xml tree and pick only the nodes that are actually between the start and end nodes, in document order. I'm not sure exactly what that would look like yet, but I think some sort of modified depth first search traversal might do the trick.

This seems like it would be a common problem when parsing open office xml documents, perhaps there is a well known algorithm that can be reused.

Template:
inline2-template

Expected:
inline2-correct

Actual (with proposed fix):
inline2-incorrect

@hoangviet62
Copy link

Hi guys, I am also facing this issue.
Is there any updates to fix the issue ?
Many thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants