Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shift codepoint indices when trimming data from front #60

Conversation

sidkurella
Copy link
Contributor

These codepoint indices are now invalid when trimming data from the front of the string. Shift these left to match the new byte offsets after trim.

Fixes potential panic or invalid data when using UTF-8 encoded strings with nested struct members.

These codepoint indices are now invalid when trimming data from the
front of the string. Shift these left to match the new byte offsets
after trim.

Fixes potential panic or invalid data when using UTF-8 encoded strings
with nested struct members.
Copy link
Owner

@ianlopshire ianlopshire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sidkurella Thanks for taking the time to submit this PR!

I've requested a small change, but once that's complete I'd be happy to merge and cut a new release!

Comment on lines +231 to +237
// We trimmed data from the front of the string.
// We need to adjust the codepoint indices to reflect this, as they have shifted.
removedFromFront := relevantIndices[0]
newIndices := make([]int, 0, len(relevantIndices))
for _, idx := range relevantIndices {
newIndices = append(newIndices, idx-removedFromFront)
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this only needs to happen when the first relevantIndices is non-zero. Can we add a check here to prevent doing an unneeded iteration over the indices?

Suggested change
// We trimmed data from the front of the string.
// We need to adjust the codepoint indices to reflect this, as they have shifted.
removedFromFront := relevantIndices[0]
newIndices := make([]int, 0, len(relevantIndices))
for _, idx := range relevantIndices {
newIndices = append(newIndices, idx-removedFromFront)
}
// If we trimmed data from the front of the string. We need to adjust the
// codepoint indices to reflect this, as they have shifted.
if relevantIndices[0] > 0 {
removedFromFront := relevantIndices[0]
newIndices := make([]int, 0, len(relevantIndices))
for _, idx := range relevantIndices {
newIndices = append(newIndices, idx-removedFromFront)
}
relevantIndices = newIndices
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ian, thanks for the prompt reply. You're right, we only need to do this in that case. However, this fix is actually not sufficient to solve the production issue we are encountering. I have another fix that together solves the problem, just trying to write a test case that exercises it right now.

@sidkurella
Copy link
Contributor Author

Closing this pr and opening new one with updated fix

@sidkurella sidkurella closed this Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants