Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential panic or invalid data when using UTF-8 codepoint boundaries when decoding into a nested struct #61

Closed
sidkurella opened this issue Feb 8, 2024 · 0 comments · Fixed by #62

Comments

@sidkurella
Copy link
Contributor

Hello,

I have noticed a bug that causes a panic when decoding into a nested struct when using codepoint indices as your boundaries rather than bytes. Take the following example:

func TestDecodeSetUseCodepointIndices_Nested(t *testing.T) {
	type Nested struct {
		First  string `fixed:"1,3"`
		Second string `fixed:"4,6"`
	}

	type Test struct {
		First  string `fixed:"1,3"`
		Second Nested `fixed:"4,9"`
		Third  string `fixed:"10,12"`
		Fourth Nested `fixed:"13,18"`
		Fifth  string `fixed:"19,21"`
	}

	for _, tt := range []struct {
		name     string
		raw      []byte
		expected Test
	}{
		{
			name: "Multi-byte characters",
			raw:  []byte("123x☃x456x☃x789x☃x012\n"),
			expected: Test{
				First:  "123",
				Second: Nested{First: "x☃x", Second: "456"},
				Third:  "x☃x",
				Fourth: Nested{First: "789", Second: "x☃x"},
				Fifth:  "012",
			},
		},
	} {
		t.Run(tt.name, func(t *testing.T) {
			d := NewDecoder(bytes.NewReader(tt.raw))
			d.SetUseCodepointIndices(true)
			var s Test
			err := d.Decode(&s)
			if err != nil {
				t.Errorf("Unexpected err: %v", err)
			}
			if !reflect.DeepEqual(tt.expected, s) {
				t.Errorf("Decode(%v) want %v, have %v", tt.raw, tt.expected, s)
			}
		})
	}
}

Currently, this causes a panic due to codepoint indices not being adjusted when trimming data from the front of the string in decode.go:rawValueFromLine.

I believe the issue is here (decode.go Ln. 217):

	if value.codepointIndices != nil {
		if len(value.codepointIndices) == 0 || startPos > len(value.codepointIndices) {
			return rawValue{data: ""}
		}
		var relevantIndices []int
		var lineData string
		if endPos >= len(value.codepointIndices) {
			relevantIndices = value.codepointIndices[startPos-1:]
			lineData = value.data[relevantIndices[0]:]
		} else {
			relevantIndices = value.codepointIndices[startPos-1 : endPos]
			lineData = value.data[relevantIndices[0]:value.codepointIndices[endPos]]
		}
	} else { // truncated
	}

Note that lineData is trimmed from the left but the codepoint indices are not adjusted to match, which can cause an index out of bounds, or reading from the wrong part of the data string.

I have created a fix in PR #60 for your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant