Correctly update and trim codepoint indices after trimming data #62

sidkurella · 2024-02-08T17:18:16Z

Shift codepoint indices when trimming from front of string

Trim codepoint indices to be consistent with trimmed data string

Closes #61

These codepoint indices are now invalid when trimming data from the front of the string. Shift these left to match the new byte offsets after trim. Fixes potential panic or invalid data when using UTF-8 encoded strings with nested struct members.

…when-trim-front Shift codepoint indices when trimming data from front

Trim these indices to be consistent with the trimmed string. This prevents a potential index out of bounds when padding is trimmed.

…es-when-trimming-line-data Trim codepoint indices when trimming line data

ianlopshire · 2024-02-08T18:03:41Z

@sidkurella I think there is still a bug in this where padding at the start of a nested struct will cause a misalignment. Consider this test case:

func TestDecode_Nested(t *testing.T) {
	type Nested struct {
		First  string `fixed:"1,3"`
		Second string `fixed:"4,6"`
	}

	type Test struct {
		Nested Nested `fixed:"1,6"`
	}

	for _, tt := range []struct {
		name string
		raw  []byte

		expected Test
	}{
		{
			name: "Multi-byte with trimmed values",
			raw:  []byte(" ☃2B☃\n"),
			expected: Test{
				Nested: Nested{
					First:  "☃2",
					Second: "B☃",
				},
			},
		},
	} {
		t.Run(tt.name, func(t *testing.T) {
			d := NewDecoder(bytes.NewReader(tt.raw))
			d.SetUseCodepointIndices(true)
			var s Test
			err := d.Decode(&s)
			if err != nil {
				t.Errorf("Unexpected err: %v", err)
			}
			if !reflect.DeepEqual(tt.expected, s) {
				t.Errorf("Decode(%v) want %v, have %v", tt.raw, tt.expected, s)
			}
		})
	}
}

A mechanism must be added to prevent trimming on values decoded into the nested struct. I've opened the PR draft #63 as an example.

sidkurella · 2024-02-08T18:08:36Z

@sidkurella I think there is still a bug in this where padding at the start of a nested struct will cause a misalignment. Consider this test case:

func TestDecode_Nested(t *testing.T) {
	type Nested struct {
		First  string `fixed:"1,3"`
		Second string `fixed:"4,6"`
	}

	type Test struct {
		Nested Nested `fixed:"1,6"`
	}

	for _, tt := range []struct {
		name string
		raw  []byte

		expected Test
	}{
		{
			name: "Multi-byte with trimmed values",
			raw:  []byte(" ☃2B☃\n"),
			expected: Test{
				Nested: Nested{
					First:  "☃2",
					Second: "B☃",
				},
			},
		},
	} {
		t.Run(tt.name, func(t *testing.T) {
			d := NewDecoder(bytes.NewReader(tt.raw))
			d.SetUseCodepointIndices(true)
			var s Test
			err := d.Decode(&s)
			if err != nil {
				t.Errorf("Unexpected err: %v", err)
			}
			if !reflect.DeepEqual(tt.expected, s) {
				t.Errorf("Decode(%v) want %v, have %v", tt.raw, tt.expected, s)
			}
		})
	}
}

A mechanism must be added to prevent trimming on values decoded into the nested struct. I've opened the PR draft #63 as an example

Yeah, I think you're right, if you're decoding into a nested struct I think you just don't want to do any padding trimming at all

I feel like that's an issue also in the pure-ASCII case too, no?

sidkurella · 2024-02-08T18:11:23Z

		{
			name: "Pure ASCII with trimmed values",
			raw:  []byte(" A2BC\n"),
			expected: Test{
				Nested: Nested{
					First:  "A2",
					Second: "BC",
				},
			},
		},

Yeah, just tested with this, this fails too.

This may break existing users though?

ianlopshire · 2024-02-08T18:19:11Z

So it seems there are two issues at play: (a) the potential panic caused by not correcting the relevant indices and (b) trimming for nested structs.

I feel fine fixing the first; any current users affected would be seeing a panic. I'm hesitant to change the behavior of the second without some sort of opt-in setting.

Would adding none as an alignment work for your usecase?

type Nested struct {
	First  string `fixed:"1,3"`
	Second string `fixed:"4,6"`
}

type Test struct {
	Nested `fixed:"1,6,none"`
}

sidkurella · 2024-02-08T18:21:38Z

Yeah, that works

ianlopshire · 2024-02-08T18:29:35Z

Cool,

Let's use this PR to solve the first of those issues, and I can update the second PR to add the none tag. I'll review this one now.

ianlopshire

The trimming logic cluttering up the rawValueFromLine and making it hard to understand. Can we factor out the trimming logic into methods on rawValue?

func (rawValue) trimPrefix(prefix string) rawValue {}

func (rawValue) trimSuffix(suffix string) rawValue {}

func (rawValue) trim(cutset string) rawValue {}

Alternatively, you could use the newRawValue function to rebuild the indices after trimming. I think that would be a performance hit, but I'd be fine with it for now.

…ethods Add raw value trim methods

sidkurella · 2024-02-08T19:15:16Z

The trimming logic cluttering up the rawValueFromLine and making it hard to understand. Can we factor out the trimming logic into methods on rawValue?
func (rawValue) trimPrefix(prefix string) rawValue {}

func (rawValue) trimSuffix(suffix string) rawValue {}

func (rawValue) trim(cutset string) rawValue {}
Alternatively, you could use the newRawValue function to rebuild the indices after trimming. I think that would be a performance hit, but I'd be fine with it for now.

Updated

sidkurella and others added 4 commits February 8, 2024 09:46

Merge pull request #1 from clear-street/skurella/shift-codepoint-idx-…

ee607f5

…when-trim-front Shift codepoint indices when trimming data from front

Trim codepoint indices when trimming line data

4fb0249

Trim these indices to be consistent with the trimmed string. This prevents a potential index out of bounds when padding is trimmed.

Merge pull request #2 from clear-street/skurella/trim-codepoint-indic…

26c0724

…es-when-trimming-line-data Trim codepoint indices when trimming line data

ianlopshire requested changes Feb 8, 2024

View reviewed changes

sidkurella and others added 2 commits February 8, 2024 13:57

Add raw value trim methods

d929357

Merge pull request #3 from clear-street/skurella/add-raw-value-trim-m…

a12d767

…ethods Add raw value trim methods

sidkurella requested a review from ianlopshire February 8, 2024 21:42

ianlopshire approved these changes Feb 8, 2024

View reviewed changes

ianlopshire merged commit 66e7247 into ianlopshire:master Feb 8, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly update and trim codepoint indices after trimming data #62

Correctly update and trim codepoint indices after trimming data #62

sidkurella commented Feb 8, 2024 •

edited

Loading

ianlopshire commented Feb 8, 2024

sidkurella commented Feb 8, 2024

sidkurella commented Feb 8, 2024

ianlopshire commented Feb 8, 2024

sidkurella commented Feb 8, 2024

ianlopshire commented Feb 8, 2024

ianlopshire left a comment

sidkurella commented Feb 8, 2024

Correctly update and trim codepoint indices after trimming data #62

Correctly update and trim codepoint indices after trimming data #62

Conversation

sidkurella commented Feb 8, 2024 • edited Loading

ianlopshire commented Feb 8, 2024

sidkurella commented Feb 8, 2024

sidkurella commented Feb 8, 2024

ianlopshire commented Feb 8, 2024

sidkurella commented Feb 8, 2024

ianlopshire commented Feb 8, 2024

ianlopshire left a comment

Choose a reason for hiding this comment

sidkurella commented Feb 8, 2024

sidkurella commented Feb 8, 2024 •

edited

Loading