Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scannerloop stops after encountering a very long line #81

Open
timmattison opened this issue Jan 31, 2024 · 0 comments
Open

scannerloop stops after encountering a very long line #81

timmattison opened this issue Jan 31, 2024 · 0 comments

Comments

@timmattison
Copy link

If you have a line longer than the 1MB buffer length (don't ask) the scannerloop's scanner.Scan() for condition will evaluate to false. When this happens line counting for the current file stops where it is and reports incorrect results for that file.

gocloc/file.go

Line 90 in 7b24285

for scanner.Scan() {

I could see a few fixes for this.

  1. A new option to set the buffer size with a maximum of 1MB being the default if it is unset:
	if opts.MaxLineLength > 0 {
		scanner.Buffer(buf.Bytes(), opts.MaxLineLength)
	} else {
		scanner.Buffer(buf.Bytes(), 1024*1024)
	}
  1. Scanning the files ahead of time to find the longest gap between line endings and then automatically setting that as the buffer size. This does require reading the file twice though.

  2. Changing the scannerloop to use something like mmap instead of scanner.

If you're interested in the third one let me know and I'll work on a PR.

The first one probably touches a bit more of the overall design than I should take on for a first PR.

I think the second one is safe but it does double the I/O required. Disk caching may make this less of an issue than doubling the amount of raw data read from disk but still feels like a last resort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant