-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffered reader is default for jsonl #446
Comments
@markcurtis1970 When I wrote that ingest code using If you have a specific (cleansed) file you can share that breaks the ingester, it could be added as a test case. I don't think this is an insurmountable problem. The Note, in general: that I'm very open to a PR, and contributors in general, not just for this part of the codebase, but for the entire project. |
Haha yeah I know that feeling! (im gonna have to circle back to this) Sure, thanks - I was going to try a PR but there wasnt a file object to get the I'll see if I can sanitise that file in the meantime! |
This comment has been minimized.
This comment has been minimized.
@markcurtis1970: The latest
Can you give the new version a try, and re-open this issue if there's still trouble? Thanks for your patience. |
Ok I tested this with the original file that gave the issue
Looks good to me thanks! |
Excellent, thank you @markcurtis1970 🙏 |
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [neilotoole/sq](https://github.com/neilotoole/sq) | patch | `v0.48.4` -> `v0.48.5` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>neilotoole/sq (neilotoole/sq)</summary> ### [`v0.48.5`](https://github.com/neilotoole/sq/blob/HEAD/CHANGELOG.md#v0485---2025-01-19) [Compare Source](neilotoole/sq@v0.48.4...v0.48.5) ##### Fixed - \[[#​446](neilotoole/sq#446)]: A [`bufio.ErrTooLong`](https://pkg.go.dev/bufio#ErrTooLong) was being returned by [`bufio.Scanner`](https://pkg.go.dev/bufio#Scanner), when splitting lines from input that was too long (larger than [`bufio.MaxScanTokenSize`](https://pkg.go.dev/bufio#MaxScanTokenSize), i.e. `64KB`). This meant that `sq` wasn't able to parse large JSON files, amongst other problems. The maximum buffer size is now configurable via the new [`tuning.scan-buffer-limit`](https://sq.io/docs/config/#tuningscan-buffer-limit) option. Note that the buffer will start small and grow as needed, up to the limit. ```plaintext $ sq config set tuning.scan-buffer-limit 64MB # or 1024B, 64KB, 1GB, etc. ``` A more useful error message is also now returned when the buffer limit is exceeded (the error suggests adjusting `tuning.scan-buffer-limit`). ##### Changed - Renamed config option `tuning.buffer-mem-limit` to [`tuning.buffer-spill-limit`](https://sq.io/docs/config/#tuningbuffer-spill-limit). The new name better reflects the purpose of the option. </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xMjYuMSIsInVwZGF0ZWRJblZlciI6IjM5LjEyNi4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Describe the bug
When using the jsonl source larger files might exceed the standard bufio limit
To Reproduce
Read in a large file
Expected behavior
Ideally we should cope with larger lines
sq
versionPaste the output of
sq version --yaml
into the code block below:Source details
Source didnt load so had no further detail
Logs
Screenshots
n/a
Additional context
When using a bufio.NewScanner the default is 64k
https://pkg.go.dev/bufio#Scanner.Buffer
I have used this example in code before with success
https://stackoverflow.com/questions/39859222/golang-how-to-overcome-scan-buffer-limit-from-bufio/66876416#66876416
However in the this part of the code the file isnt present so we cant really get the info from
file.Stat()
https://github.com/neilotoole/sq/blob/master/drivers/json/ingest_jsonl.go#L38
The text was updated successfully, but these errors were encountered: