-
Notifications
You must be signed in to change notification settings - Fork 1.7k
S3 sink broken on FreeBSD when using buffer.type = "disk" #2966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @cannedprimates , That's not good! Is this on ZFS or XFS? Can you show me your test json? |
It's a DO box with ZFS. I sent the test.json to [email protected] :) |
Thanks! I'm going to try to reproduce. |
Hm, I am using AWS S3 not DO spaces. I'll set up a minimal test instance! |
Hm, I did manage to replicate (using the test.json file).
|
Um I also just noticed, when I run without the buffer.* stuff, the file uploaded to S3 contains all the events but doesn't seem to be gzipped?? Very weird since I've been running vector with a small production service with essentially the same s3 config (longer batch.timeout_secs) and those files are definitely gzipped! |
Thanks for letting us know @cannedprimates, I've opened #3064 to investigate the gzip issue. That shouldn't have changed. |
Hmmm that's roughly what I did. I think the only difference is I used the official Vector packages from Could I invite you to try that? I think it might make deployment easier, too. :) |
well, the package is also a bit out of date right now.. Maybe there was some kind of regression between 0.7.1 and 0.9.2. If doing what you did but with But before that: where is it spinning? To check userspace stacks, run under a debugger: To check kernel stacks, set |
I'll poke around and try to reproduce. :) |
Going to see if I can replicate on UFS based on that. |
(Do you mean UFS?) Yeah, the kernel stacks aren't that interesting, we need userspace ones (interrupt vector under LLDB and view the backtrace) |
Yeah, UFS, sorry. :)
This leads me to suspect something to do with our async/signals. |
No, I think you're just looking at the main thread, which is not the one spinning at 100% CPU. You need to switch to the |
@myfreeweb It's been awhile since I used lldb directly (clion spoils me), your pointers are appreciated. 😁 |
Thread 4
Thread 6
Thread 7
Thread 24
|
haha whoops that's.. interesting. (maybe Yeah, don't bother with the sleeping threads (that are in |
This was a debug build. :) I'm waiting on a UFS build to test then I'll root a bit more. Thanks for the pointers! |
Same behavior on UFS. (End of day for me -- Will return to this issue tomorrow!) |
Cool, let me know if there's anything I can do to help! And thanks everyone for vector, has been a real joy to use so far :) |
So I ended up distracted by pre-release blockers yesterday. (And might be today too... but this is on my list!) On a more practical note, I would suggest against using the disk buffers if you can afford to give up the small amount of safety loss. Our disk buffers add a noticeable slowdown at the moment. We've been investigating ways to improve it. :) |
Vector buffers are basically redundant for a logfile managed by runit with log rotation etc (i.e. Vector doesn't update it's file position until all sinks have acked regardless of buffer setting) right? |
The answer here is unfortunately somewhat complicated, so apologies in advance 😄 First of all, we agree that the behavior you describe would be ideal. We plan to implement that behavior, but there are a variety of complications we are still figuring out how we'd like to handle (e.g. sinks that flush data only periodically, sinks that have no explicit flush, sources that provide no ability to ack, transforms that intentionally drop events in between sources and sinks, etc). In the meantime, we essentially provide two different modes of operation per sink. The default mode, with small in-memory buffers, is based around backpressure. In this mode, sinks "pull" data through the pipeline at whatever speed they're able. This flows all the way upstream to sources, so a file source (for example) would be reading and checkpointing at roughly the same throughput as the downstream sink is sending. This prevents issues where this is a large gap between data that has been ingested and data that's been fully processed. It's not as precise as the ideal behavior, but provides some of the same benefits. The second mode uses our disk buffers. The purpose of the disk buffers is to absorb and smooth out ingestion throughput variability in cases where backpressure would lead to data loss. A good example here is a UDP syslog source, where we have no way to signal the upstream system to slow down and need to simply accept the data as quickly as we can. If you're using something like the file source, however, disk buffers are very likely redundant (unless your files are very aggressively rotated). |
Release is out the door! I will be looking at this tomorrow. :) |
@cannedprimates Can you try a 0.10.0 build? It seems to work for me? Perhaps it was related to some of changes @bruceg made in #2866 or some other ticket... (This same setup had issues on 0.9.2)
|
(I didn't realize this would be closed by me moving the card in my tasks!) Please reopen this if you still face this issue! I really want to make sure the FreeBSD experience is first class. Thank you for this very awesome bug report and the truly fantastic reproduction steps. I'd love to solve issues from you any day, any time, any project of mine. :) |
Hi,
I'm trying out Vector 0.9.2 on FreeBSD 12.1.
With this config
everything works fine, but when I add
Vector immediately starts using 100% cpu, no events are added to any batch (I'm running with -vv), and Vector hangs after shutting down with TERM (needs to be KILLed)
As an immediate workaround, I guess I don't need a disk buffer when I'm ingesting from a log file (i.e. I assume Vector's file checkpointing takes into account whether events have actually been submitted successfully)?
I can set up SSH access to a test box if that helps.
cheers!
The text was updated successfully, but these errors were encountered: