Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final newline character missing when using AWS S3 sink #21086

Open
miquelruiz opened this issue Aug 15, 2024 · 2 comments · May be fixed by #21097
Open

Final newline character missing when using AWS S3 sink #21086

miquelruiz opened this issue Aug 15, 2024 · 2 comments · May be fixed by #21097
Labels
domain: codecs Anything related to Vector's codecs (encoding/decoding) type: bug A code related bug.

Comments

@miquelruiz
Copy link
Contributor

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When using the AWS S3 sink to upload local logs, the last line of each uploaded object is missing a newline character at the end. This is not the case when using the File sink.

It seems to be caused by the S3 sink calling encoder.serialize on the last event of the batch (https://github.com/vectordotdev/vector/blob/master/src/sinks/util/encoding.rs#L51) while the file sink calls encoder.encode on all events (https://github.com/vectordotdev/vector/blob/master/src/sinks/file/mod.rs#L427).

I would expect these to behave in the same way using the provided config.

Configuration

[sources.messages]
type = "file"
include = [
  "/var/log/messages",
]

[sinks.s3]
inputs = ["messages"]
type = "aws_s3"
bucket = "<redacted>"
region = "us-east-1"
key_prefix = "<redacted>"
healthcheck.enabled = false
filename_append_uuid = false
filename_extension = ""
compression = "none"
encoding.codec = "text"
auth.access_key_id = "<redacted>"
auth.secret_access_key = "<redacted>"
batch.timeout_secs = 60

[sinks.file]
type = "file"
inputs = [ "messages" ]
path = "/tmp/messages-%Y%m%d-%H%M%S.log"
encoding.codec = "text"

Version

vector 0.40.0 (aarch64-unknown-linux-gnu 1167aa9 2024-07-29 15:08:44.028365803)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@miquelruiz miquelruiz added the type: bug A code related bug. label Aug 15, 2024
@jszwedko jszwedko added the domain: codecs Anything related to Vector's codecs (encoding/decoding) label Aug 16, 2024
@jszwedko
Copy link
Member

Good spot @miquelruiz

I think

match position {
Position::Last | Position::Only => {
encoder
.serialize(event, &mut bytes)
.map_err(|error| io::Error::new(io::ErrorKind::InvalidData, error))?;
}
_ => {
encoder
.encode(event, &mut bytes)
.map_err(|error| io::Error::new(io::ErrorKind::InvalidData, error))?;
}
}
is attempting to handle proper batch encoding when the batch is being encoded as a JSON array. If encode is used, a , is appended to each element as it is encoded, but when encoding as a JSON array the trailing , isn't wanted which is why serialize is used instead.

I think a possible fix for this would be to update:

pub const fn batch_suffix(&self) -> &[u8] {
match (&self.framer, &self.serializer) {
(
Framer::CharacterDelimited(CharacterDelimitedEncoder { delimiter: b',' }),
Serializer::Json(_) | Serializer::NativeJson(_),
) => b"]",
_ => &[],
}
}

To have it add a \n if the framer is Framer::NewlineDelimited.

@jszwedko
Copy link
Member

Opened a PR: #21097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: codecs Anything related to Vector's codecs (encoding/decoding) type: bug A code related bug.
Projects
None yet
2 participants