Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector not creating last newline for gcp_cloud_storage sink #19641

Open
synepolskyi opened this issue Jan 17, 2024 · 0 comments
Open

Vector not creating last newline for gcp_cloud_storage sink #19641

synepolskyi opened this issue Jan 17, 2024 · 0 comments
Labels
type: bug A code related bug.

Comments

@synepolskyi
Copy link

synepolskyi commented Jan 17, 2024

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I'm trying to build pipeline Logs -> Vector sink GCS -> Google Cloud storage and then process these GCS files again by vector (on demand) via Vector source File -> Logs.

The problem is the fact log entries loss is constantly happening. The last line of each log file is omitted.

With reference to #18341, I've understood this is expected behavior of File source.

However, the file(s) / chunks are produced by vector, and therefore I believe there's a bug in GCS sink.

Configuration

WRITE:
[sources.in]
  type = "stdin"
[sinks.gcs]
  type = "gcp_cloud_storage"
  inputs = ["in"]
  healthcheck.enabled = true
  bucket = "test"
  credentials_path = "creds.json"
  storage_class = "STANDARD"
  batch.timeout_secs = 30
  framing.method = "newline_delimited"
  encoding.codec = "json"
  key_prefix = "test/"
  filename_time_format = "%H:%M:%S"


READ:
[sources.logs]
  type = "file"
  include = ["/tmp/logs/*"]
  fingerprint.strategy = "device_and_inode"
[sinks.out]
  type = "console"
  inputs = [ "logs" ]
  encoding.codec = "json"

Version

0.35.0

Debug Output

No response

Example Data

INFO vector::sources::file_descriptors: Capturing stdin.
INFO vector: Vector has started. debug="false" version="0.35.0" arch="x86_64" revision="e57c0c0 2024-01-08 14:42:10.103908779"
foo
baz
bar

laptop$ gsutil cp -r 'gs://test/**' /tmp/logs
laptop$ cat -vE /tmp/logs/14:11:00.log
{"host":"laptop","message":"foo","source_type":"stdin","timestamp":"2024-01-17T14:11:00.270743111Z"}$
{"host":"laptop","message":"baz","source_type":"stdin","timestamp":"2024-01-17T14:11:01.549495036Z"}$
{"host":"laptop","message":"bar","source_type":"stdin","timestamp":"2024-01-17T14:11:02.518769996Z"}laptop$

2024-01-17T14:12:48.472079Z INFO vector: Vector has started. debug="false" version="0.35.0" arch="x86_64" revision="e57c0c0 2024-01-08 14:42:10.103908779"
2024-01-17T14:12:48.472139Z INFO source{component_kind="source" component_id=logs component_type=file}: vector::sources::file: Starting file server. include=["/tmp/logs/*"] exclude=[]
2024-01-17T14:12:48.472724Z INFO source{component_kind="source" component_id=logs component_type=file}:file_server: file_source::checkpointer: Attempting to read legacy checkpoint files.
2024-01-17T14:12:48.483353Z INFO source{component_kind="source" component_id=logs component_type=file}:file_server: vector::internal_events::file::source: Found new file to watch. file=/tmp/logs/14:1100.log
{"file":"/tmp/logs/14:11:00.log","host":"laptop","message":"{"host":"laptop","message":"foo","source_type":"stdin","timestamp":"2024-01-17T14:11:00.270743111Z"}","source_type":"file","timestamp":"2024-01-17T14:12:48.483668135Z"}
{"file":"/tmp/logs/14:11:00.log","host":"laptop","message":"{"host":"laptop","message":"baz","source_type":"stdin","timestamp":"2024-01-17T14:11:01.549495036Z"}","source_type":"file","timestamp":"2024-01-17T14:12:48.483709234Z"}

Additional Context

No response

References

No response

@synepolskyi synepolskyi added the type: bug A code related bug. label Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

1 participant