Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sink of type 'vector' leaves open file descriptors of logs read by kubernetes_logs source - so the original log files still take disk space #19679

Closed
gadisn opened this issue Jan 22, 2024 · 6 comments
Labels
type: bug A code related bug.

Comments

@gadisn
Copy link
Contributor

gadisn commented Jan 22, 2024

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Topology 1 of vector as ‘agent’:
kubernetes_logs(source) -> 2 transforms -> splunk_hec_logs(sink)

Topology 2 of vector as ‘agent’:
Same as topology 1 above, and in addition, a sink of type 'vector' which uses the same kubernetes_logs source

We run a load on the system – 2000 request-per-second, spread over 3 k8s pods.
Each request leads to a line in the container log file, so overall, we we get a lot of logs.

In topology 1, we see static consumption of the disk.
In topology 2, we see a constant increase in the disk.

After investigation it turned out that in topology 2, the original log files gets deleted but doesn’t get cleaned from the disk – since the files are still referenced by a file descriptor.

I would expect that the original log files will be de-referenced so they will actually be cleaned from the disk

Configuration

Pseudo configuration of the relevant topology ('topology 2' from abvoe):


      sinks:
        vector_aggregator:
          address: ....svc.cluster.local:7500
          inputs:
            - kubernetes_logs
          type: vector
        splunk_hec_dev_test:
          type: splunk_hec_logs
          inputs:
            - ... a transform
          endpoint: ...
          default_token: ...
          encoding:
            codec: "json"
          index: ...
          compression: gzip
          buffer:
            type: memory
      sources:
        kubernetes_logs:
          type: kubernetes_logs
          max_read_bytes: 8192
          glob_minimum_cooldown_ms: 1000
      transforms:
        transform1:
          type: route
          inputs:
            - kubernetes_logs
          route:
            ...

Version

0.34.1-distroless-libc

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@gadisn gadisn added the type: bug A code related bug. label Jan 22, 2024
@jszwedko
Copy link
Member

Hi @gadisn !

The expected behavior is that Vector will hold open the file handle until it reaches EOF. If it still has the file handle open then that indicates it hasn't finished reading that file.

@jszwedko
Copy link
Member

jszwedko commented Jan 22, 2024

This is mentioned over here for the file source: https://vector.dev/docs/reference/configuration/sources/file/#file-deletion. That note should probably be copied to the kubernetes_logs source too.

@gadisn
Copy link
Contributor Author

gadisn commented Jan 22, 2024

Thanks @jszwedko.
But the problem isn't seen when kubernetes_logs is used only for the splunk_hec_logs sink, so I'm guessing the source does finish to read the log files.

Is it possible that the vector sink somehow doesn't mark to the kubernetes_logs source that the sink is complete, so the source still holds the file handle to allow re-read? (perhaps I need to enable acknowledgements?)

@jszwedko
Copy link
Member

Thanks @jszwedko. But the problem isn't seen when kubernetes_logs is used only for the splunk_hec_logs sink, so I'm guessing the source does finish to read the log files.

Is it possible that the vector sink somehow doesn't mark to the kubernetes_logs source that the sink is complete, so the source still holds the file handle to allow re-read? (perhaps I need to enable acknowledgements?)

Lines should be marked as "read" as soon as the source reads them. One guess I'd have is that the splunk_hec_logs sink had a higher level of throughput than you are seeing with the vector sink and so Vector is reading more slowly.

@gadisn
Copy link
Contributor Author

gadisn commented Jan 24, 2024

Thanks, will review

@jszwedko
Copy link
Member

Closing since this seems to be a case of back-pressure causing the file source to keep files open longer than expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants