Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logfile] Clean up state folder on init #1114

Merged
merged 1 commit into from
Apr 11, 2024
Merged

Conversation

lisguo
Copy link
Contributor

@lisguo lisguo commented Apr 2, 2024

Description of the issue

Currently, customers using the logs plugin have a state folder tracking log files on the host that are being watched. This folder gets cleaned up periodically every hour. If the agent dies or crashes before the 1 hour, the agent could end up creating more state files causing excessive inodes to fill up on the host.

Description of changes

We can clean up the state folder on initialization of the logfile plugin, as well as periodically.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

pass unit tests

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

@lisguo lisguo requested a review from a team as a code owner April 2, 2024 15:10
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 64.58%. Comparing base (96d4763) to head (d8aa0e2).
Report is 527 commits behind head on main.

Files Patch % Lines
plugins/inputs/logfile/logfile.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1114      +/-   ##
==========================================
+ Coverage   57.58%   64.58%   +7.00%     
==========================================
  Files         370      373       +3     
  Lines       17548    19709    +2161     
==========================================
+ Hits        10105    12730    +2625     
+ Misses       6848     6331     -517     
- Partials      595      648      +53     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -107,7 +107,8 @@ func (t *LogFile) Start(acc telegraf.Accumulator) error {
return fmt.Errorf("failed to create state file directory %s: %v", t.FileStateFolder, err)
}

// Clean state file regularly
// Clean state file on init and regularly
t.cleanupStateFolder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent reads the state to track which part of the log file it left off reading. Does this execute before or after the state file has been read?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Move this into the goroutine. I don't think we need to block on the clean up.

Copy link
Contributor

@jefchien jefchien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, are we sure this is run when the agent is started with the OTEL collector? I'm not sure if the Start function in the logfile plugin is called.

@@ -107,7 +107,8 @@ func (t *LogFile) Start(acc telegraf.Accumulator) error {
return fmt.Errorf("failed to create state file directory %s: %v", t.FileStateFolder, err)
}

// Clean state file regularly
// Clean state file on init and regularly
t.cleanupStateFolder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Move this into the goroutine. I don't think we need to block on the clean up.

jefchien
jefchien previously approved these changes Apr 3, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label Apr 11, 2024
go func() {
t.cleanupStateFolder()
Copy link
Contributor

@adam-mateen adam-mateen Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test or integ test which creates some files monitored by the agent, write 1 MiB of logs to them, verify the state files point to the EOF, then kill the agent, then write more data, then start the agent back up and make sure they report logs events from the 1 MiB point onwards? (i.e. state file is used correectly)

Maybe we already have a test like this, if so great.

My concern is that you are deleting the state files before the agent has a chance to start up and read them. (I think Zhihong has the same concern).

@lisguo lisguo merged commit df56c0e into aws:main Apr 11, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants