Skip to content
This repository has been archived by the owner on Dec 30, 2020. It is now read-only.

If there are more than 50 LogStreams per LogGroup the processing never finishes and keeps iterating over the last batch of streams #20

Open
Quarky9 opened this issue Nov 28, 2017 · 17 comments

Comments

@Quarky9
Copy link

Quarky9 commented Nov 28, 2017

No description provided.

@chicofranchico
Copy link
Contributor

chicofranchico commented Dec 5, 2017

I think this is probably due to my fix that limited the number of streams per group, with the new parameter max_log_streams_per_group, but it should actually cope with that whenever there's a new token to get the next "50". After it's finished it should indeed never start from the beginning but rather only get new stuff.

There should be a way to avoid that but I'd say it's harmless.

@Quarky9
Copy link
Author

Quarky9 commented Dec 5, 2017 via email

@chicofranchico
Copy link
Contributor

I'm also not a ruby coder myself, did that most out of necessity 😄

The idea was that, if log_streams_next_token is nil so the next call to describe_log_streams with an empty token would actually start from the beginning and, since it's on the state.store, it wouldn't duplicate it but I find it strange that it's getting only the last chunk over and over again.

I'll actually have to check on that and see if I also get it.

@akiraimafuji
Copy link

Hi, I made a pull request, ... I found this issue right now
#22

@sampointer
Copy link
Owner

Good spot. Merged and published as 1.7.0.rc2. - let me know how you get on!

@chicofranchico
Copy link
Contributor

Thanks for this! Going to check it on my end. Cheers

@wimnat
Copy link

wimnat commented Jan 24, 2018

This is still a bug for me even with 1.7.0.rc2 - the state file is never written to

@sampointer
Copy link
Owner

1.7.0.rc3 has been pushed. Could you give that a try?

@wimnat
Copy link

wimnat commented Jan 29, 2018

@sampointer - it seems 1.7.0.rc3 missing from repo...

ERROR: Could not find a valid gem 'fluent-plugin-cloudwatch-ingest' (= 1.7.0.rc3) in any repository

Maybe it's a mirror sync issue as i notice it's not been long since you posted. I'll try again after 24 hours.

@sampointer
Copy link
Owner

For some reason rubygems thinks the two tags I've tried have already been pushed, despite them not being present.

@sampointer
Copy link
Owner

I've fixed the CI issues. 1.7.0.rc4 is up on rubygems.org.

@wimnat
Copy link

wimnat commented Feb 6, 2018

@sampointer thanks sam. I was able to pull down rc4. Unfortunately, my state file is still not written to. If you need me to do anything - debug logs etc let me know.

@sampointer
Copy link
Owner

I'm afraid I no longer have an active development nor production environment in which to develop and test this plugin.

You may have some luck posting your configuration and logging here for others to view.

@leandrol
Copy link

Hello, I believe I may have found a fix to this issue. Turns out when you retrieve the log events from CloudWatch and you reach the end of the stream, the next token stays the same and there weren't any checks to see if the next token is the same as the current token. So it was a pretty simple fix.

The plugin finally writes to the state file and I haven't seen any duplicated logs ever since. Check out the pull request: #27

@sampointer
Copy link
Owner

sampointer commented Jul 25, 2018

Happy to merge this, although I have no ability to test this in a live infrastructure. Pushed 1.7.0.rc7. If you could post test results with that version I'd be happy to push 1.7.0 shortly thereafter

@leandrol
Copy link

I ran the tests with rake spec but I didn't see any additional tests other then checking version number and checking if false == false.

I have had it running on a live infrastructure and monitoring for any duplicate logs being sent to Elasticsearch. So far no duplicates have been found, and I know for sure that I have log groups with more than 50 log streams and a ton of log events in some of those streams. I have also checked to see if I'm still getting the latest log events and looks like I am. Lastly, it's able to store the state of each log stream to the state file.

My fluentd config:

<source>
  @type forward
</source>

<match fluent.**>
  @type null
</match>

<source>
  @type cloudwatch_ingest
  @log_level error
  tag cloudwatch
  aws_logging_enabled true
  log_group_name_prefix /aws/lambda
  log_stream_name_prefix 20
  state_file_name /var/lib/fluent/cloudwatch.in.state
  region "#{ENV['AWS_REGION']}"
  interval 60
  limit_events 10000
  get_log_events_interval 0.1
  api_interval 30.0
  error_interval 10.0
  <parse>
    @type cloudwatch_ingest
    expression ^(?<message>.+)$
    time_format %Y-%m-%d %H:%M:%S.%L
    event_time true
    inject_group_name true
    inject_stream_name true
    parse_json_body false
    fail_on_unparsable_json false
    telemetry false
    statsd_endpoint localhost
  </parse>
</source>

<match *.**>
  @type elasticsearch
  @log_level error
  logstash_format true
  logstash_prefix "#{ENV['LOGSTASH_PREFIX']}"
  include_tag_key true
  host "#{ENV['AWS_ELASTICSEARCH_URL']}"
  port 443
  scheme https

  buffer_chunk_limit 10M
  buffer_queue_limit 50
  flush_interval 1s
  max_retry_wait 30
  disable_retry_limit
  num_threads 3

  resurrect_after 5s
  reload_connections false
</match>

@sampointer
Copy link
Owner

sampointer commented Jul 25, 2018

I've pushed 1.7.0 proper. Hopefully this closes this issue. Please confirm before I do so. @Quarky9 please could you confirm that this fixes your original issue, and close if appropriate.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants