If there are more than 50 LogStreams per LogGroup the processing never finishes and keeps iterating over the last batch of streams #20

Quarky9 · 2017-11-28T07:22:40Z

No description provided.

chicofranchico · 2017-12-05T18:43:56Z

I think this is probably due to my fix that limited the number of streams per group, with the new parameter max_log_streams_per_group, but it should actually cope with that whenever there's a new token to get the next "50". After it's finished it should indeed never start from the beginning but rather only get new stuff.

There should be a way to avoid that but I'd say it's harmless.

Quarky9 · 2017-12-05T18:49:14Z

Hi, first of all I see the last bast of streams being ingested over and over again the the loop never finishes therefore. This also leads to the state never being written to disk as well unfortunately btw. Reason is for that bug is line https://github.com/sampointer/fluent-plugin-cloudwatch-ingest/blob/master/lib/fluent/plugin/in_cloudwatch_ingest.rb#L233 I added a '|| response.next_token.nil?' to fix that as this marks the last batch of streams. But even after that I still see entries being duplicated and ingested multiple times and the loop never finishes ... But i didnt have the time to debug any further ... also I am Python guy rather than ruby so it takes me a while ;-) Cheers

-- Stefan Reimer <https://startux.de>

On 2017-12-05 10:43, Francisco de Freitas wrote: I think this is probably due to my fix that limited the number of streams per group, with the new parameter max_log_streams_per_group, but it should actually cope with that whenever there's a new token to get the next "50". What are you seeing? -- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [1], or mute the thread [2].

Links: ------ [1] #20 (comment) [2] https://github.com/notifications/unsubscribe-auth/AHoUmzrMwCB-w_A_j-sgGWv4eLtT4Sveks5s9Y7sgaJpZM4Qsy2s

chicofranchico · 2017-12-05T19:02:20Z

I'm also not a ruby coder myself, did that most out of necessity 😄

The idea was that, if log_streams_next_token is nil so the next call to describe_log_streams with an empty token would actually start from the beginning and, since it's on the state.store, it wouldn't duplicate it but I find it strange that it's getting only the last chunk over and over again.

I'll actually have to check on that and see if I also get it.

akiraimafuji · 2017-12-08T08:19:38Z

Hi, I made a pull request, ... I found this issue right now
#22

sampointer · 2018-01-12T16:56:48Z

Good spot. Merged and published as 1.7.0.rc2. - let me know how you get on!

chicofranchico · 2018-01-15T10:32:42Z

Thanks for this! Going to check it on my end. Cheers

wimnat · 2018-01-24T07:01:13Z

This is still a bug for me even with 1.7.0.rc2 - the state file is never written to

sampointer · 2018-01-28T11:09:52Z

1.7.0.rc3 has been pushed. Could you give that a try?

wimnat · 2018-01-29T01:20:40Z

@sampointer - it seems 1.7.0.rc3 missing from repo...

ERROR: Could not find a valid gem 'fluent-plugin-cloudwatch-ingest' (= 1.7.0.rc3) in any repository

Maybe it's a mirror sync issue as i notice it's not been long since you posted. I'll try again after 24 hours.

sampointer · 2018-01-29T10:17:03Z

For some reason rubygems thinks the two tags I've tried have already been pushed, despite them not being present.

sampointer · 2018-02-01T18:13:11Z

I've fixed the CI issues. 1.7.0.rc4 is up on rubygems.org.

wimnat · 2018-02-06T01:28:44Z

@sampointer thanks sam. I was able to pull down rc4. Unfortunately, my state file is still not written to. If you need me to do anything - debug logs etc let me know.

sampointer · 2018-03-30T03:51:56Z

I'm afraid I no longer have an active development nor production environment in which to develop and test this plugin.

You may have some luck posting your configuration and logging here for others to view.

leandrol · 2018-07-24T21:43:35Z

Hello, I believe I may have found a fix to this issue. Turns out when you retrieve the log events from CloudWatch and you reach the end of the stream, the next token stays the same and there weren't any checks to see if the next token is the same as the current token. So it was a pretty simple fix.

The plugin finally writes to the state file and I haven't seen any duplicated logs ever since. Check out the pull request: #27

sampointer · 2018-07-25T13:53:56Z

Happy to merge this, although I have no ability to test this in a live infrastructure. Pushed 1.7.0.rc7. If you could post test results with that version I'd be happy to push 1.7.0 shortly thereafter

leandrol · 2018-07-25T18:30:35Z

I ran the tests with rake spec but I didn't see any additional tests other then checking version number and checking if false == false.

I have had it running on a live infrastructure and monitoring for any duplicate logs being sent to Elasticsearch. So far no duplicates have been found, and I know for sure that I have log groups with more than 50 log streams and a ton of log events in some of those streams. I have also checked to see if I'm still getting the latest log events and looks like I am. Lastly, it's able to store the state of each log stream to the state file.

My fluentd config:

<source>
  @type forward
</source>

<match fluent.**>
  @type null
</match>

<source>
  @type cloudwatch_ingest
  @log_level error
  tag cloudwatch
  aws_logging_enabled true
  log_group_name_prefix /aws/lambda
  log_stream_name_prefix 20
  state_file_name /var/lib/fluent/cloudwatch.in.state
  region "#{ENV['AWS_REGION']}"
  interval 60
  limit_events 10000
  get_log_events_interval 0.1
  api_interval 30.0
  error_interval 10.0
  <parse>
    @type cloudwatch_ingest
    expression ^(?<message>.+)$
    time_format %Y-%m-%d %H:%M:%S.%L
    event_time true
    inject_group_name true
    inject_stream_name true
    parse_json_body false
    fail_on_unparsable_json false
    telemetry false
    statsd_endpoint localhost
  </parse>
</source>

<match *.**>
  @type elasticsearch
  @log_level error
  logstash_format true
  logstash_prefix "#{ENV['LOGSTASH_PREFIX']}"
  include_tag_key true
  host "#{ENV['AWS_ELASTICSEARCH_URL']}"
  port 443
  scheme https

  buffer_chunk_limit 10M
  buffer_queue_limit 50
  flush_interval 1s
  max_retry_wait 30
  disable_retry_limit
  num_threads 3

  resurrect_after 5s
  reload_connections false
</match>

sampointer · 2018-07-25T21:25:39Z

I've pushed 1.7.0 proper. Hopefully this closes this issue. Please confirm before I do so. @Quarky9 please could you confirm that this fixes your original issue, and close if appropriate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If there are more than 50 LogStreams per LogGroup the processing never finishes and keeps iterating over the last batch of streams #20

If there are more than 50 LogStreams per LogGroup the processing never finishes and keeps iterating over the last batch of streams #20

Quarky9 commented Nov 28, 2017

chicofranchico commented Dec 5, 2017 •

edited

Loading

Quarky9 commented Dec 5, 2017 via email

chicofranchico commented Dec 5, 2017

akiraimafuji commented Dec 8, 2017

sampointer commented Jan 12, 2018

chicofranchico commented Jan 15, 2018

wimnat commented Jan 24, 2018

sampointer commented Jan 28, 2018

wimnat commented Jan 29, 2018 •

edited

Loading

sampointer commented Jan 29, 2018

sampointer commented Feb 1, 2018

wimnat commented Feb 6, 2018

sampointer commented Mar 30, 2018

leandrol commented Jul 24, 2018

sampointer commented Jul 25, 2018 •

edited

Loading

leandrol commented Jul 25, 2018

sampointer commented Jul 25, 2018 •

edited

Loading

If there are more than 50 LogStreams per LogGroup the processing never finishes and keeps iterating over the last batch of streams #20

If there are more than 50 LogStreams per LogGroup the processing never finishes and keeps iterating over the last batch of streams #20

Comments

Quarky9 commented Nov 28, 2017

chicofranchico commented Dec 5, 2017 • edited Loading

Quarky9 commented Dec 5, 2017 via email

chicofranchico commented Dec 5, 2017

akiraimafuji commented Dec 8, 2017

sampointer commented Jan 12, 2018

chicofranchico commented Jan 15, 2018

wimnat commented Jan 24, 2018

sampointer commented Jan 28, 2018

wimnat commented Jan 29, 2018 • edited Loading

sampointer commented Jan 29, 2018

sampointer commented Feb 1, 2018

wimnat commented Feb 6, 2018

sampointer commented Mar 30, 2018

leandrol commented Jul 24, 2018

sampointer commented Jul 25, 2018 • edited Loading

leandrol commented Jul 25, 2018

sampointer commented Jul 25, 2018 • edited Loading

chicofranchico commented Dec 5, 2017 •

edited

Loading

wimnat commented Jan 29, 2018 •

edited

Loading

sampointer commented Jul 25, 2018 •

edited

Loading

sampointer commented Jul 25, 2018 •

edited

Loading