Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New glob matcher report invalid values. #90

Closed
kien-truong opened this issue May 9, 2019 · 5 comments · Fixed by #94
Closed

New glob matcher report invalid values. #90

kien-truong opened this issue May 9, 2019 · 5 comments · Fixed by #94
Assignees
Labels

Comments

@kien-truong
Copy link
Contributor

We're testing both 0.5.0 and 0.4.2 and getting errors with the new glob matcher.

With 0.5.0, graphite exporter error out with duplicated metrics error

collected metric "flink_assigned_partitions" { label:<name:"job" value:"ebs-dumper" > label:<name:"operator" value:"Source:-kafka:-ebs-umts" > label:<name:"subtask" value:"0" > label:<name:"task" value:"Source:-kafka:-ebs-umts-->-Process-ebs-umts-events" > gauge:<value:487941.72 > } was collected before with the same name and label values

The weird part is we don't know where the value 487941.72 comes from. We never sent that value and it didn't show up in the debug log either.

With version 0.4.2, the graphite_exporter doesn't error, but reports both valid and invalid values, which screws up the monitoring process.

flink_assigned_partitions{job="ebs-dumper",operator="Source:-kafka:-ebs-lte",subtask="2",task="Source:-kafka:-ebs-lte-->-Process-ebs-lte-events"} 5
flink_assigned_partitions{job="ebs-dumper",operator="Source:-kafka:-ebs-lte",subtask="2",task="Source:-kafka:-ebs-lte-->-Process-ebs-lte-events"} 1.80102799e+06

Reverting back to 0.3.0 or using the regex matcher will avoid this issue.

This might relate to issues #79

@matthiasr
Copy link
Contributor

matthiasr commented May 9, 2019 via email

@kien-truong
Copy link
Contributor Author

Apology for the delay. The bug is a bit hard to reproduce reliably. You might have to run issue90_test.go a couple of time to trigger it.

https://github.com/kien-truong/graphite_exporter/tree/bug-90/e2e

@PhilsLab
Copy link

+1, Facing the same issue with a big dataset.

@matthiasr
Copy link
Contributor

ha, thank you, the repro case is really useful. One thing I observe is that you are sending each line individually over UDP. I'm going to try and put this into the test proper, but in some manual testing I found a data race when pushing metrics in parallel that doesn't happen when I dump them all in over one TCP connection:

==================
WARNING: DATA RACE
Write at 0x00c000190100 by goroutine 40:
  github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper.(*MetricMapper).GetMapping()
      /Users/mr/src/github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper/mapper.go:222 +0x9a9
  main.(*graphiteCollector).processLine()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:118 +0x1d8
  main.(*graphiteCollector).processReader()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:104 +0x103

Previous write at 0x00c000190100 by goroutine 39:
  github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper.(*MetricMapper).GetMapping()
      /Users/mr/src/github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper/mapper.go:222 +0x9a9
  main.(*graphiteCollector).processLine()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:118 +0x1d8
  main.(*graphiteCollector).processReader()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:104 +0x103

Goroutine 40 (running) created at:
  main.main.func2()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:293 +0x1db

Goroutine 39 (finished) created at:
  main.main.func2()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:293 +0x1db
==================

I think we're missing some serialization around accesses to mapper.GetMapping; in the statsd exporter this was already the case.

@matthiasr matthiasr added the bug label May 31, 2019
matthiasr pushed a commit that referenced this issue May 31, 2019
matthiasr pushed a commit that referenced this issue May 31, 2019
and only print the body in that case

Signed-off-by: Matthias Rampke <[email protected]>
matthiasr pushed a commit that referenced this issue May 31, 2019
the same error happens with only one mapping

Signed-off-by: Matthias Rampke <[email protected]>
matthiasr pushed a commit that referenced this issue May 31, 2019
and keep them in byte slices instead of converting to strings and back

Signed-off-by: Matthias Rampke <[email protected]>
matthiasr pushed a commit that referenced this issue May 31, 2019
Fixes #90.

`mapper.GetMapping` is not safe for concurrent access:

```
==================
WARNING: DATA RACE
Write at 0x00c000190100 by goroutine 40:
  github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper.(*MetricMapper).GetMapping()
      /Users/mr/src/github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper/mapper.go:222 +0x9a9
  main.(*graphiteCollector).processLine()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:118 +0x1d8
  main.(*graphiteCollector).processReader()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:104 +0x103

Previous write at 0x00c000190100 by goroutine 39:
  github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper.(*MetricMapper).GetMapping()
      /Users/mr/src/github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper/mapper.go:222 +0x9a9
  main.(*graphiteCollector).processLine()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:118 +0x1d8
  main.(*graphiteCollector).processReader()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:104 +0x103

Goroutine 40 (running) created at:
  main.main.func2()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:293 +0x1db

Goroutine 39 (finished) created at:
  main.main.func2()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:293 +0x1db
==================
```

The statsd exporter does all processing in a single goroutine, so the
graphite exporter should too, at least until/unless we can make the
mapper reentrant.

Signed-off-by: Matthias Rampke <[email protected]>
@matthiasr
Copy link
Contributor

I have a potential fix in #94.

@matthiasr matthiasr self-assigned this May 31, 2019
matthiasr pushed a commit that referenced this issue May 31, 2019
matthiasr pushed a commit that referenced this issue May 31, 2019
and only print the body in that case

Signed-off-by: Matthias Rampke <[email protected]>
matthiasr pushed a commit that referenced this issue May 31, 2019
the same error happens with only one mapping

Signed-off-by: Matthias Rampke <[email protected]>
matthiasr pushed a commit that referenced this issue May 31, 2019
and keep them in byte slices instead of converting to strings and back

Signed-off-by: Matthias Rampke <[email protected]>
matthiasr pushed a commit that referenced this issue May 31, 2019
Fixes #90.

`mapper.GetMapping` is not safe for concurrent access:

```
==================
WARNING: DATA RACE
Write at 0x00c000190100 by goroutine 40:
  github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper.(*MetricMapper).GetMapping()
      /Users/mr/src/github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper/mapper.go:222 +0x9a9
  main.(*graphiteCollector).processLine()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:118 +0x1d8
  main.(*graphiteCollector).processReader()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:104 +0x103

Previous write at 0x00c000190100 by goroutine 39:
  github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper.(*MetricMapper).GetMapping()
      /Users/mr/src/github.com/prometheus/graphite_exporter/vendor/github.com/prometheus/statsd_exporter/pkg/mapper/mapper.go:222 +0x9a9
  main.(*graphiteCollector).processLine()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:118 +0x1d8
  main.(*graphiteCollector).processReader()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:104 +0x103

Goroutine 40 (running) created at:
  main.main.func2()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:293 +0x1db

Goroutine 39 (finished) created at:
  main.main.func2()
      /Users/mr/src/github.com/prometheus/graphite_exporter/main.go:293 +0x1db
==================
```

The statsd exporter does all processing in a single goroutine, so the
graphite exporter should too, at least until/unless we can make the
mapper reentrant.

Signed-off-by: Matthias Rampke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants