Thanos Query: gaps in deduplicated data #7656

ppietka-bp · 2024-08-21T12:14:46Z

Thanos, Prometheus and Golang version used:
thanos, version 0.35.1 (branch: HEAD, revision: 086a698)
build user: root@be0f036fd8fa
build date: 20240528-13:54:20
go version: go1.21.10
platform: linux/amd64
tags: netgo
prometheus, version 2.32.1 (branch: HEAD, revision: 41f1a8125e664985dd30674e5bdf6b683eff5d32)
build user: root@54b6dbd48b97
build date: 20211217-22:08:06
go version: go1.17.5
platform: linux/amd64

Object Storage Provider:
Ceph

What happened:
Thanos Query: gaps in deduplicated data

What you expected to happen:
Two instances of prometheus scrap data from sources and another federated prometheuses on OpenShift.
As long as we search the data without deduplication, the data is continuous.

Anything else we need to know:
Scereens attached.

What you expected to happen:
Deduplication should properly combine datasets.

The text was updated successfully, but these errors were encountered:

ppietka-bp · 2024-08-27T08:00:44Z

Results of the investigation,
Deduplication only works one way. If we deduplicate metrics according to a label, e.g. replica, which takes values of 0 or 1, missing data with label replica=‘0’ is filled in by data with replica=‘1’ but data with label replica=‘1’ is not filled in by data with label replica=‘0’.

In our opinion, deduplication should work both ways and assemble the data according to label replica so as to show a continuous set of data in the metric.

MichaHoffmann · 2024-08-27T19:50:26Z

Deduplicating time series data is surprisingly hard! I have no great idea how to do it properly. The approach that Thanos takes during query time is roughly that we start with some replica and then if the gap gets too large we switch over. But this had numerous edge cases in the past. I wonder how we could improve it

ppietka-bp · 2024-08-29T11:54:50Z

Thanks for your reply. I look forward to solving this agonizing problem.

MichaHoffmann · 2024-08-29T18:48:33Z

Yeah I'm happy to brainstorm about this if you have an idea!

ppietka-bp · 2024-08-30T08:48:27Z

For the moment, I still have no idea where to start and what guidelines we should adopt. except, of course, one consistent data set after deduplication. I wonder if the algorithm for deduplication on Compactor via the “--deduplication.func=penalty” applied to Querier would not solve the problem. Of course, if that's not the cause.

MichaHoffmann · 2024-08-30T09:59:40Z

Penalty is the same algorithm that the querier uses though.

lachruzam · 2024-09-13T12:50:09Z

@MichaHoffmann Wouldn't putting a configurable upper bound on the penalty solve this issue (or at least allow fixing it by configuration)?

MichaHoffmann · 2024-09-13T15:23:47Z

@MichaHoffmann Wouldn't putting a configurable upper bound on the penalty solve this issue (or at least allow fixing it by configuration)?

In the sense that we always switch replica if the gap is at least this configured size?

lachruzam · 2024-10-09T11:18:56Z

Sorry, I missed the response.
We've run into a similar issue. By checking the penalty algorithm I didn't find any special treatment for the second replica.
However, the penalty for switching back to the first replica is 2x bigger. Additionally there are no upper bound for it.
And hence the question.
Having an upper bound would allow to:

block penalty to go over the lookback period
keep it in sync with scrape interval (in situation when scrape interval is known)
Of course, it isn't the most elegant solution but could improve the situation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanos Query: gaps in deduplicated data #7656

Thanos Query: gaps in deduplicated data #7656

ppietka-bp commented Aug 21, 2024

ppietka-bp commented Aug 27, 2024

MichaHoffmann commented Aug 27, 2024

ppietka-bp commented Aug 29, 2024

MichaHoffmann commented Aug 29, 2024

ppietka-bp commented Aug 30, 2024

MichaHoffmann commented Aug 30, 2024

lachruzam commented Sep 13, 2024

MichaHoffmann commented Sep 13, 2024

lachruzam commented Oct 9, 2024

Thanos Query: gaps in deduplicated data #7656

Thanos Query: gaps in deduplicated data #7656

Comments

ppietka-bp commented Aug 21, 2024

ppietka-bp commented Aug 27, 2024

MichaHoffmann commented Aug 27, 2024

ppietka-bp commented Aug 29, 2024

MichaHoffmann commented Aug 29, 2024

ppietka-bp commented Aug 30, 2024

MichaHoffmann commented Aug 30, 2024

lachruzam commented Sep 13, 2024

MichaHoffmann commented Sep 13, 2024

lachruzam commented Oct 9, 2024