[azuremonitorreceiver] Add the ability to pull Resource Metrics using AZ Query Batch API #29593

cparkins · 2023-11-30T20:04:17Z

Add the ability to pull Resource Metrics using the AZ Query Batch API. Currently a Proof-of-Concept that does not allow Resources to exist in different regions and does not filter based on dimensions similar to how it's being handled in the previous implementation (by Resource).

Description:

This is an enhancement to try and avoid limitations when querying resource directly by using the AZ Query Batch API that was recently released for the GO SDK.

Link to tracking Issue:

Testing:
Tested locally against a Resource Group in an single region. Some issues exist that still need to be investigated.

Documentation:
This will be updated once the Draft is discussed.

… the Batch API. Currently a Proof-of-Concept that does not allow Resources to exist in different regions and does not filter based on dimensions similar to how it's being handled in the previous implementation (by Resource).

cparkins · 2023-11-30T20:08:19Z

receiver/azuremonitorreceiver/config.go

@@ -26,6 +26,7 @@ var (
 	errMissingClientSecret   = errors.New(`ClientSecret" is not specified in config`)
 	errMissingFedTokenFile   = errors.New(`FederatedTokenFile is not specified in config`)
 	errInvalidCloud          = errors.New(`Cloud" is invalid`)
+	errInvalidRegion         = errors.New("`Region` is not specifiec in config`")


Oops! typo.

github-actions · 2023-12-15T05:19:48Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

ahurtaud · 2023-12-21T10:13:01Z

I have pulled the code and compiled a version, tested it on one subscription and it's a really good start!

As in first comment, we are seeing few errors but I am more concerned about this one:
"error": "POST https://northeurope.metrics.monitor.azure.com/subscriptions/56ffb7d8-dd02-4217-9f63-11093b6176ba/metrics:getBatch\n--------------------------------------------------------------------------------\nRESPONSE 400: 400 Bad Request\nERROR CODE: BadRequest\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"additionalInfo\": [\n {\n \"type\": \"string\",\n \"info\": \"TraceId=ca886d1c7e36172863695ed22c30acf2\"\n },\n {\n \"type\": \"string\",\n \"info\": \"ExceptionType=Microsoft.Online.Metrics.MetricsMP.Utilities.RPRequestFormatException\"\n }\n ],\n \"code\": \"BadRequest\",\n \"message\": \"Number of requests in the batch 301 exceeded max allowed: 50\"\n }\n}

I think this correspond to microsoft to:
Up to 50 unique resource IDs may be specified in each call. Each resource must belong to the same subscription, region, and be of the same resource type. source

And also, I would like to start a discussion (maybe for a second PR / feature request) (ping @altuner @codeboten):
At Amadeus, we have tons of subscriptions (like thousands), would it be possible to have auto discovery of them as well?

there is a GET subscription available at https://management.azure.com/subscriptions?api-version=2020-01-01
and then the list resourcegroups also give the location (for the new getBatch API endpoint):
e.g: https://management.azure.com/subscriptions/14b86a40-8d8f-4e69-abaf-42cbb0b8a331/resourcegroups?api-version=2019-10-01

WDYT? (please tell me if you want me to raise a new issue for this feature request, this getBatch will anyway be a requirement first)
Thank you,
Alban

cparkins · 2023-12-21T19:41:07Z

@ahurtaud - I ran into a few errors. But not that one specifically. I didn't realize that there was a limit to be honest. The documentation was kind of hard to understand. I had some concerns about the number of Dimensions that I was pulling so I did some testing there and it would be really nice to be able to determine the number of dimensions that a Metric can have instead of having to hard code the values. I think it would be pretty easy to break the requests up into batches of 50, I guess I didn't read all of the documentation :-D

nslaughter · 2023-12-28T18:05:09Z

Want to ping those who worked on this previously as I also think the component would benefit from switching to use the azquery interfaces in the Azure SDK. I think that provides a lot of leverage to the purposes of this code. However, when experimenting with the changes and finding it was difficult to test all paths and similarly difficult to make the result of changes clear to reviewers... I thought we could benefit from all of the following.

Separate fetching responsibility, so the mechanism of each type of fetch (i.e. resources, metric definitions, metric values) could have it's fetch specified in a simplified strategy.
Separate caching/providing responsibility, so reviewers so it's simple to verify when state (i.e. azureScraper.resources is updated).
Potentially make changes to the update trigger for each state, so a pull through flow model wouldn't be our only option. I think this would provide some optionality for controlling consistency of reads if this is a concern later.
With testing and verifying more paths in a straightforward way, start introducing changes to use higher-level APIs like azquery.
It looks like this would simplify scraping logs with azquery as well, though I haven't investigate it extensively.

In the long run these change could help with scalability, though it's unknown when/if that would be helpful. And my purpose for proposing the changes now is to simplify testing on more paths.

Before I opened an issue for changes around this I wanted to get input from those who have looked at this receiver and might have informed opinions to share.

Please share any thoughts you have on this: @cparkins @ahurtaud

nslaughter · 2023-12-28T18:59:16Z

I didn't realize that there was a limit to be honest. The documentation was kind of hard to understand.

I agree. Maybe it's just easy to confuse me when there are multiple numbers involved. Most things in the SDK are straightforward as you get familiar with their patterns and the Azure object model, but I believe some calls are limited to 20, I recall reading that one was limited to 10, and @hurtaud has errors that say limit of 50 on something else. Currently the default config in factory.go says maximum number of metrics in a call is 20 though it's configurable. The scraper uses that field in this ad hoc batching mechanism. That's what the start and end variables are for, but I find it difficult to understand much less verify.

I had some concerns about the number of Dimensions that I was pulling so I did some testing there and it would be really nice to be able to determine the number of dimensions that a Metric can have instead of having to hard code the values.

The current mechanism that deals with dimensions is the metricsCompositeKey where the dimensions field is a string built in this stanza from the metric definitions we get from Azure. Dimensions are sorted before joining to a comma delimited string, so that calls can be batched by requesting metric values in batches. (Tell me if the rationale doesn't make sense.)

I think it would be pretty easy to break the requests up into batches of 50.

Configurable batch sizes would definitely help. We could depend on a batcher package like this for example, make/copy our own like it, or use methods that will work almost as well if controlled in the context. The nice thing about the fancy one is that we don't have to manage keying on the metrics composite key, so you specify how to aggregate batches by options and you can fire calls as soon as you have a full batch or a configured batch timeframe expires.

Maybe this provides some useful information on the code's current state. Hopefully, it isn't a complete waist of time when the PoC version is considered. I plan to have a closer look at that code in a couple of hours.

nslaughter · 2023-12-28T23:05:15Z

I like where the PR is headed. I'm planning to open an issue as something like migrate to azquery package in SDK and then can play with using the QueryResource path or the Batch path. I plan to cite this PR as some of the PoC work in the issue. Happy to see the transition to azquery completed in this PR or open something different. I think both of these paths could ease the burden as far as a layers of calls and cache.

cparkins · 2024-01-02T22:02:35Z

a batcher package like this for example

@nslaughter I like the thoughts you have on how to improve the receiver in general. While writing this code I thought some time should be taken to consider different ways to pull the various bits of data that are being used when scraping. Unfortunately I didn't take much time to look at other ways that we could pull the data that might be more efficient. I really like the idea of using the batcher though. And I think that eventually this code will run into issues with getting consistent metrics because of the way it's written (multiple timers triggering at multiple intervals). I'm open to ideas based on this Draft PR and look forward to seeing what you think once you get more in depth.

nslaughter · 2024-01-11T23:59:28Z

I had some concerns about the number of Dimensions that I was pulling so I did some testing there and it would be really nice to be able to determine the number of dimensions that a Metric can have instead of having to hard code the values.

The docs say that custom metrics can have up to 10 dimensions. The metric definitions list the preset dimensions for predefined ones. Does this help or you getting at something else?

cparkins · 2024-01-12T17:50:32Z

I had some concerns about the number of Dimensions that I was pulling so I did some testing there and it would be really nice to be able to determine the number of dimensions that a Metric can have instead of having to hard code the values.

The docs say that custom metrics can have up to 10 dimensions. The metric definitions list the preset dimensions for predefined ones. Does this help or you getting at something else?

That does help actually. I was trying to determine the number of standard dimensions for a Storage Account and got to 30.... I probably need to look at the REST responses a little more closely to get this one solved better.

github-actions · 2024-01-27T05:18:33Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

…timegrain)

…aining metrics values

…-batch-api azure monitor receiver - getBatch few enhancement

…eceiver/use-batch-api

…calculating the nearest minute and then add a check to ensure the last update was not within the time grain. In addition add more logging around erorrs received from the Batch API by casting to the Azure Response Error type and logging the full error.

github-actions · 2024-02-27T05:19:22Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2024-03-12T05:19:36Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

tesharp · 2024-11-20T08:01:24Z

@cparkins Any reason this was not continued? With the current throttling limits its can be a hassle to get all the wanted metrics from Azure

celian-garcia · 2024-11-20T09:09:10Z

Hello @tesharp. I took back the topic from Amadeus side. The plan is to separate in several PRs for readability.

A boolean to disable the split by dimensions [receiver/azuremonitorreceiver] feat: Allow to not split result by dimension #36240
A subscription discovery (get all sub in tenant) + ability to configure several subscription id (PR incoming)
Use the famous getbatch API with boolean to optin/out (PR incoming)

github-actions bot added cmd/configschema configschema command cmd/otelcontribcol otelcontribcol command receiver/azuremonitor labels Nov 30, 2023

cparkins commented Nov 30, 2023

View reviewed changes

Fix errors that were not ocurring during testing

f90d828

github-actions bot added the Stale label Dec 15, 2023

github-actions bot removed the Stale label Dec 22, 2023

Merge branch 'main' into feature/azuremonitorreceiver/use-batch-api

e3ae795

azure monitor receiver - getBatch few enhancement

698548d

github-actions bot added the Stale label Jan 27, 2024

ahurtaud added 9 commits January 29, 2024 11:12

Implemented automatic discovery of subscriptions

1f7baef

Fix concurrent multi goroutines

5b6aa84

Add timegrain as attrivutes

681d03e

retrieve twice timerange only for PT1M and always query (do not wait …

9b70a3e

…timegrain)

FIX timegrain PT1M get multiple time

d1d636a

Fix limits of 50 resources per getBatch API calls

5fc0096

Retreive 4 timeranges for metrics and keep only latest timestamp cont…

8670727

…aining metrics values

Remove misleading warning log

83b673e

fix for loop iterations for resourceIds

6af67a0

ahurtaud and others added 4 commits February 1, 2024 15:52

fix for loop iterations for resourceIds

7c3ca41

fix - do not return when getBatch returns an error

6a9091f

Merge pull request #37 from ahurtaud/feature/azuremonitorreceiver/use…

e94ac06

…-batch-api azure monitor receiver - getBatch few enhancement

Merge remote-tracking branch 'source/main' into feature/azuremonitorr…

3edfc45

…eceiver/use-batch-api

github-actions bot requested review from altuner, codeboten, dmitryax and mx-psi February 9, 2024 23:51

github-actions bot removed the Stale label Feb 10, 2024

github-actions bot added the Stale label Feb 27, 2024

github-actions bot closed this Mar 12, 2024

This was referenced Nov 25, 2024

[receiver/azuremonitorreceiver] feat: multi subscriptions support and automatic discovery #36467

Open

[receiver/azuremonitorreceiver] feat: multi subscriptions support and automatic discovery #36612

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[azuremonitorreceiver] Add the ability to pull Resource Metrics using AZ Query Batch API #29593

[azuremonitorreceiver] Add the ability to pull Resource Metrics using AZ Query Batch API #29593

cparkins commented Nov 30, 2023

cparkins Nov 30, 2023

github-actions bot commented Dec 15, 2023

ahurtaud commented Dec 21, 2023 •

edited

Loading

cparkins commented Dec 21, 2023

nslaughter commented Dec 28, 2023 •

edited

Loading

nslaughter commented Dec 28, 2023

nslaughter commented Dec 28, 2023

cparkins commented Jan 2, 2024

nslaughter commented Jan 11, 2024

cparkins commented Jan 12, 2024

github-actions bot commented Jan 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Mar 12, 2024

tesharp commented Nov 20, 2024

celian-garcia commented Nov 20, 2024 •

edited

Loading

[azuremonitorreceiver] Add the ability to pull Resource Metrics using AZ Query Batch API #29593

[azuremonitorreceiver] Add the ability to pull Resource Metrics using AZ Query Batch API #29593

Conversation

cparkins commented Nov 30, 2023

cparkins Nov 30, 2023

Choose a reason for hiding this comment

github-actions bot commented Dec 15, 2023

ahurtaud commented Dec 21, 2023 • edited Loading

cparkins commented Dec 21, 2023

nslaughter commented Dec 28, 2023 • edited Loading

nslaughter commented Dec 28, 2023

nslaughter commented Dec 28, 2023

cparkins commented Jan 2, 2024

nslaughter commented Jan 11, 2024

cparkins commented Jan 12, 2024

github-actions bot commented Jan 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Mar 12, 2024

tesharp commented Nov 20, 2024

celian-garcia commented Nov 20, 2024 • edited Loading

ahurtaud commented Dec 21, 2023 •

edited

Loading

nslaughter commented Dec 28, 2023 •

edited

Loading

celian-garcia commented Nov 20, 2024 •

edited

Loading