[processor]: aggregate_on_attributes function in transform processor not working as expected in conjunction with keep_matching_keys #36517

Shindek77 · 2024-11-25T05:18:04Z

Component(s)

processor/transformprocessor

Describe the issue you're reporting

Hello,

Our Goal: we want to reduce the number metrics labels which are not required in our metric time series, as a output its should the number of time series but the new reduces metrics time series should values aggregated from the time series which was coming before drop of labels

Example:
Input Metrics Data

http_requests{region="us-east", service_name="order-service", method="GET", status="200"} 10
http_requests{region="us-east", service_name="order-service", method="GET", status="500"} 5
http_requests{region="us-east", service_name="billing-service", method="GET", status="200"} 8
http_requests{region="us-west", service_name="order-service", method="POST", status="200"} 7
http_requests{region="us-west", service_name="order-service", method="POST", status="500"} 3

**Goal:
Resource Attributes: region, service_name
Datapoint Attributes: method, status

Keep:
region (resource attribute)
method (datapoint attribute)

Drop:
service_name (resource attribute)
status (datapoint attribute)**

Final Results:

http_requests{region="us-east", method="GET"} 23
http_requests{region="us-west", method="POST"} 10

Solution: After analysis we got to know that we can use transform processor in conjuction with aggregate_on_attributes function…
as per docs aggregate_on_attributes function aggregates all datapoints in the metric based on the supplied attributes as well as removes all attributes that are present in datapoints except the ones that are specified in the attributes parameter.

But after testing we got know that it is working only on datapoint attributes but not on the resource attributes which are present in our metrics.

So then as per docs The aggregate_on_attributes function can also be used in conjunction with keep_matching_keys or delete_matching_keys.
Then we have tried with the same so that with keep_matching_keys we will keep only required resource attributes and drop others and with aggregate_on_attributes will give the list of datapoint attributes with we want to keep and it will aggregation also.

Configuration For Same:

data:
  relay: |
    exporters:
     	debug: 
          verbosity: detailed
        otlphttp/test-vm:
          compression: gzip
          encoding: proto
          endpoint: http:/victoria-metrics-cluster-vminsert.metrics-ns.svc.cluster.local:8480/insert/2/opentelemetry
          timeout: 30s
          tls:
            insecure: true
  processors:
    batch:
          timeout: 10s
     groupbyattrs:
     transform/TruncateTime:
          metric_statements:
            - context: datapoint
              statements:
                - set(time, TruncateTime(time, Duration("10s")))
     transform:
          metric_statements:
            - context: metric
              statements:
              - keep_matching_keys(resource.attributes, "^(region).*")
              - aggregate_on_attributes("sum", ["method"])
  pipelines:
        metrics/-test-label-reduction-transform:
          exporters:
          - otlphttp/test-vm
          processors:
          - batch
          - groupbyattrs
          - transform/TruncateTime
          - transform
          receivers:
          - otlp

But when keep_matching_keys(resource.attributes, "^(region).*")

This keeps only region from the resource attributes and removes service_name.
The intermediate result coming as

http_requests{region="us-east", method="GET", status="200"} (sometimes 10/8)
http_requests{region="us-east", method="GET", status="500"} 5
http_requests{region="us-west", method="POST", status="200"} (sometime 7/3)

Then aggregate_on_attributes("sum", ["method"]) works
which give final results as:

http_requests{region="us-east", method="GET"} (sometimes 10/8) + 5
http_requests{region="us-west", method="POST"} (sometime 7/3)

But as per docs it should give results as as both works at once:

http_requests{region="us-east", method="GET"} 23
http_requests{region="us-west", method="POST"} 10

So please help on this how we can get the results are we wanted....
We are getting it as expected if we use only aggregate_on_attributes but it works only on datapoint attributes as well as we have to us only one replica of Opentelemetry

So how use only aggregate_on_attributes also with many replicas of OTEL. And why its working like this??

The text was updated successfully, but these errors were encountered:

bacherfl · 2024-11-26T13:40:05Z

Hi @Shindek77 !

I just looked into this - If I understood this correctly, then the desired behavior might work with the following config, that first removes the service_name attribute and moves the region resource attribute to the datapoint attributes. After that, the region attribute can be used to regroup data points into one resource per region. Then, finally the aggregate_on_attributes function can be used to group the data points by the method attribute:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
processors:
  batch:
    timeout: 10s
  groupbyattrs:
    keys:
      - region
  transform/RemoveServiceName:
    metric_statements:
      - context: resource
        statements:
          - keep_matching_keys(attributes, "^(region).*")
  transform/MoveRegionToDataPoint:
    metric_statements:
      - context: datapoint
        statements:
          - set(attributes["region"], resource.attributes["region"])
  transform/TruncateTime:
    metric_statements:
      - context: datapoint
        statements:
          - set(time, TruncateTime(time, Duration("10s")))
  transform:
    metric_statements:
      - context: metric
        statements:
          - aggregate_on_attributes("sum", ["method"])


exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors:
        - filter
        - batch
        - transform/RemoveServiceName
        - transform/MoveRegionToDataPoint
        - groupbyattrs
        - transform/TruncateTime
        - transform
      exporters: [debug]

Hope this helps - if not, please let me know and I will continue to look into this

github-actions · 2024-11-26T13:40:46Z

Pinging code owners for processor/transform: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

Shindek77 · 2024-11-26T15:56:04Z

Hi @Shindek77 !

I just looked into this - If I understood this correctly, then the desired behavior might work with the following config, that first removes the service_name attribute and moves the region resource attribute to the datapoint attributes. After that, the region attribute can be used to regroup data points into one resource per region. Then, finally the aggregate_on_attributes function can be used to group the data points by the method attribute:
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
processors:
  batch:
    timeout: 10s
  groupbyattrs:
    keys:
      - region
  transform/RemoveServiceName:
    metric_statements:
      - context: resource
        statements:
          - keep_matching_keys(attributes, "^(region).*")
  transform/MoveRegionToDataPoint:
    metric_statements:
      - context: datapoint
        statements:
          - set(attributes["region"], resource.attributes["region"])
  transform/TruncateTime:
    metric_statements:
      - context: datapoint
        statements:
          - set(time, TruncateTime(time, Duration("10s")))
  transform:
    metric_statements:
      - context: metric
        statements:
          - aggregate_on_attributes("sum", ["method"])


exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors:
        - filter
        - batch
        - transform/RemoveServiceName
        - transform/MoveRegionToDataPoint
        - groupbyattrs
        - transform/TruncateTime
        - transform
      exporters: [debug]
Hope this helps - if not, please let me know and I will continue to look into this

Hello @bacherfl ,
Thanks for quick reply...I have tried with the same but the issue is that when it runs...the first processor i.e. transform/RemoveServiceName works that it removes the service_name resource attribute and that why the series 1 and 3 as well as 4 & 5 becomes same so that we get intermediate time series as follows:

http_requests{region="us-east", method="GET", status="200"} (sometimes 10/8)
http_requests{region="us-east", method="GET", status="500"} 5
http_requests{region="us-west", method="POST", status="200"} (sometime 7/3)

After that all remaining processor works .it means at the end aggregate_on_attributes on these above three time series so final ans comes as

http_requests{region="us-east", method="GET"} (sometimes 10/8) + 5
http_requests{region="us-west", method="POST"} (sometime 7/3)

Just Idea:
As you show we can moves the region resource attribute to the datapoint attributes using set function..So can we do some regrex way so that firstly we will move all resource attributes to datapoint attributes regardless of there names...and the use aggregate_on_attributes on datapoint attributes as we needed so that it will perform aggregation as per given list and remove other attributes.

as per above Example: moves all resource attributes to datapoint attributes first. i.e. both region and service_name and Then, finally the aggregate_on_attributes function can be used to group the data points by the method and region datapoint attributes.
so final result will be as expected:

http_requests{region="us-east", method="GET"} 23
http_requests{region="us-west", method="POST"} 10

bacherfl · 2024-11-27T07:29:12Z

Thanks for the feedback @Shindek77 - one possibility to move all resource attributes to the datapoints to make them available for the aggregate_on_attributes would be with the config below:

  transform/ResourceAttributesToDataPoint:
    metric_statements:
      - context: datapoint
        statements:
          - set(attributes["resource"], resource.attributes)
          - flatten(attributes)
  transform/TruncateTime:
    metric_statements:
      - context: datapoint
        statements:
          - set(time, TruncateTime(time, Duration("10s")))
  transform:
    metric_statements:
      - context: metric
        statements:
          - aggregate_on_attributes("sum", ["method", "resource.region"])

Note: the flatten function is used because the aggregate_on_attributes function does not seem to support access to nested properties.

Shindek77 · 2024-11-27T11:28:20Z

Hello @bacherfl , I tried with below approach

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        
processors:
  transform/ResourceAttributesToDataPoint:
      metric_statements:
        - context: datapoint
          statements:
            - set(attributes[""], resource.attributes)
            - flatten(attributes)
   transform/ResourceAttributesDeletion:
      metric_statements:
      - context: resource
        statements:
        - delete_matching_keys(attributes, "(?i).*")
  transform:
      metric_statements:
      - context: metric
        statements:
        - aggregate_on_attributes("sum", ["region", "method"])
        
exporters:
  debug:
    verbosity: detailed
    
services:
  pipelines:
      metrics:
        exporters:
        - debug
        processors:
        - filter
        - batch
        - transform/ResourceAttributesToDataPoint
        - transform/ResourceAttributesDeletion
        - groupbyattrs
        - transform/TruncateTime
        - transform
        receivers:
        - otlp

With this firstly it making all resource attributes as datapoints with same name region and service_name as I can see in otel logs...after that another processor needed to delete actual all resource attributes...but then while aggregation we are getting only two time series as expected but value is not aggregated properly..
Final Results:

http_requests{region="us-east", method="GET"} value: 10/5/8
http_requests{region="us-west", method="POST"} value: 7/3

Shindek77 · 2024-12-11T16:51:07Z

Hello @bacherfl ,

As you know we are getting lot of metrics data with many labels/attributes (resources/datapoint attributess) on our otel collector which is deployed as deployment on our k8s cluster.

We have tested it to reduce the number metrics labels which are not required in our metric time series, and the output time series should have only required labels and the aggregated value.

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317

processors:
batch:
timeout: 10s
groupbyattrs: null
transform/TruncateTime:
metric_statements:
- context: datapoint
statements:
- set(time, TruncateTime(time, Duration("10s")))
transform/RemoveOtherResourceAttributes:
metric_statements:
- context: resource
statements:
- delete_matching_keys(attributes, "^().*")
transform/ResourceAttributesToDataPoint:
metric_statements:
- context: datapoint
statements:
- set(attributes["resource"], resource.attributes)
- flatten(attributes)
transform:
metric_statements:
- context: metric
statements:
- aggregate_on_attributes("sum", ["l7_serviceName", "resource.service.name", "service.name", "resource.service_name", "service_name"])

exporters:
debug:
verbosity: detailed
otlphttp/test-vm-test-label-reduction:
compression: gzip
encoding: proto
endpoint: http://victoria-metrics-cluster-vminsert.metrics-ns.svc.cluster.local:8480/insert/8/opentelemetry
timeout: 30s
tls:
insecure: true
otlphttp/test-vm-test-without-label-reduction:
compression: gzip
encoding: proto
endpoint: http://victoria-metrics-cluster-vminsert.metrics-ns.svc.cluster.local:8480/insert/9/opentelemetry
timeout: 30s
tls:
insecure: true

services:
pipelines:
metrics/with-metrics-label-reduction:
exporters:
- otlphttp/test-vm-test-label-reduction
processors:
- batch
- transform/ResourceAttributesToDataPoint
- transform/ResourceAttributesDeletion
- groupbyattrs
- transform/TruncateTime
- transform
receivers:
- otlp
metrics/without-metrics-label-reduction:
exporters:
- otlphttp/test-vm-test-without-label-reduction
processors:
- batch
- transform/ResourceAttributesToDataPoint
- transform/ResourceAttributesDeletion
- groupbyattrs
- transform/TruncateTime
- transform
receivers:
- otlp

With the above configuration we have tested two cases:

By keeping otel-collector replicas=1
When we are keeping replica as 1 we are getting aggregated values but sometimes we are getting spikes as shown in below screenshots
Without Processors:

With Processors:

By keeping otel-collector replicas=6
When we have multiple replicas we are getting quite disturbed output...means for few metrics its doing aggregation sometimes and for few metrics its not doing aggregation properly....
Without Processors:

With Processors:

could you please help on this what is going wrong and how can we set??

bacherfl · 2024-12-12T08:11:36Z

Hi @Shindek77 and thanks for the update - I will look into this today and try to see what is happening here. Do you also have some example payload that can be sent to the otlp receiver to reproduce this behavior?

Also, if you are using multiple replicas, keep in mind that metrics that should be grouped together need to be sent to the same instance of the collector - is there currently any mechanism in place that ensures that?

Shindek77 added the needs triage New item requiring triage label Nov 25, 2024

Shindek77 changed the title ~~aggregate_on_attributes function in transform processor not working as expected in conjunction with keep_matching_keys~~ [processor]: aggregate_on_attributes function in transform processor not working as expected in conjunction with keep_matching_keys Nov 25, 2024

github-actions bot mentioned this issue Nov 26, 2024

Weekly Report: 2024-11-19 - 2024-11-26 #36533

Closed

bacherfl added processor/transform Transform processor question Further information is requested labels Nov 26, 2024

bacherfl removed the needs triage New item requiring triage label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[processor]: aggregate_on_attributes function in transform processor not working as expected in conjunction with keep_matching_keys #36517

[processor]: aggregate_on_attributes function in transform processor not working as expected in conjunction with keep_matching_keys #36517

Shindek77 commented Nov 25, 2024 •

edited

Loading

bacherfl commented Nov 26, 2024

github-actions bot commented Nov 26, 2024

Shindek77 commented Nov 26, 2024 •

edited

Loading

bacherfl commented Nov 27, 2024

Shindek77 commented Nov 27, 2024

Shindek77 commented Dec 11, 2024 •

edited

Loading

bacherfl commented Dec 12, 2024 •

edited

Loading

[processor]: aggregate_on_attributes function in transform processor not working as expected in conjunction with keep_matching_keys #36517

[processor]: aggregate_on_attributes function in transform processor not working as expected in conjunction with keep_matching_keys #36517

Comments

Shindek77 commented Nov 25, 2024 • edited Loading

Component(s)

Describe the issue you're reporting

bacherfl commented Nov 26, 2024

github-actions bot commented Nov 26, 2024

Shindek77 commented Nov 26, 2024 • edited Loading

bacherfl commented Nov 27, 2024

Shindek77 commented Nov 27, 2024

Shindek77 commented Dec 11, 2024 • edited Loading

bacherfl commented Dec 12, 2024 • edited Loading

Shindek77 commented Nov 25, 2024 •

edited

Loading

Shindek77 commented Nov 26, 2024 •

edited

Loading

Shindek77 commented Dec 11, 2024 •

edited

Loading

bacherfl commented Dec 12, 2024 •

edited

Loading