Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META]Extend ppl stats command functionality #3023

Open
YANG-DB opened this issue Sep 13, 2024 · 0 comments
Open

[META]Extend ppl stats command functionality #3023

YANG-DB opened this issue Sep 13, 2024 · 0 comments
Labels
enhancement New feature or request PPL Piped processing language

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Sep 13, 2024

High level Review

The OpenSearch Piped Processing Language (PPL) currently lacks some advanced statistical aggregation capabilities similar to those provided by the eventstats command in Splunk Search Processing Language (SPL).
This feature request proposes adding new functions and syntax to PPL to enable statistical calculations and aggregations on event data.

Proposed Functionality:

  1. Aggregate statistical calculations:

    • Calculate common statistical measures like sum, count, min, max, avg, etc., on specific fields or expressions.
    • Support grouping events by one or more fields and performing statistical calculations within each group.
    • Allow renaming the calculated fields with custom names.
  2. Conditional aggregations:

    • Perform statistical calculations based on conditional expressions or filters.
    • Evaluate conditional expressions for each event and aggregate the results (e.g., sum of a conditional expression).
  3. Chaining and nesting:

    • Enable chaining and nesting of statistical calculations, similar to how eventstats commands can be chained in SPL.
    • Allow performing multiple levels of aggregations and calculations in a single query.
  4. Integration with existing PPL syntax:

    • Seamlessly integrate the new statistical aggregation capabilities with the existing PPL syntax and functions.
    • Ensure compatibility with other PPL features and maintain the overall usability and readability of the language.

Examples:

  1. Calculate the sum of a conditional expression grouped by a field:
stats sum(if(field1 = "value" and field2 like "%pattern%", 1, 0)) as conditional_sum by group_field
  1. Calculate minimum and maximum values of a field grouped by another field:
stats min(latency_field) as min_latency, max(latency_field) as max_latency by operation_id
  1. Chain multiple statistical calculations:
stats sum(count) as total_count by client_id | stats sum(total_count) as overall_total

Support for PPL extended stats functionality is required for both:

@YANG-DB YANG-DB added enhancement New feature or request untriaged PPL Piped processing language labels Sep 13, 2024
@YANG-DB YANG-DB changed the title [META]Extend stats ppl command functionality [META]Extend ppl stats command functionality Sep 14, 2024
@YANG-DB YANG-DB removed the untriaged label Sep 16, 2024
@YANG-DB YANG-DB moved this to Todo in PPL Commands Oct 9, 2024
@YANG-DB YANG-DB removed this from PPL Commands Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PPL Piped processing language
Projects
None yet
Development

No branches or pull requests

1 participant