Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]New expand_field PPL Command #3016

Open
YANG-DB opened this issue Sep 13, 2024 · 0 comments
Open

[FEATURE]New expand_field PPL Command #3016

YANG-DB opened this issue Sep 13, 2024 · 0 comments
Labels
enhancement New feature or request PPL Piped processing language

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Sep 13, 2024

Is your feature request related to a problem?
Adding a PPL new expand_field command which adds array and nested object expansion functionality to PPL

Is your feature request related to a problem? Please describe.
OpenSearch's Piped Processing Language (PPL) currently lacks an efficient way to expand arrays and nested objects into separate events, similar to SQL's UNNEST or JSON expansion functions. This limitation hinders the analysis of complex data structures, particularly when working with JSON logs or documents containing arrays or nested objects.

Describe the solution you'd like
We propose adding a new command to PPL that would allow users to expand arrays and nested objects into separate events, similar to SQL's UNNEST function, but with additional flexibility.

The functionality should:

  1. Expand array fields or nested objects into separate events (similar to SQL's UNNEST)
  2. Retain all other fields from the original event in each new event (addressing a limitation of SQL UNNEST)
  3. Support nested fields and complex JSON structures (going beyond basic SQL capabilities)
  4. Allow for subsequent processing of each expanded value in the PPL pipeline
  5. Work seamlessly with unstructured or semi-structured data (unlike SQL, which typically requires predefined schemas)

SQL-like example and comparison:
Consider this SQL-like syntax:

SELECT * FROM my_index
CROSS JOIN UNNEST(items) AS expanded_item
WHERE expanded_item.status = 'active'

The proposed OpenSearch PPL equivalent might look like:

source =  my_index 
| parse my_nested_field `?<items>[*]` as items
| expand_field items
| where items.status = "active"

Key differences and advantages:

  1. No need for explicit JOIN syntax, making it more intuitive for log analysis
  2. Automatic handling of nested structures without need for complex JSON parsing functions
  3. Ability to work with dynamic schemas and unstructured data

Describe alternatives you've considered
Current alternatives include:

  1. Using complex JSON path queries, which can be cumbersome
  2. Processing the data outside of OpenSearch, reducing real-time analysis capabilities

Additional context
This feature would bridge the gap between SQL's structured data handling and the need for flexible, real-time analysis of semi-structured log data. It combines the power of SQL's UNNEST with the flexibility required for log and event processing.

Potential Impact

  • Simplified queries for complex data structures in logs and events
  • Enhanced real-time analytics capabilities for nested JSON data
  • Improved performance compared to client-side processing of nested structures
  • Better alignment with SQL-like functionality while maintaining PPL's simplicity

Proposed Implementation
The new command (e.g., expand_field) could be implemented as a new command in the PPL engine, combining the concepts of SQL's UNNEST with the flexibility needed for unstructured log data.

@YANG-DB YANG-DB added enhancement New feature or request untriaged PPL Piped processing language labels Sep 13, 2024
@YANG-DB YANG-DB self-assigned this Sep 13, 2024
@YANG-DB YANG-DB removed the untriaged label Sep 14, 2024
@YANG-DB YANG-DB removed their assignment Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PPL Piped processing language
Projects
Status: Todo
Development

No branches or pull requests

1 participant