You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem?
Adding a PPL new expand_field command which adds array and nested object expansion functionality to PPL
Is your feature request related to a problem? Please describe.
OpenSearch's Piped Processing Language (PPL) currently lacks an efficient way to expand arrays and nested objects into separate events, similar to SQL's UNNEST or JSON expansion functions. This limitation hinders the analysis of complex data structures, particularly when working with JSON logs or documents containing arrays or nested objects.
Describe the solution you'd like
We propose adding a new command to PPL that would allow users to expand arrays and nested objects into separate events, similar to SQL's UNNEST function, but with additional flexibility.
The functionality should:
Expand array fields or nested objects into separate events (similar to SQL's UNNEST)
Retain all other fields from the original event in each new event (addressing a limitation of SQL UNNEST)
Support nested fields and complex JSON structures (going beyond basic SQL capabilities)
Allow for subsequent processing of each expanded value in the PPL pipeline
Work seamlessly with unstructured or semi-structured data (unlike SQL, which typically requires predefined schemas)
SQL-like example and comparison:
Consider this SQL-like syntax:
SELECT*FROM my_index
CROSS JOIN UNNEST(items) AS expanded_item
WHEREexpanded_item.status='active'
The proposed OpenSearch PPL equivalent might look like:
No need for explicit JOIN syntax, making it more intuitive for log analysis
Automatic handling of nested structures without need for complex JSON parsing functions
Ability to work with dynamic schemas and unstructured data
Describe alternatives you've considered
Current alternatives include:
Using complex JSON path queries, which can be cumbersome
Processing the data outside of OpenSearch, reducing real-time analysis capabilities
Additional context
This feature would bridge the gap between SQL's structured data handling and the need for flexible, real-time analysis of semi-structured log data. It combines the power of SQL's UNNEST with the flexibility required for log and event processing.
Potential Impact
Simplified queries for complex data structures in logs and events
Enhanced real-time analytics capabilities for nested JSON data
Improved performance compared to client-side processing of nested structures
Better alignment with SQL-like functionality while maintaining PPL's simplicity
Proposed Implementation
The new command (e.g., expand_field) could be implemented as a new command in the PPL engine, combining the concepts of SQL's UNNEST with the flexibility needed for unstructured log data.
Support for PPL expand_field functionality is required for both:
Is your feature request related to a problem?
Adding a PPL new
expand_field
command which adds array and nested object expansion functionality to PPLIs your feature request related to a problem? Please describe.
OpenSearch's Piped Processing Language (PPL) currently lacks an efficient way to expand arrays and nested objects into separate events, similar to SQL's UNNEST or JSON expansion functions. This limitation hinders the analysis of complex data structures, particularly when working with JSON logs or documents containing arrays or nested objects.
Describe the solution you'd like
We propose adding a new command to PPL that would allow users to expand arrays and nested objects into separate events, similar to SQL's UNNEST function, but with additional flexibility.
The functionality should:
SQL-like example and comparison:
Consider this SQL-like syntax:
The proposed OpenSearch PPL equivalent might look like:
Key differences and advantages:
Describe alternatives you've considered
Current alternatives include:
Additional context
This feature would bridge the gap between SQL's structured data handling and the need for flexible, real-time analysis of semi-structured log data. It combines the power of SQL's UNNEST with the flexibility required for log and event processing.
Potential Impact
Proposed Implementation
The new command (e.g.,
expand_field
) could be implemented as a new command in the PPL engine, combining the concepts of SQL's UNNEST with the flexibility needed for unstructured log data.Support for PPL
expand_field
functionality is required for both:OpenSearch based PPL engine
expand_field
PPL Command #3016Spark based PPL engine
expand
PPL Command opensearch-spark#657The text was updated successfully, but these errors were encountered: