Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META]Add flatten Command to PPL #3029

Open
YANG-DB opened this issue Sep 16, 2024 · 0 comments
Open

[META]Add flatten Command to PPL #3029

YANG-DB opened this issue Sep 16, 2024 · 0 comments
Labels
enhancement New feature or request PPL Piped processing language

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Sep 16, 2024

Is your feature request related to a problem?
OpenSearch Piped Processing Language (PPL) currently lacks a native command to flatten nested objects or arrays in documents. Many datasets, especially those containing JSON objects, have deeply nested fields that are difficult to work with in their raw form. The flatten command will simplify these structures and make it easier to analyze and extract data.

What solution would you like?
Introduce a flatten command in PPL that can handle arrays or nested fields, producing a flattened result that contains all the nested elements at the top level.

Syntax:

source=<data_source> | flatten <nested_field>  | fields <fields_to_select>
  • The flatten command takes a nested array or object field and returns each element as part of a flat structure.

Example Use Cases

  1. Flattening an Array Field
source=my-index  | flatten bridges | fields _time, bridges, city, country

This query flattens the bridges array field.

Example Input:

{
  "_time": "2024-09-13T12:00:00",
  "bridges": [
    {"name": "Tower Bridge", "length": 801},
    {"name": "London Bridge", "length": 928}
  ],
  "city": "London",
  "country": "England"
}

Expected Output:

[
  {
    "_time": "2024-09-13T12:00:00",
    "name": "Tower Bridge",
    "length": 801,
    "city": "London",
    "country": "England"
  },
  {
    "_time": "2024-09-13T12:00:00",
    "name": "London Bridge",
    "length": 928,
    "city": "London",
    "country": "England"
  }
]
  1. Flattening a Nested Object
source=my-index | flatten details | fields _time, details 

This query flattens the details object field.

Example Input:

{
  "_time": "2024-09-13T12:00:00",
  "details": {
    "name": "Alice",
    "age": 30,
    "address": {
      "street": "Main St",
      "city": "New York"
    }
  }
}

Expected Output:

{
  "_time": "2024-09-13T12:00:00",
  "name": "Alice",
  "age": 30,
  "street": "Main St",
  "city": "New York"
}

Additional Considerations

  • The flatten command should work efficiently with large arrays or deeply nested structures.
  • It must handle complex JSON objects where multiple levels of nesting exist.
  • Consider supporting multi-level flattening for more deeply nested fields (e.g., flatten details.address).

Support for PPL flattens functionality is required for both:

OpenSearch based PPL engine

Spark based PPL engine

@YANG-DB YANG-DB added enhancement New feature or request untriaged PPL Piped processing language labels Sep 16, 2024
@YANG-DB YANG-DB moved this to Todo in PPL Commands Sep 16, 2024
@YANG-DB YANG-DB removed the untriaged label Sep 16, 2024
@YANG-DB YANG-DB moved this from Todo to Design in PPL Commands Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PPL Piped processing language
Projects
Status: Design
Development

No branches or pull requests

1 participant