You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the solution you'd like
We propose adding a new fieldsummary command to OpenSearch PPL that would provide summary statistics for all fields in the current result set.
This command should:
Calculate basic statistics for each field (count, distinct count, min, max, avg for numeric fields)
Determine the data type of each field
Show the most frequent values and their counts for each field
Calculate the percentage of events that contain each field
Additionally, the command should support the following key optional parameters:
includefields:
Specify which fields to include in the summary (e.g., | fieldsummary includefields="status_code,user_id,response_time")
excludefields:
Specify which fields to exclude from the summary (e.g., | fieldsummary excludefields="internal_id,debug_info")
topvalues:
Set the number of top values to display for each field (e.g., | fieldsummary topvalues=5)
maxfields:
Limit the number of fields to display (e.g., | fieldsummary maxfields=20)
nulls:
Include null/empty value counts (e.g., | fieldsummary nulls=true)
Example usage:
source = t
| wheretimestamp>="2023-01-01"andtimestamp<"2023-02-01"
| fieldsummary includefields="status_code,user_id,response_time" topvalues=3 nulls=true
This command would generate a table with summary statistics for the specified fields in the given date range, showing the top 3 values for each field and including null counts.
Describe the solution you'd like
We propose adding a new
fieldsummary
command to OpenSearch PPL that would provide summary statistics for all fields in the current result set.This command should:
Additionally, the command should support the following key optional parameters:
Specify which fields to include in the summary (e.g.,
| fieldsummary includefields="status_code,user_id,response_time"
)Specify which fields to exclude from the summary (e.g.,
| fieldsummary excludefields="internal_id,debug_info"
)Set the number of top values to display for each field (e.g.,
| fieldsummary topvalues=5
)Limit the number of fields to display (e.g.,
| fieldsummary maxfields=20
)Include null/empty value counts (e.g.,
| fieldsummary nulls=true
)Example usage:
This command would generate a table with summary statistics for the specified fields in the given date range, showing the top 3 values for each field and including null counts.
Example output:
404 (1500, 15%)
500 (400, 4%)
user456 (95, 1%)
user789 (90, 0.9%)
0.75 (1800, 18%)
1.0 (1500, 15%)
The text was updated successfully, but these errors were encountered: