-
Notifications
You must be signed in to change notification settings - Fork 1
Add aws_securityhub_finding
table
#156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…curityHubFindingTable to include artifact extractor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces the new table "aws_securityhub_finding" along with its supporting mapper, extractor, schema, and documentation. Key changes include:
- New table implementation files (table, mapper, extractor, and model) for ingesting AWS Security Hub findings.
- Updated documentation examples and queries to enable users to explore and collect Security Hub findings.
- Registration of the new table in the AWS plugin and updates to the S3 bucket documentation.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tables/securityhub_finding/securityhub_finding_table.go | Implements the table structure and source metadata for findings. |
tables/securityhub_finding/securityhub_finding_mapper.go | Adds the mapper for converting input data into SecurityHub findings. |
tables/securityhub_finding/securityhub_finding_extractor.go | Provides an extractor to parse JSON data and produce finding records. |
tables/securityhub_finding/securityhub_finding.go | Defines the schema and data structure for Security Hub findings. |
docs/tables/aws_securityhub_finding/queries.md | Introduces SQL query examples to analyze Security Hub findings. |
docs/tables/aws_securityhub_finding/index.md | Adds the table documentation and configuration instructions. |
docs/sources/aws_s3_bucket.md | Updates the S3 bucket source documentation to include the new table. |
aws/plugin.go | Registers the new table with the AWS plugin. |
Comments suppressed due to low confidence (1)
tables/securityhub_finding/securityhub_finding_extractor.go:63
- The 'min' function used to limit the bytes output is not defined in this context. Consider defining a helper function, or use a standard function to safely determine the minimum value.
slog.Debug("Error decoding SecurityHub finding", "error", err, "sample_start", string(jsonBytes[:min(len(jsonBytes), 500)]))
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds a new table for querying AWS Security Hub findings including its associated mapper, extractor, and documentation.
- Introduces the aws_securityhub_finding table with its Go definition.
- Adds mapping and extraction logic for processing Security Hub JSON records.
- Updates documentation and plugin registration to include the new table.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tables/securityhub_finding/securityhub_finding_table.go | New table definition with source metadata and row enrichment logic. |
tables/securityhub_finding/securityhub_finding_mapper.go | Implements mapping from raw input data to SecurityHubFinding structs. |
tables/securityhub_finding/securityhub_finding_extractor.go | Implements extraction logic with JSON field mapping adjustments. |
tables/securityhub_finding/securityhub_finding.go | Defines the SecurityHubFinding struct and its JSON/parquet mappings. |
docs/tables/aws_securityhub_finding/queries.md | Adds example queries for the new aws_securityhub_finding table. |
docs/tables/aws_securityhub_finding/index.md | Provides table overview and configuration instructions. |
docs/sources/aws_s3_bucket.md | Updates the source documentation to include the new table. |
aws/plugin.go | Registers the new table in the AWS plugin. |
Comments suppressed due to low confidence (1)
tables/securityhub_finding/securityhub_finding_extractor.go:63
- The function 'min' is used on this line but is not defined or imported, which will result in a compile-time error. Consider defining a helper function or using the built-in math.Min after converting the arguments to float64 and then converting back to int.
slog.Debug("Error decoding SecurityHub finding", "error", err, "sample_start", string(jsonBytes[:min(len(jsonBytes), 500)]))
SQL Query Evaluation Results for
|
Criteria | Pass/Fail | Suggestions |
---|---|---|
Use 2 space indentation | ✅ | |
Query should end with a semicolon | ✅ | |
Keywords should be in lowercase | ✅ | |
Each clause is on its own line | ✅ | |
All columns exist in the schema | ❌ | The column tp_timestamp is not present in the provided schema. Use time instead. |
STRUCT type columns use dot notation | ✅ | |
JSON type columns use -> and ->> operators |
✅ | N/A |
JSON type columns are wrapped in parenthesis | ✅ | N/A |
Space before and after each -> and ->> |
✅ | N/A |
SQL query syntax uses valid DuckDB syntax | ✅ |
Title and description checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Title uses title case | ✅ | |
Title accurately describes the query | ✅ | |
Title contains limit value if in query | ✅ | N/A |
Description explains what the query does | ✅ | |
Description explains why a user would run the query | ✅ | |
Description is concise | ✅ |
Query relevance checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Provides useful insights for this log type | ✅ | |
Relevant to security, operational, or performance monitoring | ✅ |
Column selection checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Aggregated queries should not include tp_index /tp_timestamp |
✅ | |
Non-aggregated queries should have tp_timestamp as the first column |
✅ | N/A |
Non-aggregated queries should include columns related to where the resources exist | ✅ | N/A |
Non-aggregated queries should place columns related to where the resources exist last | ✅ | N/A |
Non-aggregated queries should only include tp_index if missing index information in other columns | ✅ | N/A |
Avoid selecting columns with fixed values in WHERE clause | ✅ |
Sorting strategy checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Non-aggregated queries default to tp_timestamp desc |
✅ | N/A |
Aggregated queries ordered by count desc or time asc | ✅ |
Recent Findings Analysis ❌
Query
Recent Findings Analysis
Analyze recent security findings with detailed resource and severity information.
select
tp_timestamp,
title,
types,
severity,
resources,
tp_index as account_id,
region,
workflow_state,
remediation.recommendation.text as remediation_text
from
aws_securityhub_finding
where
tp_timestamp > current_date - interval '7 days'
order by
severity.normalized desc,
tp_timestamp desc;
SQL syntax checks ❌
Criteria | Pass/Fail | Suggestions |
---|---|---|
Use 2 space indentation | ✅ | |
Query should end with a semicolon | ✅ | |
Keywords should be in lowercase | ✅ | |
Each clause is on its own line | ✅ | |
All columns exist in the schema | ✅ | |
STRUCT type columns use dot notation | ✅ | |
JSON type columns use -> and ->> operators |
❌ | Use -> or ->> for accessing JSON properties in 'remediation' column |
JSON type columns are wrapped in parenthesis | ❌ | Wrap JSON column access in parentheses |
Space before and after each -> and ->> |
❌ | Add spaces around -> or ->> operators |
SQL query syntax uses valid DuckDB syntax | ✅ |
Title and description checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Title uses title case | ✅ | |
Title accurately describes the query | ✅ | |
Title contains limit value if in query | ✅ | N/A |
Description explains what the query does | ✅ | |
Description explains why a user would run the query | ✅ | |
Description is concise | ✅ |
Query relevance checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Provides useful insights for this log type | ✅ | |
Relevant to security, operational, or performance monitoring | ✅ |
Column selection checks ❌
Criteria | Pass/Fail | Suggestions |
---|---|---|
Aggregated queries should not include tp_index /tp_timestamp |
✅ | N/A |
Non-aggregated queries should have tp_timestamp as the first column |
✅ | |
Non-aggregated queries should include columns related to where the resources exist | ✅ | |
Non-aggregated queries should place columns related to where the resources exist last | ❌ | Move 'account_id' and 'region' to the end of the SELECT statement |
Non-aggregated queries should only include tp_index if missing index information in other columns | ❌ | Remove 'tp_index' as 'account' is already present in the schema |
Avoid selecting columns with fixed values in WHERE clause | ✅ |
Sorting strategy checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Non-aggregated queries default to tp_timestamp desc |
✅ | |
Aggregated queries ordered by count desc or time asc | ✅ | N/A |
Top 10 Finding Types ✅
Query
Top 10 Finding Types
Generate a ranked list of the most prevalent Security Hub finding types with severity information.
select
types,
count(*) as finding_count,
round(avg(severity.normalized), 2) as avg_severity
from
aws_securityhub_finding
group by
types
order by
finding_count desc
limit 10;
SQL syntax checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Use 2 space indentation | ✅ | |
Query should end with a semicolon | ✅ | |
Keywords should be in lowercase | ✅ | |
Each clause is on its own line | ✅ | |
All columns exist in the schema | ✅ | |
STRUCT type columns use dot notation | ✅ | |
JSON type columns use -> and ->> operators |
✅ | N/A |
JSON type columns are wrapped in parenthesis | ✅ | N/A |
Space before and after each -> and ->> |
✅ | N/A |
SQL query syntax uses valid DuckDB syntax | ✅ |
Title and description checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Title uses title case | ✅ | |
Title accurately describes the query | ✅ | |
Title contains limit value if in query | ✅ | |
Description explains what the query does | ✅ | |
Description explains why a user would run the query | ❌ | Add a sentence explaining the usefulness of this information. |
Description is concise | ✅ |
Query relevance checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Provides useful insights for this log type | ✅ | |
Relevant to security, operational, or performance monitoring | ✅ |
Column selection checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Aggregated queries should not include tp_index /tp_timestamp |
✅ | |
Non-aggregated queries should have tp_timestamp as the first column |
✅ | N/A |
Non-aggregated queries should include columns related to where the resources exist | ✅ | N/A |
Non-aggregated queries should place columns related to where the resources exist last | ✅ | N/A |
Non-aggregated queries should only include tp_index if missing index information in other columns | ✅ | N/A |
Avoid selecting columns with fixed values in WHERE clause | ✅ |
Sorting strategy checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Non-aggregated queries default to tp_timestamp desc |
✅ | N/A |
Aggregated queries ordered by count desc or time asc | ✅ |
Findings by Account and Region ❌
Query
Findings by Account and Region
Analyze security findings across your AWS organization with detailed severity information.
select
tp_index as account_id,
region,
count(*) as finding_count,
round(avg(severity.normalized), 2) as avg_severity,
sum(case when severity.normalized >= 90 then 1 else 0 end) as critical_severity_count,
sum(case when severity.normalized >= 70 and severity.normalized < 90 then 1 else 0 end) as high_severity_count,
sum(case when severity.normalized >= 40 and severity.normalized < 70 then 1 else 0 end) as medium_severity_count,
sum(case when severity.normalized >= 1 and severity.normalized < 40 then 1 else 0 end) as low_severity_count,
sum(case when severity.normalized = 0 then 1 else 0 end) as informational_severity_count
from
aws_securityhub_finding
group by
account_id,
region
order by
critical_severity_count desc;
SQL syntax checks ❌
Criteria | Pass/Fail | Suggestions |
---|---|---|
Use 2 space indentation | ✅ | |
Query should end with a semicolon | ✅ | |
Keywords should be in lowercase | ✅ | |
Each clause is on its own line | ✅ | |
All columns exist in the schema | ❌ | The column 'severity.normalized' does not exist in the schema. Use 'severity -> 'normalized'' instead. |
STRUCT type columns use dot notation | ❌ | Use 'severity -> 'normalized'' instead of 'severity.normalized' |
JSON type columns use -> and ->> operators |
❌ | Use 'severity -> 'normalized'' instead of 'severity.normalized' |
JSON type columns are wrapped in parenthesis | ❌ | Wrap JSON column access in parentheses: (severity -> 'normalized') |
Space before and after each -> and ->> |
❌ | Add spaces around '->' operator: severity -> 'normalized' |
SQL query syntax uses valid DuckDB syntax | ✅ |
Title and description checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Title uses title case | ✅ | |
Title accurately describes the query | ✅ | |
Title contains limit value if in query | ✅ | N/A |
Description explains what the query does | ✅ | |
Description explains why a user would run the query | ✅ | |
Description is concise | ✅ |
Query relevance checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Provides useful insights for this log type | ✅ | |
Relevant to security, operational, or performance monitoring | ✅ |
Column selection checks ❌
Criteria | Pass/Fail | Suggestions |
---|---|---|
Aggregated queries should not include tp_index /tp_timestamp |
❌ | Remove 'tp_index as account_id' from the SELECT clause |
Non-aggregated queries should have tp_timestamp as the first column |
✅ | N/A |
Non-aggregated queries should include columns related to where the resources exist | ✅ | N/A |
Non-aggregated queries should place columns related to where the resources exist last | ✅ | N/A |
Non-aggregated queries should only include tp_index if missing index information in other columns | ✅ | N/A |
Avoid selecting columns with fixed values in WHERE clause | ✅ | N/A |
Sorting strategy checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Non-aggregated queries default to tp_timestamp desc |
✅ | N/A |
Aggregated queries ordered by count desc or time asc | ✅ |
Findings by Severity Level ✅
Query
Findings by Severity Level
Categorize Security Hub findings into severity bands with detailed counts and percentages.
select
case
when severity.normalized >= 90 then 'Critical (90-100)'
when severity.normalized >= 70 then 'High (70-89)'
when severity.normalized >= 40 then 'Medium (40-69)'
when severity.normalized >= 1 then 'Low (1-39)'
else 'Informational (0)'
end as severity_level,
count(*) as finding_count,
round(count(*) * 100.0 / sum(count(*)) over(), 2) as percentage
from
aws_securityhub_finding
group by
severity_level
order by
case severity_level
when 'Critical (90-100)' then 1
when 'High (70-89)' then 2
when 'Medium (40-69)' then 3
when 'Low (1-39)' then 4
else 5
end;
SQL syntax checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Use 2 space indentation | ✅ | |
Query should end with a semicolon | ✅ | |
Keywords should be in lowercase | ✅ | |
Each clause is on its own line | ✅ | |
All columns exist in the schema | ✅ | |
STRUCT type columns use dot notation | ✅ | |
JSON type columns use -> and ->> operators |
✅ | N/A |
JSON type columns are wrapped in parenthesis | ✅ | N/A |
Space before and after each -> and ->> |
✅ | N/A |
SQL query syntax uses valid DuckDB syntax | ✅ |
Title and description checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Title uses title case | ✅ | |
Title accurately describes the query | ✅ | |
Title contains limit value if in query | ✅ | N/A |
Description explains what the query does | ✅ | |
Description explains why a user would run the query | ✅ | |
Description is concise | ✅ |
Query relevance checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Provides useful insights for this log type | ✅ | |
Relevant to security, operational, or performance monitoring | ✅ |
Column selection checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Aggregated queries should not include tp_index /tp_timestamp |
✅ | |
Non-aggregated queries should have tp_timestamp as the first column |
✅ | N/A |
Non-aggregated queries should include columns related to where the resources exist | ✅ | N/A |
Non-aggregated queries should place columns related to where the resources exist last | ✅ | N/A |
Non-aggregated queries should only include tp_index if missing index information in other columns | ✅ | N/A |
Avoid selecting columns with fixed values in WHERE clause | ✅ |
Sorting strategy checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Non-aggregated queries default to tp_timestamp desc |
✅ | N/A |
Aggregated queries ordered by count desc or time asc | ✅ |
Compliance Status Overview ✅
Query
Compliance Status Overview
Monitor compliance status with detailed severity information.
select
compliance.status,
compliance.security_control_id,
count(*) as finding_count,
round(avg(severity.normalized), 2) as avg_severity
from
aws_securityhub_finding
where
compliance is not null
group by
compliance.status,
compliance.security_control_id
order by
finding_count desc;
SQL syntax checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Use 2 space indentation | ✅ | |
Query should end with a semicolon | ✅ | |
Keywords should be in lowercase | ✅ | |
Each clause is on its own line | ✅ | |
All columns exist in the schema | ✅ | |
STRUCT type columns use dot notation | ✅ | |
JSON type columns use -> and ->> operators |
✅ | N/A |
JSON type columns are wrapped in parenthesis | ✅ | N/A |
Space before and after each -> and ->> |
✅ | N/A |
SQL query syntax uses valid DuckDB syntax | ✅ |
Title and description checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Title uses title case | ✅ | |
Title accurately describes the query | ✅ | |
Title contains limit value if in query | ✅ | N/A |
Description explains what the query does | ✅ | |
Description explains why a user would run the query | ❌ | Add a sentence explaining the benefits of monitoring compliance status. |
Description is concise | ✅ |
Query relevance checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Provides useful insights for this log type | ✅ | |
Relevant to security, operational, or performance monitoring | ✅ |
Column selection checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Aggregated queries should not include tp_index /tp_timestamp |
✅ | |
Non-aggregated queries should have tp_timestamp as the first column |
✅ | N/A |
Non-aggregated queries should include columns related to where the resources exist | ✅ | N/A |
Non-aggregated queries should place columns related to where the resources exist last | ✅ | N/A |
Non-aggregated queries should only include tp_index if missing index information in other columns | ✅ | N/A |
Avoid selecting columns with fixed values in WHERE clause | ✅ |
Sorting strategy checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Non-aggregated queries default to tp_timestamp desc |
✅ | N/A |
Aggregated queries ordered by count desc or time asc | ✅ |
Detect High Severity Findings with Remediation ❌
Query
Detect High Severity Findings with Remediation
select
tp_timestamp,
title,
types,
severity,
description,
tp_index as account_id,
region,
resources,
remediation.recommendation.text as remediation_text
from
aws_securityhub_finding
where
severity.normalized >= 70
order by
severity.normalized desc,
tp_timestamp desc;
SQL syntax checks ❌
Criteria | Pass/Fail | Suggestions |
---|---|---|
Use 2 space indentation | ✅ | |
Query should end with a semicolon | ✅ | |
Keywords should be in lowercase | ✅ | |
Each clause is on its own line | ✅ | |
All columns exist in the schema | ✅ | |
STRUCT type columns use dot notation | ✅ | |
JSON type columns use -> and ->> operators |
❌ | Use -> or ->> for accessing JSON fields, e.g., remediation->'recommendation'->>'text' |
JSON type columns are wrapped in parenthesis | ❌ | Wrap JSON access in parentheses, e.g., (remediation->'recommendation'->>'text') |
Space before and after each -> and ->> |
❌ | Add spaces around -> and ->> operators |
SQL query syntax uses valid DuckDB syntax | ✅ |
Title and description checks ❌
Criteria | Pass/Fail | Suggestions |
---|---|---|
Title uses title case | ✅ | |
Title accurately describes the query | ✅ | |
Title contains limit value if in query | ✅ | N/A |
Description explains what the query does | ❌ | Add a description explaining what the query does |
Description explains why a user would run the query | ❌ | Add a description explaining why a user would run this query |
Description is concise | ❌ | Add a concise description |
Query relevance checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Provides useful insights for this log type | ✅ | |
Relevant to security, operational, or performance monitoring | ✅ |
Column selection checks ❌
Criteria | Pass/Fail | Suggestions |
---|---|---|
Aggregated queries should not include tp_index /tp_timestamp |
✅ | N/A |
Non-aggregated queries should have tp_timestamp as the first column |
✅ | |
Non-aggregated queries should include columns related to where the resources exist | ✅ | |
Non-aggregated queries should place columns related to where the resources exist last | ❌ | Move account_id and region to the end of the SELECT statement |
Non-aggregated queries should only include tp_index if missing index information in other columns | ❌ | Remove tp_index as account_id is already present |
Avoid selecting columns with fixed values in WHERE clause | ✅ |
Sorting strategy checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Non-aggregated queries default to tp_timestamp desc |
✅ | |
Aggregated queries ordered by count desc or time asc | ✅ | N/A |
Lambda Function Security Issues ❌
Query
Lambda Function Security Issues
Identify security issues in Lambda functions, focusing on public access.
select
tp_timestamp,
title,
severity.normalized as severity,
json_extract(resources, '$[0].id') as function_arn,
json_extract(resources, '$[0].details.awslambdafunction.runtime') as runtime,
workflow_state
from
aws_securityhub_finding
where
json_extract(resources, '$[0].type') = '"AwsLambdaFunction"'
and title ilike '%public access%'
and severity.normalized >= 70
order by
severity desc,
tp_timestamp desc;
SQL syntax checks ❌
Criteria | Pass/Fail | Suggestions |
---|---|---|
Use 2 space indentation | ✅ | |
Query should end with a semicolon | ✅ | |
Keywords should be in lowercase | ✅ | |
Each clause is on its own line | ✅ | |
All columns exist in the schema | ✅ | |
STRUCT type columns use dot notation | ✅ | |
JSON type columns use -> and ->> operators |
❌ | Use -> or ->> operators instead of json_extract function |
JSON type columns are wrapped in parenthesis | ❌ | Wrap JSON column accesses in parentheses |
Space before and after each -> and ->> |
❌ | Add spaces before and after -> or ->> operators |
SQL query syntax uses valid DuckDB syntax | ❌ | Replace json_extract with DuckDB JSON operators |
Title and description checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Title uses title case | ✅ | |
Title accurately describes the query | ✅ | |
Title contains limit value if in query | ✅ | N/A |
Description explains what the query does | ✅ | |
Description explains why a user would run the query | ✅ | |
Description is concise | ✅ |
Query relevance checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Provides useful insights for this log type | ✅ | |
Relevant to security, operational, or performance monitoring | ✅ |
Column selection checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Aggregated queries should not include tp_index /tp_timestamp |
✅ | N/A |
Non-aggregated queries should have tp_timestamp as the first column |
✅ | |
Non-aggregated queries should include columns related to where the resources exist | ✅ | |
Non-aggregated queries should place columns related to where the resources exist last | ✅ | |
Non-aggregated queries should only include tp_index if missing index information in other columns | ✅ | N/A |
Avoid selecting columns with fixed values in WHERE clause | ✅ |
Sorting strategy checks ✅
Criteria | Pass/Fail | Suggestions |
---|---|---|
Non-aggregated queries default to tp_timestamp desc |
✅ | |
Aggregated queries ordered by count desc or time asc | ✅ | N/A |
Example query results
Results