Skip to content

Add aws_securityhub_finding table #156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Priyanka-Chatterjee-2000
Copy link
Contributor

Example query results

Results
Add example SQL query results here (please include the input queries as well)

@cbruno10 cbruno10 requested a review from Copilot April 21, 2025 13:39
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the new table "aws_securityhub_finding" along with its supporting mapper, extractor, schema, and documentation. Key changes include:

  • New table implementation files (table, mapper, extractor, and model) for ingesting AWS Security Hub findings.
  • Updated documentation examples and queries to enable users to explore and collect Security Hub findings.
  • Registration of the new table in the AWS plugin and updates to the S3 bucket documentation.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tables/securityhub_finding/securityhub_finding_table.go Implements the table structure and source metadata for findings.
tables/securityhub_finding/securityhub_finding_mapper.go Adds the mapper for converting input data into SecurityHub findings.
tables/securityhub_finding/securityhub_finding_extractor.go Provides an extractor to parse JSON data and produce finding records.
tables/securityhub_finding/securityhub_finding.go Defines the schema and data structure for Security Hub findings.
docs/tables/aws_securityhub_finding/queries.md Introduces SQL query examples to analyze Security Hub findings.
docs/tables/aws_securityhub_finding/index.md Adds the table documentation and configuration instructions.
docs/sources/aws_s3_bucket.md Updates the S3 bucket source documentation to include the new table.
aws/plugin.go Registers the new table with the AWS plugin.
Comments suppressed due to low confidence (1)

tables/securityhub_finding/securityhub_finding_extractor.go:63

  • The 'min' function used to limit the bytes output is not defined in this context. Consider defining a helper function, or use a standard function to safely determine the minimum value.
slog.Debug("Error decoding SecurityHub finding", "error", err, "sample_start", string(jsonBytes[:min(len(jsonBytes), 500)]))

@cbruno10 cbruno10 added the review Triggers a workflow to review and validate queries label Apr 22, 2025
@cbruno10 cbruno10 requested a review from Copilot April 22, 2025 13:29
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds a new table for querying AWS Security Hub findings including its associated mapper, extractor, and documentation.

  • Introduces the aws_securityhub_finding table with its Go definition.
  • Adds mapping and extraction logic for processing Security Hub JSON records.
  • Updates documentation and plugin registration to include the new table.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tables/securityhub_finding/securityhub_finding_table.go New table definition with source metadata and row enrichment logic.
tables/securityhub_finding/securityhub_finding_mapper.go Implements mapping from raw input data to SecurityHubFinding structs.
tables/securityhub_finding/securityhub_finding_extractor.go Implements extraction logic with JSON field mapping adjustments.
tables/securityhub_finding/securityhub_finding.go Defines the SecurityHubFinding struct and its JSON/parquet mappings.
docs/tables/aws_securityhub_finding/queries.md Adds example queries for the new aws_securityhub_finding table.
docs/tables/aws_securityhub_finding/index.md Provides table overview and configuration instructions.
docs/sources/aws_s3_bucket.md Updates the source documentation to include the new table.
aws/plugin.go Registers the new table in the AWS plugin.
Comments suppressed due to low confidence (1)

tables/securityhub_finding/securityhub_finding_extractor.go:63

  • The function 'min' is used on this line but is not defined or imported, which will result in a compile-time error. Consider defining a helper function or using the built-in math.Min after converting the arguments to float64 and then converting back to int.
slog.Debug("Error decoding SecurityHub finding", "error", err, "sample_start", string(jsonBytes[:min(len(jsonBytes), 500)]))

Copy link

SQL Query Evaluation Results for aws_securityhub_finding

Daily Activity Trends ❌

Query

Daily Activity Trends

Analyze the daily distribution of Security Hub findings to identify security patterns and potential security issues over time.

select
  strftime(tp_timestamp, '%Y-%m-%d') as finding_date,
  count(*) as finding_count,
  round(avg(severity.normalized), 2) as avg_severity
from
  aws_securityhub_finding
group by
  finding_date
order by
  finding_date asc;
SQL syntax checks ❌
Criteria Pass/Fail Suggestions
Use 2 space indentation
Query should end with a semicolon
Keywords should be in lowercase
Each clause is on its own line
All columns exist in the schema The column tp_timestamp is not present in the provided schema. Use time instead.
STRUCT type columns use dot notation
JSON type columns use -> and ->> operators N/A
JSON type columns are wrapped in parenthesis N/A
Space before and after each -> and ->> N/A
SQL query syntax uses valid DuckDB syntax
Title and description checks ✅
Criteria Pass/Fail Suggestions
Title uses title case
Title accurately describes the query
Title contains limit value if in query N/A
Description explains what the query does
Description explains why a user would run the query
Description is concise
Query relevance checks ✅
Criteria Pass/Fail Suggestions
Provides useful insights for this log type
Relevant to security, operational, or performance monitoring
Column selection checks ✅
Criteria Pass/Fail Suggestions
Aggregated queries should not include tp_index/tp_timestamp
Non-aggregated queries should have tp_timestamp as the first column N/A
Non-aggregated queries should include columns related to where the resources exist N/A
Non-aggregated queries should place columns related to where the resources exist last N/A
Non-aggregated queries should only include tp_index if missing index information in other columns N/A
Avoid selecting columns with fixed values in WHERE clause
Sorting strategy checks ✅
Criteria Pass/Fail Suggestions
Non-aggregated queries default to tp_timestamp desc N/A
Aggregated queries ordered by count desc or time asc

Recent Findings Analysis ❌

Query

Recent Findings Analysis

Analyze recent security findings with detailed resource and severity information.

select
  tp_timestamp,
  title,
  types,
  severity,
  resources,
  tp_index as account_id,
  region,
  workflow_state,
  remediation.recommendation.text as remediation_text
from
  aws_securityhub_finding
where
  tp_timestamp > current_date - interval '7 days'
order by
  severity.normalized desc,
  tp_timestamp desc;
SQL syntax checks ❌
Criteria Pass/Fail Suggestions
Use 2 space indentation
Query should end with a semicolon
Keywords should be in lowercase
Each clause is on its own line
All columns exist in the schema
STRUCT type columns use dot notation
JSON type columns use -> and ->> operators Use -> or ->> for accessing JSON properties in 'remediation' column
JSON type columns are wrapped in parenthesis Wrap JSON column access in parentheses
Space before and after each -> and ->> Add spaces around -> or ->> operators
SQL query syntax uses valid DuckDB syntax
Title and description checks ✅
Criteria Pass/Fail Suggestions
Title uses title case
Title accurately describes the query
Title contains limit value if in query N/A
Description explains what the query does
Description explains why a user would run the query
Description is concise
Query relevance checks ✅
Criteria Pass/Fail Suggestions
Provides useful insights for this log type
Relevant to security, operational, or performance monitoring
Column selection checks ❌
Criteria Pass/Fail Suggestions
Aggregated queries should not include tp_index/tp_timestamp N/A
Non-aggregated queries should have tp_timestamp as the first column
Non-aggregated queries should include columns related to where the resources exist
Non-aggregated queries should place columns related to where the resources exist last Move 'account_id' and 'region' to the end of the SELECT statement
Non-aggregated queries should only include tp_index if missing index information in other columns Remove 'tp_index' as 'account' is already present in the schema
Avoid selecting columns with fixed values in WHERE clause
Sorting strategy checks ✅
Criteria Pass/Fail Suggestions
Non-aggregated queries default to tp_timestamp desc
Aggregated queries ordered by count desc or time asc N/A

Top 10 Finding Types ✅

Query

Top 10 Finding Types

Generate a ranked list of the most prevalent Security Hub finding types with severity information.

select
  types,
  count(*) as finding_count,
  round(avg(severity.normalized), 2) as avg_severity
from
  aws_securityhub_finding
group by
  types
order by
  finding_count desc
limit 10;
SQL syntax checks ✅
Criteria Pass/Fail Suggestions
Use 2 space indentation
Query should end with a semicolon
Keywords should be in lowercase
Each clause is on its own line
All columns exist in the schema
STRUCT type columns use dot notation
JSON type columns use -> and ->> operators N/A
JSON type columns are wrapped in parenthesis N/A
Space before and after each -> and ->> N/A
SQL query syntax uses valid DuckDB syntax
Title and description checks ✅
Criteria Pass/Fail Suggestions
Title uses title case
Title accurately describes the query
Title contains limit value if in query
Description explains what the query does
Description explains why a user would run the query Add a sentence explaining the usefulness of this information.
Description is concise
Query relevance checks ✅
Criteria Pass/Fail Suggestions
Provides useful insights for this log type
Relevant to security, operational, or performance monitoring
Column selection checks ✅
Criteria Pass/Fail Suggestions
Aggregated queries should not include tp_index/tp_timestamp
Non-aggregated queries should have tp_timestamp as the first column N/A
Non-aggregated queries should include columns related to where the resources exist N/A
Non-aggregated queries should place columns related to where the resources exist last N/A
Non-aggregated queries should only include tp_index if missing index information in other columns N/A
Avoid selecting columns with fixed values in WHERE clause
Sorting strategy checks ✅
Criteria Pass/Fail Suggestions
Non-aggregated queries default to tp_timestamp desc N/A
Aggregated queries ordered by count desc or time asc

Findings by Account and Region ❌

Query

Findings by Account and Region

Analyze security findings across your AWS organization with detailed severity information.

select
  tp_index as account_id,
  region,
  count(*) as finding_count,
  round(avg(severity.normalized), 2) as avg_severity,
  sum(case when severity.normalized >= 90 then 1 else 0 end) as critical_severity_count,
  sum(case when severity.normalized >= 70 and severity.normalized < 90 then 1 else 0 end) as high_severity_count,
  sum(case when severity.normalized >= 40 and severity.normalized < 70 then 1 else 0 end) as medium_severity_count,
  sum(case when severity.normalized >= 1 and severity.normalized < 40 then 1 else 0 end) as low_severity_count,
  sum(case when severity.normalized = 0 then 1 else 0 end) as informational_severity_count
from
  aws_securityhub_finding
group by
  account_id,
  region
order by
  critical_severity_count desc;
SQL syntax checks ❌
Criteria Pass/Fail Suggestions
Use 2 space indentation
Query should end with a semicolon
Keywords should be in lowercase
Each clause is on its own line
All columns exist in the schema The column 'severity.normalized' does not exist in the schema. Use 'severity -> 'normalized'' instead.
STRUCT type columns use dot notation Use 'severity -> 'normalized'' instead of 'severity.normalized'
JSON type columns use -> and ->> operators Use 'severity -> 'normalized'' instead of 'severity.normalized'
JSON type columns are wrapped in parenthesis Wrap JSON column access in parentheses: (severity -> 'normalized')
Space before and after each -> and ->> Add spaces around '->' operator: severity -> 'normalized'
SQL query syntax uses valid DuckDB syntax
Title and description checks ✅
Criteria Pass/Fail Suggestions
Title uses title case
Title accurately describes the query
Title contains limit value if in query N/A
Description explains what the query does
Description explains why a user would run the query
Description is concise
Query relevance checks ✅
Criteria Pass/Fail Suggestions
Provides useful insights for this log type
Relevant to security, operational, or performance monitoring
Column selection checks ❌
Criteria Pass/Fail Suggestions
Aggregated queries should not include tp_index/tp_timestamp Remove 'tp_index as account_id' from the SELECT clause
Non-aggregated queries should have tp_timestamp as the first column N/A
Non-aggregated queries should include columns related to where the resources exist N/A
Non-aggregated queries should place columns related to where the resources exist last N/A
Non-aggregated queries should only include tp_index if missing index information in other columns N/A
Avoid selecting columns with fixed values in WHERE clause N/A
Sorting strategy checks ✅
Criteria Pass/Fail Suggestions
Non-aggregated queries default to tp_timestamp desc N/A
Aggregated queries ordered by count desc or time asc

Findings by Severity Level ✅

Query

Findings by Severity Level

Categorize Security Hub findings into severity bands with detailed counts and percentages.

select
  case
    when severity.normalized >= 90 then 'Critical (90-100)'
    when severity.normalized >= 70 then 'High (70-89)'
    when severity.normalized >= 40 then 'Medium (40-69)'
    when severity.normalized >= 1 then 'Low (1-39)'
    else 'Informational (0)'
  end as severity_level,
  count(*) as finding_count,
  round(count(*) * 100.0 / sum(count(*)) over(), 2) as percentage
from
  aws_securityhub_finding
group by
  severity_level
order by
  case severity_level
    when 'Critical (90-100)' then 1
    when 'High (70-89)' then 2
    when 'Medium (40-69)' then 3
    when 'Low (1-39)' then 4
    else 5
  end;
SQL syntax checks ✅
Criteria Pass/Fail Suggestions
Use 2 space indentation
Query should end with a semicolon
Keywords should be in lowercase
Each clause is on its own line
All columns exist in the schema
STRUCT type columns use dot notation
JSON type columns use -> and ->> operators N/A
JSON type columns are wrapped in parenthesis N/A
Space before and after each -> and ->> N/A
SQL query syntax uses valid DuckDB syntax
Title and description checks ✅
Criteria Pass/Fail Suggestions
Title uses title case
Title accurately describes the query
Title contains limit value if in query N/A
Description explains what the query does
Description explains why a user would run the query
Description is concise
Query relevance checks ✅
Criteria Pass/Fail Suggestions
Provides useful insights for this log type
Relevant to security, operational, or performance monitoring
Column selection checks ✅
Criteria Pass/Fail Suggestions
Aggregated queries should not include tp_index/tp_timestamp
Non-aggregated queries should have tp_timestamp as the first column N/A
Non-aggregated queries should include columns related to where the resources exist N/A
Non-aggregated queries should place columns related to where the resources exist last N/A
Non-aggregated queries should only include tp_index if missing index information in other columns N/A
Avoid selecting columns with fixed values in WHERE clause
Sorting strategy checks ✅
Criteria Pass/Fail Suggestions
Non-aggregated queries default to tp_timestamp desc N/A
Aggregated queries ordered by count desc or time asc

Compliance Status Overview ✅

Query

Compliance Status Overview

Monitor compliance status with detailed severity information.

select
  compliance.status,
  compliance.security_control_id,
  count(*) as finding_count,
  round(avg(severity.normalized), 2) as avg_severity
from
  aws_securityhub_finding
where
  compliance is not null
group by
  compliance.status,
  compliance.security_control_id
order by
  finding_count desc;
SQL syntax checks ✅
Criteria Pass/Fail Suggestions
Use 2 space indentation
Query should end with a semicolon
Keywords should be in lowercase
Each clause is on its own line
All columns exist in the schema
STRUCT type columns use dot notation
JSON type columns use -> and ->> operators N/A
JSON type columns are wrapped in parenthesis N/A
Space before and after each -> and ->> N/A
SQL query syntax uses valid DuckDB syntax
Title and description checks ✅
Criteria Pass/Fail Suggestions
Title uses title case
Title accurately describes the query
Title contains limit value if in query N/A
Description explains what the query does
Description explains why a user would run the query Add a sentence explaining the benefits of monitoring compliance status.
Description is concise
Query relevance checks ✅
Criteria Pass/Fail Suggestions
Provides useful insights for this log type
Relevant to security, operational, or performance monitoring
Column selection checks ✅
Criteria Pass/Fail Suggestions
Aggregated queries should not include tp_index/tp_timestamp
Non-aggregated queries should have tp_timestamp as the first column N/A
Non-aggregated queries should include columns related to where the resources exist N/A
Non-aggregated queries should place columns related to where the resources exist last N/A
Non-aggregated queries should only include tp_index if missing index information in other columns N/A
Avoid selecting columns with fixed values in WHERE clause
Sorting strategy checks ✅
Criteria Pass/Fail Suggestions
Non-aggregated queries default to tp_timestamp desc N/A
Aggregated queries ordered by count desc or time asc

Detect High Severity Findings with Remediation ❌

Query

Detect High Severity Findings with Remediation

select
  tp_timestamp,
  title,
  types,
  severity,
  description,
  tp_index as account_id,
  region,
  resources,
  remediation.recommendation.text as remediation_text
from
  aws_securityhub_finding
where
  severity.normalized >= 70
order by
  severity.normalized desc,
  tp_timestamp desc;
SQL syntax checks ❌
Criteria Pass/Fail Suggestions
Use 2 space indentation
Query should end with a semicolon
Keywords should be in lowercase
Each clause is on its own line
All columns exist in the schema
STRUCT type columns use dot notation
JSON type columns use -> and ->> operators Use -> or ->> for accessing JSON fields, e.g., remediation->'recommendation'->>'text'
JSON type columns are wrapped in parenthesis Wrap JSON access in parentheses, e.g., (remediation->'recommendation'->>'text')
Space before and after each -> and ->> Add spaces around -> and ->> operators
SQL query syntax uses valid DuckDB syntax
Title and description checks ❌
Criteria Pass/Fail Suggestions
Title uses title case
Title accurately describes the query
Title contains limit value if in query N/A
Description explains what the query does Add a description explaining what the query does
Description explains why a user would run the query Add a description explaining why a user would run this query
Description is concise Add a concise description
Query relevance checks ✅
Criteria Pass/Fail Suggestions
Provides useful insights for this log type
Relevant to security, operational, or performance monitoring
Column selection checks ❌
Criteria Pass/Fail Suggestions
Aggregated queries should not include tp_index/tp_timestamp N/A
Non-aggregated queries should have tp_timestamp as the first column
Non-aggregated queries should include columns related to where the resources exist
Non-aggregated queries should place columns related to where the resources exist last Move account_id and region to the end of the SELECT statement
Non-aggregated queries should only include tp_index if missing index information in other columns Remove tp_index as account_id is already present
Avoid selecting columns with fixed values in WHERE clause
Sorting strategy checks ✅
Criteria Pass/Fail Suggestions
Non-aggregated queries default to tp_timestamp desc
Aggregated queries ordered by count desc or time asc N/A

Lambda Function Security Issues ❌

Query

Lambda Function Security Issues

Identify security issues in Lambda functions, focusing on public access.

select
  tp_timestamp,
  title,
  severity.normalized as severity,
  json_extract(resources, '$[0].id') as function_arn,
  json_extract(resources, '$[0].details.awslambdafunction.runtime') as runtime,
  workflow_state
from
  aws_securityhub_finding
where
  json_extract(resources, '$[0].type') = '"AwsLambdaFunction"'
  and title ilike '%public access%'
  and severity.normalized >= 70
order by
  severity desc,
  tp_timestamp desc;
SQL syntax checks ❌
Criteria Pass/Fail Suggestions
Use 2 space indentation
Query should end with a semicolon
Keywords should be in lowercase
Each clause is on its own line
All columns exist in the schema
STRUCT type columns use dot notation
JSON type columns use -> and ->> operators Use -> or ->> operators instead of json_extract function
JSON type columns are wrapped in parenthesis Wrap JSON column accesses in parentheses
Space before and after each -> and ->> Add spaces before and after -> or ->> operators
SQL query syntax uses valid DuckDB syntax Replace json_extract with DuckDB JSON operators
Title and description checks ✅
Criteria Pass/Fail Suggestions
Title uses title case
Title accurately describes the query
Title contains limit value if in query N/A
Description explains what the query does
Description explains why a user would run the query
Description is concise
Query relevance checks ✅
Criteria Pass/Fail Suggestions
Provides useful insights for this log type
Relevant to security, operational, or performance monitoring
Column selection checks ✅
Criteria Pass/Fail Suggestions
Aggregated queries should not include tp_index/tp_timestamp N/A
Non-aggregated queries should have tp_timestamp as the first column
Non-aggregated queries should include columns related to where the resources exist
Non-aggregated queries should place columns related to where the resources exist last
Non-aggregated queries should only include tp_index if missing index information in other columns N/A
Avoid selecting columns with fixed values in WHERE clause
Sorting strategy checks ✅
Criteria Pass/Fail Suggestions
Non-aggregated queries default to tp_timestamp desc
Aggregated queries ordered by count desc or time asc N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review Triggers a workflow to review and validate queries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants