Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(aws): add new check cloudwatch_log_group_no_critical_pii_in_logs #5494

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/tutorials/configuration_file.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ The following list includes all the AWS checks with configurable variables that
| `cloudtrail_threat_detection_privilege_escalation` | `threat_detection_privilege_escalation_entropy` | Integer |
| `cloudtrail_threat_detection_privilege_escalation` | `threat_detection_privilege_escalation_minutes` | Integer |
| `cloudwatch_log_group_no_secrets_in_logs` | `secrets_ignore_patterns` | List of Strings |
| `cloudwatch_log_group_no_critical_pii_in_logs` | `critical_pii_entities` | List of Strings |
| `cloudwatch_log_group_no_critical_pii_in_logs` | `pii_language` | String |
| `cloudwatch_log_group_retention_policy_specific_days_enabled` | `log_group_retention_days` | Integer |
| `codebuild_project_no_secrets_in_variables` | `excluded_sensitive_environment_variables` | List of Strings |
| `codebuild_project_no_secrets_in_variables` | `secrets_ignore_patterns` | List of Strings |
Expand Down
693 changes: 685 additions & 8 deletions poetry.lock

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions prowler/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,31 @@ aws:
# AWS Cloudwatch Configuration
# aws.cloudwatch_log_group_retention_policy_specific_days_enabled --> by default is 365 days
log_group_retention_days: 365
# aws.cloudwatch_log_group_no_critical_pii_in_logs --> see all available entities in https://microsoft.github.io/presidio/supported_entities/
critical_pii_entities : [
"CREDIT_CARD", # Credit card numbers are highly sensitive financial information.
"CRYPTO", # Crypto wallet numbers (e.g., Bitcoin addresses) can give access to cryptocurrency.
"IBAN_CODE", # International Bank Account Numbers are critical financial information.
"US_BANK_NUMBER", # US bank account numbers are sensitive and should be protected.
"US_SSN", # US Social Security Numbers are critical PII used for identity verification.
"US_PASSPORT", # US passport numbers can be used for identity theft.
"US_ITIN", # US Individual Taxpayer Identification Numbers are sensitive personal identifiers.
#"UK_NHS", # UK NHS numbers can be used to access medical records and other private information.
#"ES_NIF", # Spanish NIF (Personal tax ID) is critical for identification and tax purposes.
#"ES_NIE", # Spanish NIE (Foreigners ID card) is a critical identifier for foreign residents.
#"IT_FISCAL_CODE", # Italian personal identification code is sensitive PII for tax and legal purposes.
#"IT_PASSPORT", # Italian passport numbers are critical PII.
#"IT_IDENTITY_CARD", # Italian identity card numbers are critical for personal identification.
#"PL_PESEL", # Polish PESEL numbers are sensitive personal identifiers.
#"SG_NRIC_FIN", # Singapore National Registration Identification Card is critical PII.
#"AU_ABN", # Australian Business Numbers are critical for business identification.
#"AU_TFN", # Australian Tax File Numbers are sensitive and used for taxation purposes.
#"AU_MEDICARE", # Australian Medicare numbers are sensitive medical identifiers.
#"IN_PAN", # Indian Permanent Account Numbers are critical for tax purposes and identity.
#"IN_AADHAAR", # Indian Aadhaar numbers are highly sensitive and serve as a universal identity number.
#"FI_PERSONAL_IDENTITY_CODE" # Finnish Personal Identity Code is sensitive PII for personal identification.
]
pii_language: "en" # Language for recognizing PII entities

# AWS AppStream Session Configuration
# aws.appstream_fleet_session_idle_disconnect_timeout
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"Provider": "aws",
"CheckID": "cloudwatch_log_group_no_critical_pii_in_logs",
"CheckTitle": "Check if secrets exists in CloudWatch logs.",
"CheckType": [],
"ServiceName": "cloudwatch",
"SubServiceName": "",
"ResourceIdTemplate": "arn:partition:cloudwatch:region:account-id:log-group/resource-id",
"Severity": "medium",
"ResourceType": "Other",
"Description": "Check if secrets exists in CloudWatch logs",
"Risk": "Storing sensitive data in CloudWatch logs could allow an attacker with read-only access to escalate their privileges or gain unauthorised access to systems.",
"RelatedUrl": "https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudwatch-alarms-for-cloudtrail.html",
"Remediation": {
"Code": {
"CLI": "",
"NativeIaC": "",
"Other": "",
"Terraform": ""
},
"Recommendation": {
"Text": "It is recommended that sensitive information is not logged to CloudWatch logs. Alternatively, sensitive data may be masked using a protection policy",
"Url": "https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/mask-sensitive-log-data.html"
}
},
"Categories": [
"secrets"
],
"DependsOn": [],
"RelatedTo": [],
"Notes": ""
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
from json import dumps
from typing import Set

from presidio_analyzer import AnalyzerEngine

from prowler.lib.check.models import Check, Check_Report_AWS
from prowler.providers.aws.services.cloudwatch.cloudwatch_service import (
convert_to_cloudwatch_timestamp_format,
)
from prowler.providers.aws.services.cloudwatch.logs_client import logs_client


class cloudwatch_log_group_no_critical_pii_in_logs(Check):
def execute(self):
findings = []

# Initialize the PII Analyzer engine
analyzer = AnalyzerEngine()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done after line 20 at least not to get anything if no logs.


if logs_client.log_groups:
critical_pii_entities = logs_client.audit_config.get(
"critical_pii_entities",
[
"CREDIT_CARD",
"EMAIL_ADDRESS",
"PHONE_NUMBER",
"US_SSN",
"US_BANK_NUMBER",
"IBAN_CODE",
"US_PASSPORT",
],
)
pii_language = logs_client.audit_config.get("pii_language", "en")
for log_group in logs_client.log_groups.values():
report = Check_Report_AWS(self.metadata())
report.status = "PASS"
report.status_extended = (
f"No critical PII found in {log_group.name} log group."
)
report.region = log_group.region
report.resource_id = log_group.name
report.resource_arn = log_group.arn
report.resource_tags = log_group.tags
log_group_pii = []

if log_group.log_streams:
for log_stream_name in log_group.log_streams:
log_stream_pii = {}
log_stream_events = [
dumps(event["message"])
for event in log_group.log_streams[log_stream_name]
]

# Process log data in manageable chunks since the limit of Presidio Analyzer is 100,000 characters
MAX_CHUNK_SIZE = 100000
for i in range(0, len(log_stream_events)):
chunk = log_stream_events[i]

# Split if chunk exceeds max allowed size for analyzer
if len(chunk) > MAX_CHUNK_SIZE:
split_chunks = [

Check warning on line 61 in prowler/providers/aws/services/cloudwatch/cloudwatch_log_group_no_critical_pii_in_logs/cloudwatch_log_group_no_critical_pii_in_logs.py

View check run for this annotation

Codecov / codecov/patch

prowler/providers/aws/services/cloudwatch/cloudwatch_log_group_no_critical_pii_in_logs/cloudwatch_log_group_no_critical_pii_in_logs.py#L61

Added line #L61 was not covered by tests
chunk[j : j + MAX_CHUNK_SIZE]
for j in range(0, len(chunk), MAX_CHUNK_SIZE)
]
else:
split_chunks = [chunk]

for split_chunk in split_chunks:
# PII detection for each split chunk
pii_detection_result = analyzer.analyze(
text=split_chunk,
entities=critical_pii_entities,
score_threshold=1,
language=pii_language,
)

# Track cumulative character count to map PII to log event
cumulative_char_count = 0
for j, log_event in enumerate(
log_stream_events[i : i + len(split_chunks)]
):
log_event_length = len(log_event)
for pii in pii_detection_result:
# Check if PII start position falls within this log event
if (
cumulative_char_count
<= pii.start
< cumulative_char_count + log_event_length
):
flagged_event = log_group.log_streams[
log_stream_name
][j]
cloudwatch_timestamp = (
convert_to_cloudwatch_timestamp_format(
flagged_event["timestamp"]
)
)
if (
cloudwatch_timestamp
not in log_stream_pii
):
log_stream_pii[cloudwatch_timestamp] = (
SecretsDict()
)

# Add the detected PII entity to log_stream_pii
log_stream_pii[
cloudwatch_timestamp
].add_secret(
pii.start - cumulative_char_count,
pii.entity_type,
)
cumulative_char_count += (
log_event_length + 1
) # +1 to account for '\n'

if log_stream_pii:
pii_string = "; ".join(
[
f"at {timestamp} - {str(log_stream_pii[timestamp])}"
for timestamp in log_stream_pii
]
)
log_group_pii.append(
f"in log stream {log_stream_name} {pii_string}"
)
if log_group_pii:
pii_string = "; ".join(log_group_pii)
report.status = "FAIL"
report.status_extended = f"Potential critical PII found in log group {log_group.name} {pii_string}."
findings.append(report)
return findings


class SecretsDict(dict[int, Set[str]]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this class?

"""Dictionary to track unique PII types on each line."""

def add_secret(self, line_number: int, pii_type: str) -> None:
"""Add a PII type to a specific line number, ensuring no duplicates."""
self.setdefault(line_number, set()).add(pii_type)

def __str__(self) -> str:
"""Generate a formatted string representation of the dictionary."""
return ", ".join(
f"{', '.join(sorted(pii_types))} on line {line_number}"
for line_number, pii_types in sorted(self.items())
)
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@ def __init__(self, provider):
if (
"cloudwatch_log_group_no_secrets_in_logs"
in provider.audit_metadata.expected_checks
or "cloudwatch_log_group_no_critical_pii_in_logs"
in provider.audit_metadata.expected_checks
):
self.events_per_log_group_threshold = (
1000 # The threshold for number of events to return per log group.
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ microsoft-kiota-abstractions = "1.3.3"
msgraph-sdk = "1.8.0"
numpy = "2.0.2"
pandas = "2.2.3"
presidio-analyzer = "2.2.355"
py-ocsf-models = "0.2.0"
pydantic = "1.10.18"
python = ">=3.9,<3.13"
Expand Down
20 changes: 20 additions & 0 deletions tests/config/config_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,16 @@ def mock_prowler_get_latest_release(_, **kwargs):
"ec2_allowed_instance_owners": ["amazon-elb"],
"trusted_account_ids": [],
"log_group_retention_days": 365,
"critical_pii_entities": [
"CREDIT_CARD", # Credit card numbers are highly sensitive financial information.
"CRYPTO", # Crypto wallet numbers (e.g., Bitcoin addresses) can give access to cryptocurrency.
"IBAN_CODE", # International Bank Account Numbers are critical financial information.
"US_BANK_NUMBER", # US bank account numbers are sensitive and should be protected.
"US_SSN", # US Social Security Numbers are critical PII used for identity verification.
"US_PASSPORT", # US passport numbers can be used for identity theft.
"US_ITIN", # US Individual Taxpayer Identification Numbers are sensitive personal identifiers.
],
"pii_language": "en", # Language for recognizing PII entities
"max_idle_disconnect_timeout_in_seconds": 600,
"max_disconnect_timeout_in_seconds": 300,
"max_session_duration_seconds": 36000,
Expand Down Expand Up @@ -97,6 +107,16 @@ def mock_prowler_get_latest_release(_, **kwargs):
"fargate_windows_latest_version": "1.0.0",
"trusted_account_ids": [],
"log_group_retention_days": 365,
"critical_pii_entities": [
"CREDIT_CARD", # Credit card numbers are highly sensitive financial information.
"CRYPTO", # Crypto wallet numbers (e.g., Bitcoin addresses) can give access to cryptocurrency.
"IBAN_CODE", # International Bank Account Numbers are critical financial information.
"US_BANK_NUMBER", # US bank account numbers are sensitive and should be protected.
"US_SSN", # US Social Security Numbers are critical PII used for identity verification.
"US_PASSPORT", # US passport numbers can be used for identity theft.
"US_ITIN", # US Individual Taxpayer Identification Numbers are sensitive personal identifiers.
],
"pii_language": "en", # Language for recognizing PII entities
"max_idle_disconnect_timeout_in_seconds": 600,
"max_disconnect_timeout_in_seconds": 300,
"max_session_duration_seconds": 36000,
Expand Down
25 changes: 25 additions & 0 deletions tests/config/fixtures/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,31 @@ aws:
# AWS Cloudwatch Configuration
# aws.cloudwatch_log_group_retention_policy_specific_days_enabled --> by default is 365 days
log_group_retention_days: 365
# aws.cloudwatch_log_group_no_critical_pii_in_logs --> see all available entities in https://microsoft.github.io/presidio/supported_entities/
critical_pii_entities : [
"CREDIT_CARD", # Credit card numbers are highly sensitive financial information.
"CRYPTO", # Crypto wallet numbers (e.g., Bitcoin addresses) can give access to cryptocurrency.
"IBAN_CODE", # International Bank Account Numbers are critical financial information.
"US_BANK_NUMBER", # US bank account numbers are sensitive and should be protected.
"US_SSN", # US Social Security Numbers are critical PII used for identity verification.
"US_PASSPORT", # US passport numbers can be used for identity theft.
"US_ITIN", # US Individual Taxpayer Identification Numbers are sensitive personal identifiers.
#"UK_NHS", # UK NHS numbers can be used to access medical records and other private information.
#"ES_NIF", # Spanish NIF (Personal tax ID) is critical for identification and tax purposes.
#"ES_NIE", # Spanish NIE (Foreigners ID card) is a critical identifier for foreign residents.
#"IT_FISCAL_CODE", # Italian personal identification code is sensitive PII for tax and legal purposes.
#"IT_PASSPORT", # Italian passport numbers are critical PII.
#"IT_IDENTITY_CARD", # Italian identity card numbers are critical for personal identification.
#"PL_PESEL", # Polish PESEL numbers are sensitive personal identifiers.
#"SG_NRIC_FIN", # Singapore National Registration Identification Card is critical PII.
#"AU_ABN", # Australian Business Numbers are critical for business identification.
#"AU_TFN", # Australian Tax File Numbers are sensitive and used for taxation purposes.
#"AU_MEDICARE", # Australian Medicare numbers are sensitive medical identifiers.
#"IN_PAN", # Indian Permanent Account Numbers are critical for tax purposes and identity.
#"IN_AADHAAR", # Indian Aadhaar numbers are highly sensitive and serve as a universal identity number.
#"FI_PERSONAL_IDENTITY_CODE" # Finnish Personal Identity Code is sensitive PII for personal identification.
]
pii_language: "en" # Language for recognizing PII entities

# AWS AppStream Session Configuration
# aws.appstream_fleet_session_idle_disconnect_timeout
Expand Down
Loading