Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(export): add API export system #6878

Merged
merged 42 commits into from
Feb 26, 2025
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
a790a50
feat(export): add api export system
AdriiiPRodri Feb 4, 2025
63b59e4
chore: apply ruff
AdriiiPRodri Feb 10, 2025
326fddd
Merge branch 'master' into PRWLR-5956-Export-Artifacts-only
AdriiiPRodri Feb 10, 2025
82d53c5
chore: update api schema
AdriiiPRodri Feb 10, 2025
d5e2d75
chore: update api changelog
AdriiiPRodri Feb 10, 2025
747b97f
ref: improve export code
AdriiiPRodri Feb 12, 2025
f7e2740
fix: add condition before close the csv file
AdriiiPRodri Feb 12, 2025
492e9f2
fix: solve duplicated findings
AdriiiPRodri Feb 12, 2025
32e880e
chore: restore rls.py
AdriiiPRodri Feb 12, 2025
d90b4fa
chore: remove comment
AdriiiPRodri Feb 12, 2025
7e7da99
ref: move the api output folder
AdriiiPRodri Feb 12, 2025
cbf8cf7
fix: html close file
AdriiiPRodri Feb 12, 2025
820a880
chore: ruff format
AdriiiPRodri Feb 12, 2025
41aec46
Merge branch 'master' into PRWLR-5956-Export-Artifacts-only
AdriiiPRodri Feb 12, 2025
621e71c
ref: improve code
AdriiiPRodri Feb 13, 2025
d988977
test: add export unittests
AdriiiPRodri Feb 13, 2025
7139683
chore: rename variables
AdriiiPRodri Feb 18, 2025
79aded5
Merge branch 'master' into PRWLR-5956-Export-Artifacts-only
AdriiiPRodri Feb 18, 2025
97d55d7
fix: fix the batch writing when launching the CLI
AdriiiPRodri Feb 18, 2025
38fb72e
fix: s3 unittests tests
AdriiiPRodri Feb 18, 2025
afce5dc
chore: api format
AdriiiPRodri Feb 18, 2025
edbd185
feat: add auth_method and partition information
AdriiiPRodri Feb 19, 2025
df6ba83
feat: add new queue for the reports
AdriiiPRodri Feb 19, 2025
dcf3965
fix: add retry for
AdriiiPRodri Feb 19, 2025
b9b6e18
fix: remove retry due to an error
AdriiiPRodri Feb 19, 2025
f6f121b
feat: add missing finding transformations
AdriiiPRodri Feb 20, 2025
b32cd99
feat: add new reports status code and naming
AdriiiPRodri Feb 21, 2025
1128d54
fix: unittests
AdriiiPRodri Feb 21, 2025
3105abc
fix: rmtree error
AdriiiPRodri Feb 24, 2025
fa1251a
test: add gcp, azure, m365 and k8s unittests for transform_api_output
AdriiiPRodri Feb 25, 2025
f83016e
Merge branch 'master' into PRWLR-5956-Export-Artifacts-only
AdriiiPRodri Feb 25, 2025
b182295
chore: format
AdriiiPRodri Feb 25, 2025
a7e8fba
chore: api format
AdriiiPRodri Feb 25, 2025
d9db052
Merge branch 'master' into PRWLR-5956-Export-Artifacts-only
AdriiiPRodri Feb 25, 2025
3adc8c9
chore: update migration order
AdriiiPRodri Feb 25, 2025
5a1f46c
chore: change schema location
AdriiiPRodri Feb 25, 2025
cf761ab
fix: add missing azure and gcp fields
AdriiiPRodri Feb 25, 2025
f5f62f8
fix: remove parameter used for HTML output
AdriiiPRodri Feb 25, 2025
88a9640
fix: add default value to azure
AdriiiPRodri Feb 26, 2025
28c4f89
fix: change response api type
AdriiiPRodri Feb 26, 2025
c3dff6a
fix: k8s namespace field
AdriiiPRodri Feb 26, 2025
b803f4e
fix: fix resource for kubernetes unittest
AdriiiPRodri Feb 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,23 @@ VALKEY_HOST=valkey
VALKEY_PORT=6379
VALKEY_DB=0

# API scan settings
# The AWS access key to be used when uploading scan artifacts to an S3 bucket
# If left empty, default AWS credentials resolution behavior will be used
DJANGO_ARTIFACTS_AWS_ACCESS_KEY_ID=""

# The AWS secret key to be used when uploading scan artifacts to an S3 bucket
DJANGO_ARTIFACTS_AWS_SECRET_ACCESS_KEY=""

# An optional AWS session token
DJANGO_ARTIFACTS_AWS_SESSION_TOKEN=""

# The AWS region where your S3 bucket is located (e.g., "us-east-1")
DJANGO_ARTIFACTS_AWS_DEFAULT_REGION=""

# The name of the S3 bucket where scan artifacts should be stored
DJANGO_ARTIFACTS_AWS_S3_OUTPUT_BUCKET=""

# Django settings
DJANGO_ALLOWED_HOSTS=localhost,127.0.0.1,prowler-api
DJANGO_BIND_ADDRESS=0.0.0.0
Expand Down
7 changes: 7 additions & 0 deletions api/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ All notable changes to the **Prowler API** are documented in this file.
## [Unreleased]


---

## [v1.5.0] (Prowler v5.4.0) - 2025-XX-XX

### Added
- Add API scan report system, now all scans launched from the API will generate a compressed file with the report in OCSF, CSV and HTML formats [(#6878)](https://github.com/prowler-cloud/prowler/pull/6878).

---

## [v1.4.0] (Prowler v5.3.0) - 2025-02-10
Expand Down
45 changes: 27 additions & 18 deletions api/src/backend/api/decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from api.db_utils import POSTGRES_TENANT_VAR, SET_CONFIG_QUERY


def set_tenant(func):
def set_tenant(func=None, *, keep_tenant=False):
"""
Decorator to set the tenant context for a Celery task based on the provided tenant_id.

Expand Down Expand Up @@ -40,20 +40,29 @@ def some_task(arg1, **kwargs):
# The tenant context will be set before the task logic executes.
"""

@wraps(func)
@transaction.atomic
def wrapper(*args, **kwargs):
try:
tenant_id = kwargs.pop("tenant_id")
except KeyError:
raise KeyError("This task requires the tenant_id")
try:
uuid.UUID(tenant_id)
except ValueError:
raise ValidationError("Tenant ID must be a valid UUID")
with connection.cursor() as cursor:
cursor.execute(SET_CONFIG_QUERY, [POSTGRES_TENANT_VAR, tenant_id])

return func(*args, **kwargs)

return wrapper
def decorator(func):
@wraps(func)
@transaction.atomic
def wrapper(*args, **kwargs):
try:
if not keep_tenant:
tenant_id = kwargs.pop("tenant_id")
else:
tenant_id = kwargs["tenant_id"]
except KeyError:
raise KeyError("This task requires the tenant_id")
try:
uuid.UUID(tenant_id)
except ValueError:
raise ValidationError("Tenant ID must be a valid UUID")
with connection.cursor() as cursor:
cursor.execute(SET_CONFIG_QUERY, [POSTGRES_TENANT_VAR, tenant_id])

return func(*args, **kwargs)

return wrapper

if func is None:
return decorator
else:
return decorator(func)
22 changes: 22 additions & 0 deletions api/src/backend/api/migrations/0010_scan_report_output.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Generated by Django 5.1.5 on 2025-02-07 10:59

from django.db import migrations, models


class Migration(migrations.Migration):
dependencies = [
("api", "0009_increase_provider_uid_maximum_length"),
]

operations = [
migrations.AddField(
model_name="scan",
name="output_path",
field=models.CharField(blank=True, max_length=200, null=True),
),
migrations.AddField(
model_name="scan",
name="upload_to_s3",
field=models.BooleanField(blank=True, null=True),
),
]
2 changes: 2 additions & 0 deletions api/src/backend/api/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,8 @@ class TriggerChoices(models.TextChoices):
scheduler_task = models.ForeignKey(
PeriodicTask, on_delete=models.CASCADE, null=True, blank=True
)
output_path = models.CharField(blank=True, null=True, max_length=200)
upload_to_s3 = models.BooleanField(blank=True, null=True)
# TODO: mutelist foreign key

class Meta(RowLevelSecurityProtectedModel.Meta):
Expand Down
33 changes: 33 additions & 0 deletions api/src/backend/api/specs/v1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4105,6 +4105,39 @@ paths:
schema:
$ref: '#/components/schemas/ScanUpdateResponse'
description: ''
/api/v1/scans/{id}/report:
get:
operationId: scans_report_retrieve
description: Returns a ZIP file containing the requested report
summary: Download ZIP report
parameters:
- in: query
name: fields[scan-reports]
schema:
type: array
items:
type: string
enum:
- id
description: endpoint return only specific fields in the response on a per-type
basis by including a fields[TYPE] query parameter.
explode: false
- in: path
name: id
schema:
type: string
format: uuid
description: A UUID string identifying this scan.
required: true
tags:
- Scan
security:
- jwtAuth: []
responses:
'200':
description: Report obtained successfully
'423':
description: There is a problem with the AWS credentials
/api/v1/schedules/daily:
post:
operationId: schedules_daily_create
Expand Down
7 changes: 4 additions & 3 deletions api/src/backend/api/tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,9 +274,10 @@ def test_invitation_expired(self, invitation):
expired_time = datetime.now(timezone.utc) - timedelta(days=1)
invitation.expires_at = expired_time

with patch("api.utils.Invitation.objects.using") as mock_using, patch(
"api.utils.datetime"
) as mock_datetime:
with (
patch("api.utils.Invitation.objects.using") as mock_using,
patch("api.utils.datetime") as mock_datetime,
):
mock_db = mock_using.return_value
mock_db.get.return_value = invitation
mock_datetime.now.return_value = datetime.now(timezone.utc)
Expand Down
8 changes: 8 additions & 0 deletions api/src/backend/api/v1/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -819,6 +819,14 @@ class Meta:
]


class ScanReportSerializer(serializers.Serializer):
id = serializers.CharField(source="scan")

class Meta:
resource_name = "scan-reports"
fields = ["id"]


class ResourceTagSerializer(RLSSerializer):
"""
Serializer for the ResourceTag model
Expand Down
97 changes: 92 additions & 5 deletions api/src/backend/api/v1/views.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
import glob
import os

import boto3
from botocore.exceptions import ClientError, NoCredentialsError, ParamValidationError
from celery.result import AsyncResult
from config.env import env
from django.conf import settings as django_settings
from django.contrib.postgres.aggregates import ArrayAgg
from django.contrib.postgres.search import SearchQuery
from django.db import transaction
from django.db.models import Count, F, OuterRef, Prefetch, Q, Subquery, Sum
from django.http import HttpResponse
from django.db.models.functions import Coalesce
from django.urls import reverse
from django.utils.decorators import method_decorator
Expand Down Expand Up @@ -35,7 +42,6 @@
check_provider_connection_task,
delete_provider_task,
delete_tenant_task,
perform_scan_summary_task,
perform_scan_task,
)

Expand Down Expand Up @@ -114,6 +120,7 @@
RoleSerializer,
RoleUpdateSerializer,
ScanCreateSerializer,
ScanReportSerializer,
ScanSerializer,
ScanUpdateSerializer,
ScheduleDailyCreateSerializer,
Expand Down Expand Up @@ -1073,6 +1080,8 @@ def get_serializer_class(self):
return ScanCreateSerializer
elif self.action == "partial_update":
return ScanUpdateSerializer
elif self.action == "report":
return ScanReportSerializer
return super().get_serializer_class()

def partial_update(self, request, *args, **kwargs):
Expand All @@ -1090,6 +1099,88 @@ def partial_update(self, request, *args, **kwargs):
)
return Response(data=read_serializer.data, status=status.HTTP_200_OK)

@extend_schema(
tags=["Scan"],
summary="Download ZIP report",
description="Returns a ZIP file containing the requested report",
request=ScanReportSerializer,
responses={
200: OpenApiResponse(description="Report obtained successfully"),
423: OpenApiResponse(
description="There is a problem with the AWS credentials"
),
},
)
@action(detail=True, methods=["get"], url_name="report")
def report(self, request, pk=None):
scan_instance = Scan.objects.get(pk=pk)
output_path = scan_instance.output_path

if not output_path:
return Response(
{"detail": "No files found"}, status=status.HTTP_404_NOT_FOUND
)

if scan_instance.upload_to_s3:
s3_client = None
try:
s3_client = boto3.client(
"s3",
aws_access_key_id=env.str("DJANGO_ARTIFACTS_AWS_ACCESS_KEY_ID"),
aws_secret_access_key=env.str(
"DJANGO_ARTIFACTS_AWS_SECRET_ACCESS_KEY"
),
aws_session_token=env.str("DJANGO_ARTIFACTS_AWS_SESSION_TOKEN"),
region_name=env.str("DJANGO_ARTIFACTS_AWS_DEFAULT_REGION"),
)
s3_client.list_buckets()
except (ClientError, NoCredentialsError, ParamValidationError):
try:
s3_client = boto3.client("s3")
s3_client.list_buckets()
except (ClientError, NoCredentialsError, ParamValidationError):
return Response(
{"detail": "There is a problem with the AWS credentials."},
status=status.HTTP_423_LOCKED,
)

bucket_name = env.str("DJANGO_ARTIFACTS_AWS_S3_OUTPUT_BUCKET")

try:
key = output_path[len(f"s3://{bucket_name}/") :]
s3_object = s3_client.get_object(Bucket=bucket_name, Key=key)
file_content = s3_object["Body"].read()
filename = os.path.basename(output_path.split("/")[-1])
except ClientError:
return Response(
{"detail": "Error accessing cloud storage"},
status=status.HTTP_500_INTERNAL_SERVER_ERROR,
)

else:
zip_files = glob.glob(output_path)
if not zip_files:
return Response(
{"detail": "No local files found"}, status=status.HTTP_404_NOT_FOUND
)

try:
file_path = zip_files[0]
with open(file_path, "rb") as f:
file_content = f.read()
filename = os.path.basename(file_path)
except IOError:
return Response(
{"detail": "Error reading local file"},
status=status.HTTP_500_INTERNAL_SERVER_ERROR,
)

response = HttpResponse(
file_content, content_type="application/x-zip-compressed"
)
response["Content-Disposition"] = f'attachment; filename="{filename}"'
return response

def create(self, request, *args, **kwargs):
input_serializer = self.get_serializer(data=request.data)
input_serializer.is_valid(raise_exception=True)
Expand All @@ -1104,10 +1195,6 @@ def create(self, request, *args, **kwargs):
# Disabled for now
# checks_to_execute=scan.scanner_args.get("checks_to_execute"),
},
link=perform_scan_summary_task.si(
tenant_id=self.request.tenant_id,
scan_id=str(scan.id),
),
)

scan.task_id = task.id
Expand Down
Loading
Loading