You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
S3 ingestion task throws unhandled exception when trying to read a parquet file, if assumeRoleArn is provided instead of access keys:
[2025-01-31, 08:00:44 UTC] {datalake_utils.py:69} ERROR - Error fetching file [olxgroup-reservoir-ares/local/odyn/jobs/poc_search_impressions_with_interactions/20230223_133234_04454_w9k7y_93212f86-8e23-4b69-99bf-4201991dbc52] using [S3Config] due to: [Error reading dataframe due to [Forbidden]]
[2025-01-31, 08:00:44 UTC] {status.py:91} WARNING - Wild error while creating Container from bucket details - 'NoneType' object has no attribute 'columns'
[2025-01-31, 08:00:44 UTC] {status.py:92} DEBUG - Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/storage/s3/metadata.py", line 156, in get_containers
yield from self._generate_structured_containers(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/storage/s3/metadata.py", line 382, in _generate_structured_containers
] = self._generate_container_details(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/storage/s3/metadata.py", line 297, in _generate_container_details
columns = self._get_columns(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/storage/storage_service.py", line 337, in _get_columns
extracted_cols = self.extract_column_definitions(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/storage/storage_service.py", line 320, in extract_column_definitions
column_parser = DataFrameColumnParser.create(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/datalake/datalake_utils.py", line 135, in create
parser = ParquetDataFrameColumnParser(data_frame)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/datalake/datalake_utils.py", line 427, in __init__
self._arrow_table = pa.Table.from_pandas(self.data_frame)
File "pyarrow/table.pxi", line 4525, in pyarrow.lib.Table.from_pandas
File "/home/airflow/.local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 570, in dataframe_to_arrays
convert_fields) = _get_columns_to_convert(df, schema, preserve_index,
File "/home/airflow/.local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 349, in _get_columns_to_convert
columns = _resolve_columns_of_interest(df, schema, columns)
File "/home/airflow/.local/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 523, in _resolve_columns_of_interest
columns = df.columns
AttributeError: 'NoneType' object has no attribute 'columns'
Note that Forbidden error is raised, even though the provided role has access to the file.
To Reproduce
Upload a parquet file to s3
Create/obtain IAM role with access to the bucket and file
Run S3 ingestion with assumeRoleArn configuration
Expected behavior
Parquet file should be read and file structure ingested to OpenMetadata.
Affected module
Ingestion Framework
Describe the bug
S3 ingestion task throws unhandled exception when trying to read a parquet file, if
assumeRoleArn
is provided instead of access keys:Note that
Forbidden
error is raised, even though the provided role has access to the file.To Reproduce
assumeRoleArn
configurationExpected behavior
Parquet file should be read and file structure ingested to OpenMetadata.
Version:
openmetadata-ingestion==1.6.3
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: