Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS Sink additional scenarios. #2

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

bijay27bit
Copy link
Owner

No description provided.

@bijay27bit
Copy link
Owner Author

@itsmekumari Please review internal PR for GCS Sink additional scenarios. Thanks

@@ -265,3 +265,175 @@ Feature: GCS sink - Verification of GCS Sink plugin
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket

#Added new scenarios for GCS Sink - Bijay
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the commented line here.


#Added new scenarios for GCS Sink - Bijay
@BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario:Validate successful records transfer from BigQuery to GCS with macro enabled at sink
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the macro scenario in a separate feature file with name macro, refer other plugins feature file for naming convention.

Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket

@GCS_SINK_TEST @BQ_SOURCE_TEST @GCS_Sink_Required
Copy link
Collaborator

@itsmekumari itsmekumari Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario should be from GCS source to GCS sink right? Re-check and change accordingly. And why are we making it a macro scenarios, it is already covered in macro enabled scenario anyways.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itsmekumari
This was the scenario: Verify and validate an end to end test case from BigQuery source to GCS sink using the Advanced File System Properties field
Please suggest.

@GCS_SINK_TEST @BQ_SOURCE_TEST @GCS_Sink_Required
Scenario Outline: To verify successful data transfer from BigQuery to GCS for different formats with write header true
Given Open Datafusion Project to configure pipeline
When Source is BigQuery
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the latest existing steps from framework.

Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the validation steps in all the scenarios.

Then Close the GCS properties
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "gcsFileSysProperty" for key "FileSystemPr"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any value added in parameter file for file system property

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File: PluginParameters.properties
Line # 112
gcsFileSysProperty={"textinputformat.record.delimiter": "@"}

| tsv | text/plain |

@BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario: To verify data is getting transferred successfully from BigQuery to GCS using advanced file system properties field
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding macro here again, It is already covered in macro enabled scenario. It should be for without macro enabled

@BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario: To verify data is getting transferred successfully from BigQuery to GCS using advanced file system properties field
Given Open Datafusion Project to configure pipeline
When Source is BigQuery
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the latest existing steps. This is a common review comment across all scenarios.

@@ -65,3 +65,37 @@ Feature: GCS sink - Verify GCS Sink plugin error scenarios
Then Select GCS property format "csv"
Then Click on the Validate button
Then Verify that the Plugin Property: "format" is displaying an in-line error message: "errorMessageInvalidFormat"

@GCS_SINK_TEST @BQ_SOURCE_TEST
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the tag order, for ease of understanding.

Then Verify data is transferred to target GCS bucket

@GCS_SINK_TEST @BQ_SOURCE_TEST
Scenario Outline: To verify data is getting transferred successfully from BigQuery to GCS with contenttype selection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the validation for file format as well

Copy link
Collaborator

@itsmekumari itsmekumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the review comments.

Then Run the Pipeline in Runtime
Then Enter runtime argument value "gcsInvalidBucketNameSink" for key "gcsSinkPath"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add Open and capture and close logs steps

@@ -33,4 +33,5 @@ errorMessageMultipleFileWithoutClearDefaultSchema=Found a row with 4 fields when
errorMessageInvalidSourcePath=Invalid bucket name in path 'abc@'. Bucket name should
errorMessageInvalidDestPath=Invalid bucket name in path 'abc@'. Bucket name should
errorMessageInvalidEncryptionKey=CryptoKeyName.parse: formattedString not in valid format: Parameter "abc@" must be
errorMessageInvalidBucketNameSink=Spark program 'phase-1' failed with error: Errors were encountered during validation. Error code: 400, Unable to read or access GCS bucket. Bucket names must be at least 3 characters in length, got 2: 'gg'. Please check the system logs for more details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add too much message. Just add relevant error message

@@ -33,4 +33,5 @@ errorMessageMultipleFileWithoutClearDefaultSchema=Found a row with 4 fields when
errorMessageInvalidSourcePath=Invalid bucket name in path 'abc@'. Bucket name should
errorMessageInvalidDestPath=Invalid bucket name in path 'abc@'. Bucket name should
errorMessageInvalidEncryptionKey=CryptoKeyName.parse: formattedString not in valid format: Parameter "abc@" must be
errorMessageInvalidBucketNameSink=Spark program 'phase-1' failed with error: Errors were encountered during validation. Error code: 400, Unable to read or access GCS bucket. Bucket names must be at least 3 characters in length, got 2: 'gg'. Please check the system logs for more details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this error message it is showing bucket name should be atleast 3 characters. Change the bucket name to more than 3 characters.

@@ -159,6 +159,11 @@ gcsParquetFileSchema=[{"key":"workforce","value":"string"},{"key":"report_year",
{"key":"race_black","value":"long"},{"key":"race_hispanic_latinx","value":"long"},\
{"key":"race_native_american","value":"long"},{"key":"race_white","value":"long"},\
{"key":"tablename","value":"string"}]
gcsInvalidBucketNameSink=gg

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here to change it to more 3 characters.

@bijay27bit
Copy link
Owner Author

@itsmekumari All review comments have been addressed. Please help to merge with master branch. Thanks

@bijay27bit bijay27bit force-pushed the E2EgcsNewChangesSink_BT branch 6 times, most recently from caf0a54 to 9c57d24 Compare January 14, 2025 07:52
@bijay27bit bijay27bit force-pushed the E2EgcsNewChangesSink_BT branch from c4e93b9 to e8d2829 Compare January 17, 2025 06:37
@bijay27bit bijay27bit force-pushed the E2EgcsNewChangesSink_BT branch from 0bbe285 to 7d5f5fa Compare January 17, 2025 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants