Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare astronomer-providers and oss airflow operators/sensors and scope out deprecation plan #1377

Closed
Lee-W opened this issue Dec 6, 2023 · 5 comments
Assignees
Labels
feature New feature or request

Comments

@Lee-W
Copy link
Contributor

Lee-W commented Dec 6, 2023

Is your feature request related to a problem? Please describe.
as most of the functionality has been contributed back to OSS airflow, we're going to leverage the code there and deprecate this project

Describe the solution you'd like

  1. Check the logic difference between OSS airflow and astronomer-providers operators/sensors
  2. Contribute the delta to the OSS provider
  3. Point the astronomer provider to the OSS provider
  4. Do the release and wait for a month or so before we remove the astronomer provider in a major release

Describe alternatives you've considered

Additional context

@Lee-W Lee-W added the feature New feature or request label Dec 6, 2023
@Lee-W Lee-W self-assigned this Dec 6, 2023
@Lee-W
Copy link
Contributor Author

Lee-W commented Dec 20, 2023

  1. amazon/aws
    1. sensors
      1. emr.py
        1. EmrContainerSensorAsync
        2. EmrStepSensorAsync
        3. EmrJobFlowSensorAsync 👷
      2. redshift_cluster.py
        1. RedshiftClusterSensorAsync 👷
          • not yet contributed back
      3. batch.py
        1. BatchSensorAsync
      4. s3.py
        1. S3KeySensorAsync 👷
          • different
        2. S3KeySizeSensorAsync 👷
          • different
        3. S3KeysUnchangedSensorAsync
        4. S3PrefixSensorAsync
          • already deprecated
    2. operators
      1. emr.py
        1. EmrContainerOperatorAsync
      2. redshift_cluster.py
        1. RedshiftDeleteClusterOperatorAsync, RedshiftResumeClusterOperatorAsync, RedshiftPauseClusterOperatorAsync
      3. batch.py
        1. BatchOperatorAsync 👷
      4. redshift_sql.py
        1. RedshiftSQLOperatorAsync 👷
          • not yet contributed back
      5. sagemaker.py
        1. SageMakerProcessingOperatorAsync
          • minor difference
            • Check before defer
        2. SageMakerTransformOperatorAsync
          • minor difference
            • end_time
        3. SageMakerTrainingOperatorAsync
          • different after if self.print_log:
      6. redshift_data.py 👷
        1. RedshiftDataOperatorAsync
          • not yet contributed back
  2. google/cloud
    1. sensors
      1. gcs.py
        1. GCSObjectExistenceSensorAsync
          • mostly the same
        2. GCSObjectsWithPrefixExistenceSensorAsync
          • mostly the same
        3. GCSUploadSessionCompleteSensorAsync
          • mostly the same
        4. GCSObjectUpdateSensorAsync
          • mostly the same
      2. bigquery.py
        1. BigQueryTableExistenceSensorAsync
          • mostly the same
    2. operators
      1. kubernetes_engine.py
        1. GKEStartPodOperatorAsync
          • different, OSS airflow use a newer implementation and deprecate our original implementation → IMO, we should deprecate ours
      2. bigquery.py
        1. BigQueryInsertJobOperatorAsync
          • minor difference
            • Check before defer
        2. BigQueryCheckOperatorAsync
          • mostly the same
        3. BigQueryGetDataOperatorAsync
          • minor difference
            • Check before defer
        4. BigQueryIntervalCheckOperatorAsync
          • minor difference
            • Check before defer
        5. BigQueryValueCheckOperatorAsync
          • mostly the same
      3. dataproc.py
        1. DataprocCreateClusterOperatorAsync
          • minor difference
            • Check before defer
        2. DataprocDeleteClusterOperatorAsync
          • minor difference
            • Check before defer
        3. DataprocSubmitJobOperatorAsync
          • mostly the same
        4. DataprocUpdateClusterOperatorAsync
          • minor difference
            • Check before defer
  3. microsoft/azure
    1. sensors
      1. data_factory.py
        1. AzureDataFactoryPipelineRunStatusSensorAsync
          • mostly the same
      2. wasb.py
        1. WasbBlobSensorAsync
          • mostly the same
        2. WasbPrefixSensorAsync
          • minor difference
            • missing args
    2. operators
      1. data_factory.py
        1. AzureDataFactoryRunPipelineOperatorAsync
          • mostly the same
  4. core/sensors
    1. filesystem.py
      1. FileSensorAsync
        • not yet contributed back
    2. external_task.py
      1. ExternalTaskSensorAsync
        • minor difference
          • handle DagStateTrigger
      2. ExternalDeploymentTaskSensorAsync
        • not yet contributed back
  5. dbt/cloud
    1. sensors/dbt.py
      1. DbtCloudJobRunSensorAsync
        • mostly the same
    2. operators/dbt.py
      1. DbtCloudRunJobOperatorAsync
        • mostly the same
  6. snowflake
    1. sensors/snowflake.py
      1. SnowflakeSensorAsync
        • not yet contributed back
    2. operators/snowflake.py
      1. SnowflakeOperatorAsync, SnowflakeSqlApiOperatorAsync
        • minor difference
          • check state before deferring → contribute to OSS
  7. databricks/operators/databricks.py
    1. DatabricksSubmitRunOperatorAsync, DatabricksRunNowOperatorAsync
      • minor difference
        • check state before deferring → contribute to OSS
  8. apache/hive/sensors
    1. named_hive_partition.py
      1. NamedHivePartitionSensorAsync
        • not yet contributed back
    2. hive_partition.py
      1. HivePartitionSensorAsync
        • not yet contributed back
  9. cncf/kubernetes/operators/kubernetes_pod.py
    1. KubernetesPodOperatorAsync
      • using different triggerer → use the oss one
  10. sftp/sensors/sftp.py
    1. SFTPSensorAsync
      • not yet contributed back
  11. http/sensors/http.py
    1. HttpSensorAsync
      • not yet contributed back
  12. apache/livy/operators/livy.py
    1. LivyOperatorAsync
      • mostly the same

@Lee-W
Copy link
Contributor Author

Lee-W commented Jan 2, 2024

apache/airflow#28874 (comment) for the hive one

@Lee-W
Copy link
Contributor Author

Lee-W commented Jan 2, 2024

@Lee-W
Copy link
Contributor Author

Lee-W commented Jan 2, 2024

The majority work on this ticket has been done. The last step is migrating the note to the notion doc. Will create separate tickets for provider deprecaction

@Lee-W Lee-W closed this as completed Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant