Skip to content

Releases: GoogleCloudPlatform/DataflowTemplates

Dataflow Templates 2023-03-28-00_RC00

29 Mar 13:54
Compare
Choose a tag to compare

Release Week of 2023-03-28

Improvements

[Integration Tests] Improve success rate through failsafe + making it deterministic even if subscription takes longer to be created
[Integration Tests] Add annotations for JDBC to BigQuery Integration Test
[Load Tests] TextToBigQuery and bulk compressor
[Load tests] Add support for launching pipelines using Pipeline#run
[PubSubToElasticsearch, BigQueryToElasticsearch] added new/missing options
New Datastore ResourceManager

Bug Fixes

[MongoDbToBigQuery] Prevent NullPointerException when reading null value during TableRow conversion

Contributors

@bvolpato
@pabloem
@pranavbhandari24

Dataflow Templates 2023-03-21-00_rc00

21 Mar 17:54
Compare
Choose a tag to compare

Note: This release is in the process of rolling out. It may not be in your region yet.

Improvements

[All Templates] Upgrade Apache Beam to 2.46.0
[GCS to Spanner] Allow configuration of invalid output path
[Docs] Create a README generator based on annotations
[Docs] Add initial version for the generated README docs
[Docs] Fix docs template to assume optional parameter default (false).

Bug Fixes

[Bigtable To Avro] Use ResourceId to resolve relative path for BigtableToAvro template
[BigQuery to Bigtable] Fix BigQuery to Bigtable template parameters
[GCS To Elasticsearch] Throw better exception when headers are missing.
[Integration Tests] Avoid job name conflicts
[Integration Tests] Fix bigtable IT module dependencies
[BigQuery to MongoDB] Fix comparing row key name

Contributors

@bvolpato
@Abacn
@nickuncaged1201
@Polber
@krzysztofcybulski

Dataflow Templates 2023-03-14-00_RC00

15 Mar 21:25
Compare
Choose a tag to compare

Release Week of 2023-03-14

Note: This release is in the process of rolling out. It may not be in your region yet.

New Templates

  • Added MQTT To Pub/Sub Template.
  • Added Kinesis to Pub/Sub Flex template.
    Note: both templates are not included in the list of Google-provided templates yet. They're available for building from source only.

Improvements

  • [BigQueryToElasticsearch] Template now supports UDFs.
  • [TextIOToBigQuery] Templates now supports RECORD type in BigQuery schema definition.
  • [DatastreamToSpanner] Introduced a flag roundJsonDecimals, which if set to true, will run PARSE_JSON() with wide_number_mode=>round on JSON data (default false).
  • [PubsubToAvro] Template now accepts dash in outputFileNamePrefix.
  • Security fixes: mysql-connector-java version upgraded.
  • Various unit test and integration test improvements, code base cleanup, documentation updates.

Bug Fixes

  • Fixed issue #582 (MongoDBtoBigQuery with UDF: ScriptObjectMirror cannot be cast to bson.Document error).

Contributors

@bvolpato @anish97IND @PoulamiR1994 @Polber @pabloem @Deep1998 @pranavbhandari24

Dataflow Templates 2023-03-07-00_RC00

08 Mar 16:18
Compare
Choose a tag to compare

Release Week of 2023-03-06

New Templates

  • Mqtt to Pub/Sub

Improvements

  • [Writes to Elasticsearch] Made trustSelfSignedCerts as parameter
  • [BigQueryIO] Performance tests implemented
  • Load test base implemented
  • [Integration Tests] Add parallelism parameters to allow parallel integration test execution
  • Enable AutoVM Scaevola scans on Dataflow Templates images
  • [DatastreamToSpanner] With this change, the template now accepts only the new HarbourBridge session file with tableID and columnID support. We do not support the older session file.
  • [DataplexBigQueryToGcs] Add more log information for DataplexBigQueryToGcs failures

Bug Fixes

  • [Templates Plugin] Bug fixes when staging templates to Artifact Registry
  • [DatastoreToPubsub] Templates Options interface has been set to public
  • [MqttToPubSub] Fixed MQTT broker server parameter (regular expression)
  • Avoid errors on surefire 2.X.0 due to threadCount=1.0C not supported
  • [CdcToBigQuery] Added null check on update frequency secs

Contributors

@Abacn
@anish97IND
@bvolpato
@Deep1998
@rguillome
@salvob41

Dataflow Templates 2023-02-27-00_RC00

02 Mar 19:13
Compare
Choose a tag to compare

Release Week of 2023-02-27

Note: The pipelines created by Datastream to BigQuery template are not compatible with previous versions. For existing jobs, to move to the latest Template, drain is required instead of update.

Improvements

Bump Beam version to 2.45.0
Add JDBC Resource Managers
Changed SpannerChangeStreamsToBigQuery changelog parameter name to be more descriptive
updating syndeo template to proper dependencies
Adding pubsub based DLQ SchemaTransform implementation for Syndeo
Ensure each BigQuery table merge is atomic using a UUID per expected merge issue

Bug Fixes

Fix import from JDK-provided Nashorn to Nashorn Standalone
[PubsubToKafka] Fix resource manager cleanup

Contributors

@bvolpato
@nirfi
@Polber
@svetakvsundhar
@manavgarg

Dataflow Templates 2023-02-21-00_RC00

21 Feb 22:12
Compare
Choose a tag to compare

Release Week of 2023-02-21

Note: This release is in the process of rolling out. It may not be in your region yet.

New Templates

  • Pub/Sub to Kafka

Improvements

  • Improve flex-template tutorial to use the plugin

Bug Fixes

  • Import/Export: Support spanner.commit_timestamp columns in PostgreSQL dialect
  • Prevent casting to TemporalAccessor when reading datetimes from MSSQL

Contributors

(Listed alphabetically)

  • andreigurau
  • bvolpato
  • pranavbhandari24

Dataflow Templates 2023-02-07-00_RC01

07 Feb 20:10
Compare
Choose a tag to compare

Release Week of 2023-02-07

Improvements

  • Improve and cleanup README.md, moving plugins documentation up
  • [Avro Import Template] Logging schema operations.
  • [Datastream To Spanner Template] Add transactions tags to the writes.
  • [Performance Tests] Draft of PS Lite to BigTable perf test.
  • [Datastream to BigQuery Template] Add flag and option to use deterministic job id
  • [Templates Plugin] Add conscrypt to classpath before other libraries, exclude shaded JAR containing libconscrypt from common
  • [Integration Tests] waitForConditionAndCancel should trigger job cancellation instead of draining
  • [Syndeo Template] Test for create never behavior on BQ
  • [Integration Tests] Create BigQueryToElasticsearchIT, include GCSToElasticsearchIT ES6 variant
  • Adds new license() rules, load statements and, default_applicable_license attributes for root third party packages.
  • [Integration Tests] Increase coverage in Bulk(Compressor|Decompressor)IT
  • [Integration Tests] Create test for FileFormatConversion (+ Avro and Parquet utilities)
  • [Plugin] Improve metadata parent, allow checkstyle check across project
  • [JdbcToBigQuery] ConnectionProperties should not be mandatory
  • [Integration Tests] Initial test for TextImportPipeline (GCS to Spanner) template
  • [Integration Tests] Initial test for PubSubToElasticsearch template
  • [Integration Tests] Change Kafka Bootstrap Server / topics list param to accept commas.
  • [Common] Allow usage of a different project id when doing merge
  • [UDF] Create unit tests for udf-samples
  • [DataStream to BigQuery] Remove determinstic id flag and assign merge jobs to workers by table name and merge concurrency limit instead of randomly

Bug Fixes

  • [Integration Tests] Improve names to use generic Beam terms
  • [Syndeo Template] Generate BQ and PS configs properly
  • [Datastream to BigQuery Template] Fix deterministic uuid generation
  • [Templates Plugin] Fix stage for projects containing ":"
  • [Security] Bump mysql-connector-java to 8.0.30
  • [Syndeo Template] Update proto-java to 3.21.9 to resolve conflicts

Contributors

@bvolpato
@pabloem
@xianhualiu
@rarsan

Dataflow Templates 2023-01-29-00_RC00

30 Jan 21:47
Compare
Choose a tag to compare

Release Week of 2023-01-29

Improvements

  • [Elasticsearch] Allow specifying a path on the connection url.
  • [Datastore-to-BQ] Flex template with support for BQ Storage Write API
  • Update Templates to Beam 2.44.0 (except kafka to bigquery)
  • Add better PostgreSQL support for datastream-to-spanner template
  • A number of integration tests added.

Bug Fixes

  • Fix bug in JdbcToBigQuery and DataplexJdbcIngestion templates where microseconds portion of timestamp would be incorrectly written.

Contributors

@an2x
@bvolpato
@oleg-semenov
@pabloem
@Polber
@pranavbhandari24

Dataflow Templates 2023-01-17-01_RC00

20 Jan 21:00
Compare
Choose a tag to compare

Release Week of 2023-01-17

Note: This release also includes changes from the release 2023-01-10-00_RC00, which was cancelled. If you're looking for a version that includes a bugfix from 2023-01-10-00_RC00, please use the latest version 2023-01-17-01_RC00 instead.

Improvements

  • [Spanner Change Stream Templates] Support import/export for change streams in Cloud Spanner PostgreSQL-dialect databases.
  • Added JDBC and Spanner sinks to StreamingDataGenerator template.
  • A number of integration tests and resource managers added.

Bug Fixes

  • [Datastream Templates] A fix for a bug sometimes causing duplicated records to be written to the target database.

Contributors

@Abacn
@nancyxu123
@bvolpato
@pranavbhandari24
@nirfi
@Polber

Dataflow Templates 2023-01-10-00_RC00

13 Jan 19:32
Compare
Choose a tag to compare

Release Week of 2023-01-10

Note: This release has been cancelled and hasn't been fully rolled out to production. Please use version 2023-01-17-01_RC00 or later instead, which includes these changes.

Improvements

  • [Integration Tests] Create Elasticsearch Resource Manager and Create GCS to Elasticsearch integration test
  • [Documentation] Improve documentation for Dataflow CSV import pipeline trailing delimiter.
  • [Flex Templates] Add SecretManagerUtils in v2 common directory

Bug Fixes

  • [Security] Bump postgresql dependency due to CVE-2022-21724 and CVE-2022-31197
  • [Spanner Tests] Fix an invalid column default value in RandomDdlGenerator.

Contributors

@andreigurau
@bvolpato
@oleg-semenov