diff --git a/docs/changelog/0.12.0.rst b/docs/changelog/0.12.0.rst new file mode 100644 index 000000000..dc4f4b838 --- /dev/null +++ b/docs/changelog/0.12.0.rst @@ -0,0 +1,53 @@ +0.12.0 (2024-08-29) +=================== + +Breaking Changes +---------------- + +- Change connection URL used for generating HWM names of S3 and Samba sources: + * ``smb://host:port`` -> ``smb://host:port/share`` + * ``s3://host:port`` -> ``s3://host:port/bucket`` (:github:pull:`304`) + +- Update ``Excel`` package from ``0.20.3`` to ``0.20.4``, to include Spark 3.5.1 support. (:github:pull:`306`) + +Features +-------- + +- Add support for specifying file formats (``ORC``, ``Parquet``, ``CSV``, etc.) in ``HiveWriteOptions.format`` (:github:pull:`292`): + + .. code:: python + + Hive.WriteOptions(format=ORC(compression="snappy")) + +- Collect Spark execution metrics in following methods, and log then in DEBUG mode: + * ``DBWriter.run()`` + * ``FileDFWriter.run()`` + * ``Hive.sql()`` + * ``Hive.execute()`` + + This is implemented using custom ``SparkListener`` which wraps the entire method call, and + then report collected metrics. But these metrics sometimes may be missing due to Spark architecture, + so they are not reliable source of information. That's why logs are printed only in DEBUG mode, and + are not returned as method call result. (:github:pull:`303`) + +- Generate default ``jobDescription`` based on currently executed method. Examples: + * ``DBWriter() -> Postgres[host:5432/database]`` + * ``MongoDB[localhost:27017/admin] -> DBReader.run()`` + * ``Hive[cluster].execute()`` + + If user already set custom ``jobDescription``, it will left intact. (:github:pull:`304`) + +- Add log.info about JDBC dialect usage (:github:pull:`305`): + + .. code:: text + + |MySQL| Detected dialect: 'org.apache.spark.sql.jdbc.MySQLDialect' + +- Log estimated size of in-memory dataframe created by ``JDBC.fetch`` and ``JDBC.execute`` methods. (:github:pull:`303`) + + +Bug Fixes +--------- + +- Fix passing ``Greenplum(extra={"options": ...)`` during read/write operations. (:github:pull:`308`) +- Do not raise exception if yield-based hook whas something past (and only one) ``yield``. diff --git a/docs/changelog/index.rst b/docs/changelog/index.rst index 4bdac9467..7700528eb 100644 --- a/docs/changelog/index.rst +++ b/docs/changelog/index.rst @@ -3,6 +3,7 @@ :caption: Changelog DRAFT + 0.12.0 0.11.1 0.11.0 0.10.2 diff --git a/docs/changelog/next_release/+yield.feature.rst b/docs/changelog/next_release/+yield.feature.rst deleted file mode 100644 index efc586068..000000000 --- a/docs/changelog/next_release/+yield.feature.rst +++ /dev/null @@ -1 +0,0 @@ -Do not raise exception if yield-based hook whas something past (and only one) ``yield``. diff --git a/docs/changelog/next_release/292.feature.rst b/docs/changelog/next_release/292.feature.rst deleted file mode 100644 index e50a5fcd7..000000000 --- a/docs/changelog/next_release/292.feature.rst +++ /dev/null @@ -1 +0,0 @@ -Add support for specifying file formats (``ORC``, ``Parquet``, ``CSV``, etc.) in ``HiveWriteOptions.format``: ``Hive.WriteOptions(format=ORC(compression="snappy"))``. diff --git a/docs/changelog/next_release/303.feature.1.rst b/docs/changelog/next_release/303.feature.1.rst deleted file mode 100644 index 8c0b1e19e..000000000 --- a/docs/changelog/next_release/303.feature.1.rst +++ /dev/null @@ -1 +0,0 @@ -Log estimated size of in-memory dataframe created by ``JDBC.fetch`` and ``JDBC.execute`` methods. diff --git a/docs/changelog/next_release/303.feature.2.rst b/docs/changelog/next_release/303.feature.2.rst deleted file mode 100644 index 92bbe13c3..000000000 --- a/docs/changelog/next_release/303.feature.2.rst +++ /dev/null @@ -1,10 +0,0 @@ -Collect Spark execution metrics in following methods, and log then in DEBUG mode: -* ``DBWriter.run()`` -* ``FileDFWriter.run()`` -* ``Hive.sql()`` -* ``Hive.execute()`` - -This is implemented using custom ``SparkListener`` which wraps the entire method call, and -then report collected metrics. But these metrics sometimes may be missing due to Spark architecture, -so they are not reliable source of information. That's why logs are printed only in DEBUG mode, and -are not returned as method call result. diff --git a/docs/changelog/next_release/304.breaking.rst b/docs/changelog/next_release/304.breaking.rst deleted file mode 100644 index 605983210..000000000 --- a/docs/changelog/next_release/304.breaking.rst +++ /dev/null @@ -1,3 +0,0 @@ -Change connection URL used for generating HWM names of S3 and Samba sources: -* ``smb://host:port`` -> ``smb://host:port/share`` -* ``s3://host:port`` -> ``s3://host:port/bucket`` diff --git a/docs/changelog/next_release/304.feature.rst b/docs/changelog/next_release/304.feature.rst deleted file mode 100644 index 975603547..000000000 --- a/docs/changelog/next_release/304.feature.rst +++ /dev/null @@ -1,6 +0,0 @@ -Generate default ``jobDescription`` based on currently executed method. Examples: -* ``DBWriter() -> Postgres[host:5432/database]`` -* ``MongoDB[localhost:27017/admin] -> DBReader.run()`` -* ``Hive[cluster].execute()`` - -If user already set custom ``jobDescription``, it will left intact. diff --git a/docs/changelog/next_release/305.feature.rst b/docs/changelog/next_release/305.feature.rst deleted file mode 100644 index c4c44dc6d..000000000 --- a/docs/changelog/next_release/305.feature.rst +++ /dev/null @@ -1 +0,0 @@ -Add log.info about JDBC dialect usage: ``Detected dialect: 'org.apache.spark.sql.jdbc.MySQLDialect'`` diff --git a/docs/changelog/next_release/306.feature.rst b/docs/changelog/next_release/306.feature.rst deleted file mode 100644 index 1c2b95f7f..000000000 --- a/docs/changelog/next_release/306.feature.rst +++ /dev/null @@ -1 +0,0 @@ -Update ``Excel`` package from ``0.20.3`` to ``0.20.4``, to include Spark 3.5.1 support. diff --git a/docs/changelog/next_release/308.bugfix.rst b/docs/changelog/next_release/308.bugfix.rst deleted file mode 100644 index 3ffcdcc58..000000000 --- a/docs/changelog/next_release/308.bugfix.rst +++ /dev/null @@ -1 +0,0 @@ -Fix passing ``Greenplum(extra={"options": ...)`` during read/write operations. diff --git a/onetl/VERSION b/onetl/VERSION index bc859cbd6..ac454c6a1 100644 --- a/onetl/VERSION +++ b/onetl/VERSION @@ -1 +1 @@ -0.11.2 +0.12.0