Skip to content

Releases: bpolaszek/bentools-etl

4.0.1

11 Dec 16:28
7ca7124
Compare
Choose a tag to compare

What's Changed

  • Fix: Multiple recipes were not properly stacked by @bpolaszek in #54

Full Changelog: 4.0...4.0.1

4.0

06 Dec 06:45
fff38db
Compare
Choose a tag to compare
4.0

Hey folks! 👋

It's been more than 4 years since a version 3 bentools/etl was drafted, but never got out of the alpha stability, mostly because of a lack of time but also, I have to admit, uncertainties about design directions taken.

Introducing bentools/etl v4

PHP 8 and a lot of projects on my side came in between, and I recently got the need of this library, but I wanted to keep the good ideas of the v3, and remove the bad ones as well.

So, I decided that a stable v3 will never sunrise, and because lots of classes have been renamed, most of them became immutable, here's a brand new v4 version.

What's new?

  • This version requires PHP 8.2 as a minimum, is 100% covered by tests (this wasn't the case before), and uses PHPStan to ensure types consistency at the highest level. A Github Actions CI has also been set up.

  • It introduces a new EtlState object, which is instantiated at the beginning of the ETL process, and passed through the different steps and event listeners. The EtlExecutor (previously the Etl class) is no longer mutable, since it basically holds the Extractor, the Transformer and the Loader objects, fires events and provides you with the state you need with the EtlState.

  • The EtlState is mostly readonly, but you can still call $state->skip() to skip items, $state->stop() to stop the process, $state->flush() to request an early flush, and you can use the $state->context array to pass arbitrary data between the different steps and events during the whole workflow.

  • The EtlState object also has a nextTick method you can use to perform actions on the next iteration of the loop, for example to do something on an item after an early flush has been triggered.

  • Experimental ReactPHP support, so that you can process incoming data from streams / connections and perform periodic tasks in a long-running process.

  • Improved DX

  • 100% code coverage

How does it work?

Here's an example of the new API:

city_english_name,city_local_name,country_iso_code,continent,population
"New York","New York",US,"North America",8537673
"Los Angeles","Los Angeles",US,"North America",39776830
Tokyo,東京,JP,Asia,13929286
...
use Bentools\ETL\EtlConfiguration;
use Bentools\ETL\EtlExecutor;
use Bentools\ETL\EventDispatcher\Event\LoadEvent;
use Bentools\ETL\Extractor\CSVExtractor;
use Bentools\ETL\Loader\JSONLoader;
use Bentools\ETL\Recipe\LoggerRecipe;
use Monolog\Logger;

$etl = (new EtlExecutor(options: new EtlConfiguration(flushEvery: 100)))
    ->extractFrom(new CSVExtractor(options: ['columns' => 'auto']))
    ->transformWith(function (array $city) {
        $city['slug'] = strtr(strtolower($city['city_english_name']), [' ' => '-']);
        yield $city;
    })
    ->loadInto(new JSONLoader())
    ->onLoad(fn (LoadEvent $event) => print("Loaded city `{$event->item['slug']}`".PHP_EOL))
    ->withRecipe(new LoggerRecipe(new Logger('etl-logs')));

$report = $etl->process(
    source: 'file:///tmp/cities.csv',
    destination: 'file:///tmp/cities.json',
);

var_dump($report->output); // file:///tmp/cities.json
[
    {
        "city_english_name": "New York",
        "city_local_name": "New York",
        "country_iso_code": "US",
        "continent": "North America",
        "population": 8537673,
        "slug": "new-york"
    },
    {
        "city_english_name": "Los Angeles",
        "city_local_name": "Los Angeles",
        "country_iso_code": "US",
        "continent": "North America",
        "population": 39776830,
        "slug": "los-angeles"
    },
    {
        "city_english_name": "Tokyo",
        "city_local_name": "東京",
        "country_iso_code": "JP",
        "continent": "Asia",
        "population": 13929286,
        "slug": "tokyo"
    }
]

I hope you'll enjoy this release as much as I enjoyed coding it! 😃

4.0-alpha16

05 Dec 11:25
277fe41
Compare
Choose a tag to compare
4.0-alpha16 Pre-release
Pre-release

What's Changed

  • Feat: Expose nb loaded items since last flush by @bpolaszek in #53

Full Changelog: 4.0-alpha15...4.0-alpha16

4.0-alpha15

29 Nov 10:43
2924577
Compare
Choose a tag to compare
4.0-alpha15 Pre-release
Pre-release

What's Changed

  • Feat: Enable ReactProcessor to process regular iterables by @bpolaszek in #52

Full Changelog: 4.0-alpha14...4.0-alpha15

4.0-alpha14

21 Nov 21:54
30d45fd
Compare
Choose a tag to compare
4.0-alpha14 Pre-release
Pre-release

What's Changed

Full Changelog: 4.0-alpha13...4.0-alpha14

4.0-alpha13

20 Nov 10:49
3335c0b
Compare
Choose a tag to compare
4.0-alpha13 Pre-release
Pre-release

What's Changed

Full Changelog: 4.0-alpha12...4.0-alpha13

4.0-alpha12

17 Nov 14:27
3d39938
Compare
Choose a tag to compare
4.0-alpha12 Pre-release
Pre-release

What's Changed

Full Changelog: 4.0-alpha11...4.0-alpha12

4.0-alpha11

17 Nov 10:32
80296e6
Compare
Choose a tag to compare
4.0-alpha11 Pre-release
Pre-release

What's Changed

Full Changelog: 4.0-alpha10...4.0-alpha11

4.0-alpha10

16 Nov 11:46
5782147
Compare
Choose a tag to compare
4.0-alpha10 Pre-release
Pre-release

What's Changed

Full Changelog: 4.0-alpha9...4.0-alpha10

4.0-alpha9

14 Nov 15:09
a74bf3c
Compare
Choose a tag to compare
4.0-alpha9 Pre-release
Pre-release

What's Changed

Full Changelog: 4.0-alpha8...4.0-alpha9