- https://www.reddit.com/r/dataengineering/comments/1491swe/a_mustread_data_engineering_collection/
- stack overview https://motherduck.com/blog/data-engineering-toolkit-essential-tools/
- data
- https://github.com/cnstlungu/portable-data-stack-dagster/tree/main
- https://github.com/dagster-io/dagster-open-platform
databases
-
cloud databases
- https://towardsdatascience.com/datastore-choices-sql-vs-nosql-database-ebec24d56106 really great overview
-
postgres based
- AWS policy generator https://github.com/salesforce/policy_sentry/blob/master/README.md
- https://de.wikipedia.org/wiki/Data_Vault
- metrics layer: https://www.sspaeti.com/blog/analytics-api-with-graphql-the-next-level-of-data-engineering/
-
excel automation https://www.xlwings.org/
-
spark
- SQL tuning
- lakehouse spark 3.x & delta: https://www.youtube.com/watch?v=iog5feADeXc
- https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-vs-apache-iceberg-lakehouse-feature-comparison
- optimizing delta https://www.youtube.com/watch?v=o2k9PICWdx0
-
dagster tipps
- https://github.com/sspaeti-com/awesome-dagster
- https://github.com/sephib/dagster-graph-project
- https://github.com/slamer59/dagster-mlflow
- https://github.com/broadinstitute/dagster-utils
- https://github.com/AntonFriberg/dagster-project-example
- https://github.com/thedmi/dagster-celery-docker-example
- https://github.com/kahnwong/dagster-demo
- https://github.com/VladX09/dagster-celery-docker-bug/tree/non-zero-exit-example with dagster-io/dagster#5008
- https://github.com/xyzy-web/dagster-exchangerates
-
airflow problems
-
examples
- dbt-loom with dagster https://github.com/cnolanminich/dbt-loom-example/tree/demo-with-dagster
- https://github.com/borjavb/dbt-iceberg-poc
- dynamic schema https://www.youtube.com/watch?v=No55ImP-Jic
- www.gentlydownthe.stream
- https://gazette.readthedocs.io/en/latest/
- https://github.com/drasi-project/drasi-platform
- k8s operator for blue green deployment
- hive sql unit testing with code coverage https://github.com/HotelsDotCom/mutant-swarm
- pyspark
- wrapping native code
- https://github.com/swoop-inc/spark-records
- https://github.com/swoop-inc/spark-alchemy
- framework ETL
- fine tuning catalyst optimizer
- streaming optimization
- delta merge optimization
- redis tips:
- https://github.com/petl-developers/petl
- https://github.com/python-bonobo/bonobo
- https://www.singer.io
- https://meltano.com/ (reference runner for singer)
- https://airbyte.io/
- https://github.com/dbt-labs/dbt-project-evaluator
- https://github.com/tobymao/sqlglot
- https://github.com/Montreal-Analytics/dbt-gloss
- https://github.com/offbi/pre-commit-dbt
- team structure dbt-labs/dbt-core#5244 monorepo or not
- https://blog.montrealanalytics.com/blue-green-deployment-with-dbt-and-snowflake-922f1c658011
- https://www.getdbt.com/analytics-engineering/case-for-elt-workflow/
- https://montrealanalytics.notion.site/Coalesce-Workshop-Guide-6382db82046f41599e9ec39afb035bdb and https://github.com/Montreal-Analytics/poutineshop-public and video of the workshop https://www.youtube.com/watch?v=L6ixHejZX5A&list=PL0QYlrC86xQlj9UDGiEwhXQuSjuSyPJHl&index=42
- https://slingdata.io/
- https://airbyte.com/
- https://docs.dagster.io/integrations/embedded-elt
- https://www.prequel.co/
- Airflow
- prefect
- https://dagster.readthedocs.io
- health dashbboard https://github.com/criteo/slab
- stress testing https://github.com/open-chaos/experiment-catalog
- Unlocking model governance and multi-project deployments with dbt-meshify - Coalesce 2023 https://www.youtube.com/watch?v=FAsY0Qx8EyU https://dbt-labs.github.io/dbt-meshify/0.5/
- datafold demo https://www.youtube.com/watch?v=5Xxm6cYRmFg
- JSON schema https://www.youtube.com/watch?v=z8y7mvOUFsY https://github.com/GClunies/Reflekt
- https://playwright.dev/ (apparently good for scraping)
- Postgres
- DuckDB
- Starrocks
- https://github.com/datafuselabs/databend
- https://cratedb.com/
- https://github.com/infinyon/fluvio
- https://risingwave.com/
- https://materialize.com/
- https://www.decodable.co/
- timescale https://www.timescale.com/
- https://www.timeplus.com/
- https://questdb.io/
- https://tembo.io/blog/pg-timeseries
- Redis
- https://github.com/valkey-io/valkey
- https://aws.amazon.com/de/elasticache/
- https://aws.amazon.com/de/memorydb/
- https://www.memcached.org
- https://www.bauplanlabs.com/
- https://hazelcast.com
- https://www.dragonflydb.io/redis-alternative
- https://github.com/microsoft/FASTER and https://github.com/faster-rs/faster-rs
- https://redpanda.com/
- https://kafka.apache.org/