This repo has all the resources you need to become an amazing data engineer!
If you are new to data engineering, start by following this 2024 breaking into data engineering roadmap
For more applied learning:
- Check out the projects section for more hands-on examples!
- Check out the interviews section for more advice on how to pass data engineering interviews!
- Check out the books section for a list of high quality data engineering books
- Check out the communities section for a list of high quality data engineering communities to join
- Check out the newsletter section to learn via email
Great list of over 25 books
Top 3 must read books are:
- Fundamentals of Data Engineering
- Designing Data-Intensive Applications
- Designing Machine Learning Systems
Top must-join communities for DE:
Top must-join communities for ML:
- Orchestration
- Data Lake / Cloud
- Data Warehouse
- Data Quality
- Education Companies
- Analytics / Visualization
- Data Integration
- Modern OLAP
- LLM application library
- Real-Time Data
- Netflix
- Uber
- Databricks
- Airbnb
- Amazon AWS Blog
- Microsoft Data Architecture Blogs
- Microsoft Fabric Blog
- Oracle
- Meta
- Onehouse
- A Five-Layered Business Intelligence Architecture
- Lakehouse:A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics
- Big Data Quality: A Data Quality Profiling Model
- The Data Lakehouse: Data Warehousing and More
- Spark: Cluster Computing with Working Sets
- The Google File System
- Building a Universal Data Lakehouse
- XTable in Action: Seamless Interoperability in Data Lakes
- MapReduce: Simplified Data Processing on Large Clusters
-
you have to have >10k subscribes to be added
-
100k+ subscribers
-
10k+ subscribers
-
you have to have >5k followers to be added
-
100k+ Followers
-
50k+ Followers
-
10k+ Followers
-
5k+ Followers
-
you have to have >5k followers to be added
-
100k+ followers
-
10k+ followers
-
5k+ followers
-
you have to have >5k followers to be added
-
100k+ followers
-
5k+ followers
TikTok
-
you have to have >10k followers to be added
-
50k+ followers
-
10k+ followers
- The Data Engineering Show
- Data Engineering Podcast
- DataTopics
- The Data Engineering Side Of Data
- DataWare
- The Data Coffee Break Podcast
- The Datastack show
- Intricity101 Data Sharks Podcast
- Drill to Detail with Mark Rittman
- Analytics Power Hour
- Catalog & cocktails
- Datatalks
- Data Brew by Databricks
- The Data Cloud Podcast by Snowflake
- What's New in data
- Open||Source||Data by Datastax
- Streaming Audio by confluent
- The Data Scientist Show
- MLOps.community
- Monday Morning Data Chat
- The Data Chief
Great list of 20+ newsletters
Top must follow newsletters for data engineering:
- Data Engineering Vault
- Airbyte Data Glossary
- Data Engineering Wiki by Reddit
- Seconda Glossary
- Glossary Databricks
- Airtable Glossary
- Data Engineering Glossary by Dagster
- Cumulative Table Design
- Microbatch Deduplication
- The Little Book of Pipelines
- Data Developer Platform
- DataExpert.io course use code HANDBOOK10 for a discount!
- LearnDataEngineering.com
- Technical Freelancer Academy Use code zwtech for a discount!
- IBM Data Engineering for Everyone
- Qwiklabs
- DataCamp
- Udemy Courses from Shruti Mantri
- Rock the JVM teaches Spark (in Scala), Flink and others
- Data Engineering Zoomcamp by DataTalksClub
- Efficient Data Processing in Spark
- Scaler
- DataTeams - Data Engingeer hiring platform
- Udemy Courses from Daniel Blanco