Curated resources for data engineering interview prep. Books, blogs, lessons, problem banks, courses, and tools.
Every entry on this list is hand picked. No filler. If a resource is not the best in its category, it is not here. PRs welcome to add anything stronger.
- Books
- Question banks
- Lessons and tutorials
- System design
- Cheatsheets
- Company guides
- Behavioral
- Blogs and newsletters
- Tools to know
- Roadmaps and study plans
- Communities
- Designing Data Intensive Applications by Martin Kleppmann. The single most useful book in print for DE system design.
- The Data Warehouse Toolkit by Ralph Kimball. The dimensional modeling reference.
- Fundamentals of Data Engineering by Joe Reis and Matt Housley. Modern survey of the field.
- Streaming Systems by Tyler Akidau et al. The deepest treatment of streaming semantics in print.
- The Log by Jay Kreps. Free essay. Foundational reading for stream processing.
- DataDriven SQL interview questions. 854 SQL problems with browser sandboxes, sortable by topic and difficulty.
- DataDriven Python interview questions. 388 DE flavored Python problems.
- DataDriven schema design questions. 56 ERD problems with worked solutions.
- DataDriven pipeline architecture questions. 120 end to end case studies.
- StrataScratch. Real questions from past company interviews. More analyst flavored.
- Datalemur. Well organized by company.
- LeetCode database track. The classic. Tricky joins, light DE flavor.
- Joins lessons. Inner, left, full, semi, anti, lateral, inequality.
- Window functions lessons. Window functions appear in most senior DE screens. Drill them first.
- Aggregating lessons.
GROUP BY, grouping sets, conditional aggregation. - Data modeling track. Keys, normalization, dimensional, SCD, event streams.
- Mode SQL tutorial. Free, browser based, ten years old and still relevant.
- Real Python data engineering tutorials. Strong external supplement.
- system-design-for-data-engineers. 120 DE specific case studies, plus the eight beat framework.
- System Design Primer. Generic backend system design. Fundamentals carry over.
- DataDriven system design framework. Eight beats with worked examples.
- High Scalability. Real production architecture writeups.
- data-engineering-cheatsheet. One page reference for SQL, Python, Spark, Airflow, dbt, Kafka, schema design.
- Pandas cheatsheet. For roles that use pandas heavily.
| Company | Guide | Distinctive |
|---|---|---|
| Netflix | companies/netflix/interview | Streaming and OLAP at scale |
| Uber | companies/uber/interview | Real time, geo partitioning |
| Amazon | companies/amazon/interview | Leadership principles, bar raiser |
| companies/google/interview | BigQuery patterns, algorithmic depth | |
| Meta | companies/meta/interview | Presto, product sense plus DE |
Full company index: datadriven.io/companies.
- 50 DE behavioral questions. With model answers and the competencies they test.
- Amazon leadership principles. The official source. Internalize before interviewing.
- STAR method. The format for every behavioral answer.
- Netflix Tech Blog. Streaming at scale.
- Uber Engineering. Real time, exactly once.
- Airbnb Engineering. Data quality and Airflow.
- Stripe Engineering. Idempotency and correctness.
- DataDriven blog. New technical writeups weekly.
- Data Engineering Weekly. Curated newsletter.
- Ben Stancil. Opinionated, often correct.
| Category | Tool | Why it shows up in interviews |
|---|---|---|
| Orchestration | Airflow | The default expectation |
| Orchestration | Dagster, Prefect | Modern alternatives, common in tradeoff questions |
| Transformation | dbt | Standard for warehouse modeling |
| Streaming | Kafka | Standard for event ingestion |
| Streaming | Flink, Spark Structured Streaming | Common stream processors |
| Warehouse | Snowflake, BigQuery, Redshift | Pick what your target uses |
| Lakehouse | Databricks, Iceberg, Delta, Hudi | Frequent in modern stack tradeoff questions |
| Format | Parquet, ORC, Avro | Know the tradeoffs |
| Catalog | Unity Catalog, Glue, Polaris | Increasingly asked |
Longer tooling map: datadriven.io/data-engineering-tools.
- DE career roadmap. Analyst to staff DE.
- 12 week study plan. Daily checklist.
- DE resume guide. With examples per level.
- DE salary guide. By company, level, region.
- How to become a DE. For career switchers.
- r/dataengineering. Largest open DE community.
- dbt Community Slack. Largest DE Slack.
- Data Engineering Discord. Smaller, more technical.
- data-engineering-interview-handbook. The flagship handbook.
- data-engineering-interview-questions. The full 1418 question bank.
- awesome-data-engineering-interviews. The DataDriven 75 focused subset.
- system-design-for-data-engineers. 120 case studies.
- data-engineer-interview-prep. 8 week practice track.
- data-engineering-cheatsheet. Single page recall reference.
Open a PR following the awesome list manifesto.
Rules:
- One line note per entry, no marketing copy.
- Free resources preferred. Paid only if best in category.
- No affiliate links.
- No dead links.
Run awesome-lint before opening a PR.
CC0 1.0. Public domain.