Skip to content

Add 60 interview-style problems and tracking index#1

Open
shiningflash wants to merge 11 commits into
mainfrom
problems
Open

Add 60 interview-style problems and tracking index#1
shiningflash wants to merge 11 commits into
mainfrom
problems

Conversation

@shiningflash
Copy link
Copy Markdown
Owner

Summary

  • Adds 60 new interview problems (Problems 6 through 65) across 10 new categories: Fundamentals, SQL Thinking, System Design, Scenarios, Cloud Decisions, Data Modeling, Debugging, Cost and Performance, Streaming, and People and Process.
  • Adds PROBLEMS.md — a single index table linking every problem to its question and solution, with category, topic and difficulty tags suitable for filtering on a website.
  • Adds Problem 5 (Merging Messy CSVs from Multiple Partners) as a worked example matching the existing Problem 1-4 style.

What each problem contains

Every problem has its own folder with:

  • question.md — scenario, task list, and what a good answer should cover.
  • solution.md (or solution.py for code-style ones) — a walkthrough written like an experienced engineer would explain it on a whiteboard.

Solutions include diagrams (ASCII / mermaid-style boxes) where the topic benefits from visualization, capacity math where relevant, common-mistake lists, and bonus follow-up questions to anticipate.

Format consistency

All new problems follow the same shape as the existing Problems 1-4:

  1. Scenario.
  2. Task list.
  3. "What a good answer covers."
  4. Solution: short version, walkthrough, picture, common mistakes, bonus follow-up.

Coverage

Category Problems
Fundamentals 6-14
SQL Thinking 15-20
System Design 21-28
Scenarios 29-34
Cloud Decisions 35-40
Data Modeling 41-45
Debugging 46-50
Cost & Performance 51-55
Streaming 56-58
People & Process 59-65

Test plan

  • Review PROBLEMS.md table renders correctly on GitHub.
  • Spot-check 3-4 problem folders to confirm question/solution pairs are coherent.
  • Confirm category and topic labels are useful for filtering on the website.
  • Check that links in PROBLEMS.md resolve to the right files (URL-encoded paths).

shiningflash and others added 11 commits May 14, 2026 03:54
Adds 9 interview-style fundamentals problems with full question
and solution markdown files, including diagrams and concrete examples.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 SQL-focused interview problems with worked examples and EXPLAIN plan walkthroughs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 system design problems covering electricity retailer platform, banking
widgets, surge pricing, streaming aggregations, billing pipelines,
real-time driver tracking, year-in-review batch, and notification dedup.

Each includes architecture diagrams, capacity math, and risk discussion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 scenario-based interview problems covering silent data bugs, cost spikes,
analyst trust, pipeline ownership transfer, executive pressure, and Kafka
data loss recovery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 cloud-decision interview problems comparing Lambda vs Cloud Run, scheduled
serverless jobs, BigQuery vs Snowflake, S3 vs warehouse storage, managed
Airflow vs self-hosted, and BigQuery access control models.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 data modeling problems covering Airbnb-style schema, subscription
history with valid_from/to, mixing facts and dimensions, explaining grain,
and current state vs event history.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 debugging-scenario interview problems covering region-zero revenue,
silent task success with empty output, sudden query slowdowns, vague
user reports, and recurring partition anomalies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 cost/performance problems covering BigQuery bill investigation, Spark
job tuning, daily-data hourly-scan waste, the 'throw more memory' reflex,
and partitioning vs clustering vs materialized views.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 streaming-fundamentals problems covering watermarks, Kafka per-partition
ordering, and diagnosing growing consumer lag before scaling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 people/process problems covering analyst onboarding, fast-vs-right
trade-offs, metric ownership disputes, blameless postmortems, inheriting
undocumented pipelines, breaking dbt changes with many consumers, and
Airflow scheduler scaling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add rows for problems 6-65 across 10 new categories (Fundamentals, SQL
Thinking, System Design, Scenarios, Cloud Decisions, Data Modeling,
Debugging, Cost and Performance, Streaming, People and Process) plus
expanded legend and difficulty guide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant