A practical SQL toolkit for detecting hidden data quality issues in analytics datasets.
This repository contains the free starter edition of the SQL Data Debugging Toolkit.
It includes a small set of SQL validation checks, a structured debugging workflow and an example dataset with built-in data issues.
Most data issues in analytics systems are not caused by SQL mistakes, but by hidden problems in datasets and pipelines.
Common examples:
• broken joins • duplicate records • missing values • unexpected schema changes • metric spikes • late arriving data
When a dashboard suddenly shows incorrect numbers, analysts often start writing random queries to investigate the issue.
This toolkit provides a structured approach to debugging analytics datasets.
The starter edition includes:
• 6 SQL data validation checks • Data debugging checklist (PDF) • Example dataset with built-in data quality issues
These checks demonstrate the debugging framework used in the full toolkit.
Explore the starter toolkit here:
- Load
example_dataset.sqlinto your SQL environment - Run SQL checks from the
starter/SQL_checksfolder - Use the
data_debugging_checklist.pdfto follow the debugging workflow - Investigate issues detected by the queries
Example query for detecting duplicate primary keys:
SELECT
primary_key_column,
COUNT(*) AS duplicate_count
FROM dataset_table
GROUP BY primary_key_column
HAVING COUNT(*) > 1;The toolkit follows a structured debugging workflow used by analytics teams:
- Schema validation
- Missing data checks
- Duplicate detection
- Join integrity validation
- Distribution & anomaly detection
- Data freshness checks
- Business logic validation
This process helps analysts isolate the root cause of data issues faster.
sql-data-debugging-toolkit
starter
│
├── SQL_checks
│ ├── 01_schema_missing_columns.sql
│ ├── 03_null_ratio_check.sql
│ ├── 05_duplicate_primary_key.sql
│ ├── 15_orphan_records.sql
│ ├── 20_metric_spike_detection.sql
│ └── 27_negative_values_check.sql
│
├── example_dataset.sql
├── data_debugging_checklist.pdf
└── README.md
The SQL queries follow ANSI SQL principles and should work with most modern warehouses:
• PostgreSQL • Snowflake • BigQuery • Redshift • DuckDB • SQL Server
The full SQL Data Debugging Toolkit includes:
• 30 SQL validation checks • additional debugging templates • extended documentation • a complete data debugging workflow
Full version available here:
SQL Data Debugging Toolkit (Full Version)
Created by Mikolaj Burzykowski
I build practical tools for data analysts, including SQL debugging workflows, Excel dashboards and data validation systems.



