|
| 1 | +--- |
| 2 | +name: silk-profiler |
| 3 | +description: Investigate backend performance using Django Silk profiling data. Use when investigating a slow endpoint, a potential bottleneck, N+1 queries, or understanding query patterns for a specific request. |
| 4 | +--- |
| 5 | + |
| 6 | +# Investigate Backend Performance with Silk |
| 7 | + |
| 8 | +Use this skill to investigate a slow endpoint or a potential bottleneck. Django Silk (enabled by default in dev via `BASEROW_ENABLE_SILK` in `dev.py`) records every HTTP request, its SQL queries, and full Python stack traces into PostgreSQL. The user may provide a Silk request URL, a request ID, or just describe which endpoint is slow. |
| 9 | + |
| 10 | +## Prerequisites |
| 11 | + |
| 12 | +- Silk must be enabled (default in dev: `BASEROW_ENABLE_SILK=on` in `dev.py`) |
| 13 | +- The slow operation must have been performed recently so Silk has captured it |
| 14 | +- The dev database must be accessible (usually via `docker exec` into the `baserow-db-1` container) |
| 15 | + |
| 16 | +## Connecting to the Database |
| 17 | + |
| 18 | +Read the `DATABASES` setting in `backend/src/baserow/config/settings/base.py` (and `dev.py` which imports it) to find the connection credentials. Env vars may override the defaults. |
| 19 | + |
| 20 | +The database usually runs in a Docker container. Try `docker exec` first: |
| 21 | + |
| 22 | +```bash |
| 23 | +docker exec <db-container> psql -U <user> -d <database> -c "<SQL>" |
| 24 | +``` |
| 25 | + |
| 26 | +To find the container name: |
| 27 | + |
| 28 | +```bash |
| 29 | +docker ps --format '{{.Names}}' | grep -i "db\|postgres" |
| 30 | +``` |
| 31 | + |
| 32 | +If `psql` is available locally, connect directly using the host/port from the settings. **Try connecting first, only ask the user if it fails.** |
| 33 | + |
| 34 | +## Handling User Input |
| 35 | + |
| 36 | +The user may provide: |
| 37 | + |
| 38 | +- **A Silk URL** like `http://localhost:8000/silk/request/<uuid>/sql/` — extract the UUID between `/request/` and the next `/` as the request ID. |
| 39 | +- **A request ID** (UUID) directly. |
| 40 | +- **An endpoint path** like `/api/database/tables/123/` — search for it in `silk_request`. |
| 41 | +- **A description** like "deleting a table is slow" — search by path pattern and sort by time. |
| 42 | + |
| 43 | +## Workflow |
| 44 | + |
| 45 | +### Step 1: Find the Request |
| 46 | + |
| 47 | +If you already have a request ID, skip to Step 2. |
| 48 | + |
| 49 | +```sql |
| 50 | +-- Search by path pattern |
| 51 | +SELECT id, path, method, round(time_taken::numeric, 0) AS ms, |
| 52 | + num_sql_queries AS queries, start_time |
| 53 | +FROM silk_request |
| 54 | +WHERE path LIKE '%/api/PATTERN/%' |
| 55 | +ORDER BY start_time DESC |
| 56 | +LIMIT 10; |
| 57 | +``` |
| 58 | + |
| 59 | +```sql |
| 60 | +-- Slowest requests overall |
| 61 | +SELECT id, path, method, round(time_taken::numeric, 0) AS ms, |
| 62 | + num_sql_queries AS queries, start_time |
| 63 | +FROM silk_request |
| 64 | +ORDER BY time_taken DESC |
| 65 | +LIMIT 10; |
| 66 | +``` |
| 67 | + |
| 68 | +```sql |
| 69 | +-- Requests with the most SQL queries (likely N+1) |
| 70 | +SELECT id, path, method, num_sql_queries AS queries, |
| 71 | + round(time_taken::numeric, 0) AS ms, start_time |
| 72 | +FROM silk_request |
| 73 | +ORDER BY num_sql_queries DESC |
| 74 | +LIMIT 10; |
| 75 | +``` |
| 76 | + |
| 77 | +### Step 2: Analyze Where Time Is Spent |
| 78 | + |
| 79 | +`meta_time_spent_queries` is usually NULL, so compute query time from `silk_sqlquery`: |
| 80 | + |
| 81 | +```sql |
| 82 | +SELECT r.path, r.method, |
| 83 | + round(r.time_taken::numeric, 0) AS total_ms, |
| 84 | + r.num_sql_queries, |
| 85 | + round(q.query_ms::numeric, 0) AS query_ms, |
| 86 | + round((r.time_taken - q.query_ms)::numeric, 0) AS python_ms |
| 87 | +FROM silk_request r |
| 88 | +JOIN ( |
| 89 | + SELECT request_id, SUM(time_taken) AS query_ms |
| 90 | + FROM silk_sqlquery GROUP BY request_id |
| 91 | +) q ON q.request_id = r.id |
| 92 | +WHERE r.id = '<request_id>'; |
| 93 | +``` |
| 94 | + |
| 95 | +If `query_ms` dominates, the problem is database queries. If `python_ms` is high, the bottleneck is in Python code. |
| 96 | + |
| 97 | +### Step 3: Detect N+1 Query Patterns |
| 98 | + |
| 99 | +Group by **normalized** query text (string literals and integer params replaced with `?`) to catch N+1 patterns where the same query runs with different IDs: |
| 100 | + |
| 101 | +```sql |
| 102 | +SELECT COUNT(*) AS count, |
| 103 | + round(SUM(time_taken)::numeric, 1) AS total_ms, |
| 104 | + LEFT(regexp_replace( |
| 105 | + regexp_replace(query, '''[^'']*''', '''?''', 'g'), |
| 106 | + '= \d+', '= ?', 'g' |
| 107 | + ), 200) AS normalized |
| 108 | +FROM silk_sqlquery |
| 109 | +WHERE request_id = '<request_id>' |
| 110 | +GROUP BY regexp_replace( |
| 111 | + regexp_replace(query, '''[^'']*''', '''?''', 'g'), |
| 112 | + '= \d+', '= ?', 'g' |
| 113 | +) |
| 114 | +ORDER BY count DESC |
| 115 | +LIMIT 15; |
| 116 | +``` |
| 117 | + |
| 118 | +Any query appearing more than a handful of times is likely an N+1 problem. |
| 119 | + |
| 120 | +### Step 4: Examine Slow Individual Queries |
| 121 | + |
| 122 | +```sql |
| 123 | +SELECT id, round(time_taken::numeric, 1) AS ms, LEFT(query, 300) AS query_preview |
| 124 | +FROM silk_sqlquery |
| 125 | +WHERE request_id = '<request_id>' |
| 126 | +ORDER BY time_taken DESC |
| 127 | +LIMIT 10; |
| 128 | +``` |
| 129 | + |
| 130 | +```sql |
| 131 | +-- Full query text for a specific slow query |
| 132 | +SELECT query FROM silk_sqlquery WHERE id = <query_id>; |
| 133 | +``` |
| 134 | + |
| 135 | +### Step 5: Analyze Query Plans |
| 136 | + |
| 137 | +Silk runs `EXPLAIN` on every captured query and stores the plan in the `analysis` column of `silk_sqlquery`. This is always available — no extra configuration needed. |
| 138 | + |
| 139 | +```sql |
| 140 | +SELECT id, round(time_taken::numeric, 1) AS ms, analysis |
| 141 | +FROM silk_sqlquery |
| 142 | +WHERE request_id = '<request_id>' |
| 143 | +ORDER BY time_taken DESC |
| 144 | +LIMIT 5; |
| 145 | +``` |
| 146 | + |
| 147 | +These are estimated plans (no actual execution stats). If you need actual row counts and timings, run `EXPLAIN ANALYZE` manually on a specific query: |
| 148 | + |
| 149 | +```sql |
| 150 | +EXPLAIN ANALYZE <paste query from silk_sqlquery here>; |
| 151 | +``` |
| 152 | + |
| 153 | +**What to look for:** |
| 154 | + |
| 155 | +- **Seq Scan on large tables** — a sequential scan on a table that could grow large (e.g. rows, fields, workspaces) means a missing or unused index. Small config tables are fine. |
| 156 | +- **Missing indexes** — if a WHERE or JOIN condition filters on a column without an index, the planner falls back to sequential scans. Fix with `db_index=True` on the model field or a migration with `AddIndex`. |
| 157 | +- **Existing indexes not being used** — an index may exist but the planner ignores it (wrong column order in composite index, type mismatch, function wrapping the column, etc.). Check whether the index is used by *any* query in the codebase — if not, it's dead weight and should be removed. |
| 158 | +- **Nested Loop with high row estimates** — often a sign of missing `select_related` or a missing index on the join column. |
| 159 | + |
| 160 | +> **Do not enable `SILKY_ANALYZE_QUERIES`** (`BASEROW_DANGEROUS_SILKY_ANALYZE_QUERIES`) to upgrade to EXPLAIN ANALYZE globally. It re-executes every UPDATE and breaks data integrity. The default EXPLAIN plans in the `analysis` column are sufficient for most investigations — run `EXPLAIN ANALYZE` manually only on specific queries when needed. |
| 161 | +
|
| 162 | +### Step 6: Read Stack Traces |
| 163 | + |
| 164 | +Stack traces are stored reversed: the **top** is ORM/Silk internals, the **middle** contains Baserow application frames (paths containing `baserow/src/baserow/`), and the **bottom** is Django server/threading boilerplate. Scan for the Baserow frames in the middle — those are the ones that matter. |
| 165 | + |
| 166 | +```sql |
| 167 | +SELECT traceback FROM silk_sqlquery WHERE id = <query_id>; |
| 168 | +``` |
| 169 | + |
| 170 | +To get traces for the most repeated query pattern: |
| 171 | + |
| 172 | +```sql |
| 173 | +SELECT id, traceback |
| 174 | +FROM silk_sqlquery |
| 175 | +WHERE request_id = '<request_id>' |
| 176 | + AND regexp_replace( |
| 177 | + regexp_replace(query, '''[^'']*''', '''?''', 'g'), |
| 178 | + '= \d+', '= ?', 'g' |
| 179 | + ) = ( |
| 180 | + SELECT regexp_replace( |
| 181 | + regexp_replace(query, '''[^'']*''', '''?''', 'g'), |
| 182 | + '= \d+', '= ?', 'g' |
| 183 | + ) |
| 184 | + FROM silk_sqlquery |
| 185 | + WHERE request_id = '<request_id>' |
| 186 | + GROUP BY regexp_replace( |
| 187 | + regexp_replace(query, '''[^'']*''', '''?''', 'g'), |
| 188 | + '= \d+', '= ?', 'g' |
| 189 | + ) |
| 190 | + ORDER BY COUNT(*) DESC LIMIT 1 |
| 191 | + ) |
| 192 | +LIMIT 3; |
| 193 | +``` |
| 194 | + |
| 195 | +### Step 7: Trace to Code and Propose Fixes |
| 196 | + |
| 197 | +Once you've identified the problematic query and its origin in the stack trace: |
| 198 | + |
| 199 | +1. Read the source file and function that triggered the query |
| 200 | +2. Understand the data access pattern |
| 201 | +3. Propose a fix based on the patterns below |
| 202 | + |
| 203 | +## Common Patterns and Fixes |
| 204 | + |
| 205 | +### N+1 Queries — Missing `select_related()` / `prefetch_related()` |
| 206 | + |
| 207 | +**Symptom:** The same SELECT appears dozens/hundreds of times, each with a different WHERE id = value. |
| 208 | + |
| 209 | +**Fix:** Add `select_related('relation')` for ForeignKey/OneToOne, or `prefetch_related('relation')` for reverse FK/M2M, on the queryset that feeds the loop or serializer. |
| 210 | + |
| 211 | +### Queries in Loops That Could Be Batched |
| 212 | + |
| 213 | +**Symptom:** Queries inside a Python for-loop that could be replaced with a single bulk operation. |
| 214 | + |
| 215 | +**Fix:** Collect IDs first, do a single bulk query, then map results back. Common in Baserow: `break_dependencies_for_field` called per-field instead of batching, `FieldCache.reset_cache()` called inside loops invalidating cached data. |
| 216 | + |
| 217 | +### `specific_iterator` or `specific_queryset` / `.specific` Overhead |
| 218 | + |
| 219 | +**Symptom:** Many SELECT queries fetching from different field type tables (`database_textfield`, `database_numberfield`, etc.) one at a time. |
| 220 | + |
| 221 | +**Fix:** Use `specific_iterator()` with a batch of fields rather than calling `.specific` per field in a loop. Or ensure `FieldCache` is populated before the loop and not reset mid-iteration. |
| 222 | + |
| 223 | +### `get_model()` Called Repeatedly |
| 224 | + |
| 225 | +**Symptom:** Repeated queries to `database_field` and `django_content_type` tables to rebuild the same table model. |
| 226 | + |
| 227 | +**Fix:** Cache the model result or pass it as a parameter instead of calling `table.get_model()` multiple times. |
| 228 | + |
| 229 | +### Duplicate Queries Across Serializer Fields |
| 230 | + |
| 231 | +**Symptom:** Multiple serializer fields each trigger the same query independently. |
| 232 | + |
| 233 | +**Fix:** Use `prefetch_related` with a `Prefetch` object on the view's queryset, or add a `@cached_property` on the model. |
| 234 | + |
| 235 | +### Missing Database Indexes |
| 236 | + |
| 237 | +**Symptom:** A single query takes a very long time (>100ms). The query filters on a column that isn't indexed. |
| 238 | + |
| 239 | +**Fix:** Add `db_index=True` to the model field or create a migration with `AddIndex`. |
| 240 | + |
| 241 | +## Silk Table Schema Reference |
| 242 | + |
| 243 | +### silk_request |
| 244 | +| Column | Type | Notes | |
| 245 | +|---|---|---| |
| 246 | +| id | CharField(36) | UUID primary key | |
| 247 | +| path | CharField(190) | Request URL path | |
| 248 | +| method | CharField(10) | HTTP method | |
| 249 | +| start_time | DateTimeField | Request start | |
| 250 | +| time_taken | FloatField | Total time (ms) | |
| 251 | +| num_sql_queries | IntegerField | SQL query count | |
| 252 | +| meta_time_spent_queries | FloatField | **Usually NULL** — compute from silk_sqlquery instead | |
| 253 | +| view_name | CharField(190) | Django view name (usually populated) | |
| 254 | + |
| 255 | +### silk_sqlquery |
| 256 | +| Column | Type | Notes | |
| 257 | +|---|---|---| |
| 258 | +| id | IntegerField | Auto PK | |
| 259 | +| query | TextField | Full SQL text | |
| 260 | +| start_time | DateTimeField | Query start | |
| 261 | +| end_time | DateTimeField | Query end | |
| 262 | +| time_taken | FloatField | Execution time (ms) | |
| 263 | +| request_id | FK | Links to silk_request | |
| 264 | +| traceback | TextField | Python stack trace (reversed — ORM at top, Baserow in middle, server at bottom) | |
| 265 | +| analysis | TextField | EXPLAIN output (always populated). With `SILKY_ANALYZE_QUERIES` enabled, contains EXPLAIN ANALYZE instead. | |
| 266 | + |
| 267 | +### silk_profile |
| 268 | +| Column | Type | Notes | |
| 269 | +|---|---|---| |
| 270 | +| id | IntegerField | Auto PK | |
| 271 | +| name | CharField(300) | Profile name | |
| 272 | +| time_taken | FloatField | Duration (ms) | |
| 273 | +| request_id | FK | Links to silk_request | |
| 274 | +| file_path | CharField(300) | Source file | |
| 275 | +| line_num | IntegerField | Start line | |
| 276 | +| func_name | CharField(300) | Function name | |
| 277 | + |
| 278 | +## Guardrails |
| 279 | + |
| 280 | +- **Use read-only queries only.** Never run INSERT, UPDATE, DELETE, or DDL on any table. |
| 281 | +- **Never enable `SILKY_ANALYZE_QUERIES`** (`BASEROW_DANGEROUS_SILKY_ANALYZE_QUERIES`). It runs every UPDATE twice and breaks data integrity. |
| 282 | +- **Do not truncate Silk tables** without asking the user first. |
| 283 | +- **Ground every claim in data.** Always quote the specific evidence (EXPLAIN output, stack trace lines, source code with file path and line number) when making a diagnosis. Never state that a query is slow, an index is missing, or a `select_related` is needed without showing the data that supports it. |
| 284 | +- When proposing fixes, always read the actual source code first — don't guess from the query text alone. |
| 285 | +- **Confirm findings with the user.** When a user describes a problem without a Silk URL or request ID, confirm the matching request was found and provide its Silk URL (e.g. `http://localhost:8000/silk/request/<uuid>/`) so the user can verify before proceeding. |
0 commit comments