You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 22, 2026. It is now read-only.
Copy file name to clipboardExpand all lines: docs/internals.md
+9-6Lines changed: 9 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ The schema has six main tables:
19
19
-`dependency_changes` records every add, modify, or remove event
20
20
-`dependency_snapshots` stores full dependency state at intervals
21
21
22
-
Snapshots exist because replaying thousands of change records to answer "what dependencies existed at commit X?" would be slow. Instead, we store the complete dependency set every 20 commits (`SNAPSHOT_INTERVAL`). Point-in-time queries find the nearest snapshot and replay only the changes since then.
22
+
Snapshots exist because replaying thousands of change records to answer "what dependencies existed at commit X?" would be slow. Instead, we store the complete dependency set every 50 commits by default. Point-in-time queries find the nearest snapshot and replay only the changes since then.
23
23
24
24
## Git Access
25
25
@@ -47,12 +47,12 @@ When you run `git pkgs init` (see [`commands/init.rb`](../lib/git/pkgs/commands/
47
47
2. Switches to bulk write mode (WAL, synchronous off, large cache)
48
48
3. Walks commits chronologically
49
49
4. For each commit with manifest changes, calls `analyzer.analyze_commit`
50
-
5. Batches inserts in transactions of 100 commits
51
-
6. Creates dependency snapshots every 20 commits that changed dependencies
50
+
5. Batches inserts in transactions of 500 commits
51
+
6. Creates dependency snapshots every 50 commits that changed dependencies
52
52
7. Creates indexes after all data is loaded
53
53
8. Switches back to normal sync mode
54
54
55
-
Deferring index creation until the end speeds things up considerably. The batch size of 100 is a balance between transaction overhead and memory usage.
55
+
Deferring index creation until the end speeds things up considerably. Both batch size and snapshot interval are configurable via environment variables (see Performance Notes below).
56
56
57
57
## Incremental Updates
58
58
@@ -139,6 +139,9 @@ ActiveRecord models live in [`lib/git/pkgs/models/`](../lib/git/pkgs/models/). T
139
139
140
140
## Performance Notes
141
141
142
-
Typical init speed is around 300 commits per second. The main bottlenecks are git blob reads and bibliothecary parsing. The blob OID cache helps a lot: if a Gemfile hasn't changed in 50 commits, we parse it once and reuse the result. The manifest path regex filter also helps by skipping commits that only touch source files.
142
+
Typical init speed is around 75-300 commits per second depending on the repository. The main bottlenecks are git blob reads and bibliothecary parsing. The blob OID cache helps a lot: if a Gemfile hasn't changed in 50 commits, we parse it once and reuse the result. The manifest path regex filter also helps by skipping commits that only touch source files.
143
143
144
-
For repositories with long histories, the database file can grow to tens of megabytes. The periodic snapshots trade storage for query speed. You could tune `SNAPSHOT_INTERVAL` if you care more about one than the other.
144
+
For repositories with long histories, the database file can grow to tens of megabytes. The periodic snapshots trade storage for query speed. Two environment variables let you tune this:
145
+
146
+
-`GIT_PKGS_BATCH_SIZE` - Number of commits per database transaction (default: 500). Larger batches reduce transaction overhead but use more memory.
147
+
-`GIT_PKGS_SNAPSHOT_INTERVAL` - Store full dependency state every N commits with changes (default: 50). Lower values speed up point-in-time queries but increase database size.
0 commit comments