Skip to content
This repository was archived by the owner on Jan 22, 2026. It is now read-only.

Commit b0e9adb

Browse files
committed
Add benchmarking documentation
1 parent 049cbe0 commit b0e9adb

File tree

3 files changed

+92
-1
lines changed

3 files changed

+92
-1
lines changed

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ The database schema stores:
99
- Dependency changes (added/modified/removed) with before/after versions
1010
- Periodic snapshots of full dependency state for efficient point-in-time queries
1111

12-
See [docs/internals.md](docs/internals.md) for a detailed architecture overview and [docs/schema.md](docs/schema.md) for the database schema.
12+
See the [docs](docs/) folder for architecture details, database schema, and benchmarking tools.
1313

1414
Since the database is just SQLite, you can query it directly for ad-hoc analysis:
1515

docs/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Documentation
2+
3+
Technical documentation for git-pkgs maintainers and contributors.
4+
5+
- [internals.md](internals.md) - Architecture overview, how commands work, key algorithms
6+
- [schema.md](schema.md) - Database tables and relationships
7+
- [benchmarking.md](benchmarking.md) - Performance profiling tools
8+
9+
For user-facing documentation, see the main [README](../README.md).

docs/benchmarking.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Benchmarking
2+
3+
git-pkgs includes benchmark scripts for profiling performance. Run them with:
4+
5+
```bash
6+
bin/benchmark <type> [repo_path] [sample_size]
7+
```
8+
9+
The default repo is `/Users/andrew/code/octobox` and sample size is 500 commits.
10+
11+
## Benchmark Types
12+
13+
### full
14+
15+
Full pipeline benchmark with phase breakdown:
16+
17+
```bash
18+
bin/benchmark full /path/to/repo 500
19+
```
20+
21+
Measures time spent in each phase: git diff extraction, manifest filtering, parsing, and database writes. Reports overall throughput in commits/sec.
22+
23+
### detailed
24+
25+
Granular breakdown of each processing step:
26+
27+
```bash
28+
bin/benchmark detailed /path/to/repo 500
29+
```
30+
31+
Shows timing for blob path extraction, regex pre-filtering, bibliothecary identification, and manifest parsing. Also breaks down parsing time by platform (rubygems, npm, etc.) and reports how many commits pass each filter stage.
32+
33+
### bulk
34+
35+
Compares data collection vs bulk insert performance:
36+
37+
```bash
38+
bin/benchmark bulk /path/to/repo 500
39+
```
40+
41+
Separates the time spent analyzing commits from the time spent writing to the database. Uses `insert_all` for bulk operations. Helps identify whether bottlenecks are in git/parsing or database writes.
42+
43+
### db
44+
45+
Individual database operation timing:
46+
47+
```bash
48+
bin/benchmark db /path/to/repo 200
49+
```
50+
51+
Measures each ActiveRecord operation separately: commit creation, branch_commit creation, manifest lookups, change inserts, and snapshot inserts. Shows per-operation averages in milliseconds.
52+
53+
## Interpreting Results
54+
55+
The main bottlenecks are typically:
56+
57+
1. **Git blob reads** - extracting file contents from commits
58+
2. **Bibliothecary parsing** - parsing manifest file contents
59+
3. **Database writes** - inserting records (mitigated by bulk inserts)
60+
61+
The regex pre-filter (`might_have_manifests?`) skips most commits cheaply. On a typical codebase, only 10-20% of commits touch files that could be manifests.
62+
63+
Blob OID caching helps when the same manifest content appears across multiple commits. The cache stats show hit rates.
64+
65+
## Example Output
66+
67+
```
68+
Full pipeline benchmark: 500 commits
69+
============================================================
70+
71+
Full pipeline breakdown:
72+
------------------------------------------------------------
73+
git_diff 0.892s (12.3%)
74+
filtering 0.234s (3.2%)
75+
parsing 4.521s (62.4%)
76+
db_writes 1.602s (22.1%)
77+
------------------------------------------------------------
78+
Total 7.249s
79+
80+
Throughput: 69.0 commits/sec
81+
Cache stats: {:cached_blobs=>142, :blobs_with_hits=>89}
82+
```

0 commit comments

Comments
 (0)