Skip to content

Commit 1540c4d

Browse files
committed
Serialize Protocol/ProtocolFn in bootstrap cache, fix startup regression (D95)
Startup regressed 4.5ms→23.3ms and memory 10MB→226MB because reloadProtocolNamespaces re-evaluated ~440 lines of protocols.clj + reducers.clj at every startup. Fix by making Protocol/ProtocolFn values serializable in the bootstrap cache (D81). Key changes: - Protocol: serialize name + method_sigs + impls (nested type→method maps) - ProtocolFn: serialize method_name + protocol var ref (deferred fixup) - Fn: serialize closure_bindings (captured upvalues) - Protocol.generation counter + ProtocolFn.cached_generation for cache invalidation when impls change (fixes reify stale cache bug) - extend_type_method: replace existing methods instead of always appending - Remove reloadProtocolNamespaces from bootstrap restore path - Simplify bootstrapFromCache (remove gc parameter, protocol reload) Result: startup 23.3ms→5.3ms (4.4x), memory 226MB→8.1MB (28x reduction).
1 parent a5054c9 commit 1540c4d

File tree

8 files changed

+397
-69
lines changed

8 files changed

+397
-69
lines changed

.dev/decisions.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -560,3 +560,26 @@ all to reach safe points, then collect with combined root sets.
560560
**Scope**: Phase 48.2 adds the mutex + registry. Thread spawning (48.3) will
561561
integrate safe-point coordination. Future optimization: concurrent marking,
562562
thread-local allocation buffers (TLABs), generational collection.
563+
564+
## D95: Protocol/ProtocolFn Serialization — Eliminating Startup Re-evaluation
565+
566+
**Problem**: After D81 (bootstrap cache), Protocol and ProtocolFn values were not
567+
serializable. `restoreFromBootstrapCache` called `reloadProtocolNamespaces` to
568+
re-evaluate protocols.clj + reducers.clj (~440 lines) via TreeWalk at every startup,
569+
causing 23.3ms startup time and 226MB memory usage.
570+
571+
**Decision**: Serialize Protocol and ProtocolFn values in the bootstrap cache.
572+
Protocol stores name + method_sigs + impls (nested map of type_key → method_map).
573+
ProtocolFn stores method_name + protocol var reference (ns + name), resolved via
574+
deferred fixup after env restore. Fn closure_bindings also serialized.
575+
576+
**Cache invalidation**: Protocol gains a `generation` counter, incremented on every
577+
`extend_type_method` / `extend-type` call. ProtocolFn inline cache checks
578+
`cached_generation == protocol.generation` to detect stale entries. This fixes a
579+
latent bug where VM-compiled reify forms share compile-time type keys, causing the
580+
monomorphic cache to return stale methods when the same type key gets new impls.
581+
Also fixed `extend_type_method` to replace existing methods (same name) in the
582+
method map instead of always appending.
583+
584+
**Result**: Startup 23.3ms → 5.3ms (4.4x), memory 226MB → 8.1MB (28x reduction).
585+
All upstream tests pass, no regression.

.dev/memo.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,21 +21,21 @@ Native production-grade Clojure runtime. Differentiation vs Babashka:
2121

2222
## Current Task
2323

24-
Ready for next phase planning. See Task Queue below.
24+
Phase 60: Startup & Memory Optimization.
25+
Task 60.3: Binary size audit and optimization.
2526

2627
## Task Queue
2728

28-
(empty — plan next phase)
29+
- 60.3: Binary size audit and optimization
30+
- 60.4: Record benchmarks, update docs
2931

3032
## Previous Task
3133

32-
F99 fix: apply on infinite lazy seq. Added lazy variadic path to applyFn
33-
(cons chain + peel fixed params, threadlocal flag for VM/TreeWalk rest packing).
34-
35-
Previous:
36-
37-
Phase 59: Deferred cleanup & test porting (ALL DONE).
38-
59.1-59.2: Ratio/promote ops upstream tests, coercion fixes.
34+
Phase 60.1-60.2 complete: Protocol/ProtocolFn serialization (D95).
35+
Root cause: reloadProtocolNamespaces re-evaluated ~440 lines at every startup.
36+
Fix: Serialize Protocol/ProtocolFn in bootstrap cache + generation counter
37+
for ProtocolFn inline cache + method replacement in extend_type_method.
38+
Result: startup 23.3ms → 5.3ms, memory 226MB → 8.1MB.
3939

4040
## Known Issues
4141

src/compiler/serialize.zig

Lines changed: 294 additions & 2 deletions
Large diffs are not rendered by default.

src/evaluator/tree_walk.zig

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -849,7 +849,9 @@ pub const TreeWalk = struct {
849849
// Monomorphic inline cache: check if same type as last dispatch
850850
const mutable_pf: *value_mod.ProtocolFn = @constCast(pf);
851851
if (mutable_pf.cached_type_key) |ck| {
852-
if (ck.ptr == type_key.ptr or std.mem.eql(u8, ck, type_key)) {
852+
if (mutable_pf.cached_generation == pf.protocol.generation and
853+
(ck.ptr == type_key.ptr or std.mem.eql(u8, ck, type_key)))
854+
{
853855
return self.callValue(mutable_pf.cached_method, args);
854856
}
855857
}
@@ -866,6 +868,7 @@ pub const TreeWalk = struct {
866868
// Update cache
867869
mutable_pf.cached_type_key = type_key;
868870
mutable_pf.cached_method = fn_val;
871+
mutable_pf.cached_generation = pf.protocol.generation;
869872

870873
// Call the impl function
871874
return self.callValue(fn_val, args);
@@ -954,6 +957,7 @@ pub const TreeWalk = struct {
954957
const new_impls = self.allocator.create(value_mod.PersistentArrayMap) catch return error.OutOfMemory;
955958
new_impls.* = .{ .entries = new_entries };
956959
protocol.impls = new_impls;
960+
protocol.generation +%= 1;
957961

958962
return Value.nil_val;
959963
}
@@ -1007,6 +1011,7 @@ pub const TreeWalk = struct {
10071011
const new_impls = self.allocator.create(value_mod.PersistentArrayMap) catch return error.OutOfMemory;
10081012
new_impls.* = .{ .entries = new_entries };
10091013
protocol.impls = new_impls;
1014+
protocol.generation +%= 1;
10101015
}
10111016

10121017
// Create reified object: a map with __type key

src/main.zig

Lines changed: 18 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -251,9 +251,7 @@ pub fn main() !void {
251251
// No args, no file, no :main — start REPL
252252
var env = Env.init(allocator);
253253
defer env.deinit();
254-
bootstrapFromCache(alloc, &env);
255-
gc.threshold = @max(gc.bytes_allocated * 2, gc.threshold);
256-
env.gc = @ptrCast(&gc);
254+
bootstrapFromCache(alloc, &env, &gc);
257255
runRepl(alloc, &env, &gc);
258256
}
259257
}
@@ -435,12 +433,7 @@ fn evalAndPrint(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.MarkSwe
435433
// bootstrap and evaluation use gc_alloc (MarkSweepGc) for Values.
436434
var env = Env.init(infra_alloc);
437435
defer env.deinit();
438-
bootstrapFromCache(gc_alloc, &env);
439-
440-
// Enable GC for user evaluation (bootstrap runs without GC).
441-
// Reset threshold to avoid immediate sweep on first safe point.
442-
gc.threshold = @max(gc.bytes_allocated * 2, gc.threshold);
443-
env.gc = @ptrCast(gc);
436+
bootstrapFromCache(gc_alloc, &env, gc);
444437

445438
// Dump bytecode if requested (VM only, dump to stderr then exit)
446439
if (dump_bytecode) {
@@ -486,9 +479,7 @@ fn runMainNs(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.MarkSweepG
486479
_ = use_vm;
487480
var env = Env.init(infra_alloc);
488481
defer env.deinit();
489-
bootstrapFromCache(gc_alloc, &env);
490-
gc.threshold = @max(gc.bytes_allocated * 2, gc.threshold);
491-
env.gc = @ptrCast(gc);
482+
bootstrapFromCache(gc_alloc, &env, gc);
492483

493484
// Generate and evaluate (require 'main-ns)
494485
var buf: [4096]u8 = undefined;
@@ -505,7 +496,9 @@ fn runMainNs(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.MarkSweepG
505496
/// Initialize env from pre-compiled bootstrap cache (D81).
506497
/// Registers builtins (Zig function pointers), then restores Clojure-defined
507498
/// Vars from the serialized env snapshot embedded at build time.
508-
fn bootstrapFromCache(gc_alloc: Allocator, env: *Env) void {
499+
/// Protocol/ProtocolFn values are now serialized directly in the cache,
500+
/// so no re-evaluation of protocols.clj/reducers.clj is needed.
501+
fn bootstrapFromCache(gc_alloc: Allocator, env: *Env, gc: ?*gc_mod.MarkSweepGc) void {
509502
registry.registerBuiltins(env) catch {
510503
std.debug.print("Error: failed to register builtins\n", .{});
511504
std.process.exit(1);
@@ -515,6 +508,12 @@ fn bootstrapFromCache(gc_alloc: Allocator, env: *Env) void {
515508
std.process.exit(1);
516509
};
517510
markBootstrapLibs();
511+
512+
// Enable GC after cache restore for subsequent evaluation.
513+
if (gc) |g| {
514+
g.threshold = @max(g.bytes_allocated * 2, g.threshold);
515+
env.gc = @ptrCast(g);
516+
}
518517
}
519518

520519
/// Mark built-in namespaces as loaded so require skips them.
@@ -551,9 +550,7 @@ fn handleTestCommand(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.Ma
551550
// Bootstrap
552551
var env = Env.init(infra_alloc);
553552
defer env.deinit();
554-
bootstrapFromCache(gc_alloc, &env);
555-
gc.threshold = @max(gc.bytes_allocated * 2, gc.threshold);
556-
env.gc = @ptrCast(gc);
553+
bootstrapFromCache(gc_alloc, &env, gc);
557554

558555
// Load cljw.edn config
559556
var config_arena = std.heap.ArenaAllocator.init(infra_alloc);
@@ -1565,15 +1562,11 @@ fn readEmbeddedSource(allocator: Allocator) ?[]const u8 {
15651562
fn evalEmbedded(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.MarkSweepGc, source: []const u8, cli_args: []const [:0]const u8) void {
15661563
var env = Env.init(infra_alloc);
15671564
defer env.deinit();
1568-
bootstrapFromCache(gc_alloc, &env);
1565+
bootstrapFromCache(gc_alloc, &env, gc);
15691566

15701567
// Set *command-line-args*
15711568
setCommandLineArgs(gc_alloc, &env, cli_args);
15721569

1573-
// Enable GC
1574-
gc.threshold = @max(gc.bytes_allocated * 2, gc.threshold);
1575-
env.gc = @ptrCast(gc);
1576-
15771570
// Evaluate using VM backend
15781571
_ = bootstrap.evalStringVM(gc_alloc, &env, source) catch |e| {
15791572
reportError(e);
@@ -1591,10 +1584,7 @@ fn evalEmbedded(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.MarkSwe
15911584
fn startNreplWithFile(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.MarkSweepGc, filepath: []const u8, nrepl_port: u16) void {
15921585
var env = Env.init(infra_alloc);
15931586
defer env.deinit();
1594-
bootstrapFromCache(gc_alloc, &env);
1595-
1596-
gc.threshold = @max(gc.bytes_allocated * 2, gc.threshold);
1597-
env.gc = @ptrCast(gc);
1587+
bootstrapFromCache(gc_alloc, &env, gc);
15981588

15991589
// Set up load paths for require resolution
16001590
const dir = std.fs.path.dirname(filepath) orelse ".";
@@ -1638,13 +1628,10 @@ fn startNreplWithFile(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.M
16381628
fn evalEmbeddedWithNrepl(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.MarkSweepGc, source: []const u8, cli_args: []const [:0]const u8, nrepl_port: u16) void {
16391629
var env = Env.init(infra_alloc);
16401630
defer env.deinit();
1641-
bootstrapFromCache(gc_alloc, &env);
1631+
bootstrapFromCache(gc_alloc, &env, gc);
16421632

16431633
setCommandLineArgs(gc_alloc, &env, cli_args);
16441634

1645-
gc.threshold = @max(gc.bytes_allocated * 2, gc.threshold);
1646-
env.gc = @ptrCast(gc);
1647-
16481635
// HTTP servers should run in background so nREPL can start after eval.
16491636
http_server.background_mode = true;
16501637

@@ -1725,8 +1712,7 @@ fn handleBuildCommand(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.M
17251712
// Bootstrap runtime from cache
17261713
var env = Env.init(infra_alloc);
17271714
defer env.deinit();
1728-
bootstrapFromCache(gc_alloc, &env);
1729-
_ = gc;
1715+
bootstrapFromCache(gc_alloc, &env, gc);
17301716

17311717
// Set up load paths from entry file directory
17321718
const dir = std.fs.path.dirname(source_file.?) orelse ".";
@@ -1828,9 +1814,7 @@ fn handleBuildCommand(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.M
18281814
fn runEmbeddedBytecode(gc_alloc: Allocator, infra_alloc: Allocator, gc: *gc_mod.MarkSweepGc, module_bytes: []const u8) void {
18291815
var env = Env.init(infra_alloc);
18301816
defer env.deinit();
1831-
bootstrapFromCache(gc_alloc, &env);
1832-
gc.threshold = @max(gc.bytes_allocated * 2, gc.threshold);
1833-
env.gc = @ptrCast(gc);
1817+
bootstrapFromCache(gc_alloc, &env, gc);
18341818

18351819
_ = bootstrap.runBytecodeModule(gc_alloc, &env, module_bytes) catch |e| {
18361820
reportError(e);

src/runtime/bootstrap.zig

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1272,11 +1272,6 @@ pub fn restoreFromBootstrapCache(allocator: Allocator, env: *Env, cache_bytes: [
12721272
thread_pool_mod.initAgentVar(agent_v);
12731273
}
12741274

1275-
// Re-load protocols and reducers namespaces (protocol/protocol_fn values are not
1276-
// serializable, so they are nil after cache restore — re-evaluate from source)
1277-
loadProtocols(allocator, env) catch {};
1278-
loadReducers(allocator, env) catch {};
1279-
12801275
// Ensure *ns* is synced
12811276
syncNsVar(env);
12821277
}

src/runtime/value.zig

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,9 @@ pub const Protocol = struct {
412412
method_sigs: []const MethodSig,
413413
/// Maps type_key (string) -> method_map (PersistentArrayMap of method_name -> fn)
414414
impls: *PersistentArrayMap,
415+
/// Incremented when impls are modified (by extend_type_method / extend-type).
416+
/// Used by ProtocolFn inline cache to detect stale entries.
417+
generation: u32 = 0,
415418
};
416419

417420
/// Protocol method reference — dispatches on first arg's type key.
@@ -426,6 +429,9 @@ pub const ProtocolFn = struct {
426429
method_name: []const u8,
427430
cached_type_key: ?[]const u8 = null,
428431
cached_method: Value = Value.nil_val,
432+
/// Protocol generation at the time of cache update.
433+
/// Cache is valid only when this matches protocol.generation.
434+
cached_generation: u32 = 0,
429435
};
430436

431437
/// MultiFn — multimethod with dispatch function and 2-level cache.

src/vm/vm.zig

Lines changed: 41 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -838,25 +838,42 @@ pub const VM = struct {
838838
// Get or create method map for this type in protocol.impls
839839
const existing = protocol.impls.get(Value.initString(self.allocator, type_key));
840840
if (existing) |ex_val| {
841-
// Existing method map for this type — add method
841+
// Existing method map for this type — update or add method
842842
if (ex_val.tag() != .map) return error.TypeError;
843843
const old_map = ex_val.asMap();
844-
const new_entries = self.allocator.alloc(Value, old_map.entries.len + 2) catch return error.OutOfMemory;
845-
if (self.gc == null) self.allocated_slices.append(self.allocator, new_entries) catch return error.OutOfMemory;
846-
@memcpy(new_entries[0..old_map.entries.len], old_map.entries);
847-
new_entries[old_map.entries.len] = Value.initString(self.allocator, method_name);
848-
new_entries[old_map.entries.len + 1] = method_fn;
849-
const new_method_map = self.allocator.create(value_mod.PersistentArrayMap) catch return error.OutOfMemory;
850-
new_method_map.* = .{ .entries = new_entries };
851-
if (self.gc == null) self.allocated_maps.append(self.allocator, new_method_map) catch return error.OutOfMemory;
852-
// Update impls: replace the type_key -> method_map entry
853-
const impls = protocol.impls;
854-
var i: usize = 0;
855-
while (i < impls.entries.len) : (i += 2) {
856-
if (impls.entries[i].eql(Value.initString(self.allocator, type_key))) {
857-
// Mutate in place — protocol.impls is mutable
858-
@constCast(impls.entries)[i + 1] = Value.initMap(new_method_map);
859-
break;
844+
845+
// Check if the method already exists in the map (replace in-place)
846+
const method_key = Value.initString(self.allocator, method_name);
847+
var replaced = false;
848+
{
849+
var j: usize = 0;
850+
while (j + 1 < old_map.entries.len) : (j += 2) {
851+
if (old_map.entries[j].eql(method_key)) {
852+
@constCast(old_map.entries)[j + 1] = method_fn;
853+
replaced = true;
854+
break;
855+
}
856+
}
857+
}
858+
859+
if (!replaced) {
860+
// New method for this type — append
861+
const new_entries = self.allocator.alloc(Value, old_map.entries.len + 2) catch return error.OutOfMemory;
862+
if (self.gc == null) self.allocated_slices.append(self.allocator, new_entries) catch return error.OutOfMemory;
863+
@memcpy(new_entries[0..old_map.entries.len], old_map.entries);
864+
new_entries[old_map.entries.len] = method_key;
865+
new_entries[old_map.entries.len + 1] = method_fn;
866+
const new_method_map = self.allocator.create(value_mod.PersistentArrayMap) catch return error.OutOfMemory;
867+
new_method_map.* = .{ .entries = new_entries };
868+
if (self.gc == null) self.allocated_maps.append(self.allocator, new_method_map) catch return error.OutOfMemory;
869+
// Update impls: replace the type_key -> method_map entry
870+
const impls = protocol.impls;
871+
var i: usize = 0;
872+
while (i < impls.entries.len) : (i += 2) {
873+
if (impls.entries[i].eql(Value.initString(self.allocator, type_key))) {
874+
@constCast(impls.entries)[i + 1] = Value.initMap(new_method_map);
875+
break;
876+
}
860877
}
861878
}
862879
} else {
@@ -882,6 +899,9 @@ pub const VM = struct {
882899
protocol.impls = new_impls;
883900
}
884901

902+
// Invalidate ProtocolFn inline caches by bumping generation
903+
protocol.generation +%= 1;
904+
885905
try self.push(Value.nil_val);
886906
},
887907

@@ -1344,7 +1364,9 @@ pub const VM = struct {
13441364
const type_key = valueTypeKey(first_arg);
13451365
const mutable_pf: *ProtocolFn = @constCast(pf);
13461366
if (mutable_pf.cached_type_key) |ck| {
1347-
if (ck.ptr == type_key.ptr or std.mem.eql(u8, ck, type_key)) {
1367+
if (mutable_pf.cached_generation == pf.protocol.generation and
1368+
(ck.ptr == type_key.ptr or std.mem.eql(u8, ck, type_key)))
1369+
{
13481370
// Cache hit — skip full protocol resolution
13491371
self.stack[fn_idx] = mutable_pf.cached_method;
13501372
return self.performCall(arg_count);
@@ -1360,6 +1382,7 @@ pub const VM = struct {
13601382
// Update cache for next call
13611383
mutable_pf.cached_type_key = type_key;
13621384
mutable_pf.cached_method = method_fn;
1385+
mutable_pf.cached_generation = pf.protocol.generation;
13631386
self.stack[fn_idx] = method_fn;
13641387
return self.performCall(arg_count);
13651388
},

0 commit comments

Comments
 (0)