Skip to content

Commit 2a1dab6

Browse files
committed
Optimize garbage collection with a generational GC
Running jit tests with AtomVM is now 20% faster. Implement BEAM's `fullsweep_after` `spawn_opt/1` option and `process_flag/2` flag. Also fix `process_flag/2` spec. Also fix a bug where a finishing process in spawning state wasn't properly removed from the processes list. Signed-off-by: Paul Guyot <pguyot@kallisys.net>
1 parent 5ca6095 commit 2a1dab6

26 files changed

+1717
-831
lines changed

doc/src/memory-management.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -922,3 +922,40 @@ match binaries, as with the case of refc binaries on the process heap.
922922
#### Deletion
923923

924924
Once all terms have been copied from the old heap to the new heap, and once the MSO list has been swept for unreachable references, the old heap is simply discarded via the `free` function.
925+
926+
### Generational Garbage Collection
927+
928+
The garbage collection described above is a *full sweep*: every live term is copied from the old heap to the new heap and the entire old heap is freed. While correct, this can be expensive for processes with large heaps, because long-lived data that has already survived previous collections must be copied again each time.
929+
930+
AtomVM implements *generational* (or *minor*) garbage collection to reduce this cost, using the same approach as BEAM. The key observation is that most terms die young: they are allocated, used briefly, and become garbage. Terms that have survived at least one collection are likely to survive many more. Generational GC exploits this by dividing the heap into two generations:
931+
932+
* **Young generation**: recently allocated terms, between the *high water mark* and the current heap pointer.
933+
* **Old (mature) generation**: terms that have survived at least one minor collection, stored in a separate old heap.
934+
935+
#### High Water Mark
936+
937+
After each garbage collection, the heap pointer position is recorded as the *high water mark*. On the next collection, terms allocated below the high water mark (i.e., terms that existed at the time of the previous collection) are considered mature. Terms allocated above the high water mark are young.
938+
939+
#### Minor Collection
940+
941+
During a minor collection:
942+
943+
1. A new young heap is allocated.
944+
2. Mature terms (below the high water mark) are *promoted*: copied to the old heap rather than the new young heap.
945+
3. Young terms that are still reachable are copied to the new young heap.
946+
4. Both the new young heap and the newly promoted old region are scanned for references, since promoted terms may reference young terms and vice versa.
947+
5. Only the young MSO list is swept; the old MSO list is preserved.
948+
6. The previous heap is freed, but the old heap persists across minor collections.
949+
950+
Because the old heap is not scanned for garbage during a minor collection, the cost is proportional to the size of the young generation rather than the entire heap.
951+
952+
#### When Full vs. Minor Collection Occurs
953+
954+
AtomVM keeps a counter (`gc_count`) of how many minor collections have occurred since the last full sweep. A full sweep is forced when:
955+
956+
* The process has never been garbage collected (no high water mark exists).
957+
* `gc_count` reaches the `fullsweep_after` threshold.
958+
* The old heap does not have enough space to accommodate promoted terms.
959+
* A `MEMORY_FORCE_SHRINK` request is made (e.g., via `erlang:garbage_collect/0`).
960+
961+
The `fullsweep_after` value can be set per-process via [`spawn_opt`](./programmers-guide.md#spawning-processes) or [`erlang:process_flag/2`](./apidocs/erlang/estdlib/erlang.md#process_flag2). The default value is 65535, meaning full sweeps are infrequent under normal operation. Setting it to `0` disables generational collection entirely, forcing a full sweep on every garbage collection event.

doc/src/programmers-guide.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -365,6 +365,7 @@ The [options](./apidocs/erlang/estdlib/erlang.md#spawn_option) argument is a pro
365365
|-----|------------|---------------|-------------|
366366
| `min_heap_size` | `non_neg_integer()` | none | Minimum heap size of the process. The heap will shrink no smaller than this size. |
367367
| `max_heap_size` | `non_neg_integer()` | unbounded | Maximum heap size of the process. The heap will grow no larger than this size. |
368+
| `fullsweep_after` | `non_neg_integer()` | 65535 | Maximum number of [minor garbage collections](./memory-management.md#generational-garbage-collection) before a full sweep is forced. Set to `0` to disable generational garbage collection. |
368369
| `link` | `boolean()` | `false` | Whether to link the spawned process to the spawning process. |
369370
| `monitor` | `boolean()` | `false` | Whether to link the spawning process should monitor the spawned process. |
370371
| `atomvm_heap_growth` | `bounded_free \| minimum \| fibonacci` | `bounded_free` | [Strategy](./memory-management.md#heap-growth-strategies) to grow the heap of the process. |

libs/estdlib/src/erlang.erl

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,7 @@
177177
-type spawn_option() ::
178178
{min_heap_size, pos_integer()}
179179
| {max_heap_size, pos_integer()}
180+
| {fullsweep_after, non_neg_integer()}
180181
| {atomvm_heap_growth, atomvm_heap_growth_strategy()}
181182
| link
182183
| monitor.
@@ -1308,7 +1309,9 @@ group_leader(_Leader, _Pid) ->
13081309
%%
13091310
%% @end
13101311
%%-----------------------------------------------------------------------------
1311-
-spec process_flag(Flag :: trap_exit, Value :: boolean()) -> pid().
1312+
-spec process_flag
1313+
(trap_exit, boolean()) -> boolean();
1314+
(fullsweep_after, non_neg_integer()) -> non_neg_integer().
13121315
process_flag(_Flag, _Value) ->
13131316
erlang:nif_error(undefined).
13141317

libs/jit/src/jit_aarch64.erl

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -164,25 +164,25 @@
164164
| {maybe_free_aarch64_register(), '&', non_neg_integer(), '!=', integer()}
165165
| {{free, aarch64_register()}, '==', {free, aarch64_register()}}.
166166

167-
% ctx->e is 0x28
168-
% ctx->x is 0x30
167+
% ctx->e is 0x50
168+
% ctx->x is 0x58
169169
-define(WORD_SIZE, 8).
170170
-define(CTX_REG, r0).
171171
-define(JITSTATE_REG, r1).
172172
-define(NATIVE_INTERFACE_REG, r2).
173-
-define(Y_REGS, {?CTX_REG, 16#28}).
174-
-define(X_REG(N), {?CTX_REG, 16#30 + (N * ?WORD_SIZE)}).
175-
-define(CP, {?CTX_REG, 16#B8}).
176-
-define(FP_REGS, {?CTX_REG, 16#C0}).
173+
-define(Y_REGS, {?CTX_REG, 16#50}).
174+
-define(X_REG(N), {?CTX_REG, 16#58 + (N * ?WORD_SIZE)}).
175+
-define(CP, {?CTX_REG, 16#E0}).
176+
-define(FP_REGS, {?CTX_REG, 16#E8}).
177177
-define(FP_REG_OFFSET(State, F),
178178
(F *
179179
case (State)#state.variant band ?JIT_VARIANT_FLOAT32 of
180180
0 -> 8;
181181
_ -> 4
182182
end)
183183
).
184-
-define(BS, {?CTX_REG, 16#C8}).
185-
-define(BS_OFFSET, {?CTX_REG, 16#D0}).
184+
-define(BS, {?CTX_REG, 16#F0}).
185+
-define(BS_OFFSET, {?CTX_REG, 16#F8}).
186186
-define(JITSTATE_MODULE, {?JITSTATE_REG, 0}).
187187
-define(JITSTATE_CONTINUATION, {?JITSTATE_REG, 16#8}).
188188
-define(JITSTATE_REDUCTIONCOUNT, {?JITSTATE_REG, 16#10}).

libs/jit/src/jit_armv6m.erl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -166,15 +166,15 @@
166166
| {{free, armv6m_register()}, '==', {free, armv6m_register()}}.
167167

168168
% ctx->e is 0x28
169-
% ctx->x is 0x30
169+
% ctx->x is 0x2C
170170
-define(CTX_REG, r0).
171171
-define(NATIVE_INTERFACE_REG, r2).
172-
-define(Y_REGS, {?CTX_REG, 16#14}).
173-
-define(X_REG(N), {?CTX_REG, 16#18 + (N * 4)}).
174-
-define(CP, {?CTX_REG, 16#5C}).
175-
-define(FP_REGS, {?CTX_REG, 16#60}).
176-
-define(BS, {?CTX_REG, 16#64}).
177-
-define(BS_OFFSET, {?CTX_REG, 16#68}).
172+
-define(Y_REGS, {?CTX_REG, 16#28}).
173+
-define(X_REG(N), {?CTX_REG, 16#2C + (N * 4)}).
174+
-define(CP, {?CTX_REG, 16#70}).
175+
-define(FP_REGS, {?CTX_REG, 16#74}).
176+
-define(BS, {?CTX_REG, 16#78}).
177+
-define(BS_OFFSET, {?CTX_REG, 16#7C}).
178178
% JITSTATE is on stack, accessed via stack offset
179179
% These macros now expect a register that contains the jit_state pointer
180180
-define(JITSTATE_MODULE(Reg), {Reg, 0}).

libs/jit/src/jit_riscv32.erl

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -195,16 +195,16 @@
195195
| {{free, riscv32_register()}, '==', {free, riscv32_register()}}.
196196

197197
% Context offsets (32-bit architecture)
198-
% ctx->e is 0x14
199-
% ctx->x is 0x18
198+
% ctx->e is 0x28
199+
% ctx->x is 0x2C
200200
-define(CTX_REG, a0).
201201
-define(NATIVE_INTERFACE_REG, a2).
202-
-define(Y_REGS, {?CTX_REG, 16#14}).
203-
-define(X_REG(N), {?CTX_REG, 16#18 + (N * 4)}).
204-
-define(CP, {?CTX_REG, 16#5C}).
205-
-define(FP_REGS, {?CTX_REG, 16#60}).
206-
-define(BS, {?CTX_REG, 16#64}).
207-
-define(BS_OFFSET, {?CTX_REG, 16#68}).
202+
-define(Y_REGS, {?CTX_REG, 16#28}).
203+
-define(X_REG(N), {?CTX_REG, 16#2C + (N * 4)}).
204+
-define(CP, {?CTX_REG, 16#70}).
205+
-define(FP_REGS, {?CTX_REG, 16#74}).
206+
-define(BS, {?CTX_REG, 16#78}).
207+
-define(BS_OFFSET, {?CTX_REG, 16#7C}).
208208
% JITSTATE is in a1 register (no prolog, following aarch64 model)
209209
-define(JITSTATE_REG, a1).
210210
% Return address register (like LR in AArch64)

libs/jit/src/jit_x86_64.erl

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -150,28 +150,28 @@
150150
-define(WORD_SIZE, 8).
151151

152152
% Following offsets are verified with static asserts in jit.c
153-
% ctx->e is 0x28
154-
% ctx->x is 0x30
155-
% ctx->cp is 0xB8
156-
% ctx->fr is 0xC0
157-
% ctx->bs is 0xC8
158-
% ctx->bs_offset is 0xD0
153+
% ctx->e is 0x50
154+
% ctx->x is 0x58
155+
% ctx->cp is 0xE0
156+
% ctx->fr is 0xE8
157+
% ctx->bs is 0xF0
158+
% ctx->bs_offset is 0xF8
159159
-define(CTX_REG, rdi).
160160
-define(JITSTATE_REG, rsi).
161161
-define(NATIVE_INTERFACE_REG, rdx).
162-
-define(Y_REGS, {16#28, ?CTX_REG}).
163-
-define(X_REG(N), {16#30 + (N * ?WORD_SIZE), ?CTX_REG}).
164-
-define(CP, {16#B8, ?CTX_REG}).
165-
-define(FP_REGS, {16#C0, ?CTX_REG}).
162+
-define(Y_REGS, {16#50, ?CTX_REG}).
163+
-define(X_REG(N), {16#58 + (N * ?WORD_SIZE), ?CTX_REG}).
164+
-define(CP, {16#E0, ?CTX_REG}).
165+
-define(FP_REGS, {16#E8, ?CTX_REG}).
166166
-define(FP_REG_OFFSET(State, F),
167167
(F *
168168
case (State)#state.variant band ?JIT_VARIANT_FLOAT32 of
169169
0 -> 8;
170170
_ -> 4
171171
end)
172172
).
173-
-define(BS, {16#C8, ?CTX_REG}).
174-
-define(BS_OFFSET, {16#D0, ?CTX_REG}).
173+
-define(BS, {16#F0, ?CTX_REG}).
174+
-define(BS_OFFSET, {16#F8, ?CTX_REG}).
175175
-define(JITSTATE_MODULE, {0, ?JITSTATE_REG}).
176176
-define(JITSTATE_CONTINUATION, {16#8, ?JITSTATE_REG}).
177177
-define(JITSTATE_REMAINING_REDUCTIONS, {16#10, ?JITSTATE_REG}).

src/libAtomVM/context.c

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,8 @@ Context *context_new(GlobalContext *glb)
8181
ctx->min_heap_size = 0;
8282
ctx->max_heap_size = 0;
8383
ctx->heap_growth_strategy = BoundedFreeHeapGrowth;
84+
ctx->fullsweep_after = 65535;
85+
ctx->gc_count = 0;
8486
ctx->has_min_heap_size = 0;
8587
ctx->has_max_heap_size = 0;
8688

@@ -136,6 +138,14 @@ Context *context_new(GlobalContext *glb)
136138

137139
void context_destroy(Context *ctx)
138140
{
141+
// If the process was never scheduled (still in Spawning state),
142+
// it is still in the waiting_processes list and must be removed.
143+
if (ctx->flags & Spawning) {
144+
SMP_SPINLOCK_LOCK(&ctx->global->processes_spinlock);
145+
list_remove(&ctx->processes_list_head);
146+
SMP_SPINLOCK_UNLOCK(&ctx->global->processes_spinlock);
147+
}
148+
139149
// Another process can get an access to our mailbox until this point.
140150
struct ListHead *processes_table_list = synclist_wrlock(&ctx->global->processes_table);
141151
UNUSED(processes_table_list);
@@ -525,6 +535,7 @@ bool context_get_process_info(Context *ctx, term *out, size_t *term_size, term a
525535
case MESSAGE_QUEUE_LEN_ATOM:
526536
case REGISTERED_NAME_ATOM:
527537
case MEMORY_ATOM:
538+
case FULLSWEEP_AFTER_ATOM:
528539
ret_size = TUPLE_SIZE(2);
529540
break;
530541
case LINKS_ATOM: {
@@ -675,6 +686,12 @@ bool context_get_process_info(Context *ctx, term *out, size_t *term_size, term a
675686
break;
676687
}
677688

689+
case FULLSWEEP_AFTER_ATOM: {
690+
term_put_tuple_element(ret, 0, FULLSWEEP_AFTER_ATOM);
691+
term_put_tuple_element(ret, 1, term_from_int(ctx->fullsweep_after));
692+
break;
693+
}
694+
678695
case CURRENT_STACKTRACE_ATOM: {
679696
term_put_tuple_element(ret, 0, CURRENT_STACKTRACE_ATOM);
680697
// FIXME: since it's not possible how to build stacktrace here with the current API,
@@ -1209,6 +1226,14 @@ COLD_FUNC void context_dump(Context *ctx)
12091226
ct++;
12101227
}
12111228

1229+
fprintf(stderr, "\n\nHeap\n----\n");
1230+
fprintf(stderr, "young heap: %zu words\n", (size_t) (ctx->heap.heap_end - ctx->heap.heap_start));
1231+
if (ctx->heap.old_heap_start) {
1232+
fprintf(stderr, "old heap: %zu words (used: %zu)\n",
1233+
(size_t) (ctx->heap.old_heap_end - ctx->heap.old_heap_start),
1234+
(size_t) (ctx->heap.old_heap_ptr - ctx->heap.old_heap_start));
1235+
}
1236+
12121237
fprintf(stderr, "\n\nMailbox\n-------\n");
12131238
mailbox_crashdump(ctx);
12141239

src/libAtomVM/context.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,8 @@ struct Context
113113
size_t min_heap_size;
114114
size_t max_heap_size;
115115
enum HeapGrowthStrategy heap_growth_strategy;
116+
unsigned int fullsweep_after;
117+
unsigned int gc_count;
116118

117119
// saved state when scheduled out
118120
Module *saved_module;

src/libAtomVM/defaultatoms.def

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,7 @@ X(EMU_ATOM, "\x3", "emu")
207207
X(JIT_ATOM, "\x3", "jit")
208208
X(EMU_FLAVOR_ATOM, "\xA", "emu_flavor")
209209
X(CODE_SERVER_ATOM, "\xB", "code_server")
210+
X(FULLSWEEP_AFTER_ATOM, "\xF", "fullsweep_after")
210211
X(LOAD_ATOM, "\x4", "load")
211212
X(JIT_X86_64_ATOM, "\xA", "jit_x86_64")
212213
X(JIT_AARCH64_ATOM, "\xB", "jit_aarch64")

0 commit comments

Comments
 (0)