[PPC64LE] Add PPC64LE dynarec infrastructure with MOV opcodes by runlevel5 · Pull Request #3592 · ptitSeb/box64

runlevel5 · 2026-02-28T12:56:03Z

Summary

Add the core dynarec infrastructure for PPC64LE with 3 MOV opcodes as proof the pipeline works end-to-end. Follows the incremental approach of the RV64 dynarec initial commit (5a9b896).

Depends on #3591 (platform support PR) — please merge that first.

Changes (36 files, ~9200 lines)

Core Infrastructure

Emitter (ppc64le_emitter.h): Instruction encoding macros for PPC64LE ISA. Argument order follows Power ISA assembly convention (Rt, offset, Ra) — documented in header comment.
Helper (dynarec_ppc64le_helper.h/c): STEPNAME-aliased macros and core functions (geted, move32/move64, jump_to_epilog/jump_to_next, call_c/call_n, grab_segdata, emit_pf). Displacement modes use named constants (DISP_NONE/DISP_D/DISP_DQ) and magic numbers replaced with PPC64_DISP_MAX.
Register mapping (ppc64le_mapping.h): x86-64 to PPC64LE register mapping, ELFv2 ABI-compliant.
Functions (dynarec_ppc64le_functions.c/h): fpu_reset, inst_name_pass3, updateNativeFlags, get_free_scratch, ppc64le_fast_hash, plus FPU register management stubs.
Architecture (dynarec_ppc64le_arch.c): CancelBlock, FillBlock, AddMarkRecursive, native address resolution, jump table support.
Constants (dynarec_ppc64le_consts.c/h): Constants table management (unused arrays removed per review).

Assembly

ppc64le_prolog.S / ppc64le_epilog.S: Dynarec entry/exit.
ppc64le_next.S: Block-to-block dispatcher.
ppc64le_lock.S/h: Atomic operations (LL/SC pairs, mutex-based 128-bit CAS).

Opcodes (3 MOV instructions in `_00.c`)

0x89 — MOV Ed,Gd
0x8B — MOV Gd,Ed
0x8D — LEA Gd,Ed

Stubs

Printer returns "???" for all instructions (will be expanded later).
_66.c, _f0.c, _66f0.c contain only DEFAULT fallback.
FPU/SSE/AVX cache functions are empty stubs, following RV64 initial commit pattern.

Shared Dynarec Modifications

CMakeLists.txt, dynarec_native_pass.c, dynarec_arch.h, dynarec_helper.h, dynarec_next.h, native_lock.h, dynacache_reloc.h, dynarec.c: PPC64LE #elif branches.

Review Changes

Removed dynarec_ppc64le_arch.h and WIN32 guard (ptitSeb)
Removed unused constant arrays from dynarec_ppc64le_consts.c (ptitSeb)
Renamed DQ_ALIGN to DISP_NONE/DISP_D/DISP_DQ named constants (ptitSeb)
Replaced magic number 32768 with PPC64_DISP_MAX constant (ptitSeb)
Added emitter argument order convention comment (ptitSeb/classilla)
Removed unused GOCOND macro (~80 lines) (ptitSeb)
Fixed x87_get_st_empty macro bug, removed duplicate dynarec64_F0 (self-audit)
Removed BEQZ_safe alias no longer needed after GOCOND removal

Notes on Reviewer Feedback

Re: unused macros in helper.h (ksco): Most macros in dynarec_ppc64le_helper.h are required by the shared dynarec_native_pass.c or are building blocks used by other macros. The genuinely unused items (GOCOND, duplicate dynarec64_F0) have been removed.

Re: unused functions in functions.c (ksco): All 19 public functions are called from shared dynarec infrastructure (dynarec_native.c, dynarec_native_pass.c, dynarec_native_functions.c, dynarec.c). None can be removed without breaking compilation.

Re: ALSL/BSTRPICK_D/BSTRINS_D names (ksco): These LoongArch names do not exist in the PPC64LE code. The PPC equivalents are already named SLADDy, BF_EXTRACT, and BF_INSERT.

Re: AVX 256-bit (ksco): PPC64LE has 128-bit VSX only. The AVX macros are shared infrastructure stubs required by dynarec_native_pass.c — they don't imply 256-bit support.

Reviewer Guide — Block Dispatcher (ppc64le_next)

This is the block-chaining trampoline, called when a dynarec block needs to jump to the next block. The flow is:

Save all volatile registers (r3-r10) that the dynarec is using as scratch/args

Call LinkNext(emu, ip, from, &rip_on_stack) to resolve the target address

Restore volatile registers (RIP may have been updated by LinkNext)

Jump to resolved target via mtctr r12; bctr

Note the .8byte 0 before the entry point: this is a NULL pointer sentinel that getDB() uses to find the start of the dispatch table. Same pattern as the other backends.

PPC64LE-specific detail: The bl LinkNext; nop pair at the call site — the nop is a TOC restore slot. The ELFv2 linker may replace it with ld r2, 24(r1) if LinkNext is in a different DSO and needs TOC switching. ARM64 has no equivalent because it doesn't use a TOC.

src/dynarec/ppc64le/dynarec_ppc64le_helper.c

src/dynarec/ppc64le/dynarec_ppc64le_helper.h

src/dynarec/ppc64le/dynarec_ppc64le_arch.c

src/dynarec/ppc64le/dynarec_ppc64le_00.c

src/dynarec/dynarec.c

runlevel5 · 2026-03-03T06:18:47Z

src/dynarec/dynarec_arch.h

+#define ARCH_UDF    0x00000000  /* illegal instruction (all zeros) */
+// PPC64LE CreateJmpNext needs 5 instructions (20 bytes) for PC-relative load + branch,
+// so the jmpnext area needs 5 void* slots (40 bytes) instead of the default 4 (32 bytes).
+#define JMPNEXT_SIZE    (5*sizeof(void*))


Reviewer Guide — JMPNEXT_SIZE = 5*sizeof(void*) = 40 bytes

Each dynarec block has a "jmpnext" area at the end — a small trampoline that chains to the next block. JMPNEXT_SIZE reserves space for it.

Why PPC64LE needs 5 slots (40 bytes) vs ARM64's 4 (32 bytes):

ARM64 can do PC-relative loads directly:

LDR X0, [PC, #offset] // 1 insn: load target address from nearby literal pool BR X0 // 1 insn: branch <8-byte target address> // data // Total: 2 instructions + 1 data slot = 3 × 8 = 24 bytes, rounded to 4 slots (32 bytes)

PPC64LE has no PC-relative load (pre-POWER10), so it uses the bcl 20,31 trick to discover its own address:

bcl 20, 31, .+4 // 1 insn: branch-and-link to next insn, putting PC into LR mflr r_tmp // 1 insn: move LR → GPR ld r_tmp, 12(r_tmp) // 1 insn: load target from 12 bytes ahead mtctr r_tmp // 1 insn: move to CTR (branch target register) bctr // 1 insn: branch via CTR <8-byte target addr> // data // Total: 5 instructions + 1 data slot → needs 5 void* slots (40 bytes)

This is the same bcl trick used in ppc64le_next.S and ppc64le_lock.S for any code that needs its own runtime address.

src/dynarec/dynarec_native_pass.c

ptitSeb · 2026-03-03T15:54:24Z

src/dynarec/ppc64le/dynarec_ppc64le_00.c

+            INST_NAME("MOV Ed, Gd");
+            nextop = F8;
+            GETGD;
+            SCRATCH_USAGE(0);


Having SCRATCH_USAGE(...) so you are planing to have flag-less fused comparison + conditionnal jump like RV64 (and LA64 on the no LBT path)?

Yes, exactly — the plan is flag-less fused compare+branch, same approach as RV64 and LA64. SCRATCH_USAGE is the tracking hook for pass0 to know a scratch register is needed for the fused sequence.

src/dynarec/ppc64le/dynarec_ppc64le_00.c

Add the core dynarec infrastructure for PPC64LE, following the incremental approach used by the RV64 dynarec (initial commit 5a9b896). This includes: - PPC64LE emitter macros (ppc64le_emitter.h) - Register mapping and private types (ppc64le_mapping.h, private.h) - Dynarec helper macros (dynarec_ppc64le_helper.h) - Core helper functions: geted, move32/64, jump_to_epilog/next, call_c/call_n, grab_segdata, emit_pf, fpu_reset_cache, fpu_propagate_stack (dynarec_ppc64le_helper.c) - Architecture-specific functions: dynarec_ppc64le_arch.c - Constants table: dynarec_ppc64le_consts.c - Functions for FPU state management (stubbed for x87/SSE/AVX, following RV64 precedent of empty TODO stubs) - Stub printer (returns '???' for all instructions) - Assembly: prolog, epilog, next dispatcher, lock primitives - CMakeLists.txt integration for PPC64LE dynarec - Shared dynarec infrastructure modifications for PPC64LE support - 3 MOV opcodes in _00.c: 0x89 (MOV Ed,Gd), 0x8B (MOV Gd,Ed), 0x8D (LEA Gd,Ed) FPU/x87/MMX/SSE/AVX cache management functions are stubbed with empty bodies, matching the RV64 initial commit pattern where all FPU functions were empty TODO stubs. These will be expanded in a follow-up PR when the first FPU opcodes are implemented.

…ssembly Replace the stub printer (returning '???') with a full PPC64LE instruction disassembler that uses dual GPR name tables (Rn[] and RnZ[]) to correctly display r0 as '0' in base-register (RA) positions where the ISA treats it as literal zero, following the ARM64 xSP/xZR pattern per reviewer request.

ptitSeb · 2026-03-04T13:20:03Z

src/dynarec/ppc64le/dynarec_ppc64le_jmpnext.c

+    BCL(20, 31, 4);
+    MFLR(reg);
+    LD(reg, (int16_t)offset, reg);
+    MTCTR(reg);


no need to setup r12 here? because it's an internal jump to box64 I suppose?

Yes, exactly — it's an internal jump to box64 code. The targets are ppc64le_next or ppc64le_epilog, both hand-written .S with no .localentry and no addis r2, r12, ... TOC preamble, so the ELFv2 r12 requirement doesn't apply here.

That said, r12 coincidentally already holds the target address (it's the scratch register I used to load it), so the convention is satisfied anyway as a defensive bonus.

ptitSeb · 2026-03-04T13:24:58Z

src/dynarec/ppc64le/ppc64le_lock.S

+ppc64le_lock_read_b:
+    // address is r3, return is r3
+    lwsync
+    lbarx   3, 0, 3


side note. This convension to use plain number for register is really bad IMO. I really prefer when things are clear and register have a name and not just a number.

Nothing to do of course, it's just me ranting

You're right — bare numbers are hard to read and error-prone, especially in the load-reserve/store-conditional sequences where a misplaced operand is silent.

I see two options:

Option A: Use %rN syntax throughout the .S files. PPC64LE GAS accepts %r-prefixed register names (this is what GCC itself emits with -S), so lbarx 3, 0, 3 becomes lbarx %r3, 0, %r3. The ASM_MAPPING defines would also change from #define RAX 14 to #define RAX %r14. Straightforward search-and-replace, no invented names.

Option B: Add ABI-name #defines on top of %rN, similar to RV64's lock.S. So a0/a1/t0 etc. mapping to %r3/%r4/%r11, giving us lbarx a0, 0, a0 in lock.S.

I think Option A is the more pragmatic choice — it matches GCC output, Power ISA documentation, and doesn't introduce custom aliases that aren't native to PPC tooling. But happy to go with Option B if you prefer the extra semantic clarity.

Which would you prefer? I can do this in a follow-up commit.

ptitSeb · 2026-03-04T13:27:03Z

src/dynarec/ppc64le/ppc64le_next.S

+    // We need to load the TOC for LinkNext - use the saved TOC
+    // Note: In ELFv2, r12 must point to the function entry for local calls
+    // For external calls via PLT, the linker handles it
+    bl      LinkNext


don't you need to setup r12 here? this is confusing.

Same as jmpnext — I don't need r12 setup here because the targets are never GCC-compiled functions:

Cache-hit path (line 78): target is a JIT'd dynarec block from block_cache. r12 already holds the address since I used it as the load scratch.

Slow path (line 132): LinkNext returns a dynarec block or ppc64le_epilog. I copy the return value into r12 with mr 12, 3 on line 116 before the branch, so r12 is already set.

The ELFv2 r12 requirement only matters for calls into C functions with a global entry point TOC setup — I handle those in call_c() and call_n() in the helper, which explicitly set r12 before BCTRL.

ptitSeb · 2026-03-04T13:32:42Z

I'm ok with the commit in it's current state. With the following remark:

The setup of r12 is not clear to me as to when it should be done, and when it's not important. I have seen many cases where r12 is not setup, even when jumping to some C function, so this is confusing as to when it must be done.
I personaly have reviewed the active code only as much as I could (so the 3 opcodes and the basic infrastructure of the dynarec). The PR is still huge and still contains lats of dead code. Everything x87 and SSE+ has not been reviewed and might be critized (much) later.
For the next commit, instead of having a ton of code targetting random stuff, it would be good if you could focus on opcode that are used in the test. You can see which one are needed by doing something like BOX64_DYNAREC_MISSING=1 ../tests/test01 from the build folder.

@ksco when you have time, check if you are ok or not.

runlevel5 marked this pull request as draft February 28, 2026 13:00

runlevel5 force-pushed the ppc64le-pr2-dynarec-infra branch from d02e0ea to 9d67773 Compare February 28, 2026 14:18

runlevel5 marked this pull request as ready for review February 28, 2026 14:19

ptitSeb reviewed Feb 28, 2026

View reviewed changes

src/dynarec/ppc64le/dynarec_ppc64le_arch.h Outdated Show resolved Hide resolved

ptitSeb reviewed Feb 28, 2026

View reviewed changes

src/dynarec/ppc64le/dynarec_ppc64le_consts.c Outdated Show resolved Hide resolved

runlevel5 force-pushed the ppc64le-pr2-dynarec-infra branch 2 times, most recently from 52308d3 to 942b062 Compare February 28, 2026 20:27

ptitSeb reviewed Mar 1, 2026

View reviewed changes

src/dynarec/ppc64le/dynarec_ppc64le_helper.c Show resolved Hide resolved

ptitSeb reviewed Mar 1, 2026

View reviewed changes

src/dynarec/ppc64le/dynarec_ppc64le_helper.c Outdated Show resolved Hide resolved

ptitSeb reviewed Mar 1, 2026

View reviewed changes

src/dynarec/ppc64le/ppc64le_emitter.h Show resolved Hide resolved

ptitSeb requested changes Mar 1, 2026

View reviewed changes

src/dynarec/ppc64le/ppc64le_emitter.h Show resolved Hide resolved

ptitSeb reviewed Mar 1, 2026

View reviewed changes

src/dynarec/ppc64le/ppc64le_emitter.h Show resolved Hide resolved

ptitSeb reviewed Mar 1, 2026

View reviewed changes

src/dynarec/ppc64le/dynarec_ppc64le_helper.h Outdated Show resolved Hide resolved

ksco requested changes Mar 2, 2026

View reviewed changes

runlevel5 marked this pull request as draft March 2, 2026 03:40

runlevel5 force-pushed the ppc64le-pr2-dynarec-infra branch from 942b062 to 4093ed9 Compare March 2, 2026 04:54