Skip to content

[PPC64LE] Add PPC64LE dynarec infrastructure with MOV opcodes#3592

Open
runlevel5 wants to merge 3 commits intoptitSeb:mainfrom
runlevel5:ppc64le-pr2-dynarec-infra
Open

[PPC64LE] Add PPC64LE dynarec infrastructure with MOV opcodes#3592
runlevel5 wants to merge 3 commits intoptitSeb:mainfrom
runlevel5:ppc64le-pr2-dynarec-infra

Conversation

@runlevel5
Copy link
Contributor

@runlevel5 runlevel5 commented Feb 28, 2026

Summary

Add the core dynarec infrastructure for PPC64LE with 3 MOV opcodes as proof the pipeline works end-to-end. Follows the incremental approach of the RV64 dynarec initial commit (5a9b896).

Depends on #3591 (platform support PR) — please merge that first.

Changes (36 files, ~9200 lines)

Core Infrastructure

  • Emitter (ppc64le_emitter.h): Instruction encoding macros for PPC64LE ISA. Argument order follows Power ISA assembly convention (Rt, offset, Ra) — documented in header comment.
  • Helper (dynarec_ppc64le_helper.h/c): STEPNAME-aliased macros and core functions (geted, move32/move64, jump_to_epilog/jump_to_next, call_c/call_n, grab_segdata, emit_pf). Displacement modes use named constants (DISP_NONE/DISP_D/DISP_DQ) and magic numbers replaced with PPC64_DISP_MAX.
  • Register mapping (ppc64le_mapping.h): x86-64 to PPC64LE register mapping, ELFv2 ABI-compliant.
  • Functions (dynarec_ppc64le_functions.c/h): fpu_reset, inst_name_pass3, updateNativeFlags, get_free_scratch, ppc64le_fast_hash, plus FPU register management stubs.
  • Architecture (dynarec_ppc64le_arch.c): CancelBlock, FillBlock, AddMarkRecursive, native address resolution, jump table support.
  • Constants (dynarec_ppc64le_consts.c/h): Constants table management (unused arrays removed per review).

Assembly

  • ppc64le_prolog.S / ppc64le_epilog.S: Dynarec entry/exit.
  • ppc64le_next.S: Block-to-block dispatcher.
  • ppc64le_lock.S/h: Atomic operations (LL/SC pairs, mutex-based 128-bit CAS).

Opcodes (3 MOV instructions in _00.c)

  • 0x89 — MOV Ed,Gd
  • 0x8B — MOV Gd,Ed
  • 0x8D — LEA Gd,Ed

Stubs

  • Printer returns "???" for all instructions (will be expanded later).
  • _66.c, _f0.c, _66f0.c contain only DEFAULT fallback.
  • FPU/SSE/AVX cache functions are empty stubs, following RV64 initial commit pattern.

Shared Dynarec Modifications

  • CMakeLists.txt, dynarec_native_pass.c, dynarec_arch.h, dynarec_helper.h, dynarec_next.h, native_lock.h, dynacache_reloc.h, dynarec.c: PPC64LE #elif branches.

Review Changes

  • Removed dynarec_ppc64le_arch.h and WIN32 guard (ptitSeb)
  • Removed unused constant arrays from dynarec_ppc64le_consts.c (ptitSeb)
  • Renamed DQ_ALIGN to DISP_NONE/DISP_D/DISP_DQ named constants (ptitSeb)
  • Replaced magic number 32768 with PPC64_DISP_MAX constant (ptitSeb)
  • Added emitter argument order convention comment (ptitSeb/classilla)
  • Removed unused GOCOND macro (~80 lines) (ptitSeb)
  • Fixed x87_get_st_empty macro bug, removed duplicate dynarec64_F0 (self-audit)
  • Removed BEQZ_safe alias no longer needed after GOCOND removal

Notes on Reviewer Feedback

Re: unused macros in helper.h (ksco): Most macros in dynarec_ppc64le_helper.h are required by the shared dynarec_native_pass.c or are building blocks used by other macros. The genuinely unused items (GOCOND, duplicate dynarec64_F0) have been removed.

Re: unused functions in functions.c (ksco): All 19 public functions are called from shared dynarec infrastructure (dynarec_native.c, dynarec_native_pass.c, dynarec_native_functions.c, dynarec.c). None can be removed without breaking compilation.

Re: ALSL/BSTRPICK_D/BSTRINS_D names (ksco): These LoongArch names do not exist in the PPC64LE code. The PPC equivalents are already named SLADDy, BF_EXTRACT, and BF_INSERT.

Re: AVX 256-bit (ksco): PPC64LE has 128-bit VSX only. The AVX macros are shared infrastructure stubs required by dynarec_native_pass.c — they don't imply 256-bit support.

Related

@runlevel5 runlevel5 marked this pull request as draft February 28, 2026 13:00
@runlevel5 runlevel5 force-pushed the ppc64le-pr2-dynarec-infra branch from d02e0ea to 9d67773 Compare February 28, 2026 14:18
@runlevel5 runlevel5 marked this pull request as ready for review February 28, 2026 14:19
@ptitSeb
Copy link
Owner

ptitSeb commented Feb 28, 2026

Sorry, but you'll have to update from main and add uint8_t is_file_mapped:1; to your dynarec_ppc64_t type for it to build.

@runlevel5 runlevel5 force-pushed the ppc64le-pr2-dynarec-infra branch 2 times, most recently from 52308d3 to 942b062 Compare February 28, 2026 20:27
Copy link
Owner

@ptitSeb ptitSeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments to address, sorry.

@runlevel5 runlevel5 marked this pull request as draft March 2, 2026 03:40
@runlevel5 runlevel5 force-pushed the ppc64le-pr2-dynarec-infra branch from 942b062 to 4093ed9 Compare March 2, 2026 04:54
@@ -0,0 +1,76 @@
// PPC64LE next linker for dynarec
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer Guide — Block Dispatcher (ppc64le_next)

This is the block-chaining trampoline, called when a dynarec block needs to jump to the next block. The flow is:

  1. Save all volatile registers (r3-r10) that the dynarec is using as scratch/args
  2. Call LinkNext(emu, ip, from, &rip_on_stack) to resolve the target address
  3. Restore volatile registers (RIP may have been updated by LinkNext)
  4. Jump to resolved target via mtctr r12; bctr

Note the .8byte 0 before the entry point: this is a NULL pointer sentinel that getDB() uses to find the start of the dispatch table. Same pattern as the other backends.

PPC64LE-specific detail: The bl LinkNext; nop pair at the call site — the nop is a TOC restore slot. The ELFv2 linker may replace it with ld r2, 24(r1) if LinkNext is in a different DSO and needs TOC switching. ARM64 has no equivalent because it doesn't use a TOC.

#define ARCH_UDF 0x00000000 /* illegal instruction (all zeros) */
// PPC64LE CreateJmpNext needs 5 instructions (20 bytes) for PC-relative load + branch,
// so the jmpnext area needs 5 void* slots (40 bytes) instead of the default 4 (32 bytes).
#define JMPNEXT_SIZE (5*sizeof(void*))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer Guide — JMPNEXT_SIZE = 5*sizeof(void*) = 40 bytes

Each dynarec block has a "jmpnext" area at the end — a small trampoline that chains to the next block. JMPNEXT_SIZE reserves space for it.

Why PPC64LE needs 5 slots (40 bytes) vs ARM64's 4 (32 bytes):

ARM64 can do PC-relative loads directly:

LDR X0, [PC, #offset]   // 1 insn: load target address from nearby literal pool
BR  X0                   // 1 insn: branch
<8-byte target address>  // data
// Total: 2 instructions + 1 data slot = 3 × 8 = 24 bytes, rounded to 4 slots (32 bytes)

PPC64LE has no PC-relative load (pre-POWER10), so it uses the bcl 20,31 trick to discover its own address:

bcl   20, 31, .+4    // 1 insn: branch-and-link to next insn, putting PC into LR
mflr  r_tmp          // 1 insn: move LR → GPR
ld    r_tmp, 12(r_tmp) // 1 insn: load target from 12 bytes ahead
mtctr r_tmp          // 1 insn: move to CTR (branch target register)
bctr                  // 1 insn: branch via CTR
<8-byte target addr>  // data
// Total: 5 instructions + 1 data slot → needs 5 void* slots (40 bytes)

This is the same bcl trick used in ppc64le_next.S and ppc64le_lock.S for any code that needs its own runtime address.

@runlevel5 runlevel5 marked this pull request as ready for review March 3, 2026 06:34
INST_NAME("MOV Ed, Gd");
nextop = F8;
GETGD;
SCRATCH_USAGE(0);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having SCRATCH_USAGE(...) so you are planing to have flag-less fused comparison + conditionnal jump like RV64 (and LA64 on the no LBT path)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly — the plan is flag-less fused compare+branch, same approach as RV64 and LA64. SCRATCH_USAGE is the tracking hook for pass0 to know a scratch register is needed for the fused sequence.

Add the core dynarec infrastructure for PPC64LE, following the
incremental approach used by the RV64 dynarec (initial commit 5a9b896).

This includes:
- PPC64LE emitter macros (ppc64le_emitter.h)
- Register mapping and private types (ppc64le_mapping.h, private.h)
- Dynarec helper macros (dynarec_ppc64le_helper.h)
- Core helper functions: geted, move32/64, jump_to_epilog/next,
  call_c/call_n, grab_segdata, emit_pf, fpu_reset_cache,
  fpu_propagate_stack (dynarec_ppc64le_helper.c)
- Architecture-specific functions: dynarec_ppc64le_arch.c
- Constants table: dynarec_ppc64le_consts.c
- Functions for FPU state management (stubbed for x87/SSE/AVX,
  following RV64 precedent of empty TODO stubs)
- Stub printer (returns '???' for all instructions)
- Assembly: prolog, epilog, next dispatcher, lock primitives
- CMakeLists.txt integration for PPC64LE dynarec
- Shared dynarec infrastructure modifications for PPC64LE support
- 3 MOV opcodes in _00.c: 0x89 (MOV Ed,Gd), 0x8B (MOV Gd,Ed),
  0x8D (LEA Gd,Ed)

FPU/x87/MMX/SSE/AVX cache management functions are stubbed with empty
bodies, matching the RV64 initial commit pattern where all FPU functions
were empty TODO stubs. These will be expanded in a follow-up PR when
the first FPU opcodes are implemented.
@runlevel5 runlevel5 force-pushed the ppc64le-pr2-dynarec-infra branch from 4093ed9 to 6875dc9 Compare March 4, 2026 02:21
…ssembly

Replace the stub printer (returning '???') with a full PPC64LE
instruction disassembler that uses dual GPR name tables (Rn[] and
RnZ[]) to correctly display r0 as '0' in base-register (RA) positions
where the ISA treats it as literal zero, following the ARM64 xSP/xZR
pattern per reviewer request.
BCL(20, 31, 4);
MFLR(reg);
LD(reg, (int16_t)offset, reg);
MTCTR(reg);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to setup r12 here? because it's an internal jump to box64 I suppose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly — it's an internal jump to box64 code. The targets are ppc64le_next or ppc64le_epilog, both hand-written .S with no .localentry and no addis r2, r12, ... TOC preamble, so the ELFv2 r12 requirement doesn't apply here.

That said, r12 coincidentally already holds the target address (it's the scratch register I used to load it), so the convention is satisfied anyway as a defensive bonus.

ppc64le_lock_read_b:
// address is r3, return is r3
lwsync
lbarx 3, 0, 3
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note. This convension to use plain number for register is really bad IMO. I really prefer when things are clear and register have a name and not just a number.

Nothing to do of course, it's just me ranting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — bare numbers are hard to read and error-prone, especially in the load-reserve/store-conditional sequences where a misplaced operand is silent.

I see two options:

Option A: Use %rN syntax throughout the .S files. PPC64LE GAS accepts %r-prefixed register names (this is what GCC itself emits with -S), so lbarx 3, 0, 3 becomes lbarx %r3, 0, %r3. The ASM_MAPPING defines would also change from #define RAX 14 to #define RAX %r14. Straightforward search-and-replace, no invented names.

Option B: Add ABI-name #defines on top of %rN, similar to RV64's lock.S. So a0/a1/t0 etc. mapping to %r3/%r4/%r11, giving us lbarx a0, 0, a0 in lock.S.

I think Option A is the more pragmatic choice — it matches GCC output, Power ISA documentation, and doesn't introduce custom aliases that aren't native to PPC tooling. But happy to go with Option B if you prefer the extra semantic clarity.

Which would you prefer? I can do this in a follow-up commit.

// We need to load the TOC for LinkNext - use the saved TOC
// Note: In ELFv2, r12 must point to the function entry for local calls
// For external calls via PLT, the linker handles it
bl LinkNext
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you need to setup r12 here? this is confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as jmpnext — I don't need r12 setup here because the targets are never GCC-compiled functions:

  • Cache-hit path (line 78): target is a JIT'd dynarec block from block_cache. r12 already holds the address since I used it as the load scratch.
  • Slow path (line 132): LinkNext returns a dynarec block or ppc64le_epilog. I copy the return value into r12 with mr 12, 3 on line 116 before the branch, so r12 is already set.

The ELFv2 r12 requirement only matters for calls into C functions with a global entry point TOC setup — I handle those in call_c() and call_n() in the helper, which explicitly set r12 before BCTRL.

@ptitSeb
Copy link
Owner

ptitSeb commented Mar 4, 2026

I'm ok with the commit in it's current state. With the following remark:

  1. The setup of r12 is not clear to me as to when it should be done, and when it's not important. I have seen many cases where r12 is not setup, even when jumping to some C function, so this is confusing as to when it must be done.
  2. I personaly have reviewed the active code only as much as I could (so the 3 opcodes and the basic infrastructure of the dynarec). The PR is still huge and still contains lats of dead code. Everything x87 and SSE+ has not been reviewed and might be critized (much) later.
  3. For the next commit, instead of having a ton of code targetting random stuff, it would be good if you could focus on opcode that are used in the test. You can see which one are needed by doing something like BOX64_DYNAREC_MISSING=1 ../tests/test01 from the build folder.

@ksco when you have time, check if you are ok or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants