[PPC64LE] Add PPC64LE dynarec infrastructure with MOV opcodes#3592
[PPC64LE] Add PPC64LE dynarec infrastructure with MOV opcodes#3592runlevel5 wants to merge 3 commits intoptitSeb:mainfrom
Conversation
d02e0ea to
9d67773
Compare
|
Sorry, but you'll have to update from main and add |
52308d3 to
942b062
Compare
ptitSeb
left a comment
There was a problem hiding this comment.
More comments to address, sorry.
942b062 to
4093ed9
Compare
| @@ -0,0 +1,76 @@ | |||
| // PPC64LE next linker for dynarec | |||
There was a problem hiding this comment.
Reviewer Guide — Block Dispatcher (ppc64le_next)
This is the block-chaining trampoline, called when a dynarec block needs to jump to the next block. The flow is:
- Save all volatile registers (r3-r10) that the dynarec is using as scratch/args
- Call
LinkNext(emu, ip, from, &rip_on_stack)to resolve the target address - Restore volatile registers (RIP may have been updated by
LinkNext) - Jump to resolved target via
mtctr r12; bctr
Note the .8byte 0 before the entry point: this is a NULL pointer sentinel that getDB() uses to find the start of the dispatch table. Same pattern as the other backends.
PPC64LE-specific detail: The bl LinkNext; nop pair at the call site — the nop is a TOC restore slot. The ELFv2 linker may replace it with ld r2, 24(r1) if LinkNext is in a different DSO and needs TOC switching. ARM64 has no equivalent because it doesn't use a TOC.
| #define ARCH_UDF 0x00000000 /* illegal instruction (all zeros) */ | ||
| // PPC64LE CreateJmpNext needs 5 instructions (20 bytes) for PC-relative load + branch, | ||
| // so the jmpnext area needs 5 void* slots (40 bytes) instead of the default 4 (32 bytes). | ||
| #define JMPNEXT_SIZE (5*sizeof(void*)) |
There was a problem hiding this comment.
Reviewer Guide — JMPNEXT_SIZE = 5*sizeof(void*) = 40 bytes
Each dynarec block has a "jmpnext" area at the end — a small trampoline that chains to the next block. JMPNEXT_SIZE reserves space for it.
Why PPC64LE needs 5 slots (40 bytes) vs ARM64's 4 (32 bytes):
ARM64 can do PC-relative loads directly:
LDR X0, [PC, #offset] // 1 insn: load target address from nearby literal pool
BR X0 // 1 insn: branch
<8-byte target address> // data
// Total: 2 instructions + 1 data slot = 3 × 8 = 24 bytes, rounded to 4 slots (32 bytes)
PPC64LE has no PC-relative load (pre-POWER10), so it uses the bcl 20,31 trick to discover its own address:
bcl 20, 31, .+4 // 1 insn: branch-and-link to next insn, putting PC into LR
mflr r_tmp // 1 insn: move LR → GPR
ld r_tmp, 12(r_tmp) // 1 insn: load target from 12 bytes ahead
mtctr r_tmp // 1 insn: move to CTR (branch target register)
bctr // 1 insn: branch via CTR
<8-byte target addr> // data
// Total: 5 instructions + 1 data slot → needs 5 void* slots (40 bytes)
This is the same bcl trick used in ppc64le_next.S and ppc64le_lock.S for any code that needs its own runtime address.
| INST_NAME("MOV Ed, Gd"); | ||
| nextop = F8; | ||
| GETGD; | ||
| SCRATCH_USAGE(0); |
There was a problem hiding this comment.
Having SCRATCH_USAGE(...) so you are planing to have flag-less fused comparison + conditionnal jump like RV64 (and LA64 on the no LBT path)?
There was a problem hiding this comment.
Yes, exactly — the plan is flag-less fused compare+branch, same approach as RV64 and LA64. SCRATCH_USAGE is the tracking hook for pass0 to know a scratch register is needed for the fused sequence.
Add the core dynarec infrastructure for PPC64LE, following the incremental approach used by the RV64 dynarec (initial commit 5a9b896). This includes: - PPC64LE emitter macros (ppc64le_emitter.h) - Register mapping and private types (ppc64le_mapping.h, private.h) - Dynarec helper macros (dynarec_ppc64le_helper.h) - Core helper functions: geted, move32/64, jump_to_epilog/next, call_c/call_n, grab_segdata, emit_pf, fpu_reset_cache, fpu_propagate_stack (dynarec_ppc64le_helper.c) - Architecture-specific functions: dynarec_ppc64le_arch.c - Constants table: dynarec_ppc64le_consts.c - Functions for FPU state management (stubbed for x87/SSE/AVX, following RV64 precedent of empty TODO stubs) - Stub printer (returns '???' for all instructions) - Assembly: prolog, epilog, next dispatcher, lock primitives - CMakeLists.txt integration for PPC64LE dynarec - Shared dynarec infrastructure modifications for PPC64LE support - 3 MOV opcodes in _00.c: 0x89 (MOV Ed,Gd), 0x8B (MOV Gd,Ed), 0x8D (LEA Gd,Ed) FPU/x87/MMX/SSE/AVX cache management functions are stubbed with empty bodies, matching the RV64 initial commit pattern where all FPU functions were empty TODO stubs. These will be expanded in a follow-up PR when the first FPU opcodes are implemented.
4093ed9 to
6875dc9
Compare
…ssembly Replace the stub printer (returning '???') with a full PPC64LE instruction disassembler that uses dual GPR name tables (Rn[] and RnZ[]) to correctly display r0 as '0' in base-register (RA) positions where the ISA treats it as literal zero, following the ARM64 xSP/xZR pattern per reviewer request.
| BCL(20, 31, 4); | ||
| MFLR(reg); | ||
| LD(reg, (int16_t)offset, reg); | ||
| MTCTR(reg); |
There was a problem hiding this comment.
no need to setup r12 here? because it's an internal jump to box64 I suppose?
There was a problem hiding this comment.
Yes, exactly — it's an internal jump to box64 code. The targets are ppc64le_next or ppc64le_epilog, both hand-written .S with no .localentry and no addis r2, r12, ... TOC preamble, so the ELFv2 r12 requirement doesn't apply here.
That said, r12 coincidentally already holds the target address (it's the scratch register I used to load it), so the convention is satisfied anyway as a defensive bonus.
src/dynarec/ppc64le/ppc64le_lock.S
Outdated
| ppc64le_lock_read_b: | ||
| // address is r3, return is r3 | ||
| lwsync | ||
| lbarx 3, 0, 3 |
There was a problem hiding this comment.
side note. This convension to use plain number for register is really bad IMO. I really prefer when things are clear and register have a name and not just a number.
Nothing to do of course, it's just me ranting
There was a problem hiding this comment.
You're right — bare numbers are hard to read and error-prone, especially in the load-reserve/store-conditional sequences where a misplaced operand is silent.
I see two options:
Option A: Use %rN syntax throughout the .S files. PPC64LE GAS accepts %r-prefixed register names (this is what GCC itself emits with -S), so lbarx 3, 0, 3 becomes lbarx %r3, 0, %r3. The ASM_MAPPING defines would also change from #define RAX 14 to #define RAX %r14. Straightforward search-and-replace, no invented names.
Option B: Add ABI-name #defines on top of %rN, similar to RV64's lock.S. So a0/a1/t0 etc. mapping to %r3/%r4/%r11, giving us lbarx a0, 0, a0 in lock.S.
I think Option A is the more pragmatic choice — it matches GCC output, Power ISA documentation, and doesn't introduce custom aliases that aren't native to PPC tooling. But happy to go with Option B if you prefer the extra semantic clarity.
Which would you prefer? I can do this in a follow-up commit.
| // We need to load the TOC for LinkNext - use the saved TOC | ||
| // Note: In ELFv2, r12 must point to the function entry for local calls | ||
| // For external calls via PLT, the linker handles it | ||
| bl LinkNext |
There was a problem hiding this comment.
don't you need to setup r12 here? this is confusing.
There was a problem hiding this comment.
Same as jmpnext — I don't need r12 setup here because the targets are never GCC-compiled functions:
- Cache-hit path (line 78): target is a JIT'd dynarec block from
block_cache. r12 already holds the address since I used it as the load scratch. - Slow path (line 132):
LinkNextreturns a dynarec block orppc64le_epilog. I copy the return value into r12 withmr 12, 3on line 116 before the branch, so r12 is already set.
The ELFv2 r12 requirement only matters for calls into C functions with a global entry point TOC setup — I handle those in call_c() and call_n() in the helper, which explicitly set r12 before BCTRL.
|
I'm ok with the commit in it's current state. With the following remark:
@ksco when you have time, check if you are ok or not. |
Summary
Add the core dynarec infrastructure for PPC64LE with 3 MOV opcodes as proof the pipeline works end-to-end. Follows the incremental approach of the RV64 dynarec initial commit (
5a9b896).Depends on #3591 (platform support PR) — please merge that first.
Changes (36 files, ~9200 lines)
Core Infrastructure
ppc64le_emitter.h): Instruction encoding macros for PPC64LE ISA. Argument order follows Power ISA assembly convention (Rt, offset, Ra) — documented in header comment.dynarec_ppc64le_helper.h/c): STEPNAME-aliased macros and core functions (geted,move32/move64,jump_to_epilog/jump_to_next,call_c/call_n,grab_segdata,emit_pf). Displacement modes use named constants (DISP_NONE/DISP_D/DISP_DQ) and magic numbers replaced withPPC64_DISP_MAX.ppc64le_mapping.h): x86-64 to PPC64LE register mapping, ELFv2 ABI-compliant.dynarec_ppc64le_functions.c/h):fpu_reset,inst_name_pass3,updateNativeFlags,get_free_scratch,ppc64le_fast_hash, plus FPU register management stubs.dynarec_ppc64le_arch.c):CancelBlock,FillBlock,AddMarkRecursive, native address resolution, jump table support.dynarec_ppc64le_consts.c/h): Constants table management (unused arrays removed per review).Assembly
ppc64le_prolog.S/ppc64le_epilog.S: Dynarec entry/exit.ppc64le_next.S: Block-to-block dispatcher.ppc64le_lock.S/h: Atomic operations (LL/SC pairs, mutex-based 128-bit CAS).Opcodes (3 MOV instructions in
_00.c)0x89— MOV Ed,Gd0x8B— MOV Gd,Ed0x8D— LEA Gd,EdStubs
_66.c,_f0.c,_66f0.ccontain only DEFAULT fallback.Shared Dynarec Modifications
CMakeLists.txt,dynarec_native_pass.c,dynarec_arch.h,dynarec_helper.h,dynarec_next.h,native_lock.h,dynacache_reloc.h,dynarec.c: PPC64LE#elifbranches.Review Changes
dynarec_ppc64le_arch.hand WIN32 guard (ptitSeb)dynarec_ppc64le_consts.c(ptitSeb)DQ_ALIGNtoDISP_NONE/DISP_D/DISP_DQnamed constants (ptitSeb)PPC64_DISP_MAXconstant (ptitSeb)GOCONDmacro (~80 lines) (ptitSeb)x87_get_st_emptymacro bug, removed duplicatedynarec64_F0(self-audit)BEQZ_safealias no longer needed after GOCOND removalNotes on Reviewer Feedback
Re: unused macros in helper.h (ksco): Most macros in
dynarec_ppc64le_helper.hare required by the shareddynarec_native_pass.cor are building blocks used by other macros. The genuinely unused items (GOCOND, duplicate dynarec64_F0) have been removed.Re: unused functions in functions.c (ksco): All 19 public functions are called from shared dynarec infrastructure (
dynarec_native.c,dynarec_native_pass.c,dynarec_native_functions.c,dynarec.c). None can be removed without breaking compilation.Re: ALSL/BSTRPICK_D/BSTRINS_D names (ksco): These LoongArch names do not exist in the PPC64LE code. The PPC equivalents are already named
SLADDy,BF_EXTRACT, andBF_INSERT.Re: AVX 256-bit (ksco): PPC64LE has 128-bit VSX only. The AVX macros are shared infrastructure stubs required by
dynarec_native_pass.c— they don't imply 256-bit support.Related
5a9b896