[RV64_DYNAREC] Add BOX64_DYNAREC_CALLN_RESTORE option for call_n a0/a1 preservation by zqb-all · Pull Request #3576 · ptitSeb/box64

zqb-all · 2026-02-27T07:07:01Z

RV64 maps x86 registers directly to RISC-V a0-a6, which are caller-saved and clobbered by native function calls. Unlike ARM64 where x86 state lives in callee-saved regs (x9+), RV64's call_n was not restoring these registers after JALR, leaving garbage values that propagated through subsequent dynarec blocks.

This caused crashes in coroutine-heavy code (e.g. WeMeet) where a wrapped function's return value (0) in a0/xRDI was misinterpreted as a null context pointer by co_jump_to_link, triggering exit(1).

ptitSeb · 2026-02-27T07:40:18Z

I still don't understand the issue here. All those register that you restored are caller-saved in system-v ABI. There are not supposed to be preserved when calling a function. Are you sure this is the fix to the coroutine issue? This seems very strange to me.

zqb-all · 2026-02-27T08:33:10Z

Yes, they are caller-saved in x86 System V ABI. But the issue is on the RISC-V side, not the x86 side.

In RV64, x86 registers (RDI/RSI/RDX/RCX/R8/R9/RAX) are mapped directly to RISC-V a0-a6, which are caller-saved in the RISC-V ABI. After JALR in call_n, a0-a6 are clobbered by the native call, but the dynarec continues using these RISC-V registers as if they still hold valid x86 state.

call_c already handles this correctly :

box64/src/dynarec/rv64/dynarec_rv64_helper.c

Lines 633 to 641 in dee9a74

    
           #define GO(A) \ 
        
               if (ret != x##A) { LOAD_REG(A); } 
        
               GO(RDI); 
        
               GO(RSI); 
        
               GO(RDX); 
        
               GO(RCX); 
        
               GO(R8); 
        
               GO(R9); 
        
               GO(RAX);

call_n was missing the LOAD_REG part.

ptitSeb · 2026-02-27T09:04:27Z

But, that was the point of mapping x64 ABI call register to RISC-V ABI call register: so they don't have to be saved when calling a native function.

Again, those RV64 regs are exclusively used to map x64 regs here, so them being clobered should be of no consequence because all those x64 regs are caller saved.

Do you have a concrete example of were the clobering is an issue?

zqb-all · 2026-02-27T11:28:23Z

You're right, I was thinking about it incorrectly. From an ABI perspective there's no need to restore those registers. This patch did avoid the WeMeet crash, but I need to reinvestigate to find the actual root cause — it may have just masked the real issue. I'll report back with a concrete trace.

ptitSeb · 2026-02-27T11:37:17Z

check, using BOX64_SHOWSEGV=1, if you see regular signal, like 24 or something like, that could be linked to a timer. If the co-routines are preemptly multitasked using a combinaison of timer signal and longjmp, the issue might be much more complex!

zqb-all · 2026-03-03T15:16:39Z

Thanks for the advice — I tried that and no timer signals were involved.

After some tracing, I think the root cause is that WeMeet violates the x86-64 System V ABI.
WeMeet uses a custom coroutine library. When a coroutine function finishes, it returns to a trampoline called co_jump_to_link. This trampoline checks xRDI:

000000000094c82c <co_jump_to_link>:
  94c82c: 48 c7 c0 00 00 00 00          movq    $0x0, %rax
  94c833: 48 83 ff 00                   cmpq    $0x0, %rdi
  94c837: 75 03                         jne     0x94c83c <co_jump_to_link+0x10>
  94c839: 41 ff d5                      callq   *%r13    ; exit(1) if xRDI== 0
  94c83c: 4c 89 e7                      movq    %r12, %rdi
  94c83f: 41 ff d6                      callq   *%r14

xRDI is a caller-saved register, so its value is undefined after a function returns.
On native x86, the coroutine function returns 0 in xRAX, but xRDI happens to still hold its pre-call value (a non-zero pointer), so it works.
On RV64 with Box64, xRDI is mapped to a0, which is also the return value register. After call_n, a0 = 0 for the return value, and MV(xRAX, A0) copies a0 to xRAX — but now xRDI = 0, so co_jump_to_link detects that and calls exit(1).

This is not a Box64 bug, but could we add a workaround? For example, an environment variable to optionally restore xRDI after call_n?

ptitSeb · 2026-03-03T15:28:38Z

Sure, why not. I guess it can be usefull for every plateform that want to run WeMeet.
I'm still surprise it runs fine on native x86. Do you have any idea what kind of functions are called? is it many function or just a specific one?

zqb-all · 2026-03-03T16:05:42Z

A quick analysis with Claude's help shows:

At the call site, the coroutine function pointer is hardcoded:

  $ llvm-objdump -d --start-address=0x92c1d2 --stop-address=0x92c1f5 libwemeet_base.so
    92c1d9: 48 05 a0 00 00 00    addq   $0xa0, %rax
    92c1df: 48 89 c7             movq   %rax, %rdi          # RDI = coroutine context (obj+0xa0)
    92c1e2: 48 8d 35 d7 0a 00 00 leaq   0xad7(%rip), %rsi   # RSI = 0x92ccc0 (the ONE coroutine function)
    92c1e9: 48 8b 95 50 ff ff ff movq   -0xb0(%rbp), %rdx   # RDX = obj (passed as arg)
    92c1f0: e8 db c7 03 00       callq  0x9689d0            # co_init_context(ctx, func, arg)

co_init_context sets up the coroutine stack

The coroutine function 0x92ccc0 ends with:

  $ llvm-objdump -d --start-address=0x92d004 --stop-address=0x92d02f libwemeet_base.so
    92d004: callq  0x92aa20             # returns some pointer in RAX
    92d009: movq   -0x10(%rbp), %rcx
    92d014: movq   %rax, %rdi           # RDI = RAX (non-zero pointer)
    92d017: movl   $0x1, %esi
    92d01c: movl   $0x3, %edx
    92d021: callq  0x92ab30             # last call before ret, returns 0 in RAX
    92d026: addq   $0x130, %rsp
    92d02d: popq   %rbp
    92d02e: retq                        # -> co_jump_to_link

Now, xRDI depends on 0x92ab30:

00000000009298c0 <_ZNSt10shared_ptrIiEC2ERKS0_>:
  92ab30: 55                            pushq   %rbp
  92ab31: 48 89 e5                      movq    %rsp, %rbp
  92ab34: 48 83 ec 40                   subq    $0x40, %rsp
  92ab38: 48 89 7d f8                   movq    %rdi, -0x8(%rbp)
  92ab3c: 89 75 f4                      movl    %esi, -0xc(%rbp)
  92ab3f: 89 55 f0                      movl    %edx, -0x10(%rbp)
  92ab42: 48 89 4d e8                   movq    %rcx, -0x18(%rbp)
  92ab46: 48 8b 4d f8                   movq    -0x8(%rbp), %rcx
  92ab4a: 48 89 cf                      movq    %rcx, %rdi
  92ab4d: 48 83 c7 58                   addq    $0x58, %rdi
  92ab51: 48 8d 45 e0                   leaq    -0x20(%rbp), %rax
  92ab55: 48 89 7d c8                   movq    %rdi, -0x38(%rbp)
  92ab59: 48 89 c7                      movq    %rax, %rdi
  92ab5c: 48 8b 75 c8                   movq    -0x38(%rbp), %rsi
  92ab60: 48 89 4d c0                   movq    %rcx, -0x40(%rbp)
  92ab64: e8 47 32 94 ff                callq   0x26ddb0 <_ZNSt10lock_guardISt5mutexEC2ERS0_@plt>
  92ab69: 8b 75 f4                      movl    -0xc(%rbp), %esi
  92ab6c: 8b 55 f0                      movl    -0x10(%rbp), %edx
  92ab6f: 48 8b 4d e8                   movq    -0x18(%rbp), %rcx
  92ab73: 48 8b 7d c0                   movq    -0x40(%rbp), %rdi
  92ab77: e8 34 00 00 00                callq   0x92abb0 <_ZNSt10shared_ptrIiEC2ERKS0_+0x12f0>
  92ab7c: e9 00 00 00 00                jmp     0x92ab81 <_ZNSt10shared_ptrIiEC2ERKS0_+0x12c1>
  92ab81: 48 8d 7d e0                   leaq    -0x20(%rbp), %rdi
  92ab85: e8 d6 8e 93 ff                callq   0x263a60 <_ZNSt10lock_guardISt5mutexED2Ev@plt>
  92ab8a: 48 83 c4 40                   addq    $0x40, %rsp
  92ab8e: 5d                            popq    %rbp
  92ab8f: c3                            retq

Finally, xRDI depends on ~lock_guard() -> mutex.unlock()

ptitSeb · 2026-03-03T16:14:25Z

A quick analysis with Claude's help shows:

At the call site, the coroutine function pointer is hardcoded:

  $ llvm-objdump -d --start-address=0x92c1d2 --stop-address=0x92c1f5 libwemeet_base.so
    92c1d9: 48 05 a0 00 00 00    addq   $0xa0, %rax
    92c1df: 48 89 c7             movq   %rax, %rdi          # RDI = coroutine context (obj+0xa0)
    92c1e2: 48 8d 35 d7 0a 00 00 leaq   0xad7(%rip), %rsi   # RSI = 0x92ccc0 (the ONE coroutine function)
    92c1e9: 48 8b 95 50 ff ff ff movq   -0xb0(%rbp), %rdx   # RDX = obj (passed as arg)
    92c1f0: e8 db c7 03 00       callq  0x9689d0            # co_init_context(ctx, func, arg)

co_init_context sets up the coroutine stack

The coroutine function 0x92ccc0 ends with:

  $ llvm-objdump -d --start-address=0x92d004 --stop-address=0x92d02f libwemeet_base.so
    92d004: callq  0x92aa20             # returns some pointer in RAX
    92d009: movq   -0x10(%rbp), %rcx
    92d014: movq   %rax, %rdi           # RDI = RAX (non-zero pointer)
    92d017: movl   $0x1, %esi
    92d01c: movl   $0x3, %edx
    92d021: callq  0x92ab30             # last call before ret, returns 0 in RAX
    92d026: addq   $0x130, %rsp
    92d02d: popq   %rbp
    92d02e: retq                        # -> co_jump_to_link

Now, xRDI depends on 0x92ab30:

00000000009298c0 <_ZNSt10shared_ptrIiEC2ERKS0_>:
  92ab30: 55                            pushq   %rbp
  92ab31: 48 89 e5                      movq    %rsp, %rbp
  92ab34: 48 83 ec 40                   subq    $0x40, %rsp
  92ab38: 48 89 7d f8                   movq    %rdi, -0x8(%rbp)
  92ab3c: 89 75 f4                      movl    %esi, -0xc(%rbp)
  92ab3f: 89 55 f0                      movl    %edx, -0x10(%rbp)
  92ab42: 48 89 4d e8                   movq    %rcx, -0x18(%rbp)
  92ab46: 48 8b 4d f8                   movq    -0x8(%rbp), %rcx
  92ab4a: 48 89 cf                      movq    %rcx, %rdi
  92ab4d: 48 83 c7 58                   addq    $0x58, %rdi
  92ab51: 48 8d 45 e0                   leaq    -0x20(%rbp), %rax
  92ab55: 48 89 7d c8                   movq    %rdi, -0x38(%rbp)
  92ab59: 48 89 c7                      movq    %rax, %rdi
  92ab5c: 48 8b 75 c8                   movq    -0x38(%rbp), %rsi
  92ab60: 48 89 4d c0                   movq    %rcx, -0x40(%rbp)
  92ab64: e8 47 32 94 ff                callq   0x26ddb0 <_ZNSt10lock_guardISt5mutexEC2ERS0_@plt>
  92ab69: 8b 75 f4                      movl    -0xc(%rbp), %esi
  92ab6c: 8b 55 f0                      movl    -0x10(%rbp), %edx
  92ab6f: 48 8b 4d e8                   movq    -0x18(%rbp), %rcx
  92ab73: 48 8b 7d c0                   movq    -0x40(%rbp), %rdi
  92ab77: e8 34 00 00 00                callq   0x92abb0 <_ZNSt10shared_ptrIiEC2ERKS0_+0x12f0>
  92ab7c: e9 00 00 00 00                jmp     0x92ab81 <_ZNSt10shared_ptrIiEC2ERKS0_+0x12c1>
  92ab81: 48 8d 7d e0                   leaq    -0x20(%rbp), %rdi
  92ab85: e8 d6 8e 93 ff                callq   0x263a60 <_ZNSt10lock_guardISt5mutexED2Ev@plt>
  92ab8a: 48 83 c4 40                   addq    $0x40, %rsp
  92ab8e: 5d                            popq    %rbp
  92ab8f: c3                            retq

Finally, xRDI depends on ~lock_guard() -> mutex.unlock()

That last function doesn't preserve RDI. The inital value RDI is saved at rbp-8 but never restored at return.

…1 preservation On RV64, xRDI is mapped to a0 and xRSI to a1 (RISC-V return value registers). After call_n's JALR, a0/a1 contain the native function's return value, silently overwriting xRDI/xRSI. Per x86-64 ABI this is correct (RDI/RSI are caller-saved), but some programs (e.g. WeMeet's coroutine library) violate the ABI by depending on caller-saved registers surviving function calls. On native x86-64, RAX and RDI are separate physical registers, so a function returning 0 in RAX doesn't affect RDI. On RV64, the aliasing (xRDI=a0=return value) causes these ABI-violating programs to crash. BOX64_DYNAREC_CALLN_RESTORE=1 saves/restores xRDI(a0) and xRSI(a1) around native calls in call_n. Off by default. Enable per-app via box64rc.

ptitSeb · 2026-03-03T17:13:30Z

I am still not convinced this is the right fix for the issue, as the code you show is not doing what you said it's doing.

zqb-all · 2026-03-04T01:25:04Z

Do you mean that from this code, it's not clear why the x86 version works? My understanding is that the existing x86 implementation is actually unsafe; it doesn't restore RDI, but relies on the fact that after the last function uses RDI, its value happens to be non-zero.

ptitSeb · 2026-03-04T07:34:36Z

Do you mean that from this code, it's not clear why the x86 version works? My understanding is that the existing x86 implementation is actually unsafe; it doesn't restore RDI, but relies on the fact that after the last function uses RDI, its value happens to be non-zero.

From this code, your explanation doesn't add up. I see 0 evidence that RDI should be preserve for some external call. This absolutly not a proof at all.

I still think your change is just a lucky one, that happen to fix the startup case for your machine, but have 0 garantie it will work on other machine or if it actualy fixes anythng at all.

zqb-all · 2026-03-04T08:27:46Z

Restoring RDI to a non-zero value has the direct effect of preventing the co_jump_to_link function from calling exit(1) due to detecting RDI as zero. However, I haven't thoroughly researched what the coroutine function actually does internally, or what role RDI plays after passing the checkpoint in co_jump_to_link.

zqb-all · 2026-03-04T09:59:02Z

I also feel this approach isn't ideal. This patch essentially restores an earlier RDI value, so if the x86 callee modifies RDI, we won't be able to stay consistent with it. I just haven't thought of a better way to handle this problem for now.

zqb-all mentioned this pull request Feb 27, 2026

WeMeet on RISCV Unhandler signal caught, SIGABRT on libstdc++.so.6 #3499

Open

zqb-all marked this pull request as draft February 27, 2026 11:28

zqb-all force-pushed the rv64-call_n branch from d66e42d to 8015d27 Compare March 3, 2026 17:00

Merge branch 'main' into rv64-call_n

fb0e301

zqb-all changed the title ~~[RV64_DYNAREC] Restore caller-saved x86 regs after native call in call_n~~ [RV64_DYNAREC] Add BOX64_DYNAREC_CALLN_RESTORE option for call_n a0/a1 preservation Mar 3, 2026

zqb-all marked this pull request as ready for review March 3, 2026 17:04

Uh oh!

Conversation

zqb-all commented Feb 27, 2026

Uh oh!

ptitSeb commented Feb 27, 2026

Uh oh!

zqb-all commented Feb 27, 2026

Uh oh!

ptitSeb commented Feb 27, 2026

Uh oh!

zqb-all commented Feb 27, 2026

Uh oh!

ptitSeb commented Feb 27, 2026

Uh oh!

zqb-all commented Mar 3, 2026

Uh oh!

ptitSeb commented Mar 3, 2026

Uh oh!

zqb-all commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ptitSeb commented Mar 3, 2026

Uh oh!

ptitSeb commented Mar 3, 2026

Uh oh!

zqb-all commented Mar 4, 2026

Uh oh!

ptitSeb commented Mar 4, 2026

Uh oh!

zqb-all commented Mar 4, 2026

Uh oh!

zqb-all commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zqb-all commented Mar 3, 2026 •

edited

Loading