Skip to content

[RV64_DYNAREC] Add BOX64_DYNAREC_CALLN_RESTORE option for call_n a0/a1 preservation#3576

Open
zqb-all wants to merge 2 commits intoptitSeb:mainfrom
zqb-all:rv64-call_n
Open

[RV64_DYNAREC] Add BOX64_DYNAREC_CALLN_RESTORE option for call_n a0/a1 preservation#3576
zqb-all wants to merge 2 commits intoptitSeb:mainfrom
zqb-all:rv64-call_n

Conversation

@zqb-all
Copy link
Contributor

@zqb-all zqb-all commented Feb 27, 2026

RV64 maps x86 registers directly to RISC-V a0-a6, which are caller-saved and clobbered by native function calls. Unlike ARM64 where x86 state lives in callee-saved regs (x9+), RV64's call_n was not restoring these registers after JALR, leaving garbage values that propagated through subsequent dynarec blocks.

This caused crashes in coroutine-heavy code (e.g. WeMeet) where a wrapped function's return value (0) in a0/xRDI was misinterpreted as a null context pointer by co_jump_to_link, triggering exit(1).

@ptitSeb
Copy link
Owner

ptitSeb commented Feb 27, 2026

I still don't understand the issue here. All those register that you restored are caller-saved in system-v ABI. There are not supposed to be preserved when calling a function. Are you sure this is the fix to the coroutine issue? This seems very strange to me.

@zqb-all
Copy link
Contributor Author

zqb-all commented Feb 27, 2026

Yes, they are caller-saved in x86 System V ABI. But the issue is on the RISC-V side, not the x86 side.

In RV64, x86 registers (RDI/RSI/RDX/RCX/R8/R9/RAX) are mapped directly to RISC-V a0-a6, which are caller-saved in the RISC-V ABI. After JALR in call_n, a0-a6 are clobbered by the native call, but the dynarec continues using these RISC-V registers as if they still hold valid x86 state.

call_c already handles this correctly :

#define GO(A) \
if (ret != x##A) { LOAD_REG(A); }
GO(RDI);
GO(RSI);
GO(RDX);
GO(RCX);
GO(R8);
GO(R9);
GO(RAX);

call_n was missing the LOAD_REG part.

@ptitSeb
Copy link
Owner

ptitSeb commented Feb 27, 2026

But, that was the point of mapping x64 ABI call register to RISC-V ABI call register: so they don't have to be saved when calling a native function.

Again, those RV64 regs are exclusively used to map x64 regs here, so them being clobered should be of no consequence because all those x64 regs are caller saved.

Do you have a concrete example of were the clobering is an issue?

@zqb-all
Copy link
Contributor Author

zqb-all commented Feb 27, 2026

You're right, I was thinking about it incorrectly. From an ABI perspective there's no need to restore those registers. This patch did avoid the WeMeet crash, but I need to reinvestigate to find the actual root cause — it may have just masked the real issue. I'll report back with a concrete trace.

@zqb-all zqb-all marked this pull request as draft February 27, 2026 11:28
@ptitSeb
Copy link
Owner

ptitSeb commented Feb 27, 2026

check, using BOX64_SHOWSEGV=1, if you see regular signal, like 24 or something like, that could be linked to a timer. If the co-routines are preemptly multitasked using a combinaison of timer signal and longjmp, the issue might be much more complex!

@zqb-all
Copy link
Contributor Author

zqb-all commented Mar 3, 2026

Thanks for the advice — I tried that and no timer signals were involved.

After some tracing, I think the root cause is that WeMeet violates the x86-64 System V ABI.
WeMeet uses a custom coroutine library. When a coroutine function finishes, it returns to a trampoline called co_jump_to_link. This trampoline checks xRDI:

000000000094c82c <co_jump_to_link>:
  94c82c: 48 c7 c0 00 00 00 00          movq    $0x0, %rax
  94c833: 48 83 ff 00                   cmpq    $0x0, %rdi
  94c837: 75 03                         jne     0x94c83c <co_jump_to_link+0x10>
  94c839: 41 ff d5                      callq   *%r13    ; exit(1) if xRDI== 0
  94c83c: 4c 89 e7                      movq    %r12, %rdi
  94c83f: 41 ff d6                      callq   *%r14

xRDI is a caller-saved register, so its value is undefined after a function returns.
On native x86, the coroutine function returns 0 in xRAX, but xRDI happens to still hold its pre-call value (a non-zero pointer), so it works.
On RV64 with Box64, xRDI is mapped to a0, which is also the return value register. After call_n, a0 = 0 for the return value, and MV(xRAX, A0) copies a0 to xRAX — but now xRDI = 0, so co_jump_to_link detects that and calls exit(1).

This is not a Box64 bug, but could we add a workaround? For example, an environment variable to optionally restore xRDI after call_n?

@ptitSeb
Copy link
Owner

ptitSeb commented Mar 3, 2026

Sure, why not. I guess it can be usefull for every plateform that want to run WeMeet.
I'm still surprise it runs fine on native x86. Do you have any idea what kind of functions are called? is it many function or just a specific one?

@zqb-all
Copy link
Contributor Author

zqb-all commented Mar 3, 2026

A quick analysis with Claude's help shows:

At the call site, the coroutine function pointer is hardcoded:

  $ llvm-objdump -d --start-address=0x92c1d2 --stop-address=0x92c1f5 libwemeet_base.so
    92c1d9: 48 05 a0 00 00 00    addq   $0xa0, %rax
    92c1df: 48 89 c7             movq   %rax, %rdi          # RDI = coroutine context (obj+0xa0)
    92c1e2: 48 8d 35 d7 0a 00 00 leaq   0xad7(%rip), %rsi   # RSI = 0x92ccc0 (the ONE coroutine function)
    92c1e9: 48 8b 95 50 ff ff ff movq   -0xb0(%rbp), %rdx   # RDX = obj (passed as arg)
    92c1f0: e8 db c7 03 00       callq  0x9689d0            # co_init_context(ctx, func, arg)

co_init_context sets up the coroutine stack

The coroutine function 0x92ccc0 ends with:

  $ llvm-objdump -d --start-address=0x92d004 --stop-address=0x92d02f libwemeet_base.so
    92d004: callq  0x92aa20             # returns some pointer in RAX
    92d009: movq   -0x10(%rbp), %rcx
    92d014: movq   %rax, %rdi           # RDI = RAX (non-zero pointer)
    92d017: movl   $0x1, %esi
    92d01c: movl   $0x3, %edx
    92d021: callq  0x92ab30             # last call before ret, returns 0 in RAX
    92d026: addq   $0x130, %rsp
    92d02d: popq   %rbp
    92d02e: retq                        # -> co_jump_to_link

Now, xRDI depends on 0x92ab30:

00000000009298c0 <_ZNSt10shared_ptrIiEC2ERKS0_>:
  92ab30: 55                            pushq   %rbp
  92ab31: 48 89 e5                      movq    %rsp, %rbp
  92ab34: 48 83 ec 40                   subq    $0x40, %rsp
  92ab38: 48 89 7d f8                   movq    %rdi, -0x8(%rbp)
  92ab3c: 89 75 f4                      movl    %esi, -0xc(%rbp)
  92ab3f: 89 55 f0                      movl    %edx, -0x10(%rbp)
  92ab42: 48 89 4d e8                   movq    %rcx, -0x18(%rbp)
  92ab46: 48 8b 4d f8                   movq    -0x8(%rbp), %rcx
  92ab4a: 48 89 cf                      movq    %rcx, %rdi
  92ab4d: 48 83 c7 58                   addq    $0x58, %rdi
  92ab51: 48 8d 45 e0                   leaq    -0x20(%rbp), %rax
  92ab55: 48 89 7d c8                   movq    %rdi, -0x38(%rbp)
  92ab59: 48 89 c7                      movq    %rax, %rdi
  92ab5c: 48 8b 75 c8                   movq    -0x38(%rbp), %rsi
  92ab60: 48 89 4d c0                   movq    %rcx, -0x40(%rbp)
  92ab64: e8 47 32 94 ff                callq   0x26ddb0 <_ZNSt10lock_guardISt5mutexEC2ERS0_@plt>
  92ab69: 8b 75 f4                      movl    -0xc(%rbp), %esi
  92ab6c: 8b 55 f0                      movl    -0x10(%rbp), %edx
  92ab6f: 48 8b 4d e8                   movq    -0x18(%rbp), %rcx
  92ab73: 48 8b 7d c0                   movq    -0x40(%rbp), %rdi
  92ab77: e8 34 00 00 00                callq   0x92abb0 <_ZNSt10shared_ptrIiEC2ERKS0_+0x12f0>
  92ab7c: e9 00 00 00 00                jmp     0x92ab81 <_ZNSt10shared_ptrIiEC2ERKS0_+0x12c1>
  92ab81: 48 8d 7d e0                   leaq    -0x20(%rbp), %rdi
  92ab85: e8 d6 8e 93 ff                callq   0x263a60 <_ZNSt10lock_guardISt5mutexED2Ev@plt>
  92ab8a: 48 83 c4 40                   addq    $0x40, %rsp
  92ab8e: 5d                            popq    %rbp
  92ab8f: c3                            retq

Finally, xRDI depends on ~lock_guard() -> mutex.unlock()

@ptitSeb
Copy link
Owner

ptitSeb commented Mar 3, 2026

A quick analysis with Claude's help shows:

At the call site, the coroutine function pointer is hardcoded:

  $ llvm-objdump -d --start-address=0x92c1d2 --stop-address=0x92c1f5 libwemeet_base.so
    92c1d9: 48 05 a0 00 00 00    addq   $0xa0, %rax
    92c1df: 48 89 c7             movq   %rax, %rdi          # RDI = coroutine context (obj+0xa0)
    92c1e2: 48 8d 35 d7 0a 00 00 leaq   0xad7(%rip), %rsi   # RSI = 0x92ccc0 (the ONE coroutine function)
    92c1e9: 48 8b 95 50 ff ff ff movq   -0xb0(%rbp), %rdx   # RDX = obj (passed as arg)
    92c1f0: e8 db c7 03 00       callq  0x9689d0            # co_init_context(ctx, func, arg)

co_init_context sets up the coroutine stack

The coroutine function 0x92ccc0 ends with:

  $ llvm-objdump -d --start-address=0x92d004 --stop-address=0x92d02f libwemeet_base.so
    92d004: callq  0x92aa20             # returns some pointer in RAX
    92d009: movq   -0x10(%rbp), %rcx
    92d014: movq   %rax, %rdi           # RDI = RAX (non-zero pointer)
    92d017: movl   $0x1, %esi
    92d01c: movl   $0x3, %edx
    92d021: callq  0x92ab30             # last call before ret, returns 0 in RAX
    92d026: addq   $0x130, %rsp
    92d02d: popq   %rbp
    92d02e: retq                        # -> co_jump_to_link

Now, xRDI depends on 0x92ab30:

00000000009298c0 <_ZNSt10shared_ptrIiEC2ERKS0_>:
  92ab30: 55                            pushq   %rbp
  92ab31: 48 89 e5                      movq    %rsp, %rbp
  92ab34: 48 83 ec 40                   subq    $0x40, %rsp
  92ab38: 48 89 7d f8                   movq    %rdi, -0x8(%rbp)
  92ab3c: 89 75 f4                      movl    %esi, -0xc(%rbp)
  92ab3f: 89 55 f0                      movl    %edx, -0x10(%rbp)
  92ab42: 48 89 4d e8                   movq    %rcx, -0x18(%rbp)
  92ab46: 48 8b 4d f8                   movq    -0x8(%rbp), %rcx
  92ab4a: 48 89 cf                      movq    %rcx, %rdi
  92ab4d: 48 83 c7 58                   addq    $0x58, %rdi
  92ab51: 48 8d 45 e0                   leaq    -0x20(%rbp), %rax
  92ab55: 48 89 7d c8                   movq    %rdi, -0x38(%rbp)
  92ab59: 48 89 c7                      movq    %rax, %rdi
  92ab5c: 48 8b 75 c8                   movq    -0x38(%rbp), %rsi
  92ab60: 48 89 4d c0                   movq    %rcx, -0x40(%rbp)
  92ab64: e8 47 32 94 ff                callq   0x26ddb0 <_ZNSt10lock_guardISt5mutexEC2ERS0_@plt>
  92ab69: 8b 75 f4                      movl    -0xc(%rbp), %esi
  92ab6c: 8b 55 f0                      movl    -0x10(%rbp), %edx
  92ab6f: 48 8b 4d e8                   movq    -0x18(%rbp), %rcx
  92ab73: 48 8b 7d c0                   movq    -0x40(%rbp), %rdi
  92ab77: e8 34 00 00 00                callq   0x92abb0 <_ZNSt10shared_ptrIiEC2ERKS0_+0x12f0>
  92ab7c: e9 00 00 00 00                jmp     0x92ab81 <_ZNSt10shared_ptrIiEC2ERKS0_+0x12c1>
  92ab81: 48 8d 7d e0                   leaq    -0x20(%rbp), %rdi
  92ab85: e8 d6 8e 93 ff                callq   0x263a60 <_ZNSt10lock_guardISt5mutexED2Ev@plt>
  92ab8a: 48 83 c4 40                   addq    $0x40, %rsp
  92ab8e: 5d                            popq    %rbp
  92ab8f: c3                            retq

Finally, xRDI depends on ~lock_guard() -> mutex.unlock()

That last function doesn't preserve RDI. The inital value RDI is saved at rbp-8 but never restored at return.

…1 preservation

On RV64, xRDI is mapped to a0 and xRSI to a1 (RISC-V return value registers).
After call_n's JALR, a0/a1 contain the native function's return value, silently
overwriting xRDI/xRSI. Per x86-64 ABI this is correct (RDI/RSI are caller-saved),
but some programs (e.g. WeMeet's coroutine library) violate the ABI by depending
on caller-saved registers surviving function calls.

On native x86-64, RAX and RDI are separate physical registers, so a function
returning 0 in RAX doesn't affect RDI. On RV64, the aliasing (xRDI=a0=return
value) causes these ABI-violating programs to crash.

BOX64_DYNAREC_CALLN_RESTORE=1 saves/restores xRDI(a0) and xRSI(a1) around
native calls in call_n. Off by default. Enable per-app via box64rc.
@zqb-all zqb-all changed the title [RV64_DYNAREC] Restore caller-saved x86 regs after native call in call_n [RV64_DYNAREC] Add BOX64_DYNAREC_CALLN_RESTORE option for call_n a0/a1 preservation Mar 3, 2026
@zqb-all zqb-all marked this pull request as ready for review March 3, 2026 17:04
@ptitSeb
Copy link
Owner

ptitSeb commented Mar 3, 2026

I am still not convinced this is the right fix for the issue, as the code you show is not doing what you said it's doing.

@zqb-all
Copy link
Contributor Author

zqb-all commented Mar 4, 2026

Do you mean that from this code, it's not clear why the x86 version works? My understanding is that the existing x86 implementation is actually unsafe; it doesn't restore RDI, but relies on the fact that after the last function uses RDI, its value happens to be non-zero.

@ptitSeb
Copy link
Owner

ptitSeb commented Mar 4, 2026

Do you mean that from this code, it's not clear why the x86 version works? My understanding is that the existing x86 implementation is actually unsafe; it doesn't restore RDI, but relies on the fact that after the last function uses RDI, its value happens to be non-zero.

From this code, your explanation doesn't add up. I see 0 evidence that RDI should be preserve for some external call. This absolutly not a proof at all.

I still think your change is just a lucky one, that happen to fix the startup case for your machine, but have 0 garantie it will work on other machine or if it actualy fixes anythng at all.

@zqb-all
Copy link
Contributor Author

zqb-all commented Mar 4, 2026

Restoring RDI to a non-zero value has the direct effect of preventing the co_jump_to_link function from calling exit(1) due to detecting RDI as zero. However, I haven't thoroughly researched what the coroutine function actually does internally, or what role RDI plays after passing the checkpoint in co_jump_to_link.

@zqb-all
Copy link
Contributor Author

zqb-all commented Mar 4, 2026

I also feel this approach isn't ideal. This patch essentially restores an earlier RDI value, so if the x86 callee modifies RDI, we won't be able to stay consistent with it. I just haven't thought of a better way to handle this problem for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants