Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Aug 9, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.3)

Can you help keep this open source service alive? 💖 Please sponsor : )

XrXr and others added 4 commits August 8, 2025 18:54
These `...` ISEQs have a special calling convention in the interpreter
and our stubs and JIT calling convention don't deal well. Reject for now.
Debugged with help from `@tekknolagi` and `tool/zjit_bisect.rb`.

Merely avoiding direct sends is enough to pass the attached test, but also
avoid compiling ISEQs with `...` parameter to limit exposure for now.

`SendWithoutBlock`, which does dynamic dispatch using interpreter code,
seems to handle calling into forwardable ISEQs correctly, so they are
fine -- we can't predict where these dynamic sends land anyways.
rb_gc_impl_writebarrier_remember is not Ractor safe because it writes to
bitmaps and also pushes onto the mark stack during incremental marking.
We should acquire the VM lock to prevent race conditions.

In the case that the object is not old, there is no performance impact.
However, we can see a performance impact in this microbenchmark where the
object is old:

    4.times.map do
      Ractor.new do
        ary = []

        3.times { GC.start }

        10_000_000.times do |i|
          ary.push(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
          ary.clear
        end
      end
    end.map(&:value)

Before:

    Time (mean ± σ):     682.4 ms ±   5.1 ms    [User: 2564.8 ms, System: 16.0 ms]

After:

    Time (mean ± σ):      5.522 s ±  0.096 s    [User: 8.237 s, System: 7.931 s]

Co-Authored-By: Luke Gruber <luke.gruber@shopify.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
…d_wakeup()

In rb_ractor_sched_wait() (ex: Ractor.receive), we acquire
RACTOR_LOCK(cr) and then thread_sched_lock(cur_th). However, on wakeup
if we're a dnt, in thread_sched_wait_running_turn() we acquire
thread_sched_lock(cur_th) after condvar wakeup and then RACTOR_LOCK(cr).
This lock inversion can cause a deadlock with rb_ractor_wakeup_all()
(ex: port.send(obj)), where we acquire RACTOR_LOCK(other_r) and then
thread_sched_lock(other_th).

So, the error happens:

nt 1:   Ractor.receive
            rb_ractor_sched_wait() after condvar wakeup in thread_sched_wait_running_turn():
              - thread_sched_lock(cur_th) (condvar) # acquires lock
              - rb_ractor_lock_self(cr) # deadlock here: tries to acquire, HANGS

nt 2: port.send
            ractor_wakeup_all()
              - RACTOR_LOCK(port_r) # acquires lock
              - thread_sched_lock # tries to acquire, HANGS

To fix it, we now unlock the thread_sched_lock before acquiring the
ractor_lock in rb_ractor_sched_wait().

Script that reproduces issue:

```ruby
require "async"
class RactorWrapper
  def initialize
    @Ractor = Ractor.new do
      Ractor.recv # Ractor doesn't start until explicitly told to
      # Do some calculations
      fib = ->(x) { x < 2 ? 1 : fib.call(x - 1) + fib.call(x - 2) }
      fib.call(20)
    end
  end

  def take_async
    @ractor.send(nil)
    Thread.new { @ractor.value }.value
  end
end

Async do |task|
  10_000.times do |i|
    task.async do
      RactorWrapper.new.take_async
      puts i
    end
  end
end
exit 0
```

Fixes [Bug #21398]

Co-authored-by: John Hawthorn <john.hawthorn@shopify.com>
Previously, if GC was in progress when we're initially building the
id2ref table, it could see the empty table and then crash when trying to
remove ids from it. This commit fixes the bug by only publishing the
table after GC is done.

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
@pull pull bot locked and limited conversation to collaborators Aug 9, 2025
@pull pull bot added the ⤵️ pull label Aug 9, 2025
@pull pull bot merged commit d80c03d into turkdevops:master Aug 9, 2025
1 of 2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants