fix: Fix `dispose()` stalling because of `Dispatcher` by mrousavy · Pull Request #3704 · Shopify/react-native-skia

mrousavy · 2026-02-11T18:08:06Z

Disclaimer

I am creating this as a PoC PR to discuss further steps - I can only assume what your implementation does, but if you have any specific reason for this to absolutely be there, then we need to discuss next steps.

Journey

In my tests, my Camera pipeline stalled after rendering a few Frames with Skia.
I diagnosed this from top-down - Camera frames are properly freed, NativeBuffers are released, and so on - until I hit TextureHolder from Skia - I added logs in the constructor and destructor:

react-native-skia/packages/skia/apple/SkiaCVPixelBufferUtils.mm

Lines 47 to 51 in 0081cf0

    
           TextureHolder::TextureHolder(CVMetalTextureRef texture) : _texture(texture) {} 
        
           TextureHolder::~TextureHolder() { 
        
             // ARC will now automatically release _texture. 
        
             CFRelease(_texture); 
        
           }

And to my surprise, there were more allocations than deallocations:

It's exactly 6 images not being released before the pipeline stalls - this is just how AVFoundation works, it has a maximum queue of 6 buffers, and once that is exceeded, it stalls:

 [...]
 LOG  render
 LOG  render
 LOG  render
 WARN  Frame Dropped! Reason: out-of-buffers
 WARN  Frame Dropped! Reason: out-of-buffers
 WARN  Frame Dropped! Reason: out-of-buffers

Now I got curious - why are some TextureHolders not deallocated?

The reason for that is the JsiSkImage's destructor - which schedules the deletion on a Dispatcher:

react-native-skia/packages/skia/cpp/api/JsiSkImage.h

Lines 297 to 307 in 0081cf0

    
           void releaseResources() override { 
        
             // Queue deletion on the creation thread if needed 
        
             auto image = getObjectUnchecked(); 
        
             if (image && _dispatcher) { 
        
               _dispatcher->run([image]() { 
        
                 // Image will be deleted when this lambda is destroyed 
        
               }); 
        
             } 
        
             // Clear the object 
        
             JsiSkWrappingSkPtrHostObject<SkImage>::releaseResources(); 
        
           }

..and the Dispatcher's job is only ever called when you create a new JsiSkImage, as it calls processQueue() in it's constructor:

react-native-skia/packages/skia/cpp/api/JsiSkImage.h

Lines 286 to 294 in 0081cf0

    
           JsiSkImage(std::shared_ptr<RNSkPlatformContext> context, 
        
                      const sk_sp<SkImage> image) 
        
               : JsiSkWrappingSkPtrHostObject<SkImage>(std::move(context), 
        
                                                       std::move(image)) { 
        
             // Get the dispatcher for the current thread 
        
             _dispatcher = Dispatcher::getDispatcher(); 
        
             // Process any pending operations 
        
             _dispatcher->processQueue(); 
        
           }

Note

This is the first problem. Why is deletion scheduled? This goes against a few principles in OOP, and languages like Swift even warn you if you run lambdas in a destructor. It's not really a safe way to destroy/cleanup state.

Then I found out that while there are indeed JsiSkImages being created next, they use a different Dispatcher - because your Dispatcher is cached per C++ Thread - see thread_local here:

react-native-skia/packages/skia/cpp/api/JsiSkDispatcher.cpp

Lines 5 to 7 in 0081cf0

    
           // Thread-local storage definition 
        
           thread_local std::shared_ptr<Dispatcher::DispatcherData> 
        
               Dispatcher::_threadDispatcher;

This is a problem for modern (mobile) architectures, as a Thread =/= Queue.

Both iOS and Android use Thread Pools for their Queues - e.g. on Android a Coroutine uses a Thread Pool, and on iOS GCD (DispatchQueue, which is the default for anything AVFoundation related) uses Thread Pools.
So thread_local is not a safe option to cache things.

Note

This also bit me when I tried to cache SkSurface - that apparently also needs to be invalidated if called on a new Thread - I currently work around this issue by invalidating the cache if a thread_local counter I increment is not the same as the previous one - but that's a topic for another day.

To demonstrate this, I created a thread_local counter in C++:

double HybridThreadHelper::getCurrentThreadMarker() {
  static std::atomic_size_t threadCounter{1};
  static thread_local size_t thisThreadId{0};
  if (thisThreadId == 0) {
    thisThreadId = threadCounter.fetch_add(1);
  }
  return static_cast<double>(thisThreadId);
}

When I call this in my JS Frame Processor, I get these results:

..and it grows all the way 'til 12, or possibly even more. Again, this is because of GCD using a Thread Pool.
Kotlin (Coroutine) also does this.

The stalling problem is then logically explainable: Each time the GCD pool uses a different Thread, the previous JsiSkImage's constructor calls Dispatcher::processQueue() on an empty queue - the previously enqueued deletion is then left on the old Thread - which might not be touched again for a few more calls. If there are 5 Thread hops, there's 4 stale deletions. Once it's 6 stale deletions, my Camera pipeline stalls.

So thread_local is not a safe way to assume same caller context - and in your case you rely on JsiSkImage grabbing the Dispatcher for the current Thread, which at first call and second call might be the same, but on third call it's a different one, then on the 10th call it might be another one, and so on.
In total, you'd end up with around 6 different Dispatchers, and

Solution

Just remove the Dispatcher. The image/picture will be deleted immediately in the destructor/dispose function. No need to schedule something, no need to lock a mutex, no need to queue remaining deletions on creation of a new Image.

I removed it (see this PR), and it works fine - no more crashes, no more pipeline stalls.

Now, memory usage is stable, nothing stalls, no Images become stale - even at 60 FPS rendering/SkImage creation throughput:

Motivation before

I can see that you introduced this change in #3407 in attempt to fix #3390.

My app has been running for minutes and Hades GC did run, but the images still get destroyed fine.

We can try to reproduce the issue again (as you mentioned you were able to reproduce), and provide a real fix if the issue actually occurs again.

mrousavy · 2026-02-11T18:43:38Z

Update:

I understand that this issue only happens if Hades (Hermes' GC) destroys the SkImage reference- because then the destructor is called on a separate Thread, which seemed to crash for @jfhamlin. In my case, it isn't Hades GC destroying it but instead I am manually destroying (via dispose()), that's why I probably don't see the crash he reported.

I quickly checked the Skia docs and I think their assumption is;

A raster/CPU resource can be destroyed on any Thread
A GPU resource should be destroyed on the same Thread it was created on

So I guess in the case of a GPU resource, we might actually need to jump to the original Thread to run the deletion, if the destructor is being called on another Thread.

If the crash can actually be reproduced again with my PR, then I suggest the following step by step plan:

dispose() should always delete instantly - not schedule deletion (this would already unblock me)
Scheduling to a different Thread in the destructor should only happen if we are really on another Thread, otherwise we can also just delete immediately (this prevents objects floating around until someone calls processQueue(), which might not happen for a while)
Scheduling to a different Thread should actually do scheduling - we need an actual Dispatcher for this - not a queue that will be picked up later. Then the deletion call is always scheduled, and we don't depend on a new SkImage calling processQueue().

wdyt @wcandillon?

mrousavy · 2026-02-11T18:53:22Z

I created a PR to implement point 1 in my comment above here: #3705

wcandillon · 2026-02-12T08:17:25Z

@mrousavy I agree with all three points above. I am leaning toward removing the Dispatcher altogether. Explicit resource management of GPU-bound objects would be symmetric with WebGPU. I could review our internal APIs to see if this model would fit and implement it.

To your point above, if we still need a Dispatcher for some reason, it should 1) not prevent immediate explicit deletion; 2) work as a real dispatcher (right now we play tricks to drain the dispatch queue).

wcandillon · 2026-02-12T08:19:38Z

The PR makes sense, I just need to review/test everything. I know already of some corner cases but I think these can be handled elegantly with careful thoughts.

wcandillon · 2026-02-12T11:37:05Z

The change in this PR is in the right direction, we just need to test that internally with have explicit GPU resource management working nicely as well as making sure that we don't crash if the object is being GC'd if the user forgot/didn't invoke dipose yet. So just a bit more one to do on this.

mrousavy · 2026-02-12T14:20:55Z

agreed! let me know how I can help. Both Terry and I can do some extensive real world testing

mrousavy added 2 commits February 11, 2026 18:51

fix: Fix dispose() stalling because of Dispatcher

3b6d6a1

delete JsiSkDispatcher

9eee995

mrousavy mentioned this pull request Feb 11, 2026

fix: Schedule deletion only if object was GC'd - allow dispose() fast path #3705

Merged

Merge branch 'main' into fix/fix-dispose-leak

66148b0

mrousavy mentioned this pull request Feb 13, 2026

fix: Purge cache on memory warning to prevent OOM crashes #3711

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix `dispose()` stalling because of `Dispatcher`#3704

fix: Fix `dispose()` stalling because of `Dispatcher`#3704
mrousavy wants to merge 3 commits intoShopify:mainfrom
mrousavy:fix/fix-dispose-leak

mrousavy commented Feb 11, 2026 •

edited

Loading

Uh oh!

mrousavy commented Feb 11, 2026 •

edited

Loading

Uh oh!

mrousavy commented Feb 11, 2026

Uh oh!

wcandillon commented Feb 12, 2026

Uh oh!

wcandillon commented Feb 12, 2026

Uh oh!

wcandillon commented Feb 12, 2026

Uh oh!

mrousavy commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	TextureHolder::TextureHolder(CVMetalTextureRef texture) : _texture(texture) {}
	TextureHolder::~TextureHolder() {
	// ARC will now automatically release _texture.
	CFRelease(_texture);
	}

	void releaseResources() override {
	// Queue deletion on the creation thread if needed
	auto image = getObjectUnchecked();
	if (image && _dispatcher) {
	_dispatcher->run([image]() {
	// Image will be deleted when this lambda is destroyed
	});
	}
	// Clear the object
	JsiSkWrappingSkPtrHostObject<SkImage>::releaseResources();
	}

	JsiSkImage(std::shared_ptr<RNSkPlatformContext> context,
	const sk_sp<SkImage> image)
	: JsiSkWrappingSkPtrHostObject<SkImage>(std::move(context),
	std::move(image)) {
	// Get the dispatcher for the current thread
	_dispatcher = Dispatcher::getDispatcher();
	// Process any pending operations
	_dispatcher->processQueue();
	}

	// Thread-local storage definition
	thread_local std::shared_ptr<Dispatcher::DispatcherData>
	Dispatcher::_threadDispatcher;

Conversation

mrousavy commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Disclaimer

Journey

Solution

Motivation before

Uh oh!

mrousavy commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrousavy commented Feb 11, 2026

Uh oh!

wcandillon commented Feb 12, 2026

Uh oh!

wcandillon commented Feb 12, 2026

Uh oh!

wcandillon commented Feb 12, 2026

Uh oh!

mrousavy commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mrousavy commented Feb 11, 2026 •

edited

Loading

mrousavy commented Feb 11, 2026 •

edited

Loading