Add LlmModule thread safety instrumentation tests by psiddh · Pull Request #19715 · pytorch/executorch

psiddh · 2026-05-21T07:24:05Z

Summary: -

Differential Revision: D105886777

pytorch-bot · 2026-05-21T07:24:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19715

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-05-21T07:24:14Z

@psiddh has exported this pull request. If you are a Meta employee, you can view the originating Diff in D105886777.

github-actions · 2026-05-21T07:25:04Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Adds Android instrumentation tests that exercise the documented concurrency/lifecycle contract of org.pytorch.executorch.extension.llm.LlmModule, focusing on concurrent generate() calls, cross-thread stop(), and close() idempotency.

Changes:

Introduces a new LlmThreadSafetyTest androidTest suite for LlmModule.
Adds coverage for concurrent generation behavior, stop signaling from another thread, idle-stop behavior, and use-after-close semantics.

Comments suppressed due to low confidence (2)

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:60

Same issue as above for the tokenizer fixture: if the resource stream is null, .use {} will NPE before the requireNotNull message is evaluated. Use the requireNotNull(getResourceAsStream(...)) { ... }.use { ... } pattern to get a clear failure when android_test_setup.sh hasn’t run.

    javaClass.getResourceAsStream(TOKENIZER_FILE_NAME).use { stream ->
      requireNotNull(stream) {
        "Test resource $TOKENIZER_FILE_NAME not found; did android_test_setup.sh run?"
      }
      FileUtils.copyInputStreamToFile(stream, tokenizerFile)
    }

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:199

thresholdReached.await() / generateDone.await() are unbounded waits. If token callbacks never arrive (e.g., model load failure), this will hang until the overall test timeout. Using await(timeout, unit) with an assert (and possibly calling llmModule.stop() on timeout) makes failures faster and helps avoid orphaned generate threads.

    // Wait for exactly TOKEN_THRESHOLD tokens, then signal stop
    thresholdReached.await()
    llmModule.stop()

    // Wait for generate() to return
    generateDone.await()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Summary: - Differential Revision: D105886777

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (4)

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:118

The test uses several 20s awaits (e.g., waiting for the first token) and a 30s method timeout. Existing LLM instrumentation/perf tests allow much longer for time-to-first-token on emulator (often up to ~30s) and overall test wall-time (minutes). To avoid CI flakes, consider bumping these awaits/timeouts (or making them configurable via instrumentation args) so they can accommodate slow environments.

        assertTrue(
            "Thread A did not produce a token in time",
            threadAProducedToken.await(20, TimeUnit.SECONDS),
        )

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:100

testConcurrentGenerateThrowsOrSerializes() intends to guarantee Thread A is still holding the lock when Thread B calls generate(), but the callback only counts down a latch and does not block generation. On fast devices, Thread A could finish soon after the first token, so Thread B may never contend for the lock and the test won’t actually validate serialization behavior. Consider adding a second latch to pause Thread A’s callback after the first token until Thread B has entered generate() (or until Thread B signals it is blocked), then release Thread A to complete.

              override fun onResult(result: String) {
                threadATokens.incrementAndGet()
                threadAProducedToken.countDown()
              }

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:136

Catching RuntimeException here is too broad and can mask real bugs (e.g., NullPointerException) as an “expected” outcome. LlmModule.generate() failures are expected to surface as ExecutorchRuntimeException (a RuntimeException subclass) or IllegalStateException; consider catching ExecutorchRuntimeException explicitly instead of all RuntimeException so unexpected runtime errors still fail the test.

      } catch (_: RuntimeException) {
        // Valid: serialized second generate() may fail (e.g., dirty KV cache state)
        threadBRejected.set(true)
      } catch (e: Exception) {

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:182

testStopFromDifferentThread() can be timing-flaky: after thresholdReached is counted down, generation continues running while the main thread wakes up and calls stop(), so on a fast device it’s possible to reach LONG_SEQ_LEN before stop() is issued, violating the final assertion. To make the contract deterministic, consider pausing generation at the threshold (e.g., have onResult await a “stopIssued” latch after counting down thresholdReached) so stop() is guaranteed to happen before additional tokens are produced.

              override fun onResult(result: String) {
                if (tokensReceived.incrementAndGet() == TOKEN_THRESHOLD) {
                  thresholdReached.countDown()
                }
              }

meta-codesync · 2026-05-21T07:53:48Z

@psiddh has imported this pull request. If you are a Meta employee, you can view this in D105886777.

Copilot AI review requested due to automatic review settings May 21, 2026 07:24

psiddh requested a review from kirklandsign as a code owner May 21, 2026 07:24

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 21, 2026

meta-codesync Bot added fb-exported meta-exported labels May 21, 2026

Copilot started reviewing on behalf of psiddh May 21, 2026 07:24 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread ...ndroid/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt Outdated

Comment thread ...ndroid/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt

psiddh force-pushed the export-D105886777 branch from f2e80c4 to f656c1d Compare May 21, 2026 07:32

Add LlmModule thread safety instrumentation tests

eaa081d

Summary: - Differential Revision: D105886777

Copilot AI review requested due to automatic review settings May 21, 2026 07:45

psiddh force-pushed the export-D105886777 branch from f656c1d to eaa081d Compare May 21, 2026 07:45

Copilot started reviewing on behalf of psiddh May 21, 2026 07:46 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread ...ndroid/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt

Comment thread ...ndroid/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt

kirklandsign approved these changes May 21, 2026

View reviewed changes

psiddh merged commit ab8fcba into pytorch:main May 21, 2026
177 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LlmModule thread safety instrumentation tests#19715

Add LlmModule thread safety instrumentation tests#19715
psiddh merged 1 commit into
pytorch:mainfrom
psiddh:export-D105886777

psiddh commented May 21, 2026

Uh oh!

pytorch-bot Bot commented May 21, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

meta-codesync Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

psiddh commented May 21, 2026

Uh oh!

pytorch-bot Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19715

❗ 1 Active SEVs

Uh oh!

meta-codesync Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

meta-codesync Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot Bot commented May 21, 2026 •

edited

Loading

This PR needs a `release notes:` label