Skip to content

Add LlmModule thread safety instrumentation tests#19715

Merged
psiddh merged 1 commit into
pytorch:mainfrom
psiddh:export-D105886777
May 21, 2026
Merged

Add LlmModule thread safety instrumentation tests#19715
psiddh merged 1 commit into
pytorch:mainfrom
psiddh:export-D105886777

Conversation

@psiddh
Copy link
Copy Markdown
Contributor

@psiddh psiddh commented May 21, 2026

Summary: -

Differential Revision: D105886777

Copilot AI review requested due to automatic review settings May 21, 2026 07:24
@psiddh psiddh requested a review from kirklandsign as a code owner May 21, 2026 07:24
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 21, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19715

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 21, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 21, 2026

@psiddh has exported this pull request. If you are a Meta employee, you can view the originating Diff in D105886777.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Android instrumentation tests that exercise the documented concurrency/lifecycle contract of org.pytorch.executorch.extension.llm.LlmModule, focusing on concurrent generate() calls, cross-thread stop(), and close() idempotency.

Changes:

  • Introduces a new LlmThreadSafetyTest androidTest suite for LlmModule.
  • Adds coverage for concurrent generation behavior, stop signaling from another thread, idle-stop behavior, and use-after-close semantics.
Comments suppressed due to low confidence (2)

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:60

  • Same issue as above for the tokenizer fixture: if the resource stream is null, .use {} will NPE before the requireNotNull message is evaluated. Use the requireNotNull(getResourceAsStream(...)) { ... }.use { ... } pattern to get a clear failure when android_test_setup.sh hasn’t run.
    javaClass.getResourceAsStream(TOKENIZER_FILE_NAME).use { stream ->
      requireNotNull(stream) {
        "Test resource $TOKENIZER_FILE_NAME not found; did android_test_setup.sh run?"
      }
      FileUtils.copyInputStreamToFile(stream, tokenizerFile)
    }

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:199

  • thresholdReached.await() / generateDone.await() are unbounded waits. If token callbacks never arrive (e.g., model load failure), this will hang until the overall test timeout. Using await(timeout, unit) with an assert (and possibly calling llmModule.stop() on timeout) makes failures faster and helps avoid orphaned generate threads.
    // Wait for exactly TOKEN_THRESHOLD tokens, then signal stop
    thresholdReached.await()
    llmModule.stop()

    // Wait for generate() to return
    generateDone.await()


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@psiddh psiddh force-pushed the export-D105886777 branch from f2e80c4 to f656c1d Compare May 21, 2026 07:32
Summary: -

Differential Revision: D105886777
Copilot AI review requested due to automatic review settings May 21, 2026 07:45
@psiddh psiddh force-pushed the export-D105886777 branch from f656c1d to eaa081d Compare May 21, 2026 07:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (4)

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:118

  • The test uses several 20s awaits (e.g., waiting for the first token) and a 30s method timeout. Existing LLM instrumentation/perf tests allow much longer for time-to-first-token on emulator (often up to ~30s) and overall test wall-time (minutes). To avoid CI flakes, consider bumping these awaits/timeouts (or making them configurable via instrumentation args) so they can accommodate slow environments.
        assertTrue(
            "Thread A did not produce a token in time",
            threadAProducedToken.await(20, TimeUnit.SECONDS),
        )

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:100

  • testConcurrentGenerateThrowsOrSerializes() intends to guarantee Thread A is still holding the lock when Thread B calls generate(), but the callback only counts down a latch and does not block generation. On fast devices, Thread A could finish soon after the first token, so Thread B may never contend for the lock and the test won’t actually validate serialization behavior. Consider adding a second latch to pause Thread A’s callback after the first token until Thread B has entered generate() (or until Thread B signals it is blocked), then release Thread A to complete.
              override fun onResult(result: String) {
                threadATokens.incrementAndGet()
                threadAProducedToken.countDown()
              }

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:136

  • Catching RuntimeException here is too broad and can mask real bugs (e.g., NullPointerException) as an “expected” outcome. LlmModule.generate() failures are expected to surface as ExecutorchRuntimeException (a RuntimeException subclass) or IllegalStateException; consider catching ExecutorchRuntimeException explicitly instead of all RuntimeException so unexpected runtime errors still fail the test.
      } catch (_: RuntimeException) {
        // Valid: serialized second generate() may fail (e.g., dirty KV cache state)
        threadBRejected.set(true)
      } catch (e: Exception) {

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmThreadSafetyTest.kt:182

  • testStopFromDifferentThread() can be timing-flaky: after thresholdReached is counted down, generation continues running while the main thread wakes up and calls stop(), so on a fast device it’s possible to reach LONG_SEQ_LEN before stop() is issued, violating the final assertion. To make the contract deterministic, consider pausing generation at the threshold (e.g., have onResult await a “stopIssued” latch after counting down thresholdReached) so stop() is guaranteed to happen before additional tokens are produced.
              override fun onResult(result: String) {
                if (tokensReceived.incrementAndGet() == TOKEN_THRESHOLD) {
                  thresholdReached.countDown()
                }
              }

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 21, 2026

@psiddh has imported this pull request. If you are a Meta employee, you can view this in D105886777.

@psiddh psiddh merged commit ab8fcba into pytorch:main May 21, 2026
177 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants