Add LlmGenerationConfig, error handling, and callback contract tests by psiddh · Pull Request #19714 · pytorch/executorch

psiddh · 2026-05-21T07:23:03Z

Summary: Tests the production LlmGenerationConfig builder API, error paths (invalid model, empty prompt, use-after-close), and callback contract (ordering, frequency, JSON schema). Covers OKR 3.1 (E2E) and 3.2 (feature testing).

Differential Revision: D105886676

pytorch-bot · 2026-05-21T07:23:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19714

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

✅ No Failures

As of commit 171fb1b with merge base b4a9e72 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-05-21T07:23:12Z

@psiddh has exported this pull request. If you are a Meta employee, you can view the originating Diff in D105886676.

github-actions · 2026-05-21T07:23:59Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Adds a new Android instrumentation test suite to exercise the public LlmGenerationConfig builder API when calling LlmModule.generate(prompt, config, callback), and to validate key error paths and callback “contract” behaviors for the LLM Android wrapper.

Changes:

Introduces LlmGenerationConfigTest covering config-driven generation (default config, seqLen, echo).
Adds tests for error handling paths (invalid model path, empty prompt behavior, generate-after-close).
Adds callback contract assertions (onResult frequency bounds, onStats JSON schema, callback ordering).

Comments suppressed due to low confidence (3)

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmGenerationConfigTest.kt:95

testEchoModeTrue / testEchoModeFalse assert on generated content (e.g., output contains "Hello" / doesn’t start with the prompt). Other LLM instrumentation tests in this suite explicitly avoid asserting content for the TinyStories fixture because it’s not instruction-tuned and content is not stable. These checks are also vulnerable to false failures when the model happens to generate the same prefix as the prompt even with echo disabled. Consider asserting a structural invariant instead (e.g., compare echo=true vs echo=false outputs from a reset context and verify echo=true output is the prompt prefix + echo=false output).

  @Test(timeout = MAX_TEST_TIMEOUT_MS)
  fun testEchoModeTrue() {
    val config = buildConfig(echo = true)
    val callback = CollectingCallback()
    llmModule.generate(TEST_PROMPT, config, callback)
    val output = callback.results.joinToString("")
    assertTrue(
        "Echo mode should include prompt tokens",
        output.contains("Hello") || output.contains("hello"),
    )
  }

  @Test(timeout = MAX_TEST_TIMEOUT_MS)
  fun testEchoModeFalse() {
    val config = buildConfig(echo = false)
    val callback = CollectingCallback()
    llmModule.generate(TEST_PROMPT, config, callback)
    assertTrue("Should produce output", callback.results.isNotEmpty())
    val output = callback.results.joinToString("")
    assertFalse(
        "With echo=false, output should NOT start with prompt text",
        output.startsWith(TEST_PROMPT),
    )
  }

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmGenerationConfigTest.kt:107

The comment that onResult callbacks are “1:1 with tokens” is inaccurate: the JNI layer buffers tokens until it has valid UTF-8 and may coalesce multiple tokens into a single onResult (see extension/android/jni/jni_layer_llama.cpp token_buffer logic). Using results.size as a token count can therefore be misleading; consider rewording the comment and/or validating seqLen via stats (e.g., generated_tokens) instead of callback count.

    // Note: results.size counts onResult callbacks, which is 1:1 with tokens for LlmModule
    assertTrue("Should produce at least 1 token", callback.results.isNotEmpty())
    assertTrue(
        "Token count (${callback.results.size}) should be <= seqLen ($shortSeqLen)",
        callback.results.size <= shortSeqLen,

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmGenerationConfigTest.kt:240

This file defines a custom assertThrows that catches all Throwable. The project already uses JUnit’s org.junit.Assert.assertThrows in other Android tests, which avoids swallowing Errors and keeps stack traces consistent. Consider switching to the standard JUnit helper and deleting the custom implementation.

  private fun assertThrows(exClass: Class<out Throwable>, block: () -> Unit) {
    try {
      block()
      fail("Expected ${exClass.simpleName} but no exception was thrown")
    } catch (e: Throwable) {
      assertTrue(
          "Expected ${exClass.simpleName} but got ${e.javaClass.simpleName}: ${e.message}",
          exClass.isInstance(e),
      )
    }
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Summary: Tests the production LlmGenerationConfig builder API, error paths (invalid model, empty prompt, use-after-close), and callback contract (ordering, frequency, JSON schema). Covers OKR 3.1 (E2E) and 3.2 (feature testing). Differential Revision: D105886676

meta-codesync · 2026-05-21T07:52:59Z

@psiddh has imported this pull request. If you are a Meta employee, you can view this in D105886676.

Copilot AI review requested due to automatic review settings May 21, 2026 07:23

psiddh requested a review from kirklandsign as a code owner May 21, 2026 07:23

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 21, 2026

meta-codesync Bot added fb-exported meta-exported labels May 21, 2026

Copilot started reviewing on behalf of psiddh May 21, 2026 07:23 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread ...id/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmGenerationConfigTest.kt Outdated

psiddh force-pushed the export-D105886676 branch from f029804 to 171fb1b Compare May 21, 2026 07:47

kirklandsign approved these changes May 21, 2026

View reviewed changes

psiddh merged commit 50efdcb into pytorch:main May 21, 2026
174 of 175 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LlmGenerationConfig, error handling, and callback contract tests#19714

Add LlmGenerationConfig, error handling, and callback contract tests#19714
psiddh merged 1 commit into
pytorch:mainfrom
psiddh:export-D105886676

psiddh commented May 21, 2026

Uh oh!

pytorch-bot Bot commented May 21, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

meta-codesync Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

psiddh commented May 21, 2026

Uh oh!

pytorch-bot Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19714

❗ 1 Active SEVs

✅ No Failures

Uh oh!

meta-codesync Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

meta-codesync Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot Bot commented May 21, 2026 •

edited

Loading

This PR needs a `release notes:` label