Skip to content

Add LlmGenerationConfig, error handling, and callback contract tests#19714

Merged
psiddh merged 1 commit into
pytorch:mainfrom
psiddh:export-D105886676
May 21, 2026
Merged

Add LlmGenerationConfig, error handling, and callback contract tests#19714
psiddh merged 1 commit into
pytorch:mainfrom
psiddh:export-D105886676

Conversation

@psiddh
Copy link
Copy Markdown
Contributor

@psiddh psiddh commented May 21, 2026

Summary: Tests the production LlmGenerationConfig builder API, error paths (invalid model, empty prompt, use-after-close), and callback contract (ordering, frequency, JSON schema). Covers OKR 3.1 (E2E) and 3.2 (feature testing).

Differential Revision: D105886676

Copilot AI review requested due to automatic review settings May 21, 2026 07:23
@psiddh psiddh requested a review from kirklandsign as a code owner May 21, 2026 07:23
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 21, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19714

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 171fb1b with merge base b4a9e72 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 21, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 21, 2026

@psiddh has exported this pull request. If you are a Meta employee, you can view the originating Diff in D105886676.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Android instrumentation test suite to exercise the public LlmGenerationConfig builder API when calling LlmModule.generate(prompt, config, callback), and to validate key error paths and callback “contract” behaviors for the LLM Android wrapper.

Changes:

  • Introduces LlmGenerationConfigTest covering config-driven generation (default config, seqLen, echo).
  • Adds tests for error handling paths (invalid model path, empty prompt behavior, generate-after-close).
  • Adds callback contract assertions (onResult frequency bounds, onStats JSON schema, callback ordering).
Comments suppressed due to low confidence (3)

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmGenerationConfigTest.kt:95

  • testEchoModeTrue / testEchoModeFalse assert on generated content (e.g., output contains "Hello" / doesn’t start with the prompt). Other LLM instrumentation tests in this suite explicitly avoid asserting content for the TinyStories fixture because it’s not instruction-tuned and content is not stable. These checks are also vulnerable to false failures when the model happens to generate the same prefix as the prompt even with echo disabled. Consider asserting a structural invariant instead (e.g., compare echo=true vs echo=false outputs from a reset context and verify echo=true output is the prompt prefix + echo=false output).
  @Test(timeout = MAX_TEST_TIMEOUT_MS)
  fun testEchoModeTrue() {
    val config = buildConfig(echo = true)
    val callback = CollectingCallback()
    llmModule.generate(TEST_PROMPT, config, callback)
    val output = callback.results.joinToString("")
    assertTrue(
        "Echo mode should include prompt tokens",
        output.contains("Hello") || output.contains("hello"),
    )
  }

  @Test(timeout = MAX_TEST_TIMEOUT_MS)
  fun testEchoModeFalse() {
    val config = buildConfig(echo = false)
    val callback = CollectingCallback()
    llmModule.generate(TEST_PROMPT, config, callback)
    assertTrue("Should produce output", callback.results.isNotEmpty())
    val output = callback.results.joinToString("")
    assertFalse(
        "With echo=false, output should NOT start with prompt text",
        output.startsWith(TEST_PROMPT),
    )
  }

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmGenerationConfigTest.kt:107

  • The comment that onResult callbacks are “1:1 with tokens” is inaccurate: the JNI layer buffers tokens until it has valid UTF-8 and may coalesce multiple tokens into a single onResult (see extension/android/jni/jni_layer_llama.cpp token_buffer logic). Using results.size as a token count can therefore be misleading; consider rewording the comment and/or validating seqLen via stats (e.g., generated_tokens) instead of callback count.
    // Note: results.size counts onResult callbacks, which is 1:1 with tokens for LlmModule
    assertTrue("Should produce at least 1 token", callback.results.isNotEmpty())
    assertTrue(
        "Token count (${callback.results.size}) should be <= seqLen ($shortSeqLen)",
        callback.results.size <= shortSeqLen,

extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmGenerationConfigTest.kt:240

  • This file defines a custom assertThrows that catches all Throwable. The project already uses JUnit’s org.junit.Assert.assertThrows in other Android tests, which avoids swallowing Errors and keeps stack traces consistent. Consider switching to the standard JUnit helper and deleting the custom implementation.
  private fun assertThrows(exClass: Class<out Throwable>, block: () -> Unit) {
    try {
      block()
      fail("Expected ${exClass.simpleName} but no exception was thrown")
    } catch (e: Throwable) {
      assertTrue(
          "Expected ${exClass.simpleName} but got ${e.javaClass.simpleName}: ${e.message}",
          exClass.isInstance(e),
      )
    }
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Summary: Tests the production LlmGenerationConfig builder API, error paths (invalid model, empty prompt, use-after-close), and callback contract (ordering, frequency, JSON schema). Covers OKR 3.1 (E2E) and 3.2 (feature testing).

Differential Revision: D105886676
@psiddh psiddh force-pushed the export-D105886676 branch from f029804 to 171fb1b Compare May 21, 2026 07:47
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 21, 2026

@psiddh has imported this pull request. If you are a Meta employee, you can view this in D105886676.

@psiddh psiddh merged commit 50efdcb into pytorch:main May 21, 2026
174 of 175 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants