Add AI kernel generation endpoint by msaroufim · Pull Request #449 · gpu-mode/kernelbot

msaroufim · 2026-02-26T01:45:20Z

Vibe coded prototype to see if can just support prompt to kernel submissions directly in the service and pay for everyone' AI credits

We'd still need to likely update the site to support this new API endpoint, we probably don't need to change the popcorn-cli tho

Summary

Adds a POST /ai/{leaderboard}/{gpu}/{mode} API endpoint that accepts a natural language prompt, generates kernel code, and submits it through the existing evaluation pipeline
New src/libkernelbot/ai_generate.py module with generate_kernel() that builds a context-rich prompt from the leaderboard's description, templates, reference files, and test specs
Adds anthropic dependency and ANTHROPIC_API_KEY env var

Test plan

Verify uv run ruff check passes on changed files
Verify existing tests still pass (uv run pytest tests/ -v)

Manual test with local API server:

curl -X POST "http://localhost:8000/ai/vectoradd_v2/H100/test" \
  -H "X-Popcorn-Cli-Id: test-cli-id-123" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a simple vectoradd kernel using PyTorch"}'

Verify response includes generated_code and submission ID
Check submission status via GET /user/submissions/{id}

Allow users to submit a natural language prompt instead of code. A new POST /ai/{leaderboard}/{gpu}/{mode} endpoint generates kernel code via Claude API, then feeds it through the existing submission pipeline for evaluation.

github-actions · 2026-02-26T01:46:23Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
src/libkernelbot
ai_generate.py					30-71
utils.py
Project Total

_{This report was generated by python-coverage-comment-action}

Copilot

Pull request overview

This pull request adds AI-powered kernel code generation functionality to the KernelBot API. It integrates Claude AI (via the Anthropic API) to generate GPU kernel code from natural language prompts, then automatically submits the generated code through the existing evaluation pipeline.

Changes:

Added new POST /ai/{leaderboard}/{gpu}/{mode} endpoint that accepts natural language prompts and generates kernel code via Claude API
Created ai_generate.py module with generate_kernel() function that builds context-rich prompts from leaderboard metadata and generates code
Added anthropic Python dependency and ANTHROPIC_API_KEY environment variable

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
uv.lock	Adds anthropic package v0.84.0 and its dependencies (jiter, distro, docstring-parser) to the lock file
src/libkernelbot/ai_generate.py	New module implementing kernel code generation using Claude AI with context from task specs, templates, and test cases
src/kernelbot/env.py	Adds ANTHROPIC_API_KEY environment variable configuration
src/kernelbot/api/main.py	Implements new AI submission endpoint that generates code and submits it through existing pipeline
pyproject.toml	Adds anthropic dependency to project dependencies

Comments suppressed due to low confidence (11)

src/kernelbot/api/main.py:474

The prompt content should be sanitized or validated before being sent to the Anthropic API. While the length is checked (max 10000 characters), there's no validation of the prompt content itself. Consider adding checks for potentially malicious content, prompt injection attempts, or rate limiting per user to prevent abuse of the AI API (which likely has associated costs).

        prompt = payload.get("prompt")
        if not prompt or not isinstance(prompt, str):
            raise HTTPException(status_code=400, detail="Missing or invalid 'prompt' in request body")
        if len(prompt) > 10000:
            raise HTTPException(status_code=400, detail="Prompt too long (max 10000 characters)")

src/kernelbot/api/main.py:468

The AI submission endpoint should log the request details for observability and debugging, similar to how the regular submission endpoint logs "Received submission request for..." at line 567-568. Add logging at the start of the AI submission handler to track leaderboard name, GPU type, submission mode, and user info for operational visibility.

    try:
        await simple_rate_limit()

src/kernelbot/api/main.py:468

The AI generation endpoint uses the existing simple_rate_limit function which allows 10 requests per second globally. However, AI API calls likely have higher costs and potentially different rate limits than regular submissions. Consider implementing a separate, more restrictive rate limiter specifically for AI generation requests to prevent unexpected API costs and to respect Anthropic's rate limits. You may also want to implement per-user rate limiting for AI requests.

        await simple_rate_limit()

src/libkernelbot/ai_generate.py:66

If the regex fails to find a code block (match is None), the function returns the raw response text. However, if the AI response contains explanatory text around the code but no proper code fence, this could result in invalid code being submitted. Consider adding validation after extraction to ensure the extracted code is not empty and contains expected patterns (e.g., function definitions), or require the AI to always use code blocks by being more explicit in the system prompt.

    # Extract code from a fenced code block if present
    match = re.search(r"```(?:\w+)?\n(.*?)```", raw, re.DOTALL)
    code = match.group(1).strip() if match else raw.strip()

src/kernelbot/api/main.py:503

The error message logged contains the full exception details which may include sensitive information from the Anthropic API response. Consider sanitizing the error message before including it in the HTTPException detail, especially since this error is returned to the client. Log the full error server-side for debugging, but return a more generic error message to the client.

        except Exception as e:
            logger.error(f"AI generation failed: {e}")
            raise HTTPException(status_code=502, detail=f"AI code generation failed: {e}") from e

src/libkernelbot/ai_generate.py:62

The code assumes the response will have a text attribute at response.content[0].text, but this could fail if the response structure is different. The Anthropic API could return different content types or an empty content array. Add validation to check that the response has content and that the first content block is of type 'text' before accessing the text attribute.

    raw = response.content[0].text

src/kernelbot/api/main.py:528

The generated code is included in the response body, which could result in very large responses if the AI generates lengthy code. This could cause issues with response size limits or consume significant bandwidth. Consider whether returning the full generated code is necessary, or if it should be optional (e.g., via a query parameter) since the code is already stored in the submission and can be retrieved via the submission ID.

                "generated_code": code,

src/libkernelbot/ai_generate.py:69

The new AI kernel generation functionality lacks test coverage. Given that the codebase has comprehensive tests for other API endpoints (as seen in test_admin_api.py, test_submission.py, etc.), tests should be added for the new generate_kernel function and the run_ai_submission endpoint. Consider adding tests for: successful generation, handling of invalid prompts, missing API key scenarios, AI API failures, and various template/task configurations.

async def generate_kernel(
    prompt: str,
    task: LeaderboardTask,
    description: str,
    templates: dict[str, str],
) -> tuple[str, str]:
    """Generate kernel code from a natural language prompt using Claude.

    Args:
        prompt: The user's natural language description of the kernel to generate.
        task: The LeaderboardTask containing file signatures, tests, and config.
        description: The leaderboard's problem description.
        templates: Template/starter code files keyed by language name.

    Returns:
        A tuple of (generated_code, file_name).
    """
    # Build context from the task
    system_parts = [
        "You are an expert GPU kernel programmer. Generate code that solves the given problem.",
        "Return ONLY the code inside a single code block. No explanation outside the code block.",
    ]

    if description:
        system_parts.append(f"## Problem Description\n{description}")

    # Include template code so the AI knows the expected function signatures
    if templates:
        for lang, code in templates.items():
            system_parts.append(f"## Template ({lang})\n```\n{code}\n```")

    # Include reference/test files for additional context (skip submission placeholder)
    for name, content in task.files.items():
        if content != "@SUBMISSION@":
            system_parts.append(f"## Reference file: {name}\n```\n{content}\n```")

    # Include test specs so the AI knows input sizes / shapes
    if task.tests:
        system_parts.append(f"## Test cases\n{task.tests}")

    system_prompt = "\n\n".join(system_parts)

    client = anthropic.AsyncAnthropic()
    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system=system_prompt,
        messages=[{"role": "user", "content": prompt}],
    )

    raw = response.content[0].text

    # Extract code from a fenced code block if present
    match = re.search(r"```(?:\w+)?\n(.*?)```", raw, re.DOTALL)
    code = match.group(1).strip() if match else raw.strip()

    file_name = "submission.py" if task.lang == Language.Python else "submission.cu"
    return code, file_name

src/libkernelbot/ai_generate.py:54

The AsyncAnthropic client is created without passing the API key. While the Anthropic SDK will automatically read from the ANTHROPIC_API_KEY environment variable, the code should explicitly validate that this environment variable is set before making the API call, or pass it explicitly to provide clearer error messages. If the API key is not set, the user would receive a cryptic error from the Anthropic SDK instead of a clear validation error from the application.

    client = anthropic.AsyncAnthropic()

src/libkernelbot/ai_generate.py:68

The file name is hardcoded based on the task language, but this doesn't account for other supported languages. The Language enum in consts.py only has Python and CUDA, but the templates support additional languages like Triton, HIP, and CuteDSL (as seen in task.py line 149). If a task uses one of these other languages, the file name logic will fall back to "submission.cu" which may be incorrect. Consider adding proper handling for all supported languages.

    file_name = "submission.py" if task.lang == Language.Python else "submission.cu"

src/kernelbot/env.py:44

The ANTHROPIC_API_KEY environment variable is defined but not validated in the init_environment function. Unlike other critical environment variables like GITHUB_TOKEN and DISCORD_TOKEN that are validated on startup, this API key is only used when someone calls the AI endpoint. Consider whether this should be a required environment variable that's validated on startup, or if the endpoint should gracefully handle the case where it's not set with a clear error message.

env.ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add AI kernel generation endpoint

34f6184

Allow users to submit a natural language prompt instead of code. A new POST /ai/{leaderboard}/{gpu}/{mode} endpoint generates kernel code via Claude API, then feeds it through the existing submission pipeline for evaluation.

Copilot AI review requested due to automatic review settings February 26, 2026 01:45

Copilot started reviewing on behalf of msaroufim February 26, 2026 01:45 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Switch AI generation from Anthropic to OpenAI (o3)

fb1f6f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AI kernel generation endpoint#449

Add AI kernel generation endpoint#449
msaroufim wants to merge 2 commits intomainfrom
ai-kernel-generation

msaroufim commented Feb 26, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

msaroufim commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

msaroufim commented Feb 26, 2026 •

edited

Loading

github-actions bot commented Feb 26, 2026 •

edited

Loading