Conversation
Allow users to submit a natural language prompt instead of code.
A new POST /ai/{leaderboard}/{gpu}/{mode} endpoint generates kernel
code via Claude API, then feeds it through the existing submission
pipeline for evaluation.
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This pull request adds AI-powered kernel code generation functionality to the KernelBot API. It integrates Claude AI (via the Anthropic API) to generate GPU kernel code from natural language prompts, then automatically submits the generated code through the existing evaluation pipeline.
Changes:
- Added new
POST /ai/{leaderboard}/{gpu}/{mode}endpoint that accepts natural language prompts and generates kernel code via Claude API - Created
ai_generate.pymodule withgenerate_kernel()function that builds context-rich prompts from leaderboard metadata and generates code - Added
anthropicPython dependency andANTHROPIC_API_KEYenvironment variable
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Adds anthropic package v0.84.0 and its dependencies (jiter, distro, docstring-parser) to the lock file |
| src/libkernelbot/ai_generate.py | New module implementing kernel code generation using Claude AI with context from task specs, templates, and test cases |
| src/kernelbot/env.py | Adds ANTHROPIC_API_KEY environment variable configuration |
| src/kernelbot/api/main.py | Implements new AI submission endpoint that generates code and submits it through existing pipeline |
| pyproject.toml | Adds anthropic dependency to project dependencies |
Comments suppressed due to low confidence (11)
src/kernelbot/api/main.py:474
- The prompt content should be sanitized or validated before being sent to the Anthropic API. While the length is checked (max 10000 characters), there's no validation of the prompt content itself. Consider adding checks for potentially malicious content, prompt injection attempts, or rate limiting per user to prevent abuse of the AI API (which likely has associated costs).
prompt = payload.get("prompt")
if not prompt or not isinstance(prompt, str):
raise HTTPException(status_code=400, detail="Missing or invalid 'prompt' in request body")
if len(prompt) > 10000:
raise HTTPException(status_code=400, detail="Prompt too long (max 10000 characters)")
src/kernelbot/api/main.py:468
- The AI submission endpoint should log the request details for observability and debugging, similar to how the regular submission endpoint logs "Received submission request for..." at line 567-568. Add logging at the start of the AI submission handler to track leaderboard name, GPU type, submission mode, and user info for operational visibility.
try:
await simple_rate_limit()
src/kernelbot/api/main.py:468
- The AI generation endpoint uses the existing simple_rate_limit function which allows 10 requests per second globally. However, AI API calls likely have higher costs and potentially different rate limits than regular submissions. Consider implementing a separate, more restrictive rate limiter specifically for AI generation requests to prevent unexpected API costs and to respect Anthropic's rate limits. You may also want to implement per-user rate limiting for AI requests.
await simple_rate_limit()
src/libkernelbot/ai_generate.py:66
- If the regex fails to find a code block (match is None), the function returns the raw response text. However, if the AI response contains explanatory text around the code but no proper code fence, this could result in invalid code being submitted. Consider adding validation after extraction to ensure the extracted code is not empty and contains expected patterns (e.g., function definitions), or require the AI to always use code blocks by being more explicit in the system prompt.
# Extract code from a fenced code block if present
match = re.search(r"```(?:\w+)?\n(.*?)```", raw, re.DOTALL)
code = match.group(1).strip() if match else raw.strip()
src/kernelbot/api/main.py:503
- The error message logged contains the full exception details which may include sensitive information from the Anthropic API response. Consider sanitizing the error message before including it in the HTTPException detail, especially since this error is returned to the client. Log the full error server-side for debugging, but return a more generic error message to the client.
except Exception as e:
logger.error(f"AI generation failed: {e}")
raise HTTPException(status_code=502, detail=f"AI code generation failed: {e}") from e
src/libkernelbot/ai_generate.py:62
- The code assumes the response will have a text attribute at
response.content[0].text, but this could fail if the response structure is different. The Anthropic API could return different content types or an empty content array. Add validation to check that the response has content and that the first content block is of type 'text' before accessing the text attribute.
raw = response.content[0].text
src/kernelbot/api/main.py:528
- The generated code is included in the response body, which could result in very large responses if the AI generates lengthy code. This could cause issues with response size limits or consume significant bandwidth. Consider whether returning the full generated code is necessary, or if it should be optional (e.g., via a query parameter) since the code is already stored in the submission and can be retrieved via the submission ID.
"generated_code": code,
src/libkernelbot/ai_generate.py:69
- The new AI kernel generation functionality lacks test coverage. Given that the codebase has comprehensive tests for other API endpoints (as seen in test_admin_api.py, test_submission.py, etc.), tests should be added for the new generate_kernel function and the run_ai_submission endpoint. Consider adding tests for: successful generation, handling of invalid prompts, missing API key scenarios, AI API failures, and various template/task configurations.
async def generate_kernel(
prompt: str,
task: LeaderboardTask,
description: str,
templates: dict[str, str],
) -> tuple[str, str]:
"""Generate kernel code from a natural language prompt using Claude.
Args:
prompt: The user's natural language description of the kernel to generate.
task: The LeaderboardTask containing file signatures, tests, and config.
description: The leaderboard's problem description.
templates: Template/starter code files keyed by language name.
Returns:
A tuple of (generated_code, file_name).
"""
# Build context from the task
system_parts = [
"You are an expert GPU kernel programmer. Generate code that solves the given problem.",
"Return ONLY the code inside a single code block. No explanation outside the code block.",
]
if description:
system_parts.append(f"## Problem Description\n{description}")
# Include template code so the AI knows the expected function signatures
if templates:
for lang, code in templates.items():
system_parts.append(f"## Template ({lang})\n```\n{code}\n```")
# Include reference/test files for additional context (skip submission placeholder)
for name, content in task.files.items():
if content != "@SUBMISSION@":
system_parts.append(f"## Reference file: {name}\n```\n{content}\n```")
# Include test specs so the AI knows input sizes / shapes
if task.tests:
system_parts.append(f"## Test cases\n{task.tests}")
system_prompt = "\n\n".join(system_parts)
client = anthropic.AsyncAnthropic()
response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=system_prompt,
messages=[{"role": "user", "content": prompt}],
)
raw = response.content[0].text
# Extract code from a fenced code block if present
match = re.search(r"```(?:\w+)?\n(.*?)```", raw, re.DOTALL)
code = match.group(1).strip() if match else raw.strip()
file_name = "submission.py" if task.lang == Language.Python else "submission.cu"
return code, file_name
src/libkernelbot/ai_generate.py:54
- The AsyncAnthropic client is created without passing the API key. While the Anthropic SDK will automatically read from the ANTHROPIC_API_KEY environment variable, the code should explicitly validate that this environment variable is set before making the API call, or pass it explicitly to provide clearer error messages. If the API key is not set, the user would receive a cryptic error from the Anthropic SDK instead of a clear validation error from the application.
client = anthropic.AsyncAnthropic()
src/libkernelbot/ai_generate.py:68
- The file name is hardcoded based on the task language, but this doesn't account for other supported languages. The Language enum in consts.py only has Python and CUDA, but the templates support additional languages like Triton, HIP, and CuteDSL (as seen in task.py line 149). If a task uses one of these other languages, the file name logic will fall back to "submission.cu" which may be incorrect. Consider adding proper handling for all supported languages.
file_name = "submission.py" if task.lang == Language.Python else "submission.cu"
src/kernelbot/env.py:44
- The ANTHROPIC_API_KEY environment variable is defined but not validated in the init_environment function. Unlike other critical environment variables like GITHUB_TOKEN and DISCORD_TOKEN that are validated on startup, this API key is only used when someone calls the AI endpoint. Consider whether this should be a required environment variable that's validated on startup, or if the endpoint should gracefully handle the case where it's not set with a clear error message.
env.ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Vibe coded prototype to see if can just support prompt to kernel submissions directly in the service and pay for everyone' AI credits
We'd still need to likely update the site to support this new API endpoint, we probably don't need to change the popcorn-cli tho
Summary
POST /ai/{leaderboard}/{gpu}/{mode}API endpoint that accepts a natural language prompt, generates kernel code, and submits it through the existing evaluation pipelinesrc/libkernelbot/ai_generate.pymodule withgenerate_kernel()that builds a context-rich prompt from the leaderboard's description, templates, reference files, and test specsanthropicdependency andANTHROPIC_API_KEYenv varTest plan
uv run ruff checkpasses on changed filesuv run pytest tests/ -v)generated_codeand submission IDGET /user/submissions/{id}