Skip to content

Commit 01236e5

Browse files
committed
fix: refactor run to complete
1 parent 7a3b415 commit 01236e5

File tree

19 files changed

+4428
-393
lines changed

19 files changed

+4428
-393
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
POSTGRES_HOST="localhost"
2+
POSTGRES_DATABASE="swebench"
3+
POSTGRES_USER="swebench"
4+
POSTGRES_PASSWORD="swebench"
5+
POSTGRES_PORT="5432"
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.env.db
Lines changed: 84 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,96 @@
1-
# INSTRUCTIONS
1+
# SWE-bench Agent Runner
22

3-
1. Create a `.env` file in the `swebench_agent_run` directory (codegen-examples/examples/swebench_agent_run) and add your API keys.
3+
Tool for running and evaluating model fixes using SWE-bench.
44

5-
1. cd into the `codegen-examples/examples/swebench_agent_run` directory
5+
## Setup
66

7-
1. Create a `.venv` with `uv venv` and activate it with `source .venv/bin/activate`
7+
1. Using the `.env.template` reference, create a `.env` file in the project root and add your API keys:
88

9-
1. Install the dependencies with `uv pip install .`
9+
```env
10+
OPENAI_API_KEY=your_key_here
11+
MODAL_TOKEN_ID=your_token_id
12+
MODAL_TOKEN_SECRET=your_token_secret
13+
```
1014

11-
1. Install the codegen dependencies with `uv add codegen`
15+
1. Create and activate a virtual environment:
1216

13-
- Note: If you'd like to install the dependencies using the global environment, use `uv pip install -e ../../../` instead of `uv pip install .`. This will allow you to test modifications to the codegen codebase. You will need to run `uv pip install -e ../../../` each time you make changes to the codebase.
17+
```bash
18+
uv venv
19+
source .venv/bin/activate
20+
```
1421

15-
6. Ensure that you have a modal account and profile set up. If you don't have one, you can create one at https://modal.com/
22+
1. Install the package:
1623

17-
1. Activate the appropriate modal profile `python -m modal profile activate <profile_name>`
24+
```bash
25+
# Basic installation
26+
uv pip install -e .
1827

19-
1. Launch the modal app with `python -m modal deploy --env=<env_name> entry_point.py`
28+
# With metrics support
29+
uv pip install -e ".[metrics]"
2030

21-
1. Run the evaluation with `python -m run_eval` with the desired options:
31+
# With development tools
32+
uv pip install -e ".[dev]"
2233

23-
- ```bash
24-
$ python run_eval.py --help
25-
Usage: run_eval.py [OPTIONS]
34+
# Install everything
35+
uv pip install -e ".[all]"
36+
```
2637

27-
Options:
28-
--use-existing-preds TEXT The run ID of the existing predictions to
29-
use.
30-
--dataset [lite|full|verified] The dataset to use.
31-
--length INTEGER The number of examples to process.
32-
--instance-id TEXT The instance ID of the example to process.
33-
--repo TEXT The repo to use.
34-
--help Show this message and exit.
35-
```
38+
1. Set up Modal:
39+
40+
- Create an account at https://modal.com/ if you don't have one
41+
- Activate your Modal profile:
42+
```bash
43+
python -m modal profile activate <profile_name>
44+
```
45+
- Deploy the Modal app:
46+
```bash
47+
python -m modal deploy entry_point.py
48+
```
49+
50+
## Usage
51+
52+
The package provides two main command-line tools:
53+
54+
### Testing SWE CodeAgent
55+
56+
Run the agent on a specific repository:
57+
58+
```bash
59+
# Using the installed command
60+
swe-agent --repo pallets/flask --prompt "Analyze the URL routing system"
61+
62+
# Options
63+
swe-agent --help
64+
Options:
65+
--agent-class [DefaultAgent|CustomAgent] Agent class to use
66+
--repo TEXT Repository to analyze (owner/repo)
67+
--prompt TEXT Prompt for the agent
68+
--help Show this message and exit
69+
```
70+
71+
### Running SWE-Bench Eval
72+
73+
Deploy modal app
74+
75+
```bash
76+
./deploy.sh
77+
```
78+
79+
Run evaluations on model fixes:
80+
81+
```bash
82+
# Using the installed command
83+
swe-eval --dataset lite --length 10
84+
85+
# Options
86+
swe-eval --help
87+
Options:
88+
--use-existing-preds TEXT Run ID of existing predictions
89+
--dataset [lite|full|verified] Dataset to use
90+
--length INTEGER Number of examples to process
91+
--instance-id TEXT Specific instance ID to process
92+
--repo TEXT Specific repo to evaluate
93+
--local Run evaluation locally
94+
--push-metrics Push results to metrics database (Requires additional database environment variables)
95+
--help Show this message and exit
96+
```
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
import click
2+
import modal
3+
from codegen import CodeAgent, Codebase
4+
5+
image = modal.Image.debian_slim(python_version="3.13").apt_install("git").pip_install("codegen")
6+
7+
app = modal.App(
8+
name="codegen-examples",
9+
image=image,
10+
secrets=[modal.Secret.from_dotenv()],
11+
)
12+
13+
14+
@app.function()
15+
def run_agent(repo_name: str, prompt: str) -> bool:
16+
codebase = Codebase.from_repo(repo_full_name=repo_name)
17+
agent = CodeAgent(codebase)
18+
return agent.run(prompt=prompt)
19+
20+
21+
@click.command()
22+
@click.option(
23+
"--repo",
24+
type=str,
25+
default="pallets/flask",
26+
help="The repository to analyze (format: owner/repo)",
27+
)
28+
@click.option(
29+
"--prompt",
30+
type=str,
31+
default="Tell me about the codebase and the files in it.",
32+
help="The prompt to send to the agent",
33+
)
34+
def main(repo: str, prompt: str):
35+
"""Run a codegen agent on a GitHub repository."""
36+
# Import agent class dynamically based on name
37+
38+
click.echo(f"Running on {repo}")
39+
click.echo(f"Prompt: {prompt}")
40+
41+
try:
42+
with app.run():
43+
result = run_agent.remote(repo, prompt)
44+
if result:
45+
click.echo("✅ Analysis completed successfully:")
46+
click.echo(result)
47+
else:
48+
click.echo("❌ Analysis failed")
49+
except Exception as e:
50+
click.echo(f"❌ Error: {str(e)}", err=True)
51+
raise click.Abort()
52+
53+
54+
if __name__ == "__main__":
55+
main()
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#! /bin/bash
2+
3+
uv run modal deploy swebench_agent_run/modal_harness/entry_point.py

0 commit comments

Comments
 (0)