|
1 | | -# INSTRUCTIONS |
| 1 | +# SWE-bench Agent Runner |
2 | 2 |
|
3 | | -1. Create a `.env` file in the `swebench_agent_run` directory (codegen-examples/examples/swebench_agent_run) and add your API keys. |
| 3 | +Tool for running and evaluating model fixes using SWE-bench. |
4 | 4 |
|
5 | | -1. cd into the `codegen-examples/examples/swebench_agent_run` directory |
| 5 | +## Setup |
6 | 6 |
|
7 | | -1. Create a `.venv` with `uv venv` and activate it with `source .venv/bin/activate` |
| 7 | +1. Using the `.env.template` reference, create a `.env` file in the project root and add your API keys: |
8 | 8 |
|
9 | | -1. Install the dependencies with `uv pip install .` |
| 9 | + ```env |
| 10 | + OPENAI_API_KEY=your_key_here |
| 11 | + MODAL_TOKEN_ID=your_token_id |
| 12 | + MODAL_TOKEN_SECRET=your_token_secret |
| 13 | + ``` |
10 | 14 |
|
11 | | -1. Install the codegen dependencies with `uv add codegen` |
| 15 | +1. Create and activate a virtual environment: |
12 | 16 |
|
13 | | -- Note: If you'd like to install the dependencies using the global environment, use `uv pip install -e ../../../` instead of `uv pip install .`. This will allow you to test modifications to the codegen codebase. You will need to run `uv pip install -e ../../../` each time you make changes to the codebase. |
| 17 | + ```bash |
| 18 | + uv venv |
| 19 | + source .venv/bin/activate |
| 20 | + ``` |
14 | 21 |
|
15 | | -6. Ensure that you have a modal account and profile set up. If you don't have one, you can create one at https://modal.com/ |
| 22 | +1. Install the package: |
16 | 23 |
|
17 | | -1. Activate the appropriate modal profile `python -m modal profile activate <profile_name>` |
| 24 | + ```bash |
| 25 | + # Basic installation |
| 26 | + uv pip install -e . |
18 | 27 |
|
19 | | -1. Launch the modal app with `python -m modal deploy --env=<env_name> entry_point.py` |
| 28 | + # With metrics support |
| 29 | + uv pip install -e ".[metrics]" |
20 | 30 |
|
21 | | -1. Run the evaluation with `python -m run_eval` with the desired options: |
| 31 | + # With development tools |
| 32 | + uv pip install -e ".[dev]" |
22 | 33 |
|
23 | | -- ```bash |
24 | | - $ python run_eval.py --help |
25 | | - Usage: run_eval.py [OPTIONS] |
| 34 | + # Install everything |
| 35 | + uv pip install -e ".[all]" |
| 36 | + ``` |
26 | 37 |
|
27 | | - Options: |
28 | | - --use-existing-preds TEXT The run ID of the existing predictions to |
29 | | - use. |
30 | | - --dataset [lite|full|verified] The dataset to use. |
31 | | - --length INTEGER The number of examples to process. |
32 | | - --instance-id TEXT The instance ID of the example to process. |
33 | | - --repo TEXT The repo to use. |
34 | | - --help Show this message and exit. |
35 | | - ``` |
| 38 | +1. Set up Modal: |
| 39 | + |
| 40 | + - Create an account at https://modal.com/ if you don't have one |
| 41 | + - Activate your Modal profile: |
| 42 | + ```bash |
| 43 | + python -m modal profile activate <profile_name> |
| 44 | + ``` |
| 45 | + - Deploy the Modal app: |
| 46 | + ```bash |
| 47 | + python -m modal deploy entry_point.py |
| 48 | + ``` |
| 49 | + |
| 50 | +## Usage |
| 51 | + |
| 52 | +The package provides two main command-line tools: |
| 53 | + |
| 54 | +### Testing SWE CodeAgent |
| 55 | + |
| 56 | +Run the agent on a specific repository: |
| 57 | + |
| 58 | +```bash |
| 59 | +# Using the installed command |
| 60 | +swe-agent --repo pallets/flask --prompt "Analyze the URL routing system" |
| 61 | +
|
| 62 | +# Options |
| 63 | +swe-agent --help |
| 64 | +Options: |
| 65 | + --agent-class [DefaultAgent|CustomAgent] Agent class to use |
| 66 | + --repo TEXT Repository to analyze (owner/repo) |
| 67 | + --prompt TEXT Prompt for the agent |
| 68 | + --help Show this message and exit |
| 69 | +``` |
| 70 | + |
| 71 | +### Running SWE-Bench Eval |
| 72 | + |
| 73 | +Deploy modal app |
| 74 | + |
| 75 | +```bash |
| 76 | +./deploy.sh |
| 77 | +``` |
| 78 | + |
| 79 | +Run evaluations on model fixes: |
| 80 | + |
| 81 | +```bash |
| 82 | +# Using the installed command |
| 83 | +swe-eval --dataset lite --length 10 |
| 84 | +
|
| 85 | +# Options |
| 86 | +swe-eval --help |
| 87 | +Options: |
| 88 | + --use-existing-preds TEXT Run ID of existing predictions |
| 89 | + --dataset [lite|full|verified] Dataset to use |
| 90 | + --length INTEGER Number of examples to process |
| 91 | + --instance-id TEXT Specific instance ID to process |
| 92 | + --repo TEXT Specific repo to evaluate |
| 93 | + --local Run evaluation locally |
| 94 | + --push-metrics Push results to metrics database (Requires additional database environment variables) |
| 95 | + --help Show this message and exit |
| 96 | +``` |
0 commit comments