Skip to content

Commit 316d173

Browse files
add dynamic timeout
1 parent aeba623 commit 316d173

12 files changed

Lines changed: 361 additions & 16 deletions

File tree

AGENTS.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# AGENTS.md
2+
3+
Guide for AI agents working on the CodeEvolve codebase.
4+
5+
## Project Overview
6+
7+
CodeEvolve is an LLM-driven evolutionary algorithm framework that discovers and optimizes code solutions. Programs evolve across distributed islands via LLM-generated SEARCH/REPLACE diffs, evaluated in sandboxed environments.
8+
9+
## Repository Layout
10+
11+
```
12+
src/codeevolve/
13+
cli.py # CLI entry point, arg parsing, island spawning
14+
runner.py # Process management, signal handling, log daemon
15+
evolution.py # Main evolutionary loop and component setup
16+
database.py # Program dataclass, ProgramDatabase, MAP-Elites
17+
evaluator.py # Sandboxed execution with time/memory limits
18+
scheduler.py # Exploration rate schedulers
19+
lm/ # LLM interfaces (base.py, openai.py)
20+
islands/ # Topology (graph.py), sync (sync.py), migration (migration.py)
21+
prompt/ # Prompt templates (template.py), conversation builder (sampler.py)
22+
utils/ # Constants, checkpointing, config, diff parsing, logging, locking
23+
tests/ # pytest suite mirroring src/ structure
24+
configs/ # Example YAML configs (mock, qwen, gemini)
25+
problems/ # Benchmark problems and problem_template
26+
```
27+
28+
## Build & Test
29+
30+
- **Python**: >=3.13.5, managed via conda (`environment.yml`)
31+
- **Install**: `conda env create -f environment.yml && conda activate codeevolve`
32+
- **Run tests**: `pytest tests/` (use `pytest tests/ -v` for verbose)
33+
- **Formatting**: `black` (line-length 100, target py313) and `isort` (profile "black")
34+
- **Async tests**: use `@pytest.mark.asyncio` decorator; pytest-asyncio is configured in strict mode
35+
36+
## Code Conventions
37+
38+
- **Type hints**: use throughout; prefer `X | None` over `Optional[X]` for new code
39+
- **Docstrings**: Google-style with Args/Returns/Raises sections
40+
- **File headers**: each source file starts with the Apache-2.0 license block
41+
- **Constants**: centralized in `utils/constants.py`; do not scatter magic values
42+
- **Dataclasses**: `Program` is a dataclass in `database.py`; `depth` field tracks evolutionary lineage depth
43+
- **Config**: YAML-based; top-level keys (`EVAL_TIMEOUT`, `SYS_MSG`, etc.) and `EVOLVE_CONFIG` dict for evolutionary parameters
44+
45+
## Key Patterns
46+
47+
### Evolutionary Loop (`evolution.py`)
48+
49+
The loop in `codeevolve_loop()` delegates to focused helper functions:
50+
1. `select_parents()` — selection phase
51+
2. `run_meta_prompting()` — optional prompt evolution
52+
3. `generate_solution()` — LLM-driven code generation
53+
4. `evaluate_and_store()` — sandboxed evaluation
54+
5. `handle_migration()` — inter-island exchange
55+
56+
Component setup lives in `setup_codeevolve_components()` and `_create_*` factory functions.
57+
58+
### Prompt Assembly (`prompt/`)
59+
60+
System messages are assembled in `sampler.py:build()` as:
61+
`[user SYS_MSG] + [eval_budget] + [task template]`
62+
63+
Task templates live in `template.py` as composable sections (`_CORE_RULES`, `_MODIFICATION_FORMAT`, etc.) assembled by `get_*_task_template()` factory functions.
64+
65+
### Evaluator (`evaluator.py`)
66+
67+
The `Evaluator` class runs programs via subprocess with timeout and optional memory monitoring. The `execute()` method takes a `Program` and returns `(returncode, output, warning, error, eval_metrics)`.
68+
69+
### Mock Models
70+
71+
Set `model_name` to `"MOCK"` in config to use `MockOpenAILM` for testing without API calls. The mock returns identity SEARCH/REPLACE diffs.
72+
73+
## Testing Guidelines
74+
75+
- Mirror the module structure: `test_<module>.py`
76+
- Use class-based test organization (`class TestFeatureName`)
77+
- Prefix helpers with `_make_*` for fixture-like factory methods
78+
- Test both success and error/edge cases
79+
- For async functions, use `@pytest.mark.asyncio`

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@
4444

4545
<div align="center">
4646
<p align="center">
47-
<img src="https://img.shields.io/badge/version-v0.3-green" alt="v0.3"></a>
47+
<img src="https://img.shields.io/badge/version-v0.3-green" alt="v0.3.1"></a>
4848
<a href="https://arxiv.org/abs/2510.14150"><img src="https://img.shields.io/badge/arxiv-2510.14150-red" alt="Arxiv"></a>
4949
<a href="https://github.com/inter-co/science-codeevolve/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache--2.0-blue.svg" alt="License"></a>
5050
</p>

configs/config_mock.yaml

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,14 @@ SYS_MSG: |
88
1. **benchmark_ratio**: fitness of the solution (higher is better)
99
2. **eval_time**: Execution time in seconds (keep reasonable)
1010
11-
COMPUTATIONAL BUDGET:
12-
- **Time limit**: 60 seconds maximum execution time
13-
- **Memory limit**: 1 GB
14-
1511
# PROMPT-BLOCK-END
1612
1713
CODEBASE_PATH: 'src/'
1814
INIT_FILE_DATA: {filename: 'init_program.py', language: 'python'}
1915
EVAL_FILE_NAME: 'evaluate.py'
20-
EVAL_TIMEOUT: 60
16+
EVAL_TIMEOUT: 60
17+
EVAL_MIN_TIMEOUT: 5
18+
DYNAMIC_EVAL_TIMEOUT: true
2119

2220
MAX_MEM_BYTES: 1000000000
2321
MEM_CHECK_INTERVAL_S: 0.1

src/codeevolve/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@
99
# This file initializes the main module for CodeEvolve.
1010
#
1111
# ===--------------------------------------------------------------------------------------===#
12-
__version__ = "0.3"
12+
__version__ = "0.3.1"

src/codeevolve/evaluator.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ def __repr__(self):
195195
)
196196

197197
def execute(
198-
self, prog: Program
198+
self, prog: Program, timeout_s: Optional[int] = None
199199
) -> Tuple[int, Optional[str], Optional[str], Optional[str], Dict[str, Any]]:
200200
"""Executes a program and updates it with execution results and metrics.
201201
@@ -211,6 +211,8 @@ def execute(
211211
prog: Program object containing the code to execute. This object will be
212212
modified in-place with execution results including returncode, error
213213
messages, and evaluation metrics.
214+
timeout_s: Optional timeout override in seconds. When provided,
215+
this value is used instead of ``self.timeout_s``.
214216
215217
Returns:
216218
returncode: Exit code of the program (0 for success)
@@ -219,7 +221,10 @@ def execute(
219221
error: String with stderr
220222
eval_metrics: Dictionary of evaluation metrics if successful
221223
"""
222-
self.logger.info("Attempting to evaluate program...")
224+
effective_timeout: int = timeout_s if timeout_s is not None else self.timeout_s
225+
self.logger.info(
226+
f"Attempting to evaluate program (depth={prog.depth}, timeout={effective_timeout}s)..."
227+
)
223228

224229
extension: str = LANGUAGE_TO_EXTENSION.get(prog.language, DEFAULT_EXTENSION)
225230
returncode: int = 1
@@ -297,7 +302,7 @@ def execute(
297302
mem_monitor_daemon.start()
298303

299304
try:
300-
stdout, stderr = process.communicate(timeout=self.timeout_s)
305+
stdout, stderr = process.communicate(timeout=effective_timeout)
301306
kill_flag.set()
302307
if mem_monitor_daemon is not None:
303308
mem_monitor_daemon.join(timeout=1)
@@ -328,7 +333,7 @@ def execute(
328333
process.communicate(timeout=1)
329334
except Exception:
330335
pass
331-
error = f"TimeoutError: Evaluation time usage exceeded maximum time limit of {self.timeout_s} seconds."
336+
error = f"TimeoutError: Evaluation time usage exceeded maximum time limit of {effective_timeout} seconds."
332337

333338
except Exception as err:
334339
self.logger.error(f"Unexpected error during evaluation: {err}")

src/codeevolve/evolution.py

Lines changed: 68 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,13 @@
2727
from codeevolve.islands.sync import GlobalSyncData
2828
from codeevolve.lm.openai import OpenAIEmbedding, OpenAIEnsemble
2929
from codeevolve.prompt.sampler import PromptSampler, format_prog_msg
30+
from codeevolve.prompt.template import format_eval_budget
3031
from codeevolve.scheduler import SCHEDULER_TYPES, ExplorationRateScheduler
3132
from codeevolve.utils.ckpt import load_ckpt, save_ckpt, save_run_metadata
3233
from codeevolve.utils.constants import (
3334
BEST_PROMPT_FILE,
3435
BEST_SOLUTION_FILE,
36+
DEFAULT_EVAL_MIN_TIMEOUT_S,
3537
DEFAULT_EVAL_TIMEOUT_S,
3638
DEFAULT_EVOLVE_END_MARKER,
3739
DEFAULT_EVOLVE_START_MARKER,
@@ -47,6 +49,34 @@
4749
from codeevolve.utils.logging import get_elapsed_time, get_logger
4850
from codeevolve.utils.parsing import apply_diff
4951

52+
# ---------------------------------------------------------------------------
53+
# Dynamic timeout
54+
# ---------------------------------------------------------------------------
55+
56+
57+
def compute_dynamic_timeout(
58+
depth: int,
59+
max_timeout_s: int,
60+
min_timeout_s: int,
61+
max_depth: int,
62+
) -> int:
63+
"""Computes the effective timeout for a program based on its depth.
64+
65+
Scales linearly from ``min_timeout_s`` at depth 0 up to
66+
``max_timeout_s`` at depth >= ``max_depth``.
67+
68+
Args:
69+
depth: Evolutionary depth of the program.
70+
max_timeout_s: Maximum timeout in seconds (upper bound).
71+
min_timeout_s: Minimum timeout in seconds (floor at depth 0).
72+
max_depth: Reference depth at which the full timeout is reached.
73+
74+
Returns:
75+
Effective timeout in seconds, always in [min_timeout_s, max_timeout_s].
76+
"""
77+
ratio: float = min(depth / max_depth, 1.0)
78+
return max(min_timeout_s, int(min_timeout_s + (max_timeout_s - min_timeout_s) * ratio))
79+
5080
# ---------------------------------------------------------------------------
5181
# Helpers
5282
# ---------------------------------------------------------------------------
@@ -352,6 +382,7 @@ async def generate_solution(
352382
chat_depth: Optional[int],
353383
exploitation: bool,
354384
logger: logging.Logger,
385+
eval_budget: Optional[str] = None,
355386
) -> Tuple[Optional[Program], bool]:
356387
"""
357388
Generate a new solution program by querying an LLM ensemble with structured context.
@@ -383,6 +414,8 @@ async def generate_solution(
383414
evolve_state: State dictionary for tracking token usage and errors
384415
gen_init_pop: Whether generating initial population
385416
logger: Logger instance
417+
eval_budget: Optional pre-formatted evaluation budget string to inject
418+
into the system prompt so the LLM is aware of resource constraints.
386419
387420
Returns:
388421
Tuple of (child_sol, success) where:
@@ -401,6 +434,7 @@ async def generate_solution(
401434
inspirations=inspirations,
402435
max_chat_depth=chat_depth,
403436
exploitation=exploitation,
437+
eval_budget=eval_budget,
404438
)
405439
logger.info(f"Chat consists of {len(messages)} messages (max_chat_depth = {chat_depth}).")
406440

@@ -480,6 +514,7 @@ async def evaluate_and_store(
480514
evolve_state: Dict[str, Any],
481515
epoch: int,
482516
logger: logging.Logger,
517+
timeout_s: Optional[int] = None,
483518
) -> bool:
484519
"""
485520
Evaluate a solution program and add it to the database if valid.
@@ -510,13 +545,14 @@ async def evaluate_and_store(
510545
evolve_state: State dictionary for tracking token usage and errors
511546
epoch: Current epoch number
512547
logger: Logger instance
548+
timeout_s: Optional timeout override passed to the evaluator.
513549
514550
Returns:
515551
Boolean indicating whether this child became the new global best solution
516552
"""
517553
## EVALUATING CHILD PROGRAM
518554
child_sol.returncode, _, _, child_sol.error, child_sol.eval_metrics = evaluator.execute(
519-
child_sol
555+
child_sol, timeout_s=timeout_s
520556
)
521557
child_sol.fitness = child_sol.eval_metrics.get(evolve_config["fitness_key"], 0)
522558

@@ -636,6 +672,7 @@ async def codeevolve_loop(
636672
evolve_state: Dict[str, Any],
637673
init_sol: Program,
638674
init_prompt: Program,
675+
config: Dict[str, Any],
639676
evolve_config: Dict[str, Any],
640677
args: Dict[str, Any],
641678
isl_data: IslandCommunicationData,
@@ -670,8 +707,8 @@ async def codeevolve_loop(
670707
evolve_state: Dictionary tracking algorithm state.
671708
init_sol: Initial solution program.
672709
init_prompt: Initial prompt program.
673-
config: Full configuration dictionary.
674-
evolve_config: Evolution-specific configuration.
710+
config: Full configuration dictionary (top-level YAML).
711+
evolve_config: Evolution-specific configuration subset.
675712
args: Command-line arguments.
676713
isl_data: Island communication data.
677714
global_data: Shared data structures.
@@ -715,6 +752,19 @@ def _do_checkpoint(epoch_num: int) -> None:
715752
exploration_rate: float = (
716753
scheduler.exploration_rate if scheduler is not None else evolve_config["exploration_rate"]
717754
)
755+
dynamic_timeout: bool = config.get("DYNAMIC_EVAL_TIMEOUT", False)
756+
min_timeout_s: int = config.get("EVAL_MIN_TIMEOUT", DEFAULT_EVAL_MIN_TIMEOUT_S)
757+
max_depth: Optional[int] = None
758+
if dynamic_timeout:
759+
raw_mcd: Optional[int] = evolve_config.get("max_chat_depth")
760+
if raw_mcd is not None and raw_mcd > 0:
761+
max_depth = raw_mcd
762+
else:
763+
max_depth = evolve_config["num_epochs"]
764+
765+
eval_budget: str = format_eval_budget(
766+
timeout_s=evaluator.timeout_s, max_mem_b=evaluator.max_mem_b
767+
)
718768
epoch: int = start_epoch + 1
719769

720770
for epoch in range(start_epoch + 1, evolve_config["num_epochs"] + 1):
@@ -789,6 +839,18 @@ def _do_checkpoint(epoch_num: int) -> None:
789839
)
790840
chat_depth: Optional[int] = evolve_config.get("max_chat_depth", None) if exploitation else 0
791841

842+
child_timeout: Optional[int] = None
843+
if dynamic_timeout and max_depth is not None:
844+
child_timeout = compute_dynamic_timeout(
845+
depth=parent_sol.depth + 1,
846+
max_timeout_s=evaluator.timeout_s,
847+
min_timeout_s=min_timeout_s,
848+
max_depth=max_depth,
849+
)
850+
eval_budget = format_eval_budget(
851+
timeout_s=child_timeout, max_mem_b=evaluator.max_mem_b
852+
)
853+
792854
child_sol, evolve_success = await generate_solution(
793855
ensemble=ensemble,
794856
prompt_sampler=prompt_sampler,
@@ -804,6 +866,7 @@ def _do_checkpoint(epoch_num: int) -> None:
804866
chat_depth=chat_depth,
805867
exploitation=exploitation,
806868
logger=logger,
869+
eval_budget=eval_budget,
807870
)
808871

809872
# EVALUATE AND ADD TO DB
@@ -820,6 +883,7 @@ def _do_checkpoint(epoch_num: int) -> None:
820883
evolve_state=evolve_state,
821884
epoch=epoch,
822885
logger=logger,
886+
timeout_s=child_timeout,
823887
)
824888

825889
# MIGRATION
@@ -1377,6 +1441,7 @@ async def codeevolve(
13771441
components.evolve_state,
13781442
components.init_sol,
13791443
components.init_prompt,
1444+
components.config,
13801445
components.evolve_config,
13811446
args,
13821447
isl_data,

src/codeevolve/prompt/sampler.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,7 @@ def build(
195195
inspirations: Optional[List[Program]] = None,
196196
max_chat_depth: Optional[int] = None,
197197
exploitation: bool = False,
198+
eval_budget: Optional[str] = None,
198199
) -> List[Dict[str, str]]:
199200
"""Builds a conversation prompt from program lineage and inspirations.
200201
@@ -203,6 +204,9 @@ def build(
203204
can be used to generate the next program iteration. It optionally
204205
includes inspiration programs and limits conversation depth.
205206
207+
The system message is assembled as:
208+
[user's SYS_MSG] + [eval_budget] + [task template]
209+
206210
Args:
207211
prompt: The system prompt program defining the task and instructions.
208212
prog: The current program to build conversation history from.
@@ -211,6 +215,9 @@ def build(
211215
max_chat_depth: Maximum depth to trace back in the conversation history.
212216
If None, traces back to the root program.
213217
exploitation: If True, use exploitation templates; if False, use exploration templates.
218+
eval_budget: Optional pre-formatted evaluation budget string (from
219+
``format_eval_budget``) to inject between the user's system
220+
prompt and the task template.
214221
215222
Returns:
216223
A list of message dictionaries following the OpenAI chat format,
@@ -240,7 +247,10 @@ def build(
240247
"content": EVOLVE_PROG_TEMPLATE.format(program=db.programs[curr_pid].prog_msg),
241248
}
242249
)
243-
messages.appendleft({"role": "system", "content": prompt.code})
250+
sys_content: str = prompt.code
251+
if eval_budget:
252+
sys_content += "\n" + eval_budget
253+
messages.appendleft({"role": "system", "content": sys_content})
244254

245255
task_template: str
246256
if inspirations and len(inspirations):

0 commit comments

Comments
 (0)