inter-co
diff --git a/‎AGENTS.md‎
Lines changed: 79 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎configs/config_mock.yaml‎
Lines changed: 3 additions & 5 deletions b/‎configs/config_mock.yaml‎
Lines changed: 3 additions & 5 deletions
diff --git a/‎src/codeevolve/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎src/codeevolve/__init__.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/codeevolve/evaluator.py‎
Lines changed: 9 additions & 4 deletions b/‎src/codeevolve/evaluator.py‎
Lines changed: 9 additions & 4 deletions
diff --git a/‎src/codeevolve/evolution.py‎
Lines changed: 68 additions & 3 deletions b/‎src/codeevolve/evolution.py‎
Lines changed: 68 additions & 3 deletions
diff --git a/‎src/codeevolve/prompt/sampler.py‎
Lines changed: 11 additions & 1 deletion b/‎src/codeevolve/prompt/sampler.py‎
Lines changed: 11 additions & 1 deletion
@@ -0,0 +1,79 @@
+# AGENTS.md
+
+Guide for AI agents working on the CodeEvolve codebase.
+
+## Project Overview
+
+CodeEvolve is an LLM-driven evolutionary algorithm framework that discovers and optimizes code solutions. Programs evolve across distributed islands via LLM-generated SEARCH/REPLACE diffs, evaluated in sandboxed environments.
+
+## Repository Layout
+
+```
+src/codeevolve/
+  cli.py            # CLI entry point, arg parsing, island spawning
+  runner.py         # Process management, signal handling, log daemon
+  evolution.py      # Main evolutionary loop and component setup
+  database.py       # Program dataclass, ProgramDatabase, MAP-Elites
+  evaluator.py      # Sandboxed execution with time/memory limits
+  scheduler.py      # Exploration rate schedulers
+  lm/               # LLM interfaces (base.py, openai.py)
+  islands/          # Topology (graph.py), sync (sync.py), migration (migration.py)
+  prompt/           # Prompt templates (template.py), conversation builder (sampler.py)
+  utils/            # Constants, checkpointing, config, diff parsing, logging, locking
+tests/              # pytest suite mirroring src/ structure
+configs/            # Example YAML configs (mock, qwen, gemini)
+problems/           # Benchmark problems and problem_template
+```
+
+## Build & Test
+
+- **Python**: >=3.13.5, managed via conda (`environment.yml`)
+- **Install**: `conda env create -f environment.yml && conda activate codeevolve`
+- **Run tests**: `pytest tests/` (use `pytest tests/ -v` for verbose)
+- **Formatting**: `black` (line-length 100, target py313) and `isort` (profile "black")
+- **Async tests**: use `@pytest.mark.asyncio` decorator; pytest-asyncio is configured in strict mode
+
+## Code Conventions
+
+- **Type hints**: use throughout; prefer `X | None` over `Optional[X]` for new code
+- **Docstrings**: Google-style with Args/Returns/Raises sections
+- **File headers**: each source file starts with the Apache-2.0 license block
+- **Constants**: centralized in `utils/constants.py`; do not scatter magic values
+- **Dataclasses**: `Program` is a dataclass in `database.py`; `depth` field tracks evolutionary lineage depth
+- **Config**: YAML-based; top-level keys (`EVAL_TIMEOUT`, `SYS_MSG`, etc.) and `EVOLVE_CONFIG` dict for evolutionary parameters
+
+## Key Patterns
+
+### Evolutionary Loop (`evolution.py`)
+
+The loop in `codeevolve_loop()` delegates to focused helper functions:
+1. `select_parents()` — selection phase
+2. `run_meta_prompting()` — optional prompt evolution
+3. `generate_solution()` — LLM-driven code generation
+4. `evaluate_and_store()` — sandboxed evaluation
+5. `handle_migration()` — inter-island exchange
+
+Component setup lives in `setup_codeevolve_components()` and `_create_*` factory functions.
+
+### Prompt Assembly (`prompt/`)
+
+System messages are assembled in `sampler.py:build()` as:
+`[user SYS_MSG] + [eval_budget] + [task template]`
+
+Task templates live in `template.py` as composable sections (`_CORE_RULES`, `_MODIFICATION_FORMAT`, etc.) assembled by `get_*_task_template()` factory functions.
+
+### Evaluator (`evaluator.py`)
+
+The `Evaluator` class runs programs via subprocess with timeout and optional memory monitoring. The `execute()` method takes a `Program` and returns `(returncode, output, warning, error, eval_metrics)`.
+
+### Mock Models
+
+Set `model_name` to `"MOCK"` in config to use `MockOpenAILM` for testing without API calls. The mock returns identity SEARCH/REPLACE diffs.
+
+## Testing Guidelines
+
+- Mirror the module structure: `test_<module>.py`
+- Use class-based test organization (`class TestFeatureName`)
+- Prefix helpers with `_make_*` for fixture-like factory methods
+- Test both success and error/edge cases
+- For async functions, use `@pytest.mark.asyncio`
@@ -44,7 +44,7 @@
 
 <div align="center">
 <p align="center">
-  <img src="https://img.shields.io/badge/version-v0.3-green" alt="v0.3"></a>
+  <img src="https://img.shields.io/badge/version-v0.3-green" alt="v0.3.1"></a>
   <a href="https://arxiv.org/abs/2510.14150"><img src="https://img.shields.io/badge/arxiv-2510.14150-red" alt="Arxiv"></a>
   <a href="https://github.com/inter-co/science-codeevolve/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache--2.0-blue.svg" alt="License"></a>
 </p>
 
@@ -8,16 +8,14 @@ SYS_MSG: |
   1. **benchmark_ratio**: fitness of the solution (higher is better)
   2. **eval_time**: Execution time in seconds (keep reasonable)
 
-  COMPUTATIONAL BUDGET:
-  - **Time limit**: 60 seconds maximum execution time
-  - **Memory limit**: 1 GB
-
   # PROMPT-BLOCK-END
       
 CODEBASE_PATH: 'src/'
 INIT_FILE_DATA: {filename: 'init_program.py', language: 'python'}
 EVAL_FILE_NAME: 'evaluate.py'
-EVAL_TIMEOUT: 60
+EVAL_TIMEOUT: 60            
+EVAL_MIN_TIMEOUT: 5          
+DYNAMIC_EVAL_TIMEOUT: true   
 
 MAX_MEM_BYTES: 1000000000
 MEM_CHECK_INTERVAL_S: 0.1
 
@@ -9,4 +9,4 @@
 # This file initializes the main module for CodeEvolve.
 #
 # ===--------------------------------------------------------------------------------------===#
-__version__ = "0.3"
+__version__ = "0.3.1"
@@ -195,7 +195,7 @@ def __repr__(self):
         )
 
     def execute(
-        self, prog: Program
+        self, prog: Program, timeout_s: Optional[int] = None
     ) -> Tuple[int, Optional[str], Optional[str], Optional[str], Dict[str, Any]]:
         """Executes a program and updates it with execution results and metrics.
 
@@ -211,6 +211,8 @@ def execute(
             prog: Program object containing the code to execute. This object will be
                 modified in-place with execution results including returncode, error
                 messages, and evaluation metrics.
+            timeout_s: Optional timeout override in seconds.  When provided,
+                this value is used instead of ``self.timeout_s``.
 
         Returns:
             returncode: Exit code of the program (0 for success)
@@ -219,7 +221,10 @@ def execute(
             error: String with stderr
             eval_metrics: Dictionary of evaluation metrics if successful
         """
-        self.logger.info("Attempting to evaluate program...")
+        effective_timeout: int = timeout_s if timeout_s is not None else self.timeout_s
+        self.logger.info(
+            f"Attempting to evaluate program (depth={prog.depth}, timeout={effective_timeout}s)..."
+        )
 
         extension: str = LANGUAGE_TO_EXTENSION.get(prog.language, DEFAULT_EXTENSION)
         returncode: int = 1
@@ -297,7 +302,7 @@ def execute(
                 mem_monitor_daemon.start()
 
             try:
-                stdout, stderr = process.communicate(timeout=self.timeout_s)
+                stdout, stderr = process.communicate(timeout=effective_timeout)
                 kill_flag.set()
                 if mem_monitor_daemon is not None:
                     mem_monitor_daemon.join(timeout=1)
@@ -328,7 +333,7 @@ def execute(
                     process.communicate(timeout=1)
                 except Exception:
                     pass
-                error = f"TimeoutError: Evaluation time usage exceeded maximum time limit of {self.timeout_s} seconds."
+                error = f"TimeoutError: Evaluation time usage exceeded maximum time limit of {effective_timeout} seconds."
 
         except Exception as err:
             self.logger.error(f"Unexpected error during evaluation: {err}")
 
@@ -27,11 +27,13 @@
 from codeevolve.islands.sync import GlobalSyncData
 from codeevolve.lm.openai import OpenAIEmbedding, OpenAIEnsemble
 from codeevolve.prompt.sampler import PromptSampler, format_prog_msg
+from codeevolve.prompt.template import format_eval_budget
 from codeevolve.scheduler import SCHEDULER_TYPES, ExplorationRateScheduler
 from codeevolve.utils.ckpt import load_ckpt, save_ckpt, save_run_metadata
 from codeevolve.utils.constants import (
     BEST_PROMPT_FILE,
     BEST_SOLUTION_FILE,
+    DEFAULT_EVAL_MIN_TIMEOUT_S,
     DEFAULT_EVAL_TIMEOUT_S,
     DEFAULT_EVOLVE_END_MARKER,
     DEFAULT_EVOLVE_START_MARKER,
@@ -47,6 +49,34 @@
 from codeevolve.utils.logging import get_elapsed_time, get_logger
 from codeevolve.utils.parsing import apply_diff
 
+# ---------------------------------------------------------------------------
+# Dynamic timeout
+# ---------------------------------------------------------------------------
+
+
+def compute_dynamic_timeout(
+    depth: int,
+    max_timeout_s: int,
+    min_timeout_s: int,
+    max_depth: int,
+) -> int:
+    """Computes the effective timeout for a program based on its depth.
+
+    Scales linearly from ``min_timeout_s`` at depth 0 up to
+    ``max_timeout_s`` at depth >= ``max_depth``.
+
+    Args:
+        depth: Evolutionary depth of the program.
+        max_timeout_s: Maximum timeout in seconds (upper bound).
+        min_timeout_s: Minimum timeout in seconds (floor at depth 0).
+        max_depth: Reference depth at which the full timeout is reached.
+
+    Returns:
+        Effective timeout in seconds, always in [min_timeout_s, max_timeout_s].
+    """
+    ratio: float = min(depth / max_depth, 1.0)
+    return max(min_timeout_s, int(min_timeout_s + (max_timeout_s - min_timeout_s) * ratio))
+
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------
@@ -352,6 +382,7 @@ async def generate_solution(
     chat_depth: Optional[int],
     exploitation: bool,
     logger: logging.Logger,
+    eval_budget: Optional[str] = None,
 ) -> Tuple[Optional[Program], bool]:
     """
     Generate a new solution program by querying an LLM ensemble with structured context.
@@ -383,6 +414,8 @@ async def generate_solution(
             evolve_state: State dictionary for tracking token usage and errors
             gen_init_pop: Whether generating initial population
             logger: Logger instance
+            eval_budget: Optional pre-formatted evaluation budget string to inject
+                into the system prompt so the LLM is aware of resource constraints.
 
         Returns:
             Tuple of (child_sol, success) where:
@@ -401,6 +434,7 @@ async def generate_solution(
         inspirations=inspirations,
         max_chat_depth=chat_depth,
         exploitation=exploitation,
+        eval_budget=eval_budget,
     )
     logger.info(f"Chat consists of {len(messages)} messages (max_chat_depth = {chat_depth}).")
 
@@ -480,6 +514,7 @@ async def evaluate_and_store(
     evolve_state: Dict[str, Any],
     epoch: int,
     logger: logging.Logger,
+    timeout_s: Optional[int] = None,
 ) -> bool:
     """
     Evaluate a solution program and add it to the database if valid.
@@ -510,13 +545,14 @@ async def evaluate_and_store(
         evolve_state: State dictionary for tracking token usage and errors
         epoch: Current epoch number
         logger: Logger instance
+        timeout_s: Optional timeout override passed to the evaluator.
 
     Returns:
         Boolean indicating whether this child became the new global best solution
     """
     ## EVALUATING CHILD PROGRAM
     child_sol.returncode, _, _, child_sol.error, child_sol.eval_metrics = evaluator.execute(
-        child_sol
+        child_sol, timeout_s=timeout_s
     )
     child_sol.fitness = child_sol.eval_metrics.get(evolve_config["fitness_key"], 0)
 
@@ -636,6 +672,7 @@ async def codeevolve_loop(
     evolve_state: Dict[str, Any],
     init_sol: Program,
     init_prompt: Program,
+    config: Dict[str, Any],
     evolve_config: Dict[str, Any],
     args: Dict[str, Any],
     isl_data: IslandCommunicationData,
@@ -670,8 +707,8 @@ async def codeevolve_loop(
         evolve_state: Dictionary tracking algorithm state.
         init_sol: Initial solution program.
         init_prompt: Initial prompt program.
-        config: Full configuration dictionary.
-        evolve_config: Evolution-specific configuration.
+        config: Full configuration dictionary (top-level YAML).
+        evolve_config: Evolution-specific configuration subset.
         args: Command-line arguments.
         isl_data: Island communication data.
         global_data: Shared data structures.
@@ -715,6 +752,19 @@ def _do_checkpoint(epoch_num: int) -> None:
     exploration_rate: float = (
         scheduler.exploration_rate if scheduler is not None else evolve_config["exploration_rate"]
     )
+    dynamic_timeout: bool = config.get("DYNAMIC_EVAL_TIMEOUT", False)
+    min_timeout_s: int = config.get("EVAL_MIN_TIMEOUT", DEFAULT_EVAL_MIN_TIMEOUT_S)
+    max_depth: Optional[int] = None
+    if dynamic_timeout:
+        raw_mcd: Optional[int] = evolve_config.get("max_chat_depth")
+        if raw_mcd is not None and raw_mcd > 0:
+            max_depth = raw_mcd
+        else:
+            max_depth = evolve_config["num_epochs"]
+
+    eval_budget: str = format_eval_budget(
+        timeout_s=evaluator.timeout_s, max_mem_b=evaluator.max_mem_b
+    )
     epoch: int = start_epoch + 1
 
     for epoch in range(start_epoch + 1, evolve_config["num_epochs"] + 1):
@@ -789,6 +839,18 @@ def _do_checkpoint(epoch_num: int) -> None:
         )
         chat_depth: Optional[int] = evolve_config.get("max_chat_depth", None) if exploitation else 0
 
+        child_timeout: Optional[int] = None
+        if dynamic_timeout and max_depth is not None:
+            child_timeout = compute_dynamic_timeout(
+                depth=parent_sol.depth + 1,
+                max_timeout_s=evaluator.timeout_s,
+                min_timeout_s=min_timeout_s,
+                max_depth=max_depth,
+            )
+            eval_budget = format_eval_budget(
+                timeout_s=child_timeout, max_mem_b=evaluator.max_mem_b
+            )
+
         child_sol, evolve_success = await generate_solution(
             ensemble=ensemble,
             prompt_sampler=prompt_sampler,
@@ -804,6 +866,7 @@ def _do_checkpoint(epoch_num: int) -> None:
             chat_depth=chat_depth,
             exploitation=exploitation,
             logger=logger,
+            eval_budget=eval_budget,
         )
 
         # EVALUATE AND ADD TO DB
@@ -820,6 +883,7 @@ def _do_checkpoint(epoch_num: int) -> None:
                 evolve_state=evolve_state,
                 epoch=epoch,
                 logger=logger,
+                timeout_s=child_timeout,
             )
 
         # MIGRATION
@@ -1377,6 +1441,7 @@ async def codeevolve(
         components.evolve_state,
         components.init_sol,
         components.init_prompt,
+        components.config,
         components.evolve_config,
         args,
         isl_data,
 
@@ -195,6 +195,7 @@ def build(
         inspirations: Optional[List[Program]] = None,
         max_chat_depth: Optional[int] = None,
         exploitation: bool = False,
+        eval_budget: Optional[str] = None,
     ) -> List[Dict[str, str]]:
         """Builds a conversation prompt from program lineage and inspirations.
 
@@ -203,6 +204,9 @@ def build(
         can be used to generate the next program iteration. It optionally
         includes inspiration programs and limits conversation depth.
 
+        The system message is assembled as:
+            [user's SYS_MSG] + [eval_budget] + [task template]
+
         Args:
             prompt: The system prompt program defining the task and instructions.
             prog: The current program to build conversation history from.
@@ -211,6 +215,9 @@ def build(
             max_chat_depth: Maximum depth to trace back in the conversation history.
                            If None, traces back to the root program.
             exploitation: If True, use exploitation templates; if False, use exploration templates.
+            eval_budget: Optional pre-formatted evaluation budget string (from
+                ``format_eval_budget``) to inject between the user's system
+                prompt and the task template.
 
         Returns:
             A list of message dictionaries following the OpenAI chat format,
@@ -240,7 +247,10 @@ def build(
                 "content": EVOLVE_PROG_TEMPLATE.format(program=db.programs[curr_pid].prog_msg),
             }
         )
-        messages.appendleft({"role": "system", "content": prompt.code})
+        sys_content: str = prompt.code
+        if eval_budget:
+            sys_content += "\n" + eval_budget
+        messages.appendleft({"role": "system", "content": sys_content})
 
         task_template: str
         if inspirations and len(inspirations):
Original file line number	Diff line number	Diff line change
`@@ -9,4 +9,4 @@`
`9`	`9`	`# This file initializes the main module for CodeEvolve.`
`10`	`10`	`#`
`11`	`11`	`# ===--------------------------------------------------------------------------------------===#`
`12`		`-__version__ = "0.3"`
	`12`	`+__version__ = "0.3.1"`