Skip to content

feat: multi-language orchestration loop with per-language config discovery#1859

Open
mashraf-222 wants to merge 21 commits intomainfrom
fix/set-language-singleton-early
Open

feat: multi-language orchestration loop with per-language config discovery#1859
mashraf-222 wants to merge 21 commits intomainfrom
fix/set-language-singleton-early

Conversation

@mashraf-222
Copy link
Contributor

@mashraf-222 mashraf-222 commented Mar 18, 2026

Summary

Adds multi-language orchestration to the Codeflash CLI. The optimizer now discovers all language configs in a project (Python, Java, JS/TS), and runs a full optimization pass for each language automatically.

This PR is part of a 3-repo change set:

  • codeflash (this PR) — Multi-language orchestration loop, config discovery, and language-agnostic git diff
  • codeflash-cc-plugin — Unified hook that triggers a single codeflash --subagent call
  • optimize-me — Mixed-language test fixture (Python + Java + JS/TS) for E2E validation

The CLI orchestration loop is the core — the cc-plugin delegates to it, and optimize-me validates it end-to-end.

What Changed

Config Discovery

  • find_all_config_files() walks CWD→root, discovers pyproject.toml (Python), codeflash.toml (Java), package.json (JS/TS). Closest config wins per language.
  • New LanguageConfig dataclass holds config dict, path, and language enum per discovered config.
  • Extracted normalize_toml_config() shared helper for consistent config normalization.

Orchestration Loop

  • main() iterates over discovered configs, deep-copies args per language, calls apply_language_config() then optimizer.run_with_args() for each.
  • --file flag filters to the matching language only. --all and no-flags run all discovered languages.
  • Per-language error isolation — one language failing doesn't block others.
  • Summary logging after all passes complete.

Auto-Detection

  • detect_unconfigured_languages() compares configs vs git diff to find languages with changed files but no config.
  • auto_configure_language() creates on-the-fly configs by detecting project roots (pom.xml, package.json).

Language-Agnostic Git Diff

  • get_git_diff() now uses get_supported_extensions() from the registry instead of the singleton. The coarse filter lets through all supported files; per-file language detection happens downstream.

Tests

  • 73 tests in test_multi_language_orchestration.py (orchestration loop, apply_language_config, summary logging, CLI path routing, unconfigured detection)
  • 11 tests in test_multi_config_discovery.py (find_all_config_files, LanguageConfig)
  • 15 tests in test_git_utils.py (language-agnostic diff filtering)
  • Registry test pollution fix in test_registry.py

All tests pass locally.

E2E Validation

Full E2E session with local backend services — Fibonacci (java-test-project), encodedLength (aerospike), isPalindrome + mergeSorted (optimize-me Java), retryWithBackoff (optimize-me JS) all passed.

Related PRs

codeflash-cc-plugin — Unified multi-language hook

The cc-plugin is the entry point that triggers Codeflash from Claude Code. Previously it had 3 separate per-language code paths that each invoked the CLI differently. With the orchestration loop introduced in this PR, the cc-plugin was simplified to a single codeflash --subagent call — it only needs to detect that any config exists and find the binary. All multi-language discovery and dispatch is now handled by find_all_config_files() and the orchestration loop in main.py from this PR.

optimize-me — Mixed-language test fixture

The optimize-me repo was extended with Java and JS/TS subprojects to serve as an E2E validation target for the orchestration loop. Each subproject has its own codeflash config file, so running Codeflash at the repo root exercises find_all_config_files() discovering multiple configs and the loop running separate optimization passes per language. The E2E validation results above were produced by running this PR's code against the optimize-me fixture.

Generated with Claude Code

…r all languages

The language singleton was only set after function discovery, but get_git_diff()
needs it during discovery to filter by file extension. Now set it in
process_pyproject_config() based on the config file type.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Mar 18, 2026

Claude finished @mashraf-222's task in 6m 18s —— View job


PR Review Summary

  • Triage PR size → LARGE (382+ lines of production code)
  • Lint and typecheck
  • Resolve stale threads (none found)
  • Code review
  • Duplicate detection
  • Test coverage
  • Post summary

Prek Checks

ruff-format was failing in CI: config_parser.py lines 169-171 had a 3-line configs.append(...) that fits on one line. Fixed and pushed in commit 275527db.

ruff-check F811 (redefined-while-unused) in git_utils.py was flagged in the CI run against commit c5d7c394. The current HEAD does not have a duplicate import — the inline from codeflash.languages.registry import get_supported_extensions inside get_git_diff() is the only import, with no top-level shadow. This appears to have been introduced and then resolved during the 20-commit history before this run. ✅ Clean on HEAD after the format fix.


Code Review

🔴 Accidental File Inclusions

.planning/ directory committed — 5 files added: .planning/STATE.md, .planning/config.json, .planning/phases/08-*/08-02-PLAN.md, 08-02-SUMMARY.md, 08-03-PLAN.md. These are AI agent internal workflow artifacts, not production code. They should either be removed from the PR or added to .gitignore.

Fix this →


🟠 Design Issues

1. get_changed_file_paths() in main.py duplicates get_git_diff() logic

main.py:188 introduces a new subprocess.run(["git", "diff", "--name-only", "HEAD~1"]) implementation while codeflash/code_utils/git_utils.py already has the full-featured get_git_diff(). Problems with the new function:

  • Returns relative Path objects (git diff --name-only output is repo-relative, not absolute)
  • HEAD~1 fails on an initial commit with no parent
  • Placed in main.py (an entry point) instead of git_utils.py (where all git utilities live)

The fix should call into get_git_diff() or extract a helper in git_utils.py. Concretely, get_git_diff() already returns dict[str, list[int]]get_changed_file_paths() only needs the keys.

Fix this →

2. Duplicate tests_root detection logic

The Java/JS/TS tests_root fallback logic is copy-pasted verbatim between process_pyproject_config (cli.py:126-155) and apply_language_config (cli.py:289-312). Both blocks look identical. A private helper _resolve_tests_root(args, is_java, is_js_ts) should be extracted.

Fix this →

3. Business logic in main.py entry point

detect_unconfigured_languages, detect_project_for_language, and auto_configure_language (main.py:176-264) contain significant domain logic that should live in setup/ or code_utils/, not the CLI entry point. Per the architecture doc, setup logic belongs in setup/. Moving them would also improve testability.

4. apply_language_config missing --benchmark validation

process_pyproject_config (cli.py:159-183) validates --benchmarks-root is set and that the GitHub app is installed when --benchmark is used. apply_language_config skips all of this. Multi-language runs with --benchmark will silently proceed without these guards.

5. detect_project_for_language imports private-prefixed functions

main.py:203-211 imports _detect_formatter, _detect_ignore_paths, _detect_java_module_root, _detect_js_module_root, _detect_python_module_root, _detect_test_runner, _detect_tests_root from setup/detector.py. CLAUDE.md states: "NEVER use leading underscores — Python has no true private functions, use public names." Either rename those detector functions to remove the underscore prefix, or expose a public API from detector.py that main.py can call without reaching into private helpers.


🟡 Minor Issues

Docstrings on new functions — CLAUDE.md: "Do not add docstrings to new or changed code unless explicitly asked." The following new/changed functions have docstrings that should be removed: _handle_config_loading (main.py:267), print_codeflash_banner (main.py:313), _handle_reset_config (cli.py:392).

Silent exception swallowing in find_all_config_filesconfig_parser.py:172 has except Exception: continue which silently swallows all config parse errors. At minimum a logger.debug should log the exception to aid debugging.


Duplicate Detection

HIGH confidenceget_changed_file_paths() (main.py:188) vs get_git_diff() (git_utils.py:21): both call git diff to find changed files; the new function is a simplified re-implementation of the existing one. Should be eliminated in favour of the existing utility.

MEDIUM confidencedetect_project_for_language() (main.py:202) vs detect_project() (setup/detector.py:79): both call the same individual _detect_* helpers and build a DetectedProject. The key difference is that detect_project auto-detects the language while detect_project_for_language takes an explicit language. This near-duplication could be resolved by adding a language override parameter to detect_project().


Test Coverage

All 73 new tests pass. Coverage on the key new paths:

File Coverage
codeflash/main.py 70%
codeflash/code_utils/config_parser.py 45%
codeflash/code_utils/git_utils.py 64%
codeflash/cli_cmds/cli.py 20%

cli.py at 20% is expected (interactive CLI). The new multi-language orchestration logic in main.py at 70% is reasonable. auto_configure_language (main.py:244) is covered; detect_project_for_language (main.py:202) lacks direct test coverage — consider adding at least a happy-path unit test.


Pushed fix: style: auto-fix ruff formatting in config_parser.py (commit 275527db)
|

…r all languages

The language singleton was only set after function discovery, but get_git_diff()
needs it during discovery to filter by file extension.

- config_parser.py: set config["language"] based on config file type (codeflash.toml
  → java, pyproject.toml → python) so all project types return a language
- cli.py: call set_current_language() in process_pyproject_config() using the
  config value, before the optimizer runs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mashraf-222 mashraf-222 force-pushed the fix/set-language-singleton-early branch from 8c9aab5 to d9dfb0b Compare March 18, 2026 00:34
mashraf-222 and others added 18 commits March 18, 2026 04:03
…ensions

- Replace current_language_support().file_extensions with get_supported_extensions() from registry
- Update tests: remove singleton dependency, add unsupported extension filtering test
- Mixed Python+Java diffs now return both file types regardless of singleton state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add LanguageConfig dataclass with config, config_path, language fields
- Add find_all_config_files() that discovers all codeflash configs in project hierarchy
- Supports pyproject.toml (Python), codeflash.toml (Java), package.json (JS/TS)
- Skips configs without [tool.codeflash] section, closest config wins per language
- Add 6 tests covering discovery, filtering, parent directory search, deduplication

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…DISC-04)

- Add smoke test confirming get_language_support usage, not singleton
- No code changes needed, function already uses per-file registry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add apply_language_config() to cli.py for multi-language mode config application
- Import LanguageConfig and Language enum in cli.py
- Create test_multi_language_orchestration.py with 9 tests covering:
  module_root/tests_root setting, path resolution, project_root,
  CLI override preservation, formatter_cmds, language singleton,
  Python/Java config handling, Java default tests_root

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add orchestration loop that iterates over all discovered LanguageConfigs
- Deep-copy args per language pass to prevent mutation leakage
- Run git/GitHub checks once before loop via handle_optimize_all_arg_parsing
- Preserve fallback to single-config path when find_all_config_files returns empty
- Add 4 orchestration tests: sequential passes, singleton per pass,
  fallback to single config, args isolation between passes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ig_files

- Extract shared normalization logic (path resolution, defaults, key conversion) into normalize_toml_config()
- Use it in both find_all_config_files and parse_config_file to eliminate duplication
- Add 6 tests verifying normalization behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap each language pass in try/except so one failure doesn't block others
- Track per-language status (success/failed/skipped) in results dict
- Add 3 tests verifying error isolation and failure tracking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test summary format with all success, mixed statuses, and empty results
- Test skipped status when formatter check fails
- 4 new tests covering _log_orchestration_summary behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ation

- Detect file language via get_language_support(Path(args.file))
- Filter language_configs to only the matching language before loop
- Gracefully handle unsupported extensions and missing configs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_file_flag_filters_to_matching_language: Java file runs only Java pass
- test_file_flag_python_file_filters_to_python: Python file runs only Python pass
- test_file_flag_unknown_extension_runs_all: .rs file runs all language passes
- test_file_flag_no_matching_config_runs_all: Java file with only Python config runs all
- test_all_flag_sets_module_root_per_language: --all sets pass_args.all per language
- test_no_flags_runs_all_language_passes: no flags runs all language passes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Tests for detect_unconfigured_languages() function
- Tests for auto_configure_language() success and failure paths
- Test for per-language logging output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion

- Add detect_unconfigured_languages() to identify languages in changed files lacking configs
- Add detect_project_for_language() using per-language detection helpers (avoids wrong-language pitfall)
- Add auto_configure_language() that writes config and re-discovers it in one step
- Add get_changed_file_paths() helper using git diff
- Wire auto-config into orchestration loop (only for subagent/no-flags path)
- Failed auto-config logs warning with manual setup instructions, continues gracefully
- Per-language "Processing {lang} (config: {path})" logging confirmed working

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add JS/TS config discovery tests (package.json, all three config types)
- Add malformed TOML and missing codeflash section tests
- Add JS/TS extension git diff tests (.js, .ts, .jsx, .tsx)
- Add mixed three-language git diff test
- Add TypeScript/JSX file flag routing tests
- Add direct function coverage for get_changed_file_paths, detect_project_for_language
- Add empty config normalize test
- 13 new tests across 3 files (60 -> 73 total)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Mock posthog and sentry initialization in all tests calling main() to
  prevent SystemExit when prior tests overwrite CODEFLASH_API_KEY
- Re-register JavaSupport in clear_registry test to prevent Java language
  lookup failures in subsequent tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mashraf-222 mashraf-222 changed the title fix: set language singleton early so git diff auto-detection works for non-Python languages feat: multi-language orchestration loop with per-language config discovery Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants