feat: zero-config Java projects + smart ReplayHelper for end-to-end optimization by misrasaurabh1 · Pull Request #1880 · codeflash-ai/codeflash

misrasaurabh1 · 2026-03-20T03:20:02Z

Summary

Eliminates codeflash.toml for Java projects and fixes the complete trace → optimize pipeline to work end-to-end on real Java projects (validated on aerospike-client-java).

Zero-config Java support

Auto-detect Java projects from pom.xml / build.gradle — no config file needed
Read custom settings from pom.xml <properties> or gradle.properties (codeflash.* keys)
Multi-module Maven scanning: parses each module's <sourceDirectory> / <testSourceDirectory>, picks module with most Java files as source root
Deleted all codeflash.toml files

Smart ReplayHelper (behavior + performance parity)

ReplayHelper.replay() now reads CODEFLASH_MODE env var and produces the same output as existing test instrumentation
Behavior mode: captures return value via Kryo, writes to SQLite test_results table for correctness comparison
Performance mode: runs inner loop for JIT warmup, prints timing markers matching the optimizer's expected format
No mode: just invokes (trace-only or manual testing)

Bug fixes

JFR parser: normalize / → . in class names (JVM internal format vs Java package format)
Graceful timeout: send SIGTERM before SIGKILL so JFR can dump recording and shutdown hooks run
TracingTransformer: remove isRecording() check that prevented instrumenting classes loaded during serialization (was causing 3 captures instead of 10,000+)
Replay test generator: JUnit 4 support (org.junit.Test vs org.junit.jupiter.api.Test), detect from project build config
Overloaded methods: global counter per method name to avoid duplicate replay test method names
Instrumentation: fix _add_behavior_instrumentation for compact @Test lines (annotation + signature on same line)
project_root: use build root directory (not sub-module) for multi-module Maven projects
optimize subparser: add_help=False so -h in Java commands isn't intercepted as --help

Validated end-to-end on aerospike-client-java

10,500+ invocations traced across 282 methods
41 functions ranked by JFR CPU profiling data
55 replay test files generated (JUnit 4 compatible)
Replay tests compile, run, and pass (129 tests for Crypto.computeDigest)
Behavior baseline established with timing data (4.81ms over 119 loops)
Candidates correctly verified and rejected when behavior doesn't match

Test plan

33 config detection tests (build tool, source/test root, Maven/Gradle properties, multi-module)
13 JFR parser tests (normalization, filtering, ranking, timeout, project_root)
10 replay test generation tests (JUnit 4/5, overloads, instrumentation)
8 tracer e2e tests (agent capture, replay generation, orchestration)
6 integration tests (full pipeline: discover → rank → compile)
2 replay test discovery tests
Full optimizer pipeline on aerospike benchmark: trace → discover → rank → optimize → verify

🤖 Generated with Claude Code

…iles Java projects no longer need a standalone config file. Codeflash reads config from pom.xml <properties> or gradle.properties, and auto-detects source/test roots from build tool conventions. Changes: - Add parse_java_project_config() to read codeflash.* properties from pom.xml and gradle.properties - Add multi-module Maven scanning: parses each module's pom.xml for <sourceDirectory> and <testSourceDirectory>, picks module with most Java files as source root, identifies test modules by name - Route Java projects through build-file detection in config_parser.py before falling back to pyproject.toml - Detect Java language from pom.xml/build.gradle presence (no config needed) - Fix project_root for multi-module projects (was resolving to sub-module) - Fix JFR parser / separators (JVM uses com/example, normalized to com.example) - Fix graceful timeout (SIGTERM before SIGKILL for JFR dump + shutdown hooks) - Remove isRecording() check from TracingTransformer (was preventing class instrumentation for classes loaded during serialization) - Delete all codeflash.toml files from fixtures and code_to_optimize - Add 33 config detection tests - Update docs for zero-config Java setup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replay tests call helper.replay() via reflection, not the target function directly. The behavior instrumentation can't wrap indirect calls and produces malformed output (code emitted outside class body) for large replay test files. For replay tests, just rename the class without adding instrumentation — JUnit pass/fail results verify correctness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Detect test framework from project build config and generate replay tests with appropriate imports (org.junit.Test for JUnit 4, org.junit.jupiter.api.Test for JUnit 5). Fixes compilation failures on projects using JUnit 4 (like aerospike-client-java). Also passes test_framework through run_java_tracer to generate_replay_tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ay tests Use a global counter per method name across all descriptors to generate unique test method names. Previously, overloaded methods (same name, different descriptor) would generate duplicate replay_methodName_N methods, causing compilation errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@afterclass

…on skip 10 new tests covering: - JUnit 5 replay test generation (imports, class visibility) - JUnit 4 replay test generation (imports, public methods, @afterclass) - Overloaded method handling (no duplicate test method names) - Instrumentation skip for replay tests (behavior + perf mode) - Regular tests still get instrumented normally Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…solution 13 new tests covering: - JFR class name normalization (/ to . conversion) - Package-based sample filtering - Addressable time calculation from JFR samples - Method ranking order and format - Graceful timeout (SIGTERM before SIGKILL) - Multi-module project root detection (Path not str) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@test

The behavior instrumentation was producing malformed output for compact @test lines (annotation + method signature on same line, common in replay tests). The method signature collection loop would skip past the opening brace and consume subsequent methods' content. Fix: detect when the @test annotation line already contains { and treat it as both annotation and method signature, avoiding the separate signature search that was over-consuming lines. Reverted the instrumentation skip for replay tests — they now get properly instrumented for both behavior capture and performance timing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-03-20T03:20:44Z

Claude finished @misrasaurabh1's task in 6s —— View job

PR Review Summary

Triage PR size — LARGE (1940 additions, 346 deletions, 10 production Python files + Java runtime)
Run lint/typecheck — prek passes; mypy errors in config_parser.py and config_writer.py are pre-existing
Resolve stale threads — no unresolved threads found
Review code
Duplicate detection
Test coverage

Prek Checks

All clean. ✅

Code Review

Bug: `has_existing_config()` always returns `True` for any Java project (Fix this →)

detector.py:904-906 returns (True, "pom.xml") for any project that has a pom.xml or build.gradle — even before codeflash has been configured. Previously the equivalent Java check looked for [tool.codeflash] in codeflash.toml. The new check completely removed the presence validation:

for build_file in ("pom.xml", "build.gradle", "build.gradle.kts"):
    if (project_root / build_file).exists():
        return True, build_file  # ← always True for any Java project!

Consequence: codeflash init on a fresh Java project would say "config already exists" and skip initialization.

Bug: `_write_maven_properties()` destroys `pom.xml` formatting (Fix this →)

config_writer.py:132-136 uses ET.parse() + tree.write() to modify pom.xml. This is destructive: it strips all XML comments, namespace declarations, and reformats indentation. A user's well-maintained pom.xml with comments explaining each dependency would be silently mangled. For Maven specifically, losing the xmlns namespace prefix declarations can also break mvn parsing.

The source-code.md rules confirm: use libcst for code modification to preserve formatting. For XML, the equivalent is a text/regex-based approach rather than parse-and-serialize.

Bug: Write/remove priority mismatch for Java config (Fix this →)

_write_java_build_config() (line 119) writes to pom.xml first when it exists. _remove_java_build_config() (line 1224) tries gradle.properties first. On a Maven project with both files, config written to pom.xml won't be cleaned up by remove.

Design: `add_help=False` breaks `codeflash optimize --help`

cli.py:382 disables help for the entire optimize subparser to prevent -h from being intercepted when it appears in a Java command like java -h. But this also silently disables codeflash optimize --help for users. A cleaner fix would be to require users to separate their Java command from codeflash flags with --, or only suppress help when the language is Java.

Design: `_try_parse_java_build_config()` takes priority over JS/Python config

config_parser.py:453-457 runs Java config detection before find_package_json(). In a full-stack monorepo where the parent directory has a pom.xml and a subdirectory has the JS project with package.json, running codeflash from the JS subdirectory would incorrectly load Java config. The CWD-walk in _try_parse_java_build_config() will find the parent pom.xml and return Java config even though the user is working in a JS project.

Accidental binary file committed

codeflash/languages/java/resources/codeflash-runtime-1.0.0.jar is a binary file that changed in this PR. If this is intentional (new runtime built from the ReplayHelper.java changes), it should be noted explicitly. Binary files in resources/ that are auto-generated from source-controlled Java code could get out of sync.

Duplicate Detection

MEDIUM confidence: The key_map dict (kebab-case → camelCase) is defined identically in both _write_maven_properties() (line 142) and _write_gradle_properties() (line 176) in config_writer.py. This should be a module-level constant shared by both functions.

_JAVA_CONFIG_KEY_MAP = {
    "module-root": "moduleRoot",
    "tests-root": "testsRoot",
    ...
}

No other duplicates found across language modules.

Test Coverage

New _write_maven_properties(), _write_gradle_properties(), _write_java_build_config(), _remove_java_build_config() functions in config_writer.py have no tests. These are risky file-mutation operations that should be covered.
Updated has_existing_config() in detector.py has existing tests in test_detector.py, but no test for the new Java behavior (i.e., that a fresh project with pom.xml is handled correctly). Given the false-positive bug above, a test should be added.
783/791 tests pass locally; 8 failures are in integration tests requiring Java runtime (expected in this environment).

Last updated: 2026-03-20T07:05:00Z

ReplayHelper now reads CODEFLASH_MODE env var and produces the same output as the existing test instrumentation: - Behavior mode: captures return value via Kryo serialization, writes to SQLite (test_results table) for correctness comparison, prints start/end timing markers - Performance mode: runs inner loop for JIT warmup, prints timing markers for each iteration matching the expected format - No mode: just invokes the method (trace-only or manual testing) This achieves feature parity with the existing test instrumentation for replay tests, which call functions via reflection and can't be wrapped by text-level instrumentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ay tests + speedups - Trigger on any codeflash/** or tests/** changes (not just java subset) - Validate replay test files are discovered per-function - Already validates: replay test generation, global discovery count, optimization success, and minimum speedup percentage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The refactored Java project_root handling moved args.tests_root resolution after the project_root_from_module_root call, which passed a string instead of a Path. Restore the original order: resolve tests_root to Path first, then set test_project_root, then override both for Java multi-module projects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Use Path comparisons instead of forward-slash substring matching - Avoid parse_args() in test (reads stdin on Windows) — use Namespace directly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use print(flush=True) instead of logging.info for subprocess output so CI logs show progress in real-time instead of buffering until completion. Also set PYTHONUNBUFFERED=1 for the subprocess. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…_write_gradle_properties Co-authored-by: Saurabh Misra <undefined@users.noreply.github.com>

…ions harder - Set jdk.ExecutionSample#period=1ms (default was 10ms) so JFR captures samples from shorter-running programs - Workload.main now runs 1000 rounds with larger inputs so JFR can capture method-level CPU samples (repeatString with O(n²) concat dominates ~75% of samples) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

misrasaurabh1 and others added 7 commits March 19, 2026 19:11

misrasaurabh1 changed the title ~~Java config redesign + bugfixs for Tracer~~ feat: zero-config Java projects + smart ReplayHelper for end-to-end optimization Mar 20, 2026

github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Mar 20, 2026

misrasaurabh1 and others added 5 commits March 19, 2026 22:40

fix: Windows compatibility for Java config detection tests

74cbe2a

- Use Path comparisons instead of forward-slash substring matching - Avoid parse_args() in test (reads stdin on Windows) — use Namespace directly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: add missing type params for dict in _write_maven_properties and …

803fb64

…_write_gradle_properties Co-authored-by: Saurabh Misra <undefined@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: zero-config Java projects + smart ReplayHelper for end-to-end optimization#1880

feat: zero-config Java projects + smart ReplayHelper for end-to-end optimization#1880
misrasaurabh1 wants to merge 14 commits intomainfrom
java-config-redesign

misrasaurabh1 commented Mar 20, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

misrasaurabh1 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Zero-config Java support

Smart ReplayHelper (behavior + performance parity)

Bug fixes

Validated end-to-end on aerospike-client-java

Test plan

Uh oh!

claude bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Bug: has_existing_config() always returns True for any Java project (Fix this →)

Bug: _write_maven_properties() destroys pom.xml formatting (Fix this →)

Bug: Write/remove priority mismatch for Java config (Fix this →)

Design: add_help=False breaks codeflash optimize --help

Design: _try_parse_java_build_config() takes priority over JS/Python config

Accidental binary file committed

Duplicate Detection

Test Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

misrasaurabh1 commented Mar 20, 2026 •

edited

Loading

claude bot commented Mar 20, 2026 •

edited

Loading

Bug: `has_existing_config()` always returns `True` for any Java project (Fix this →)

Bug: `_write_maven_properties()` destroys `pom.xml` formatting (Fix this →)

Design: `add_help=False` breaks `codeflash optimize --help`

Design: `_try_parse_java_build_config()` takes priority over JS/Python config