Skip to content

feat: Java tracing agent with end-to-end optimization pipeline#1874

Merged
misrasaurabh1 merged 14 commits intomainfrom
java-tracer
Mar 19, 2026
Merged

feat: Java tracing agent with end-to-end optimization pipeline#1874
misrasaurabh1 merged 14 commits intomainfrom
java-tracer

Conversation

@misrasaurabh1
Copy link
Contributor

Summary

Adds a complete Java tracing pipeline that captures method arguments from running Java programs, generates JUnit 5 replay tests, and feeds them into the optimization pipeline — achieving feature parity with the Python tracer.

Two-stage approach:

  1. JFR profiling — uses Java Flight Recorder for accurate method-level timing (JIT-friendly, ~1% overhead)
  2. Argument capture — uses a bytecode instrumentation agent (ASM) to serialize method arguments via Kryo into SQLite

The traced data is used to generate replay tests that exercise the original functions with real-world inputs, which are then used by the optimizer to verify correctness and benchmark candidates.

Java agent (codeflash-java-runtime)

  • TracerAgent / TracerConfig — agent entry point, JSON config parsing
  • TracingTransformer / TracingClassVisitor / TracingMethodAdapter — ASM bytecode instrumentation (uses COMPUTE_MAXS to avoid classloader deadlocks)
  • TraceRecorder / TraceWriter — async SQLite writer with Kryo serialization timeout (500ms via CachedThreadPool)
  • ReplayHelper — runtime class for replay tests: deserializes args from trace DB, invokes methods via reflection
  • AgentDispatcher — routes to tracer mode via trace= agent arg prefix

Python orchestration

  • codeflash/languages/java/tracer.pyJavaTracer two-stage flow (JFR + agent), run_java_tracer() entry point
  • codeflash/languages/java/replay_test.py — generates JUnit 5 replay tests from trace SQLite DB with metadata comments
  • codeflash/languages/java/jfr_parser.py — parses JFR files via jfr CLI tool for method-level profiling

Integration with optimizer pipeline

  • codeflash/tracer.py — language detection from codeflash.toml config; routes Java projects to _run_java_tracer()
  • codeflash/discovery/functions_to_optimize.py_get_java_replay_test_functions() parses replay test metadata to discover traced functions
  • codeflash/languages/java/test_discovery.py — discovers ReplayTest_*.java files via metadata comments (static analysis can't trace helper.replay() string args)
  • codeflash/discovery/discover_unit_tests.py — classifies replay tests as TestType.REPLAY_TEST using TestInfo.is_replay flag
  • codeflash/benchmarking/function_ranker.pyJavaFunctionRanker ranks by JFR samples with min_functions=5 escape hatch for short workloads
  • codeflash/optimization/optimizer.py — extracts Java packages from file paths for JFR filtering; uses JavaFunctionRanker when language == "java"

Verified end-to-end

Ran the full pipeline on the Workload.java fixture:

  • Traced 10 invocations across 5 methods
  • Generated 1 replay test file with 10 test methods (correctly classified as replay tests)
  • Optimized 4 functions, found winning candidates for 3:
    • repeatString: 2.58x faster (StringBuilder → String.repeat())
    • filterEvens: 32% faster (bitwise parity, pre-sized list)
    • instanceMethod: 75% faster (inlined computation)

Test plan

  • 8 tracer e2e tests (test_java_tracer_e2e.py) — agent capture, replay test generation, two-stage orchestration
  • 6 integration tests (test_java_tracer_integration.py) — function discovery, test discovery, JFR parsing, ranking, Maven compilation
  • 2 replay test discovery tests (test_test_discovery.py) — metadata-based mapping, no confusion with regular tests
  • All 40 tracer-related tests pass; 643/644 total Java tests pass (1 pre-existing flaky timing test)
  • Full optimizer pipeline run on Workload fixture: trace → discover → rank → optimize → verify → benchmark
  • GitHub Actions workflow (e2e-java-tracer.yaml) for CI validation

🤖 Generated with Claude Code

@github-actions github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Mar 19, 2026
@claude
Copy link
Contributor

claude bot commented Mar 19, 2026

Claude finished @misrasaurabh1's task in 7s —— View job


PR Review Summary

Tasks

  • Triage PR size (LARGE: 3252 additions)
  • Lint and typecheck
  • Resolve stale review threads (none found)
  • Code review
  • Duplicate detection
  • Test coverage
  • Merge optimization PRs

Prek Checks

✅ Ruff and formatting: pass (no issues)

Mypy flags 3 new errors introduced by this PR:

File Line Issue
jfr_parser.py 112–113 top_method_key typed as str | None used as dict index and passed to _store_method_info — needs narrowing
base.py 697 tuple[bool, list] is missing type parameter; should be tuple[bool, list[Any]]

For jfr_parser.py:111-113 — the logic is actually correct (the matches_packages_cached guard returns False for None), but mypy can't see through it. Fix: change the guard to if top_method_key and matches_packages_cached(top_method_key): to explicitly narrow the type. Fix this →


Code Review

Bugs / Correctness

1. _get_java_replay_test_functions — LSP-mode silent return with invalid path (codeflash/discovery/functions_to_optimize.py:608–616)

exit_with_message doesn't call sys.exit when LSP is enabled — it logs and returns. When trace_file_path exists but the file on disk doesn't, execution falls through to return dict(functions), trace_file_path returning a path to a nonexistent file. There's already a raise AssertionError("Unreachable") guard for the None case above — the same pattern should be applied here. Fix this →

2. CompilationCache.clear() called unconditionally for all languages (codeflash/optimization/optimizer.py:751–753)

cleanup_temporary_paths always imports Java's CompilationCache and calls .clear(), even for Python and JS projects. This is a Java-specific side effect applied globally. Should be guarded with a language check (or CompilationCache should handle being called when empty). Fix this →

Design Issues

3. Language guard uses string literal instead of enum (codeflash/optimization/optimizer.py:385)

if functions_only and functions_only[0].language == "java":

The codebase mixes string comparisons ("java", "javascript") with Language enum usage — line 264 in functions_to_optimize.py uses Language.JAVASCRIPT. This is inconsistent and error-prone. Per language-patterns.md, the language should be compared via the Language enum. (Low priority since this pattern exists elsewhere in the codebase, but worth noting.)

4. Binary JAR committed to git (codeflash/languages/java/resources/codeflash-runtime-1.0.0.jar)

A 16MB JAR is modified in the diff (15.95MB → 15.97MB) and stored directly in git. There's no .gitattributes LFS config for it. This permanently bloats repo clone size. The code_to_optimize/java-gradle/libs/codeflash-runtime-1.0.0.jar (14.6MB) is also newly added in this PR for the test fixture. Consider LFS or a CI download step instead.

5. _run_java_tracer broad silent exception swallowing (codeflash/tracer.py:66, 80)

_detect_non_python_language has two except Exception: pass blocks that silently swallow any errors during language detection. If, say, the config file is malformed or the file path doesn't exist, the function returns None (treats it as Python) with no feedback. At minimum, a logger.debug in the except block would help with debugging.


Duplicate Detection

No meaningful duplicates detected. parse_replay_test_metadata is defined once in replay_test.py and wrapped by _parse_replay_metadata in test_discovery.py (thin delegation, not a duplicate).

detect_packages_from_source is a new JavaTracer static method with no equivalent elsewhere.


Test Coverage

The new files have accompanying tests:

  • test_java_tracer_e2e.py (8 tests)
  • test_java_tracer_integration.py (6 tests)
  • test_test_discovery.py (2 new tests)

_detect_non_python_language in codeflash/tracer.py is not exercised by unit tests — it's covered only by the e2e suite. Consider adding a unit test for the LSP-mode path in _get_java_replay_test_functions given the bug identified above.


Optimization PRs

PR #1877 (JfrProfile.get_method_ranking 73% speedup): Has merge conflicts with java-tracer (due to the previously merged #1876 touching the same file) and CI unit test failures. Leaving open — PR is less than 3 days old. The conflicts need manual resolution before it can be merged.


misrasaurabh1 and others added 4 commits March 18, 2026 23:17
- Use `uv run -m codeflash.main` instead of direct file path
- Remove redundant --no-pr (already hardcoded in _run_java_tracer)
- Clean up leftover replay tests between retry attempts
- Add error logging for subprocess output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Git doesn't track empty directories, so src/test/java must be created
before process_pyproject_config validates tests-root exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Unwrap logger.info call in tracer.py that fits within 120-char limit
- Revert auto-generated dev version string in version.py back to 0.20.3

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The original code performed a linear scan over `self._ranking` on every call to `get_function_addressable_time`, which `rank_functions` invokes repeatedly (once per function to filter, plus once per function to sort). The optimized version builds a hash map `_ranking_by_name` during `__init__`, replacing the O(n) loop with an O(1) dictionary lookup. Line profiler confirms the loop and comparison accounted for 94.7% of original runtime. When `rank_functions` calls `get_function_addressable_time` dozens or hundreds of times across a 1000-method ranking (as in `test_large_number_of_methods_and_repeated_queries_perf_and_correctness`), the lookup cost drops from ~293 µs to ~10 µs per call, yielding the 1244% overall speedup. The optimization also consolidates the two calls to `get_addressable_time_ns` in `get_function_stats_summary` into a single call, stored in a local variable, eliminating redundant work.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Mar 19, 2026

⚡️ Codeflash found optimizations for this PR

📄 1,245% (12.45x) speedup for JavaFunctionRanker.get_function_addressable_time in codeflash/benchmarking/function_ranker.py

⏱️ Runtime : 1.14 milliseconds 85.0 microseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch java-tracer).

Static Badge

…2026-03-19T06.41.55

⚡️ Speed up method `JavaFunctionRanker.get_function_addressable_time` by 1,245% in PR #1874 (`java-tracer`)
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Mar 19, 2026

misrasaurabh1 and others added 4 commits March 19, 2026 00:05
- Read --timeout from both config.timeout and config.tracer_timeout
- Handle multi-line /* */ block comments in package detection (aerospike
  source files start with license block comments before package declaration)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n_ranker

- Import Language from codeflash.languages (exported) not codeflash.languages.base
- Fix _detect_non_python_language return type: object | None -> Language | None
- Fix bare dict type annotations: dict -> dict[str, Any] in jfr_parser.py and function_ranker.py
- Fix pytest_splits/test_paths type narrowing by separating assignment from None check

Co-authored-by: Saurabh Misra <undefined@users.noreply.github.com>
The optimization precomputes all frame-to-key conversions for a stack trace once (into a `keys` list) instead of calling `_frame_to_key` repeatedly inside the caller-callee loop, cutting per-frame extraction from ~3.3 µs to ~0.19 µs (83% reduction) and lifting `_frame_to_key` from 20.8% of total time to 43.2% (the loop cost is now dominated by the upfront list comprehension rather than repeated calls). A local `matches_packages_cached` closure memoizes package-filter results to avoid re-checking the same method keys across caller relationships, reducing `_matches_packages` overhead from 12.6% to 0.8% of total time; profiler data shows `_matches_packages` hits dropped from 18,364 to 1,500. The timestamp-duration calculation switched from accumulating a list then calling `max()`/`min()` to inline min/max tracking, removing intermediate allocations; combined, these changes yield a 42% overall speedup (46.4 ms → 32.6 ms).
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Mar 19, 2026

⚡️ Codeflash found optimizations for this PR

📄 42% (0.42x) speedup for JfrProfile._parse_json in codeflash/languages/java/jfr_parser.py

⏱️ Runtime : 46.4 milliseconds 32.6 milliseconds (best of 32 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch java-tracer).

Static Badge

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Mar 19, 2026

⚡️ Codeflash found optimizations for this PR

📄 73% (0.73x) speedup for JfrProfile.get_method_ranking in codeflash/languages/java/jfr_parser.py

⏱️ Runtime : 4.38 milliseconds 2.53 milliseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch java-tracer).

Static Badge

misrasaurabh1 and others added 2 commits March 19, 2026 11:37
…, filter empty names

- Consolidate _parse_replay_metadata to call parse_replay_test_metadata
  instead of duplicating the parsing logic
- Replace hardcoded fallback java command with a clear error message
  when no java command is provided
- Filter empty strings from function_names split (\"".split(\",\")
  returns [\"\"] which is truthy)
- Fix import ordering in tracer.py (ruff I001)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…2026-03-19T08.11.52

⚡️ Speed up method `JfrProfile._parse_json` by 42% in PR #1874 (`java-tracer`)
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Mar 19, 2026

This PR is now faster! 🚀 @claude[bot] accepted my optimizations from:

misrasaurabh1 and others added 2 commits March 19, 2026 11:48
These files were unrelated to the PR and got swept in during a
stash/pop operation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@misrasaurabh1 misrasaurabh1 merged commit 59031a1 into main Mar 19, 2026
29 of 31 checks passed
@misrasaurabh1 misrasaurabh1 deleted the java-tracer branch March 19, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

workflow-modified This PR modifies GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant