Skip to content

⚡️ Speed up method JfrProfile.get_method_ranking by 73% in PR #1874 (java-tracer)#1877

Closed
codeflash-ai[bot] wants to merge 1 commit intojava-tracerfrom
codeflash/optimize-pr1874-2026-03-19T09.01.32
Closed

⚡️ Speed up method JfrProfile.get_method_ranking by 73% in PR #1874 (java-tracer)#1877
codeflash-ai[bot] wants to merge 1 commit intojava-tracerfrom
codeflash/optimize-pr1874-2026-03-19T09.01.32

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Mar 19, 2026

⚡️ This pull request contains optimizations for PR #1874

If you approve this dependent PR, these changes will be merged into the original PR branch java-tracer.

This PR will be automatically closed if the original PR is merged.


📄 73% (0.73x) speedup for JfrProfile.get_method_ranking in codeflash/languages/java/jfr_parser.py

⏱️ Runtime : 4.38 milliseconds 2.53 milliseconds (best of 5 runs)

📝 Explanation and details

The optimization eliminates redundant dictionary lookups and string operations by hoisting the percentage multiplier out of the loop, replacing the lambda sort key with operator.itemgetter(1), and consolidating _method_info.get() calls that previously checked the same key twice for class_name and method_name. Line profiler data shows sorting time dropped from 3.86 µs to 1.92 µs (50% reduction) by using itemgetter, and the rsplit(".", 1) fallback is now invoked only when a field is genuinely missing rather than on every iteration. These changes cut per-method processing from ~769 ns to ~542 ns on average, yielding a 73% speedup (4.38 ms → 2.53 ms) across workloads ranging from small rankings to 1000-method profiles.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 79 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
from pathlib import Path

# imports
from codeflash.languages.java.jfr_parser import JfrProfile


def test_empty_profile_returns_empty_list():
    # Create a JfrProfile with a Path that does NOT exist so _parse exits early
    p = JfrProfile(Path("/this/path/should/not/exist/jfrfile.jfr"), packages=[])
    # Since no parsing happened and totals are zero, get_method_ranking must return an empty list
    assert p.get_method_ranking() == []  # 742ns -> 721ns (2.91% faster)


def test_basic_single_entry_ranking_and_pct():
    # Create instance with non-existent file to avoid invoking _find_jfr_tool
    prof = JfrProfile(Path("/no/such/file.jfr"), packages=[])
    # Manually populate internal state to simulate parsed data: one method with 5 samples
    prof._method_samples = {"com.example.Foo.bar": 5}
    prof._method_info = {"com.example.Foo.bar": {"class_name": "com.example.Foo", "method_name": "bar"}}
    prof._total_samples = 5
    # Call the function under test
    ranking = prof.get_method_ranking()  # 4.35μs -> 3.76μs (15.7% faster)
    # One result expected
    assert isinstance(ranking, list)
    assert len(ranking) == 1
    entry = ranking[0]
    # Validate fields and exact pct calculation (5/5 * 100 == 100.0)
    assert entry["class_name"] == "com.example.Foo"
    assert entry["method_name"] == "bar"
    assert entry["sample_count"] == 5
    assert abs(entry["pct_of_total"] - 100.0) < 1e-12


def test_missing_method_info_uses_fallback_names():
    # Create instance and populate only _method_samples, leaving out _method_info for fallback behavior
    prof = JfrProfile(Path("/no/such/file2.jfr"), packages=[])
    prof._method_samples = {"com.example.Bar.baz": 3}
    prof._method_info = {}  # deliberately empty to force fallback logic
    prof._total_samples = 3
    ranking = prof.get_method_ranking()  # 4.11μs -> 4.30μs (4.42% slower)
    # Fallback should split on the last '.' to produce class_name and method_name
    assert len(ranking) == 1
    entry = ranking[0]
    assert entry["class_name"] == "com.example.Bar"  # all before last dot
    assert entry["method_name"] == "baz"  # last segment after last dot
    assert entry["sample_count"] == 3
    assert abs(entry["pct_of_total"] - 100.0) < 1e-12


def test_method_key_without_dot_fallback_behavior():
    # If a method key does not include any dot, both class_name and method_name fall back to entire key
    prof = JfrProfile(Path("/no/file3.jfr"), packages=[])
    prof._method_samples = {"noperiodmethod": 7}
    prof._method_info = {}  # force fallback
    prof._total_samples = 7
    ranking = prof.get_method_ranking()  # 3.89μs -> 4.05μs (3.95% slower)
    assert len(ranking) == 1
    entry = ranking[0]
    # rsplit on '.' with no '.' returns the whole string for both parts
    assert entry["class_name"] == "noperiodmethod"
    assert entry["method_name"] == "noperiodmethod"
    assert entry["sample_count"] == 7
    assert abs(entry["pct_of_total"] - 100.0) < 1e-12


def test_multiple_entries_sorted_descending_and_pcts():
    # Create profile with several methods to test sorting and percentage computations
    prof = JfrProfile(Path("/no/file4.jfr"), packages=[])
    prof._method_samples = {"a.A.m1": 10, "b.B.m2": 30, "c.C.m3": 20}
    # Provide some method info for one and leave others to fallback
    prof._method_info = {
        "a.A.m1": {"class_name": "a.A", "method_name": "m1"}
        # others intentionally omitted to test fallback
    }
    prof._total_samples = sum(prof._method_samples.values())  # 60
    ranking = prof.get_method_ranking()  # 6.05μs -> 5.54μs (9.24% faster)
    # Expect sorting by sample_count descending: m2 (30), m3 (20), m1 (10)
    assert [e["sample_count"] for e in ranking] == [30, 20, 10]
    # Validate the first entry has pct 50.0 (30/60)
    first = ranking[0]
    assert first["class_name"] == "b.B"  # fallback used for b.B.m2
    assert first["method_name"] == "m2"
    assert abs(first["pct_of_total"] - 50.0) < 1e-12
    # Validate last entry uses provided info for class_name/method_name
    last = ranking[-1]
    assert last["class_name"] == "a.A"
    assert last["method_name"] == "m1"
    assert abs(last["pct_of_total"] - (10 / 60 * 100)) < 1e-12


def test_zero_total_samples_returns_empty_even_with_samples_present():
    # If total_samples is zero, the function must return an empty list regardless of _method_samples contents
    prof = JfrProfile(Path("/no/file5.jfr"), packages=[])
    prof._method_samples = {"x.Y.z": 1, "u.V.w": 2}
    prof._total_samples = 0  # explicit zero should trigger the empty result
    ranking = prof.get_method_ranking()  # 761ns -> 751ns (1.33% faster)
    assert ranking == []


def test_large_scale_ranking_correctness_and_performance():
    # Test with multiple calls on large datasets to exercise real sorting/computation patterns
    # First dataset: 300 entries with hot methods pattern
    prof1 = JfrProfile(Path("/no/file6a.jfr"), packages=[])
    samples1 = {}
    counts_list1 = []

    # Pattern 1: Few very hot methods
    for i in range(3):
        count = 5000 - (i * 1000)
        counts_list1.append(count)

    # Pattern 2: Medium-frequency methods
    for i in range(50):
        count = 500 + (i * 15)
        counts_list1.append(count)

    # Pattern 3: Long tail of cold methods
    for i in range(247):
        count = 2 + (i % 100)
        counts_list1.append(count)

    for i, count in enumerate(counts_list1):
        if i < 30:
            key = f"com.company.core.Algorithm{i}.compute"
        elif i < 100:
            key = f"com.company.util.Helper{i}.process"
        else:
            key = f"com.company.app.service.Handler{i}.execute"
        samples1[key] = count

    prof1._method_samples = samples1
    prof1._method_info = {}
    prof1._total_samples = sum(counts_list1)

    ranking1 = prof1.get_method_ranking()  # 158μs -> 116μs (35.8% faster)

    assert len(ranking1) == 300
    sample_counts1 = [entry["sample_count"] for entry in ranking1]
    for i in range(len(sample_counts1) - 1):
        assert sample_counts1[i] >= sample_counts1[i + 1]
    assert ranking1[0]["sample_count"] == max(counts_list1)
    assert ranking1[-1]["sample_count"] == min(counts_list1)
    assert abs(ranking1[0]["pct_of_total"] - (ranking1[0]["sample_count"] / prof1._total_samples * 100)) < 1e-12
    assert abs(ranking1[-1]["pct_of_total"] - (ranking1[-1]["sample_count"] / prof1._total_samples * 100)) < 1e-12
    total_pct1 = sum(entry["pct_of_total"] for entry in ranking1)
    assert abs(total_pct1 - 100.0) < 1e-10

    # Second dataset: Different scale and distribution to test sorting stability
    prof2 = JfrProfile(Path("/no/file6b.jfr"), packages=[])
    samples2 = {}
    counts_list2 = []

    # Uniform distribution pattern
    for i in range(150):
        count = 100 + (i % 50) * 2
        counts_list2.append(count)

    # Skewed distribution
    for i in range(150):
        count = int(10000 / (i + 1))
        counts_list2.append(count)

    for i, count in enumerate(counts_list2):
        key = f"org.framework.backend.Service{i}.method"
        samples2[key] = count

    prof2._method_samples = samples2
    prof2._method_info = {}
    prof2._total_samples = sum(counts_list2)

    ranking2 = prof2.get_method_ranking()

    assert len(ranking2) == 300
    sample_counts2 = [entry["sample_count"] for entry in ranking2]
    for i in range(len(sample_counts2) - 1):
        assert sample_counts2[i] >= sample_counts2[i + 1]
    mid_idx = len(ranking2) // 2
    assert (
        abs(ranking2[mid_idx]["pct_of_total"] - (ranking2[mid_idx]["sample_count"] / prof2._total_samples * 100))
        < 1e-12
    )  # 156μs -> 115μs (35.6% faster)
    total_pct2 = sum(entry["pct_of_total"] for entry in ranking2)
    assert abs(total_pct2 - 100.0) < 1e-10

    # Third call: Same data as prof2, different instance to verify consistent computation
    prof3 = JfrProfile(Path("/no/file6c.jfr"), packages=[])
    prof3._method_samples = dict(samples2)
    prof3._method_info = {}
    prof3._total_samples = sum(counts_list2)

    ranking3 = prof3.get_method_ranking()

    assert len(ranking3) == len(ranking2)
    assert ranking3[0]["sample_count"] == ranking2[0]["sample_count"]
    assert ranking3[-1]["sample_count"] == ranking2[-1]["sample_count"]
    for i in range(len(ranking3)):
        assert abs(ranking3[i]["pct_of_total"] - ranking2[i]["pct_of_total"]) < 1e-12
from pathlib import Path

# imports
from codeflash.languages.java.jfr_parser import JfrProfile


def test_get_method_ranking_empty_samples():
    """Test that get_method_ranking returns empty list when no samples are present."""
    # Create a JfrProfile instance with a non-existent file
    # This will result in _method_samples being empty
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    result = profile.get_method_ranking()  # 882ns -> 892ns (1.12% slower)
    # Should return empty list when no samples are recorded
    assert result == []
    assert isinstance(result, list)


def test_get_method_ranking_zero_total_samples():
    """Test that get_method_ranking returns empty list when total_samples is zero."""
    # Create a JfrProfile instance
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    # Manually set _method_samples to non-empty but _total_samples to zero
    profile._method_samples = {"com.example.MyClass.myMethod": 5}
    profile._total_samples = 0
    result = profile.get_method_ranking()  # 861ns -> 792ns (8.71% faster)
    # Should return empty list when total samples is zero (division safety)
    assert result == []


def test_get_method_ranking_single_method():
    """Test get_method_ranking with a single method sample."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    # Manually set up method samples and metadata
    profile._method_samples = {"com.example.MyClass.myMethod": 10}
    profile._method_info = {
        "com.example.MyClass.myMethod": {"class_name": "com.example.MyClass", "method_name": "myMethod"}
    }
    profile._total_samples = 10

    result = profile.get_method_ranking()  # 5.30μs -> 4.98μs (6.45% faster)

    # Should return a list with one entry
    assert len(result) == 1
    assert result[0]["class_name"] == "com.example.MyClass"
    assert result[0]["method_name"] == "myMethod"
    assert result[0]["sample_count"] == 10
    assert result[0]["pct_of_total"] == 100.0


def test_get_method_ranking_multiple_methods_sorted():
    """Test that get_method_ranking sorts methods by sample count in descending order."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    # Set up multiple methods with different sample counts
    profile._method_samples = {
        "com.example.ClassA.methodA": 5,
        "com.example.ClassB.methodB": 15,
        "com.example.ClassC.methodC": 10,
    }
    profile._method_info = {
        "com.example.ClassA.methodA": {"class_name": "com.example.ClassA", "method_name": "methodA"},
        "com.example.ClassB.methodB": {"class_name": "com.example.ClassB", "method_name": "methodB"},
        "com.example.ClassC.methodC": {"class_name": "com.example.ClassC", "method_name": "methodC"},
    }
    profile._total_samples = 30

    result = profile.get_method_ranking()  # 6.28μs -> 5.40μs (16.3% faster)

    # Should be sorted by sample count descending: 15, 10, 5
    assert len(result) == 3
    assert result[0]["sample_count"] == 15
    assert result[1]["sample_count"] == 10
    assert result[2]["sample_count"] == 5


def test_get_method_ranking_percentage_calculation():
    """Test that percentage of total is calculated correctly."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    profile._method_samples = {"com.example.MyClass.method1": 25}
    profile._method_info = {
        "com.example.MyClass.method1": {"class_name": "com.example.MyClass", "method_name": "method1"}
    }
    profile._total_samples = 100

    result = profile.get_method_ranking()  # 4.04μs -> 3.91μs (3.33% faster)

    # 25/100 * 100 = 25.0%
    assert result[0]["pct_of_total"] == 25.0


def test_get_method_ranking_missing_method_info():
    """Test that get_method_ranking handles missing method info by parsing method key."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    # Method key exists but no info entry for it
    profile._method_samples = {"com.example.MyClass.myMethod": 10}
    profile._method_info = {}  # No info for this method
    profile._total_samples = 10

    result = profile.get_method_ranking()  # 4.22μs -> 4.17μs (1.20% faster)

    # Should parse class and method names from the key
    assert len(result) == 1
    assert result[0]["class_name"] == "com.example.MyClass"
    assert result[0]["method_name"] == "myMethod"
    assert result[0]["sample_count"] == 10


def test_get_method_ranking_all_fields_present():
    """Test that all expected fields are present in ranking results."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    profile._method_samples = {"com.example.Test.run": 50}
    profile._method_info = {"com.example.Test.run": {"class_name": "com.example.Test", "method_name": "run"}}
    profile._total_samples = 50

    result = profile.get_method_ranking()  # 4.40μs -> 4.10μs (7.35% faster)

    # Check all required fields are present
    assert "class_name" in result[0]
    assert "method_name" in result[0]
    assert "sample_count" in result[0]
    assert "pct_of_total" in result[0]


def test_get_method_ranking_with_equal_sample_counts():
    """Test ranking when multiple methods have the same sample count."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    profile._method_samples = {
        "com.example.ClassA.methodA": 10,
        "com.example.ClassB.methodB": 10,
        "com.example.ClassC.methodC": 10,
    }
    profile._method_info = {
        "com.example.ClassA.methodA": {"class_name": "com.example.ClassA", "method_name": "methodA"},
        "com.example.ClassB.methodB": {"class_name": "com.example.ClassB", "method_name": "methodB"},
        "com.example.ClassC.methodC": {"class_name": "com.example.ClassC", "method_name": "methodC"},
    }
    profile._total_samples = 30

    result = profile.get_method_ranking()  # 6.06μs -> 5.18μs (17.0% faster)

    # All should have same sample count
    assert len(result) == 3
    assert all(r["sample_count"] == 10 for r in result)
    # All should have 33.33...% approximately
    assert all(abs(r["pct_of_total"] - 33.33333333) < 0.01 for r in result)


def test_get_method_ranking_with_very_small_percentages():
    """Test ranking with many methods where some have very small percentages."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    profile._method_samples = {
        "com.example.ClassA.methodA": 1000,
        "com.example.ClassB.methodB": 1,
        "com.example.ClassC.methodC": 1,
    }
    profile._method_info = {
        "com.example.ClassA.methodA": {"class_name": "com.example.ClassA", "method_name": "methodA"},
        "com.example.ClassB.methodB": {"class_name": "com.example.ClassB", "method_name": "methodB"},
        "com.example.ClassC.methodC": {"class_name": "com.example.ClassC", "method_name": "methodC"},
    }
    profile._total_samples = 1002

    result = profile.get_method_ranking()  # 5.94μs -> 4.96μs (19.8% faster)

    # Check that small percentages are calculated correctly
    assert abs(result[0]["pct_of_total"] - 99.80039920) < 0.001
    assert abs(result[1]["pct_of_total"] - 0.0998) < 0.001
    assert abs(result[2]["pct_of_total"] - 0.0998) < 0.001


def test_get_method_ranking_method_key_with_no_dot():
    """Test ranking when method key has unusual format (edge case for parsing)."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    # Use a method key with minimal dots
    profile._method_samples = {"method": 5}
    profile._method_info = {}
    profile._total_samples = 5

    result = profile.get_method_ranking()  # 4.00μs -> 4.20μs (4.79% slower)

    # Should handle rsplit gracefully
    assert len(result) == 1
    assert result[0]["sample_count"] == 5


def test_get_method_ranking_method_key_with_many_dots():
    """Test ranking with deeply nested class names."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    profile._method_samples = {"com.example.outer.inner.ClassName.methodName": 10}
    profile._method_info = {}
    profile._total_samples = 10

    result = profile.get_method_ranking()  # 4.26μs -> 4.28μs (0.468% slower)

    # rsplit with maxsplit=1 should split at the last dot
    assert len(result) == 1
    assert result[0]["class_name"] == "com.example.outer.inner.ClassName"
    assert result[0]["method_name"] == "methodName"


def test_get_method_ranking_return_type_is_list():
    """Test that return type is always a list."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    profile._method_samples = {"com.example.Test.method": 10}
    profile._method_info = {"com.example.Test.method": {"class_name": "com.example.Test", "method_name": "method"}}
    profile._total_samples = 10

    result = profile.get_method_ranking()  # 4.21μs -> 4.03μs (4.49% faster)

    assert isinstance(result, list)
    assert len(result) > 0
    assert isinstance(result[0], dict)


def test_get_method_ranking_return_dicts_are_independent():
    """Test that returned dictionaries are independent copies."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    profile._method_samples = {"com.example.Test.method": 10}
    profile._method_info = {"com.example.Test.method": {"class_name": "com.example.Test", "method_name": "method"}}
    profile._total_samples = 10

    result1 = profile.get_method_ranking()  # 4.29μs -> 3.86μs (11.2% faster)
    result1[0]["sample_count"] = 999
    result2 = profile.get_method_ranking()

    # Second call should not be affected by modification to first result
    assert result2[0]["sample_count"] == 10  # 1.80μs -> 1.65μs (9.14% faster)


def test_get_method_ranking_percentage_precision():
    """Test that percentage calculations maintain floating point precision."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    profile._method_samples = {"com.example.Test.method": 1}
    profile._method_info = {"com.example.Test.method": {"class_name": "com.example.Test", "method_name": "method"}}
    profile._total_samples = 3

    result = profile.get_method_ranking()  # 4.21μs -> 3.71μs (13.5% faster)

    # 1/3 * 100 = 33.333...
    expected_pct = (1 / 3) * 100
    assert result[0]["pct_of_total"] == expected_pct


def test_get_method_ranking_with_special_characters_in_method_name():
    """Test ranking with special characters in method names."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])
    profile._method_samples = {"com.example.Test.<init>": 5}
    profile._method_info = {"com.example.Test.<init>": {"class_name": "com.example.Test", "method_name": "<init>"}}
    profile._total_samples = 5

    result = profile.get_method_ranking()  # 4.17μs -> 3.72μs (12.1% faster)

    assert result[0]["method_name"] == "<init>"
    assert result[0]["sample_count"] == 5


def test_get_method_ranking_with_100_methods():
    """Test ranking performance with 100 methods."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])

    # Create 100 methods with varying sample counts
    total_samples = 0
    for i in range(100):
        count = 100 - i  # Decreasing counts: 100, 99, 98, ...
        method_key = f"com.example.Class{i}.method{i}"
        profile._method_samples[method_key] = count
        profile._method_info[method_key] = {"class_name": f"com.example.Class{i}", "method_name": f"method{i}"}
        total_samples += count

    profile._total_samples = total_samples

    # Test with different sample counts each time
    result1 = profile.get_method_ranking()  # 65.2μs -> 39.3μs (66.1% faster)
    assert len(result1) == 100
    assert result1[0]["sample_count"] == 100
    assert result1[99]["sample_count"] == 1
    for i in range(len(result1) - 1):
        assert result1[i]["sample_count"] >= result1[i + 1]["sample_count"]

    # Modify samples and test again with different data
    for i in range(50):
        profile._method_samples[f"com.example.Class{i}.method{i}"] = 1
    profile._total_samples = sum(profile._method_samples.values())

    result2 = profile.get_method_ranking()
    assert len(result2) == 100
    for i in range(len(result2) - 1):
        assert result2[i]["sample_count"] >= result2[i + 1]["sample_count"]

    # Reset and test once more
    profile._method_samples.clear()
    profile._method_info.clear()
    for i in range(100):
        count = (i + 1) * 10
        method_key = f"com.example.Class{i}.method{i}"
        profile._method_samples[method_key] = count
        profile._method_info[method_key] = {"class_name": f"com.example.Class{i}", "method_name": f"method{i}"}
    profile._total_samples = sum(profile._method_samples.values())

    result3 = profile.get_method_ranking()
    assert len(result3) == 100
    for i in range(len(result3) - 1):
        assert result3[i]["sample_count"] >= result3[i + 1]["sample_count"]


def test_get_method_ranking_with_1000_methods():
    """Test ranking performance and correctness with 1000 methods."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])

    # Create 1000 methods
    total_samples = 0
    for i in range(1000):
        count = 1001 - i  # Decreasing counts
        method_key = f"com.example.Class{i}.method{i}"
        profile._method_samples[method_key] = count
        profile._method_info[method_key] = {"class_name": f"com.example.Class{i}", "method_name": f"method{i}"}
        total_samples += count

    profile._total_samples = total_samples

    result1 = profile.get_method_ranking()  # 559μs -> 310μs (80.3% faster)
    assert len(result1) == 1000
    for i in range(len(result1) - 1):
        assert result1[i]["sample_count"] >= result1[i + 1]["sample_count"]
    total_pct = sum(r["pct_of_total"] for r in result1)
    assert abs(total_pct - 100.0) < 0.001

    # Test with modified data
    for i in range(500):
        profile._method_samples[f"com.example.Class{i}.method{i}"] = 100
    profile._total_samples = sum(profile._method_samples.values())  # 575μs -> 302μs (90.5% faster)

    result2 = profile.get_method_ranking()
    assert len(result2) == 1000
    for i in range(len(result2) - 1):
        assert result2[i]["sample_count"] >= result2[i + 1]["sample_count"]

    # Test with different subset
    profile._method_samples = {}
    profile._method_info = {}
    total_samples = 0
    for i in range(1000, 2000):
        count = 500
        method_key = f"com.example.Class{i}.method{i}"
        profile._method_samples[method_key] = count
        profile._method_info[method_key] = {"class_name": f"com.example.Class{i}", "method_name": f"method{i}"}
        total_samples += count
    profile._total_samples = total_samples  # 568μs -> 315μs (80.0% faster)

    result3 = profile.get_method_ranking()
    assert len(result3) == 1000
    for i in range(len(result3) - 1):
        assert result3[i]["sample_count"] >= result3[i + 1]["sample_count"]


def test_get_method_ranking_with_skewed_distribution():
    """Test ranking with highly skewed sample distribution (few hot methods)."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])

    # Create 500 methods where first few have most samples
    total_samples = 0
    for i in range(500):
        if i == 0:
            count = 50000  # One very hot method
        elif i == 1:
            count = 40000  # Second hot method
        elif i < 10:
            count = 1000
        else:
            count = 10

        method_key = f"com.example.Class{i}.method{i}"
        profile._method_samples[method_key] = count
        profile._method_info[method_key] = {"class_name": f"com.example.Class{i}", "method_name": f"method{i}"}
        total_samples += count

    profile._total_samples = total_samples

    result1 = profile.get_method_ranking()  # 278μs -> 148μs (87.5% faster)
    assert result1[0]["sample_count"] == 50000
    assert result1[1]["sample_count"] == 40000
    top_two_pct = result1[0]["pct_of_total"] + result1[1]["pct_of_total"]
    assert top_two_pct > 80.0

    # Test with different skew
    profile._method_samples = {}
    profile._method_info = {}
    total_samples = 0
    for i in range(500):
        if i == 0:
            count = 100000
        elif i < 5:
            count = 5000
        else:
            count = 50

        method_key = f"com.example.ClassX{i}.methodX{i}"
        profile._method_samples[method_key] = count
        profile._method_info[method_key] = {"class_name": f"com.example.ClassX{i}", "method_name": f"methodX{i}"}
        total_samples += count
    profile._total_samples = total_samples

    result2 = profile.get_method_ranking()
    assert result2[0]["sample_count"] == 100000
    top_pct = result2[0]["pct_of_total"]
    assert top_pct > 85.0

    # Test with inverted distribution
    profile._method_samples = {}
    profile._method_info = {}
    total_samples = 0  # 267μs -> 150μs (77.1% faster)
    for i in range(500):
        if i < 400:
            count = 1000
        else:
            count = 100

        method_key = f"com.example.ClassY{i}.methodY{i}"
        profile._method_samples[method_key] = count
        profile._method_info[method_key] = {"class_name": f"com.example.ClassY{i}", "method_name": f"methodY{i}"}
        total_samples += count
    profile._total_samples = total_samples

    result3 = profile.get_method_ranking()
    assert result3[0]["sample_count"] == 1000
    for i in range(len(result3) - 1):
        assert result3[i]["sample_count"] >= result3[i + 1]["sample_count"]


def test_get_method_ranking_with_uniform_distribution():
    """Test ranking with uniform sample distribution across many methods."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])

    # Create 250 methods with equal sample counts
    count_per_method = 4  # 250 * 4 = 1000
    total_samples = 0
    for i in range(250):
        method_key = f"com.example.Class{i}.method{i}"
        profile._method_samples[method_key] = count_per_method
        profile._method_info[method_key] = {"class_name": f"com.example.Class{i}", "method_name": f"method{i}"}
        total_samples += count_per_method

    profile._total_samples = total_samples

    result1 = profile.get_method_ranking()  # 140μs -> 77.8μs (80.6% faster)
    assert len(result1) == 250
    assert all(r["sample_count"] == count_per_method for r in result1)
    expected_pct = 0.4
    assert all(abs(r["pct_of_total"] - expected_pct) < 0.01 for r in result1)

    # Test with different uniform count
    profile._method_samples = {}
    profile._method_info = {}
    count_per_method = 2
    total_samples = 0
    for i in range(250):
        method_key = f"com.example.ClassB{i}.methodB{i}"
        profile._method_samples[method_key] = count_per_method
        profile._method_info[method_key] = {"class_name": f"com.example.ClassB{i}", "method_name": f"methodB{i}"}
        total_samples += count_per_method
    profile._total_samples = total_samples

    result2 = profile.get_method_ranking()
    assert len(result2) == 250
    assert all(r["sample_count"] == count_per_method for r in result2)
    expected_pct = 0.2
    assert all(abs(r["pct_of_total"] - expected_pct) < 0.01 for r in result2)

    # Test with different number of methods
    profile._method_samples = {}
    profile._method_info = {}
    count_per_method = 1
    num_methods = 500  # 270μs -> 147μs (84.3% faster)
    total_samples = 0
    for i in range(num_methods):
        method_key = f"com.example.ClassC{i}.methodC{i}"
        profile._method_samples[method_key] = count_per_method
        profile._method_info[method_key] = {"class_name": f"com.example.ClassC{i}", "method_name": f"methodC{i}"}
        total_samples += count_per_method
    profile._total_samples = total_samples

    result3 = profile.get_method_ranking()
    assert len(result3) == num_methods
    assert all(r["sample_count"] == count_per_method for r in result3)
    expected_pct = 0.2
    assert all(abs(r["pct_of_total"] - expected_pct) < 0.01 for r in result3)


def test_get_method_ranking_percentages_sum_to_100():
    """Test that percentages of all methods sum to 100% for various sizes."""
    for num_methods in [10, 50, 100, 500]:
        profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])

        total_samples = 0
        for i in range(num_methods):
            count = num_methods - i
            method_key = f"com.example.Class{i}.method{i}"
            profile._method_samples[method_key] = count
            profile._method_info[method_key] = {"class_name": f"com.example.Class{i}", "method_name": f"method{i}"}
            total_samples += count

        profile._total_samples = total_samples

        result = profile.get_method_ranking()  # 382μs -> 215μs (77.3% faster)

        # Sum of percentages should be 100%
        total_pct = sum(r["pct_of_total"] for r in result)
        assert abs(total_pct - 100.0) < 0.001


def test_get_method_ranking_order_stability_with_ties():
    """Test that ordering is stable when multiple methods have same sample count."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])

    # Create methods with tied sample counts
    total_samples = 0
    for i in range(100):
        # All methods have 10 samples (ties)
        method_key = f"com.example.Class{i}.method{i}"
        profile._method_samples[method_key] = 10
        profile._method_info[method_key] = {"class_name": f"com.example.Class{i}", "method_name": f"method{i}"}
        total_samples += 10

    profile._total_samples = total_samples

    result1 = profile.get_method_ranking()  # 59.6μs -> 35.6μs (67.1% faster)
    assert len(result1) == 100
    assert all(r["sample_count"] == 10 for r in result1)
    assert all(r["pct_of_total"] == 1.0 for r in result1)

    # Test with different tie values
    profile._method_samples = {}
    profile._method_info = {}
    total_samples = 0
    for i in range(100):
        method_key = f"com.example.ClassB{i}.methodB{i}"
        profile._method_samples[method_key] = 50
        profile._method_info[method_key] = {"class_name": f"com.example.ClassB{i}", "method_name": f"methodB{i}"}
        total_samples += 50
    profile._total_samples = total_samples

    result2 = profile.get_method_ranking()
    assert len(result2) == 100
    assert all(r["sample_count"] == 50 for r in result2)
    assert all(r["pct_of_total"] == 1.0 for r in result2)

    # Test with smaller group of ties
    profile._method_samples = {}
    profile._method_info = {}  # 55.9μs -> 31.1μs (79.4% faster)
    total_samples = 0
    for i in range(100):
        method_key = f"com.example.ClassC{i}.methodC{i}"
        profile._method_samples[method_key] = 25
        profile._method_info[method_key] = {"class_name": f"com.example.ClassC{i}", "method_name": f"methodC{i}"}
        total_samples += 25
    profile._total_samples = total_samples

    result3 = profile.get_method_ranking()
    assert len(result3) == 100
    assert all(r["sample_count"] == 25 for r in result3)
    assert all(r["pct_of_total"] == 1.0 for r in result3)


def test_get_method_ranking_large_sample_counts():
    """Test ranking with very large sample counts."""
    profile = JfrProfile(Path("/nonexistent/file.jfr"), ["com.example"])

    # Use very large numbers
    profile._method_samples = {
        "com.example.Class1.method1": 1000000,
        "com.example.Class2.method2": 500000,
        "com.example.Class3.method3": 250000,
    }
    profile._method_info = {
        "com.example.Class1.method1": {"class_name": "com.example.Class1", "method_name": "method1"},
        "com.example.Class2.method2": {"class_name": "com.example.Class2", "method_name": "method2"},
        "com.example.Class3.method3": {"class_name": "com.example.Class3", "method_name": "method3"},
    }
    profile._total_samples = 1750000

    result = profile.get_method_ranking()  # 5.50μs -> 4.93μs (11.6% faster)

    assert len(result) == 3
    assert result[0]["sample_count"] == 1000000
    assert abs(result[0]["pct_of_total"] - 57.14285714) < 0.01
    assert abs(result[1]["pct_of_total"] - 28.57142857) < 0.01
    assert abs(result[2]["pct_of_total"] - 14.28571429) < 0.01

To edit these changes git checkout codeflash/optimize-pr1874-2026-03-19T09.01.32 and push.

Codeflash Static Badge

The optimization eliminates redundant dictionary lookups and string operations by hoisting the percentage multiplier out of the loop, replacing the lambda sort key with `operator.itemgetter(1)`, and consolidating `_method_info.get()` calls that previously checked the same key twice for `class_name` and `method_name`. Line profiler data shows sorting time dropped from 3.86 µs to 1.92 µs (50% reduction) by using `itemgetter`, and the `rsplit(".", 1)` fallback is now invoked only when a field is genuinely missing rather than on every iteration. These changes cut per-method processing from ~769 ns to ~542 ns on average, yielding a 73% speedup (4.38 ms → 2.53 ms) across workloads ranging from small rankings to 1000-method profiles.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Mar 19, 2026
@claude
Copy link
Contributor

claude bot commented Mar 19, 2026

CI failures are pre-existing on the base branch (not caused by this PR): unit-tests (all platforms) fail due to test_git_utils tests that test old language-filtering behavior — these will be fixed once PR #1878 merges. Leaving open for merge once base branch CI is fixed.

@claude
Copy link
Contributor

claude bot commented Mar 19, 2026

The merge conflict was introduced by #1876 being merged moments ago. Both PRs add an import to jfr_parser.py, causing a trivial conflict. The Windows unit test failure is the pre-existing flaky timing test mentioned in the base PR description. Please rebase onto java-tracer to resolve.

@claude
Copy link
Contributor

claude bot commented Mar 19, 2026

CI failures are pre-existing on the base branch (not caused by this PR): test_java_diff_ignored_when_language_is_python and test_mixed_lang_diff_filters_by_current_language were updated in the base PR (#1874) to expect new filtering behavior that hasn't been implemented yet in get_git_diff. These failures are unrelated to the JfrProfile.get_method_ranking optimization. Leaving open for merge once the base branch CI is fixed.

@codeflash-ai
Copy link
Contributor Author

codeflash-ai bot commented Mar 19, 2026

This PR has been automatically closed because the original PR #1874 by misrasaurabh1 was closed.

@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr1874-2026-03-19T09.01.32 branch March 19, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants