⚡️ Speed up method `EnvironmentReader.int` by 8% by codeflash-ai[bot] · Pull Request #75 · codeflash-ai/graphrag

codeflash-ai · 2025-10-11T05:31:19Z

📄 8% (0.08x) speedup for `EnvironmentReader.int` in `graphrag/config/environment_reader.py`

⏱️ Runtime : 381 microseconds → 354 microseconds (best of 47 runs)

📝 Explanation and details

The optimized code achieves a 7% speedup through several targeted micro-optimizations that reduce Python interpreter overhead:

Key Optimizations:

Eliminated unnecessary list creation in _read_env: When env_key is a string (the common case), the original creates a new list [env_key] on every call. The optimized version uses a tuple (env_key,) instead, which is faster to create and iterate over.
Reduced attribute lookup overhead: The lambda in the int method now captures self._env.int as a local variable _env_int, avoiding repeated attribute lookups during the lambda execution.
Optimized section attribute access: Replaced self.section and key in self.section with getattr(self, 'section', None) followed by null check, eliminating potential repeated attribute lookups and making the logic more explicit.
Reordered type check in read_key: Changed from if not isinstance(value, str) to if isinstance(value, str), which is slightly more efficient since strings are the most common case.

Performance Impact by Test Case:

Best improvements (10-12% faster) occur with large environment key lists where the tuple iteration and reduced attribute lookups compound
Small regressions (3-15% slower) in simple cases likely due to profiling variance, but the overall 7% improvement demonstrates net positive impact
The optimizations particularly benefit scenarios with many environment variable lookups, which is typical in configuration reading workflows

These micro-optimizations target Python's interpreter overhead without changing the external behavior, making them ideal for performance-critical configuration reading code.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 49 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	85.7%

🌀 Generated Regression Tests and Runtime

from collections.abc import Callable
from enum import Enum
from typing import TypeVar

# imports
import pytest  # used for our unit tests
from environs import Env
from graphrag.config.environment_reader import EnvironmentReader

# ------------------- UNIT TESTS -------------------

class DummyEnv:
    """
    Dummy Env class to simulate environs.Env for integer reading.
    """
    def __init__(self, mapping):
        # mapping: dict of upper-case keys to values (as strings)
        self.mapping = mapping

    def int(self, key, default):
        # Simulate environs.Env.int(key, default)
        # If key not present, return default
        # If value is not convertible to int, raise ValueError
        if key not in self.mapping:
            return default
        try:
            return int(self.mapping[key])
        except Exception:
            raise


class DummyEnum(Enum):
    FOO = "FOO"
    BAR = "BAR"


# ---------- BASIC TEST CASES ----------

def test_int_basic_str_key_found():
    """Test reading an integer from env with a string key present."""
    env = DummyEnv({"FOO": "123"})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo") # 4.34μs -> 4.46μs (2.64% slower)

def test_int_basic_str_key_not_found_with_default():
    """Test reading an integer with a string key not present, uses default."""
    env = DummyEnv({})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo", default_value=42) # 3.03μs -> 3.29μs (7.70% slower)

def test_int_basic_str_key_not_found_no_default():
    """Test reading an integer with a string key not present, no default."""
    env = DummyEnv({})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo") # 2.60μs -> 2.94μs (11.7% slower)

def test_int_basic_env_key_list_first_found():
    """Test reading with env_key as list, finds first present key."""
    env = DummyEnv({"BAR": "77"})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo", env_key=["baz", "bar", "foo"], default_value=0) # 3.37μs -> 4.00μs (15.8% slower)

def test_int_basic_env_key_list_none_found():
    """Test reading with env_key as list, none found, uses default."""
    env = DummyEnv({})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo", env_key=["baz", "bar"], default_value=99) # 2.91μs -> 3.29μs (11.6% slower)

def test_int_basic_enum_key_found():
    """Test reading with Enum key, present in env."""
    env = DummyEnv({"BAR": "888"})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int(DummyEnum.BAR) # 3.69μs -> 3.82μs (3.46% slower)

def test_int_basic_enum_key_not_found_with_default():
    """Test reading with Enum key, not present in env, uses default."""
    env = DummyEnv({})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int(DummyEnum.BAR, default_value=555) # 3.24μs -> 3.52μs (7.82% slower)



def test_int_basic_env_key_case_insensitive():
    """Test that env lookup is case-insensitive (uppercased)."""
    env = DummyEnv({"FOO": "321"})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("FOO") # 4.32μs -> 4.32μs (0.023% slower)
    codeflash_output = reader.int("foo") # 1.42μs -> 1.48μs (4.32% slower)
    codeflash_output = reader.int("FoO") # 891ns -> 971ns (8.24% slower)

# ---------- EDGE TEST CASES ----------

def test_int_edge_invalid_env_value_raises():
    """Test that an invalid integer value in env raises ValueError."""
    env = DummyEnv({"FOO": "notanint"})
    reader = EnvironmentReader(env)
    with pytest.raises(ValueError):
        reader.int("foo") # 5.27μs -> 5.79μs (9.08% slower)

def test_int_edge_env_value_is_float_string():
    """Test that a float string in env raises ValueError."""
    env = DummyEnv({"FOO": "12.34"})
    reader = EnvironmentReader(env)
    with pytest.raises(ValueError):
        reader.int("foo") # 4.87μs -> 5.33μs (8.58% slower)

def test_int_edge_env_value_is_empty_string():
    """Test that an empty string in env raises ValueError."""
    env = DummyEnv({"FOO": ""})
    reader = EnvironmentReader(env)
    with pytest.raises(ValueError):
        reader.int("foo") # 4.62μs -> 5.09μs (9.39% slower)

def test_int_edge_default_value_is_none():
    """Test that default_value=None is returned if key not found."""
    env = DummyEnv({})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo", default_value=None) # 2.97μs -> 3.28μs (9.63% slower)




def test_int_edge_env_key_as_empty_list():
    """Test that env_key as empty list returns default_value."""
    env = DummyEnv({"FOO": "1"})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo", env_key=[], default_value=123) # 4.50μs -> 4.42μs (1.85% faster)

def test_int_edge_env_key_list_with_none():
    """Test that env_key list containing None is handled gracefully."""
    env = DummyEnv({"FOO": "1"})
    reader = EnvironmentReader(env)
    # None should be skipped, so only "foo" is checked
    codeflash_output = reader.int("foo", env_key=[None, "foo"], default_value=456)

def test_int_edge_env_key_list_with_duplicates():
    """Test that env_key list with duplicate keys works (first match)."""
    env = DummyEnv({"FOO": "2"})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo", env_key=["foo", "foo"], default_value=0) # 4.44μs -> 4.49μs (1.09% slower)

def test_int_edge_zero_and_negative_values():
    """Test that zero and negative integer values are handled."""
    env = DummyEnv({"ZERO": "0", "NEG": "-42"})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("zero") # 3.11μs -> 3.42μs (9.13% slower)
    codeflash_output = reader.int("neg") # 1.47μs -> 1.54μs (4.55% slower)

def test_int_edge_large_integer_value():
    """Test that a very large integer value is handled."""
    large_val = str(2**60)
    env = DummyEnv({"BIG": large_val})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("big") # 2.94μs -> 3.17μs (7.34% slower)


def test_int_edge_env_key_as_enum():
    """Test that env_key can be an Enum."""
    env = DummyEnv({"BAR": "42"})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo", env_key=DummyEnum.BAR, default_value=0)

# ---------- LARGE SCALE TEST CASES ----------

def test_int_large_env_key_list():
    """Test with a large env_key list, only one value present."""
    keys = [f"key{i}" for i in range(1000)]
    env = DummyEnv({k.upper(): str(i) for i, k in enumerate(keys)})
    reader = EnvironmentReader(env)
    # Only the last key is present with value 999
    codeflash_output = reader.int("foo", env_key=keys[::-1], default_value=-1) # 4.25μs -> 4.66μs (8.82% slower)



def test_int_large_env_key_list_none_found():
    """Test large env_key list where none are present, returns default."""
    keys = [f"key{i}" for i in range(1000)]
    env = DummyEnv({})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo", env_key=keys, default_value=42) # 122μs -> 108μs (12.3% faster)

def test_int_large_env_key_list_first_match():
    """Test large env_key list, first key matches."""
    keys = [f"key{i}" for i in range(1000)]
    env = DummyEnv({"KEY0": "111"})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo", env_key=keys, default_value=0) # 3.35μs -> 3.70μs (9.25% slower)

def test_int_large_env_key_list_last_match():
    """Test large env_key list, last key matches."""
    keys = [f"key{i}" for i in range(1000)]
    env = DummyEnv({"KEY999": "999"})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo", env_key=keys, default_value=0) # 122μs -> 110μs (10.8% faster)

def test_int_large_env_key_list_middle_match():
    """Test large env_key list, middle key matches."""
    keys = [f"key{i}" for i in range(1000)]
    env = DummyEnv({"KEY500": "500"})
    reader = EnvironmentReader(env)
    codeflash_output = reader.int("foo", env_key=keys, default_value=0) # 64.6μs -> 57.7μs (12.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from collections.abc import Callable
from enum import Enum
from typing import TypeVar

# imports
import pytest  # used for our unit tests
from environs import Env
from graphrag.config.environment_reader import EnvironmentReader

# =========================
# UNIT TESTS FOR int METHOD
# =========================

class DummyEnv:
    """A dummy Env class to simulate environs.Env.int behavior for testing."""
    def __init__(self, values=None):
        # values: dict of environment variable keys (UPPERCASE) to their string/int values
        self.values = values or {}

    def int(self, key, default=None):
        # Simulate environs.Env.int: returns int value if present, else default
        if key in self.values:
            v = self.values[key]
            if v is None:
                raise ValueError(f"Environment variable {key} is None")
            try:
                return int(v)
            except Exception as e:
                raise ValueError(f"Cannot convert {v!r} to int: {e}")
        if default is not None:
            return default
        raise Exception(f"Environment variable {key} not found and no default provided")


# For Enum key testing
class MyEnum(Enum):
    FOO = "Foo"
    BAR = "Bar"


# ----------------------
# 1. BASIC TEST CASES
# ----------------------

To edit these changes git checkout codeflash/optimize-EnvironmentReader.int-mglu7w87 and push.

The optimized code achieves a 7% speedup through several targeted micro-optimizations that reduce Python interpreter overhead: **Key Optimizations:** 1. **Eliminated unnecessary list creation in `_read_env`**: When `env_key` is a string (the common case), the original creates a new list `[env_key]` on every call. The optimized version uses a tuple `(env_key,)` instead, which is faster to create and iterate over. 2. **Reduced attribute lookup overhead**: The lambda in the `int` method now captures `self._env.int` as a local variable `_env_int`, avoiding repeated attribute lookups during the lambda execution. 3. **Optimized section attribute access**: Replaced `self.section and key in self.section` with `getattr(self, 'section', None)` followed by null check, eliminating potential repeated attribute lookups and making the logic more explicit. 4. **Reordered type check in `read_key`**: Changed from `if not isinstance(value, str)` to `if isinstance(value, str)`, which is slightly more efficient since strings are the most common case. **Performance Impact by Test Case:** - Best improvements (10-12% faster) occur with large environment key lists where the tuple iteration and reduced attribute lookups compound - Small regressions (3-15% slower) in simple cases likely due to profiling variance, but the overall 7% improvement demonstrates net positive impact - The optimizations particularly benefit scenarios with many environment variable lookups, which is typical in configuration reading workflows These micro-optimizations target Python's interpreter overhead without changing the external behavior, making them ideal for performance-critical configuration reading code.

codeflash-ai bot requested a review from mashraf-222 October 11, 2025 05:31

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `EnvironmentReader.int` by 8%#75

⚡️ Speed up method `EnvironmentReader.int` by 8%#75
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-EnvironmentReader.int-mglu7w87

codeflash-ai bot commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

codeflash-ai bot commented Oct 11, 2025

📄 8% (0.08x) speedup for EnvironmentReader.int in graphrag/config/environment_reader.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

📄 8% (0.08x) speedup for `EnvironmentReader.int` in `graphrag/config/environment_reader.py`