Skip to content

⚡️ Speed up function _timestamp_message by 9%#22

Open
codeflash-ai[bot] wants to merge 1 commit into
mainfrom
codeflash/optimize-_timestamp_message-mh4ja3ia
Open

⚡️ Speed up function _timestamp_message by 9%#22
codeflash-ai[bot] wants to merge 1 commit into
mainfrom
codeflash/optimize-_timestamp_message-mh4ja3ia

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai Bot commented Oct 24, 2025

📄 9% (0.09x) speedup for _timestamp_message in src/deepgram/extensions/telemetry/proto_encoder.py

⏱️ Runtime : 11.8 milliseconds 10.8 milliseconds (best of 107 runs)

📝 Explanation and details

The optimization achieves a 9% speedup by replacing bytearray objects with regular Python list objects in two key functions:

Key Changes:

  1. In _varint(): Changed out = bytearray() to out = [] and replaced out.append() calls with list appends
  2. In _timestamp_message(): Changed msg = bytearray() to msg = [], replaced msg += ... concatenations with msg.append() calls, and used b''.join(msg) for final assembly

Why This is Faster:

  • List operations are more efficient than bytearray operations in CPython when building sequences incrementally
  • Avoiding repeated concatenation: The original code used msg += _int64(...) which creates new bytearray objects each time. The optimized version appends complete byte strings to a list and joins them once at the end
  • Better memory allocation patterns: Lists have optimized growth strategies for append operations, while bytearray concatenation involves more memory copying

Performance Benefits by Test Type:

  • Simple cases (whole seconds, zero values): 7-20% faster due to reduced bytearray overhead
  • Complex cases (fractional seconds requiring nanos field): 3-16% faster from eliminating intermediate concatenations
  • Bulk operations (1000+ timestamps): 6-12% faster, showing consistent gains across workloads

The optimization is particularly effective for protobuf encoding workloads where many small byte sequences need to be assembled into larger messages.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 8036 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import struct

# imports
import pytest
from deepgram.extensions.telemetry.proto_encoder import _timestamp_message

# --- Helper functions for testing ---

def decode_varint(data: bytes, offset=0):
    """Decodes a varint from bytes starting at offset, returns (value, new_offset)."""
    shift = 0
    result = 0
    while True:
        b = data[offset]
        result |= (b & 0x7F) << shift
        offset += 1
        if not (b & 0x80):
            break
        shift += 7
    return result, offset

def parse_timestamp_message(msg: bytes):
    """Parse the protobuf wire-encoded timestamp message into a dict."""
    offset = 0
    result = {}
    while offset < len(msg):
        key, offset = decode_varint(msg, offset)
        field_number = key >> 3
        wire_type = key & 0x7
        if wire_type != 0:
            raise ValueError("Only varint wire type supported in this test parser")
        value, offset = decode_varint(msg, offset)
        result[field_number] = value
    return result

# --- Unit Tests ---

# 1. Basic Test Cases

def test_zero_seconds():
    # 0 seconds, should only encode seconds field, nanos omitted
    codeflash_output = _timestamp_message(0.0); msg = codeflash_output # 3.81μs -> 3.44μs (10.4% faster)
    fields = parse_timestamp_message(msg)

def test_positive_integer_seconds():
    # 42 seconds, no nanos
    codeflash_output = _timestamp_message(42.0); msg = codeflash_output # 2.77μs -> 2.47μs (12.3% faster)
    fields = parse_timestamp_message(msg)

def test_positive_seconds_with_nanos():
    # 10.25 seconds, nanos = 250_000_000
    codeflash_output = _timestamp_message(10.25); msg = codeflash_output # 4.21μs -> 4.02μs (4.73% faster)
    fields = parse_timestamp_message(msg)

def test_fractional_seconds_rounding():
    # 3.9999999995 rounds up to 4 seconds, nanos = 0
    codeflash_output = _timestamp_message(3.9999999995); msg = codeflash_output # 2.94μs -> 2.57μs (14.5% faster)
    fields = parse_timestamp_message(msg)

def test_fractional_seconds_with_trailing_nanos():
    # 5.000000001, nanos = 1
    codeflash_output = _timestamp_message(5.000000001); msg = codeflash_output # 3.23μs -> 3.01μs (7.10% faster)
    fields = parse_timestamp_message(msg)

def test_small_fractional_seconds():
    # 7.000000001, nanos = 1
    codeflash_output = _timestamp_message(7.000000001); msg = codeflash_output # 3.15μs -> 2.83μs (11.5% faster)
    fields = parse_timestamp_message(msg)

def test_exactly_one_second():
    # 1.0 seconds, should only encode seconds field, nanos omitted
    codeflash_output = _timestamp_message(1.0); msg = codeflash_output # 2.63μs -> 2.20μs (19.6% faster)
    fields = parse_timestamp_message(msg)

# 2. Edge Test Cases

def test_negative_zero():
    # -0.0 should be treated as 0
    codeflash_output = _timestamp_message(-0.0); msg = codeflash_output # 2.69μs -> 2.28μs (17.9% faster)
    fields = parse_timestamp_message(msg)

def test_negative_integer_seconds():
    # -5.0 seconds, should encode seconds as -5, no nanos
    codeflash_output = _timestamp_message(-5.0); msg = codeflash_output # 4.76μs -> 4.41μs (8.08% faster)
    fields = parse_timestamp_message(msg)

def test_negative_seconds_with_nanos():
    # -2.75 seconds, should encode seconds as -2, nanos as -750_000_000
    codeflash_output = _timestamp_message(-2.75); msg = codeflash_output # 6.25μs -> 5.77μs (8.25% faster)
    fields = parse_timestamp_message(msg)
    # nanos field is negative, so two's complement encoding
    expected_sec = -2 if fields[1] <= 2**63-1 else fields[1] - 2**64
    expected_nanos = int(round((-2.75 - int(-2.75)) * 1_000_000_000))
    if expected_nanos < 0:
        expected_nanos += 1_000_000_000
        expected_sec -= 1

def test_large_integer_seconds():
    # Very large seconds value (2**40)
    codeflash_output = _timestamp_message(float(2**40)); msg = codeflash_output # 3.82μs -> 3.55μs (7.78% faster)
    fields = parse_timestamp_message(msg)

def test_large_seconds_with_nanos():
    # Large seconds and nanos
    codeflash_output = _timestamp_message(2**40 + 0.123456789); msg = codeflash_output # 4.99μs -> 4.76μs (4.72% faster)
    fields = parse_timestamp_message(msg)

def test_nanos_exactly_one_billion():
    # 1.9999999995 should round nanos to 1_000_000_000, so sec=2, nanos=0
    codeflash_output = _timestamp_message(1.9999999995); msg = codeflash_output # 2.83μs -> 2.46μs (14.9% faster)
    fields = parse_timestamp_message(msg)

def test_nanos_just_below_one_billion():
    # 1.9999999994 should round nanos to 999_999_999
    codeflash_output = _timestamp_message(1.9999999994); msg = codeflash_output # 4.15μs -> 4.02μs (3.29% faster)
    fields = parse_timestamp_message(msg)

def test_negative_fractional_seconds_rounding():
    # -1.9999999995 should round to -2, nanos=0
    codeflash_output = _timestamp_message(-1.9999999995); msg = codeflash_output # 6.16μs -> 5.93μs (3.85% faster)
    fields = parse_timestamp_message(msg)
    expected_sec = -2 if fields[1] <= 2**63-1 else fields[1] - 2**64

def test_minimum_float():
    # Smallest positive float
    codeflash_output = _timestamp_message(5e-324); msg = codeflash_output # 2.77μs -> 2.43μs (13.9% faster)
    fields = parse_timestamp_message(msg)

def test_maximum_float():
    # Largest float (will overflow int64, but should not error)
    codeflash_output = _timestamp_message(float(2**63-1)); msg = codeflash_output # 4.41μs -> 4.22μs (4.36% faster)
    fields = parse_timestamp_message(msg)

def test_negative_large_float():
    # Large negative float
    codeflash_output = _timestamp_message(float(-(2**63))); msg = codeflash_output # 4.41μs -> 4.12μs (6.86% faster)
    fields = parse_timestamp_message(msg)
    # Should encode as two's complement
    expected_sec = fields[1] if fields[1] < 2**63 else fields[1] - 2**64

def test_subsecond_just_below_zero():
    # -0.000000001, nanos should be 999_999_999 and sec -1
    codeflash_output = _timestamp_message(-0.000000001); msg = codeflash_output # 5.17μs -> 4.86μs (6.42% faster)
    fields = parse_timestamp_message(msg)
    expected_sec = -1 if fields[1] <= 2**63-1 else fields[1] - 2**64

# 3. Large Scale Test Cases

def test_many_increasing_seconds():
    # Test a sequence of timestamps from 0 to 999
    for i in range(1000):
        codeflash_output = _timestamp_message(float(i)); msg = codeflash_output # 862μs -> 774μs (11.4% faster)
        fields = parse_timestamp_message(msg)

def test_many_fractional_seconds():
    # Test a sequence of timestamps with increasing nanos
    for i in range(1000):
        ts = i + 0.123456789
        codeflash_output = _timestamp_message(ts); msg = codeflash_output # 1.55ms -> 1.41ms (9.62% faster)
        fields = parse_timestamp_message(msg)

def test_many_negative_seconds():
    # Test a sequence of negative timestamps
    for i in range(1000):
        ts = -float(i)
        codeflash_output = _timestamp_message(ts); msg = codeflash_output # 1.57ms -> 1.48ms (6.13% faster)
        fields = parse_timestamp_message(msg)
        expected_sec = fields[1] if fields[1] < 2**63 else fields[1] - 2**64

def test_many_small_fractions():
    # Test subsecond values from 0.000001 to 0.001
    for i in range(1, 1000):
        ts = i / 1_000_000
        codeflash_output = _timestamp_message(ts); msg = codeflash_output # 1.39ms -> 1.24ms (11.7% faster)
        fields = parse_timestamp_message(msg)
        expected_nanos = int(round(ts * 1_000_000_000))
        if expected_nanos:
            pass
        else:
            pass

def test_performance_large_batch():
    # Test a batch of 1000 diverse timestamps for performance and correctness
    for i in range(1000):
        ts = (i * 123456789.987654321) % (2**40)
        codeflash_output = _timestamp_message(ts); msg = codeflash_output # 1.94ms -> 1.82ms (6.92% faster)
        fields = parse_timestamp_message(msg)
        nanos = int(round((ts - int(ts)) * 1_000_000_000))
        if nanos >= 1_000_000_000:
            pass
        elif nanos:
            pass
        else:
            pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

import struct

# imports
import pytest  # used for our unit tests
from deepgram.extensions.telemetry.proto_encoder import _timestamp_message

# unit tests

# Helper to decode protobuf varint (for test validation)
def decode_varint(data, offset=0):
    shift = 0
    result = 0
    while True:
        b = data[offset]
        result |= ((b & 0x7F) << shift)
        offset += 1
        if not (b & 0x80):
            break
        shift += 7
    return result, offset

# Helper to decode the protobuf-encoded timestamp message
def decode_timestamp_message(b):
    # Returns (seconds, nanos)
    offset = 0
    seconds = None
    nanos = 0
    while offset < len(b):
        key, offset2 = decode_varint(b, offset)
        field_number = key >> 3
        wire_type = key & 0x7
        offset = offset2
        if wire_type != 0:
            raise ValueError("Unexpected wire type: %d" % wire_type)
        value, offset = decode_varint(b, offset)
        if field_number == 1:
            seconds = value
        elif field_number == 2:
            nanos = value
        else:
            raise ValueError("Unexpected field number: %d" % field_number)
    return seconds, nanos

# ------------------------
# 1. Basic Test Cases
# ------------------------

def test_whole_second():
    # 123.0 should encode to seconds=123, nanos=0
    codeflash_output = _timestamp_message(123.0); b = codeflash_output # 2.79μs -> 2.60μs (7.43% faster)
    sec, nanos = decode_timestamp_message(b)

def test_simple_fractional_second():
    # 123.456 should encode to seconds=123, nanos=456_000_000
    codeflash_output = _timestamp_message(123.456); b = codeflash_output # 4.59μs -> 4.35μs (5.61% faster)
    sec, nanos = decode_timestamp_message(b)

def test_exact_nanoseconds():
    # 1.000000789 should encode to seconds=1, nanos=789
    codeflash_output = _timestamp_message(1.000000789); b = codeflash_output # 3.79μs -> 3.26μs (16.3% faster)
    sec, nanos = decode_timestamp_message(b)

def test_zero():
    # 0.0 should encode to seconds=0, nanos=0
    codeflash_output = _timestamp_message(0.0); b = codeflash_output # 2.65μs -> 2.47μs (7.62% faster)
    sec, nanos = decode_timestamp_message(b)

def test_negative_zero():
    # -0.0 should encode to seconds=0, nanos=0 (identical to 0.0)
    codeflash_output = _timestamp_message(-0.0); b = codeflash_output # 2.75μs -> 2.29μs (19.9% faster)
    sec, nanos = decode_timestamp_message(b)

def test_negative_whole_second():
    # -42.0 should encode to seconds=-42, nanos=0
    codeflash_output = _timestamp_message(-42.0); b = codeflash_output # 4.50μs -> 4.38μs (2.69% faster)
    sec, nanos = decode_timestamp_message(b)

def test_negative_fractional():
    # -42.75 should encode to seconds=-42, nanos=-0.75e9 = -750_000_000
    codeflash_output = _timestamp_message(-42.75); b = codeflash_output # 6.15μs -> 6.01μs (2.35% faster)
    sec, nanos = decode_timestamp_message(b)
    # nanos will be 2**64 - 750_000_000, but since nanos must be non-negative, this is a special edge
    # However, the function as written will encode negative nanos as unsigned varint
    # Let's check the actual encoding
    expected_nanos = int(round((-42.75 - int(-42.75)) * 1_000_000_000))

# ------------------------
# 2. Edge Test Cases
# ------------------------

def test_fractional_rounding_up():
    # 1.9999999996 rounds nanos to 1_000_000_000, should increment seconds
    codeflash_output = _timestamp_message(1.9999999996); b = codeflash_output # 2.82μs -> 2.52μs (11.8% faster)
    sec, nanos = decode_timestamp_message(b)

def test_fractional_rounding_down():
    # 1.0000000003 rounds nanos to 0 (since 0.0000000003 * 1e9 = 0.3, rounds to 0)
    codeflash_output = _timestamp_message(1.0000000003); b = codeflash_output # 2.36μs -> 2.19μs (7.80% faster)
    sec, nanos = decode_timestamp_message(b)

def test_fractional_just_below_increment():
    # 1.9999999994 rounds nanos to 999_999_999
    codeflash_output = _timestamp_message(1.9999999994); b = codeflash_output # 4.27μs -> 4.07μs (4.94% faster)
    sec, nanos = decode_timestamp_message(b)

def test_maximum_possible_nanoseconds():
    # 0.9999999995 rounds to 1_000_000_000, so seconds should be 1, nanos 0
    codeflash_output = _timestamp_message(0.9999999995); b = codeflash_output # 2.69μs -> 2.41μs (11.8% faster)
    sec, nanos = decode_timestamp_message(b)

def test_minimum_possible_nanoseconds():
    # 0.0000000004 rounds to 0
    codeflash_output = _timestamp_message(0.0000000004); b = codeflash_output # 2.68μs -> 2.24μs (19.5% faster)
    sec, nanos = decode_timestamp_message(b)

def test_large_positive_timestamp():
    # 2**40 + 0.123456789
    val = float(2**40) + 0.123456789
    codeflash_output = _timestamp_message(val); b = codeflash_output # 5.20μs -> 4.97μs (4.57% faster)
    sec, nanos = decode_timestamp_message(b)

def test_large_negative_timestamp():
    # -2**40 + 0.987654321
    val = -float(2**40) + 0.987654321
    codeflash_output = _timestamp_message(val); b = codeflash_output # 6.41μs -> 6.18μs (3.75% faster)
    sec, nanos = decode_timestamp_message(b)
    expected_nanos = int(round((val - int(val)) * 1_000_000_000)) & ((1<<64)-1)


def test_inf_input():
    # inf input should result in sec=0, nanos=0 (since int(float('inf')) raises OverflowError)
    import math
    with pytest.raises(OverflowError):
        _timestamp_message(float('inf')) # 1.06μs -> 1.04μs (1.15% faster)
    with pytest.raises(OverflowError):
        _timestamp_message(float('-inf')) # 413ns -> 425ns (2.82% slower)

def test_subsecond_precision_loss():
    # 0.123456789123456 should round nanos to 123456789
    codeflash_output = _timestamp_message(0.123456789123456); b = codeflash_output # 6.95μs -> 6.16μs (12.9% faster)
    sec, nanos = decode_timestamp_message(b)

def test_negative_fractional_rounding():
    # -1.9999999996 rounds nanos to -1_000_000_000, sec should be -2, nanos 0
    codeflash_output = _timestamp_message(-1.9999999996); b = codeflash_output # 6.71μs -> 6.34μs (5.90% faster)
    sec, nanos = decode_timestamp_message(b)

# ------------------------
# 3. Large Scale Test Cases
# ------------------------

def test_many_whole_seconds():
    # Test a range of whole seconds
    for i in range(-500, 500):
        codeflash_output = _timestamp_message(float(i)); b = codeflash_output # 1.21ms -> 1.11ms (8.27% faster)
        sec, nanos = decode_timestamp_message(b)
        expected_sec = i if i >= 0 else (2**64 + i)

def test_many_fractional_seconds():
    # Test 1000 values between 0 and 1 second
    for i in range(1000):
        frac = i / 1000.0
        codeflash_output = _timestamp_message(frac); b = codeflash_output # 1.54ms -> 1.39ms (10.7% faster)
        sec, nanos = decode_timestamp_message(b)



def test_all_possible_nanoseconds():
    # Test for all possible nanos in increments of 1_000_000 (millisecond precision)
    for nanos in range(0, 1_000_000_000, 1_000_000):
        ts = nanos / 1_000_000_000
        codeflash_output = _timestamp_message(ts); b = codeflash_output # 1.55ms -> 1.40ms (10.6% faster)
        sec, nanos_out = decode_timestamp_message(b)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from deepgram.extensions.telemetry.proto_encoder import _timestamp_message

def test__timestamp_message():
    _timestamp_message(-2.25)
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_7zeygj7s/tmpfxa8_bv5/test_concolic_coverage.py::test__timestamp_message 7.61μs 7.60μs 0.079%✅

To edit these changes git checkout codeflash/optimize-_timestamp_message-mh4ja3ia and push.

Codeflash

The optimization achieves a 9% speedup by replacing `bytearray` objects with regular Python `list` objects in two key functions:

**Key Changes:**
1. **In `_varint()`**: Changed `out = bytearray()` to `out = []` and replaced `out.append()` calls with list appends
2. **In `_timestamp_message()`**: Changed `msg = bytearray()` to `msg = []`, replaced `msg += ...` concatenations with `msg.append()` calls, and used `b''.join(msg)` for final assembly

**Why This is Faster:**
- **List operations are more efficient** than bytearray operations in CPython when building sequences incrementally
- **Avoiding repeated concatenation**: The original code used `msg += _int64(...)` which creates new bytearray objects each time. The optimized version appends complete byte strings to a list and joins them once at the end
- **Better memory allocation patterns**: Lists have optimized growth strategies for append operations, while bytearray concatenation involves more memory copying

**Performance Benefits by Test Type:**
- **Simple cases** (whole seconds, zero values): 7-20% faster due to reduced bytearray overhead
- **Complex cases** (fractional seconds requiring nanos field): 3-16% faster from eliminating intermediate concatenations  
- **Bulk operations** (1000+ timestamps): 6-12% faster, showing consistent gains across workloads

The optimization is particularly effective for protobuf encoding workloads where many small byte sequences need to be assembled into larger messages.
@codeflash-ai codeflash-ai Bot requested a review from mashraf-222 October 24, 2025 07:32
@codeflash-ai codeflash-ai Bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants