Skip to content

Commit c07cbfa

Browse files
authored
FEAT: mssql-python driver changes to support bulk copy logging. (#430)
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below For external contributors: Insert Github Issue number below Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> AB#42113 ------------------------------------------------------------------- ### Summary This pull request introduces significant improvements to the logging system in the `mssql_python` package, focusing on performance optimization, better integration with Rust (`py-core`), and enhanced test coverage. The changes optimize how logging levels are checked, ensure that logging from Rust code is handled correctly, and add thorough tests for these new features. **Key changes:** ### Logging Performance and Behavior Improvements - Switched the default logging level from `CRITICAL` to `WARNING`, ensuring that warnings and errors are visible by default without explicit configuration. Also, a default stderr handler is now added for warnings and errors. [[1]](diffhunk://#diff-a8f9a63854a2b8541d3d9260e97948990dba646ca62b8e0d135055fd7f543f28L91-R91) [[2]](diffhunk://#diff-a8f9a63854a2b8541d3d9260e97948990dba646ca62b8e0d135055fd7f543f28L107-R122) - Introduced cached logging level checks (`_cached_level` and `_is_debug_enabled`) for fast, low-overhead decision-making in logging calls, with a new `is_debug_enabled` property for quick checks. [[1]](diffhunk://#diff-a8f9a63854a2b8541d3d9260e97948990dba646ca62b8e0d135055fd7f543f28L107-R122) [[2]](diffhunk://#diff-a8f9a63854a2b8541d3d9260e97948990dba646ca62b8e0d135055fd7f543f28R505-R515) [[3]](diffhunk://#diff-a8f9a63854a2b8541d3d9260e97948990dba646ca62b8e0d135055fd7f543f28R621-R625) - Updated all logging calls in `_bulkcopy` (in `cursor.py`) to use the new fast check for debug logging, and to avoid unnecessary logger passing to Rust unless logging is enabled. [[1]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280R2577-R2591) [[2]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280L2691-R2706) [[3]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280R2719-R2727) ### Rust (`py-core`) Logging Integration - Added a new `py_core_log` method to the logger, allowing Rust-side code to emit logs with custom source locations and levels, and ensuring these logs are formatted and filtered consistently with Python logs. - Enhanced the CSV log formatter to recognize and format logs originating from `py-core` distinctly. ### Robustness and Error Handling - Ensured that logging always allows `WARNING` and `ERROR` messages, even when the logger is otherwise disabled, both in Python and Rust-originated logs. [[1]](diffhunk://#diff-a8f9a63854a2b8541d3d9260e97948990dba646ca62b8e0d135055fd7f543f28L355-R416) [[2]](diffhunk://#diff-a8f9a63854a2b8541d3d9260e97948990dba646ca62b8e0d135055fd7f543f28R349-R386) - Improved error logging in `_bulkcopy` for various failure cases, such as missing dependencies or invalid parameters. [[1]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280R2577-R2591) [[2]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280R2623) [[3]](diffhunk://#diff-deceea46ae01082ce8400e14fa02f4b7585afb7b5ed9885338b66494f5f38280R2633-R2637) ### Test Suite Enhancements - Added comprehensive tests for the new `is_debug_enabled` property and for `py_core_log`, covering level filtering, custom source locations, CSV formatting, error fallback, and integration with different Rust files. - Updated test cleanup utilities to maintain consistency with the new cached logging level logic. [[1]](diffhunk://#diff-39a609c16b2d72fabefaf23a0f99dccae8bd4bf5e481928bdeaf53d6131a0fd4R34-R35) [[2]](diffhunk://#diff-39a609c16b2d72fabefaf23a0f99dccae8bd4bf5e481928bdeaf53d6131a0fd4R51-R52) These changes collectively improve logging performance, reliability, and cross-language integration, while ensuring robust test coverage for the new features. **High Level Design** <img width="426" height="986" alt="image" src="https://github.com/user-attachments/assets/be4ddc70-7974-484e-9ad0-22e17321e16d" /> **Success Test Log** <img width="1790" height="952" alt="image" src="https://github.com/user-attachments/assets/073a07e1-3235-4587-a747-274e5aa06113" /> **Failure Test Log** <img width="1887" height="957" alt="image" src="https://github.com/user-attachments/assets/8d39e902-1f13-4cce-a252-2dc12deb94ac" />
1 parent ba369c7 commit c07cbfa

File tree

3 files changed

+302
-16
lines changed

3 files changed

+302
-16
lines changed

mssql_python/cursor.py

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2575,16 +2575,21 @@ def bulkcopy(
25752575
ValueError: If table_name is empty or parameters are invalid
25762576
RuntimeError: If connection string is not available
25772577
"""
2578+
# Fast check if logging is enabled to avoid overhead
2579+
is_logging_enabled = logger.is_debug_enabled
2580+
25782581
try:
25792582
import mssql_py_core
25802583
except ImportError as exc:
2584+
logger.error("_bulkcopy: Failed to import mssql_py_core module")
25812585
raise ImportError(
25822586
"Bulk copy requires the mssql_py_core library which is not available. "
25832587
"This is an unexpected error. "
25842588
) from exc
25852589

25862590
# Validate inputs
25872591
if not table_name or not isinstance(table_name, str):
2592+
logger.error("_bulkcopy: Invalid table_name parameter")
25882593
raise ValueError("table_name must be a non-empty string")
25892594

25902595
# Validate that data is iterable (but not a string or bytes, which are technically iterable)
@@ -2616,6 +2621,7 @@ def bulkcopy(
26162621

26172622
# Get and parse connection string
26182623
if not hasattr(self.connection, "connection_str"):
2624+
logger.error("_bulkcopy: Connection string not available")
26192625
raise RuntimeError("Connection string not available for bulk copy")
26202626

26212627
# Use the proper connection string parser that handles braced values
@@ -2652,9 +2658,14 @@ def bulkcopy(
26522658
pycore_connection = None
26532659
pycore_cursor = None
26542660
try:
2655-
pycore_connection = mssql_py_core.PyCoreConnection(pycore_context)
2661+
# Only pass logger to Rust if logging is enabled (performance optimization)
2662+
pycore_connection = mssql_py_core.PyCoreConnection(
2663+
pycore_context, python_logger=logger if is_logging_enabled else None
2664+
)
26562665
pycore_cursor = pycore_connection.cursor()
26572666

2667+
# Call bulkcopy with explicit keyword arguments
2668+
# The API signature: bulkcopy(table_name, data_source, batch_size=0, timeout=30, ...)
26582669
result = pycore_cursor.bulkcopy(
26592670
table_name,
26602671
iter(data),
@@ -2667,6 +2678,14 @@ def bulkcopy(
26672678
keep_nulls=keep_nulls,
26682679
fire_triggers=fire_triggers,
26692680
use_internal_transaction=use_internal_transaction,
2681+
python_logger=logger if is_logging_enabled else None, # Only pass logger if enabled
2682+
)
2683+
2684+
logger.info(
2685+
"_bulkcopy: Bulk copy completed successfully - rows_copied=%s, batch_count=%s, elapsed_time=%s",
2686+
result.get("rows_copied", "N/A"),
2687+
result.get("batch_count", "N/A"),
2688+
result.get("elapsed_time", "N/A"),
26702689
)
26712690

26722691
return result
@@ -2694,8 +2713,7 @@ def bulkcopy(
26942713
try:
26952714
resource.close()
26962715
except Exception as cleanup_error:
2697-
# Log cleanup errors at debug level to aid troubleshooting
2698-
# without masking the original exception
2716+
# Log cleanup errors only - aids troubleshooting without masking original exception
26992717
logger.debug(
27002718
"Failed to close bulk copy resource %s: %s",
27012719
type(resource).__name__,

mssql_python/logging.py

Lines changed: 90 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,22 @@ def __init__(self):
104104
self._handler_lock = threading.RLock() # Reentrant lock for handler operations
105105
self._cleanup_registered = False # Track if atexit cleanup is registered
106106

107-
# Don't setup handlers yet - do it lazily when setLevel is called
107+
# Cached level for fast checks (avoid repeated isEnabledFor calls)
108+
self._cached_level = logging.WARNING
109+
self._is_debug_enabled = False
110+
111+
# Set up default stderr handler for WARNING and ERROR messages
112+
# This ensures warnings are always visible even when logging is not enabled
113+
import sys
114+
115+
default_handler = logging.StreamHandler(sys.stderr)
116+
default_handler.setLevel(logging.WARNING)
117+
# Simple format for warnings - no CSV formatting needed
118+
default_handler.setFormatter(logging.Formatter("[%(name)s] %(levelname)s: %(message)s"))
119+
self._logger.addHandler(default_handler)
120+
self._default_handler = default_handler # Keep reference for later removal
121+
122+
# Don't setup full handlers yet - do it lazily when setLevel is called
108123
# This prevents creating log files when user changes output mode before enabling logging
109124

110125
def _setup_handlers(self):
@@ -145,15 +160,20 @@ def _setup_handlers(self):
145160
# Custom formatter to extract source from message and format as CSV
146161
class CSVFormatter(logging.Formatter):
147162
def format(self, record):
148-
# Extract source from message (e.g., [Python] or [DDBC])
149-
msg = record.getMessage()
150-
if msg.startswith("[") and "]" in msg:
151-
end_bracket = msg.index("]")
152-
source = msg[1:end_bracket]
153-
message = msg[end_bracket + 2 :].strip() # Skip '] '
163+
# Check if this is from py-core (via py_core_log method)
164+
if hasattr(record, "funcName") and record.funcName == "py-core":
165+
source = "py-core"
166+
message = record.getMessage()
154167
else:
155-
source = "Unknown"
156-
message = msg
168+
# Extract source from message (e.g., [Python] or [DDBC])
169+
msg = record.getMessage()
170+
if msg.startswith("[") and "]" in msg:
171+
end_bracket = msg.index("]")
172+
source = msg[1:end_bracket]
173+
message = msg[end_bracket + 2 :].strip() # Skip '] '
174+
else:
175+
source = "Unknown"
176+
message = msg
157177

158178
# Format timestamp with milliseconds using period separator
159179
timestamp = self.formatTime(record, "%Y-%m-%d %H:%M:%S")
@@ -326,6 +346,44 @@ def _write_log_header(self):
326346
pass # Even stderr notification failed
327347
# Don't crash - logging continues without header
328348

349+
def py_core_log(self, level: int, msg: str, filename: str = "cursor.rs", lineno: int = 0):
350+
"""
351+
Logging method for py-core (Rust/TDS) code with custom source location.
352+
353+
Args:
354+
level: Log level (DEBUG, INFO, WARNING, ERROR)
355+
msg: Message string (already formatted)
356+
filename: Source filename (e.g., 'cursor.rs')
357+
lineno: Line number in source file
358+
"""
359+
try:
360+
# Fast level check using cached level (same optimization as _log method)
361+
# Exception: Always allow WARNING and ERROR messages through
362+
if level < self._cached_level and level < logging.WARNING:
363+
return
364+
365+
# Create a custom LogRecord with Rust source location
366+
import logging as log_module
367+
368+
record = log_module.LogRecord(
369+
name=self._logger.name,
370+
level=level,
371+
pathname=filename,
372+
lineno=lineno,
373+
msg=msg,
374+
args=(),
375+
exc_info=None,
376+
func="py-core",
377+
sinfo=None,
378+
)
379+
self._logger.handle(record)
380+
except Exception:
381+
# Fallback - use regular logging
382+
try:
383+
self._logger.log(level, msg)
384+
except:
385+
pass
386+
329387
def _log(self, level: int, msg: str, add_prefix: bool = True, *args, **kwargs):
330388
"""
331389
Internal logging method with exception safety.
@@ -352,8 +410,10 @@ def _log(self, level: int, msg: str, add_prefix: bool = True, *args, **kwargs):
352410
All other failures are silently ignored to prevent app crashes.
353411
"""
354412
try:
355-
# Fast level check (zero overhead if disabled)
356-
if not self._logger.isEnabledFor(level):
413+
# Fast level check using cached level (zero overhead if disabled)
414+
# This avoids the overhead of isEnabledFor() method call
415+
# Exception: Always allow WARNING and ERROR messages through
416+
if level < self._cached_level and level < logging.WARNING:
357417
return
358418

359419
# Add prefix if requested (only after level check)
@@ -364,8 +424,9 @@ def _log(self, level: int, msg: str, add_prefix: bool = True, *args, **kwargs):
364424
if args:
365425
msg = msg % args
366426

367-
# Log the message (no args since already formatted)
368-
self._logger.log(level, msg, **kwargs)
427+
# Log the message with proper stack level to capture caller's location
428+
# stacklevel=3 skips: _log -> debug/info/warning/error -> actual caller
429+
self._logger.log(level, msg, stacklevel=3, **kwargs)
369430
except Exception:
370431
# Last resort: Try stderr fallback for any logging failure
371432
# This helps diagnose critical issues (disk full, permission denied, etc.)
@@ -441,6 +502,17 @@ def _setLevel(
441502
# Set level (atomic operation, no lock needed)
442503
self._logger.setLevel(level)
443504

505+
# Cache level for fast checks (avoid repeated isEnabledFor calls)
506+
# Note: These updates are not atomic across both variables, creating a brief
507+
# window where reads might see inconsistent state (e.g., updated _cached_level
508+
# but old _is_debug_enabled). This is an acceptable benign race condition:
509+
# - Worst case: one log statement might be incorrectly allowed/blocked
510+
# - Duration: nanoseconds (single Python bytecode instruction gap)
511+
# - Impact: negligible - next check will see consistent state
512+
# - Alternative (locking) would add overhead to every log call
513+
self._cached_level = level
514+
self._is_debug_enabled = level <= logging.DEBUG
515+
444516
# Notify C++ bridge of level change
445517
self._notify_cpp_level_change(level)
446518

@@ -546,6 +618,11 @@ def level(self) -> int:
546618
"""Get the current logging level"""
547619
return self._logger.level
548620

621+
@property
622+
def is_debug_enabled(self) -> bool:
623+
"""Fast check if debug logging is enabled (cached for performance)"""
624+
return self._is_debug_enabled
625+
549626

550627
# ============================================================================
551628
# Module-level exports (Primary API)

0 commit comments

Comments
 (0)