Discovered the mssql-python library and the microsoft dev blog comparing the performance of the library vs pyodbc here: https://devblogs.microsoft.com/python/mssql-python-vs-pyodbc-benchmarking-sql-server-performance/
So I started to look into and test replacing our current pyodbc-based implementation with mssql-python by switching from pyodbc/sqlalchemy with pandas.read_sql to the .arrow().to_pandas() functionality of the mssql-python library.
Using the arrow integration, I found some significant speed ups in our data getters. This is significant for some of our work flows which have been slow in the past using our current sql to python workflow (e.g. 15s vs 42s).
In testing, I have looked into all the various workflows to compare the two libraries:
- I have used our existing DBs and queries to test timings for
.fetchall(): pyodbc vs mssql_python
- pyodbc vs mssql_python via sqlalchemy/pandas (the latest beta of sqlalchemy for mssql-python compatibility)
- mssql-python via
.arrow()
- Used the performance test script from the original blog article here: https://github.com/microsoft/mssql-python/blob/main/benchmarks/bench_mssql.py
For these tests I initially used our existing infrastructure with SQL Server 2019 and Python 3.10. However to test whether they played any factor, I also tried Python 3.13 and SQL Server 2025 (via a fresh docker container)
Going via pyarrow is always the fastest which seems to be no surprise and provides generally the best features like fully typed columns and future compatibility with other libraries which is perfect.
However in testing the other scenarios which are more of a fair test between pyodbc and mssql-python and comparing their equivalent functionality. I keep finding that pyodbc is either approximately as performant or quicker than mssql-pyodbc.
Here's a list of results using the tests from the article:
Operation pyodbc min mssql min Ratio(min) pyodbc med mssql med Ratio(med)
----------------------------------------------------------------------------------------------------
SELECT 57.83ms 58.22ms 0.99x (pyodbc) 65.94ms 67.01ms 0.98x (pyodbc)
INSERT 14.43ms 16.22ms 0.89x (pyodbc) 17.05ms 18.86ms 0.90x (pyodbc)
UPDATE 14.24ms 17.37ms 0.82x (pyodbc) 16.66ms 20.57ms 0.81x (pyodbc)
DELETE 6.04ms 7.98ms 0.76x (pyodbc) 7.72ms 10.64ms 0.73x (pyodbc)
Complex query 5.79ms 7.80ms 0.74x (pyodbc) 7.05ms 9.98ms 0.71x (pyodbc)
Multiple connections 67.30ms 86.74ms 0.78x (pyodbc) 74.46ms 98.73ms 0.75x (pyodbc)
Fetch one 5.86ms 7.55ms 0.78x (pyodbc) 8.07ms 9.75ms 0.83x (pyodbc)
Fetch many 5.78ms 7.81ms 0.74x (pyodbc) 6.78ms 9.62ms 0.70x (pyodbc)
Stored procedure 5.80ms 7.53ms 0.77x (pyodbc) 6.79ms 8.75ms 0.78x (pyodbc)
Nested query 5.93ms 7.74ms 0.77x (pyodbc) 7.30ms 9.76ms 0.75x (pyodbc)
Large data set 5.76ms 8.21ms 0.70x (pyodbc) 6.78ms 9.70ms 0.70x (pyodbc)
Join query 6.13ms 8.50ms 0.72x (pyodbc) 7.61ms 10.19ms 0.75x (pyodbc)
Execute many 22.56ms 21.60ms 1.04x (mssql) 27.91ms 26.57ms 1.05x (mssql)
100 inserts 187.36ms 23.13ms 8.10x (mssql) 204.97ms 25.67ms 7.98x (mssql)
Transaction 22.62ms 24.63ms 0.92x (pyodbc) 28.83ms 30.79ms 0.94x (pyodbc)
Update with join 5.83ms 7.52ms 0.77x (pyodbc) 7.18ms 9.15ms 0.78x (pyodbc)
Delete with join 5.45ms 7.80ms 0.70x (pyodbc) 6.54ms 9.10ms 0.72x (pyodbc)
And an example benchmark for one of our internal queries of 24 columns by:
- 1709738 rows
pyodbc fetchall 23.5540s
mssql-python fetchall 37.0886s
SQLAlchemy+pyodbc 41.9620s
SQLAlchemy+mssql-py 53.1163s
- 77712 rows
pyodbc fetchall 1.0679s
mssql-python fetchall 1.2697s
SQLAlchemy+pyodbc 1.7786s
SQLAlchemy+mssql-py 2.0846s
I am by no means a SQL expert so maybe there's something else which is going on which means mssql-python is not reaching it's peak speeds.
Questions:
- Any ideas why I cannot not replicate something similar to the original benchmarks whereby there's at least a 2x speed up in most situations?
- If there is some sort of performance regression here, does this mean the arrow integration could be even faster than it is now in this initial testing?
Spec Details:
Python version: 3.10.11 and 3.13.12
SQL Server Version: 2019 (existing) and 2025 (test docker container)
OS: Windows 10
python-mssql: 1.6.0
pyodbc: 5.3.0
sqlalchemy: 2.1.0b1
Discovered the mssql-python library and the microsoft dev blog comparing the performance of the library vs pyodbc here: https://devblogs.microsoft.com/python/mssql-python-vs-pyodbc-benchmarking-sql-server-performance/
So I started to look into and test replacing our current pyodbc-based implementation with mssql-python by switching from pyodbc/sqlalchemy with
pandas.read_sqlto the.arrow().to_pandas()functionality of the mssql-python library.Using the arrow integration, I found some significant speed ups in our data getters. This is significant for some of our work flows which have been slow in the past using our current sql to python workflow (e.g. 15s vs 42s).
In testing, I have looked into all the various workflows to compare the two libraries:
.fetchall(): pyodbc vs mssql_python.arrow()For these tests I initially used our existing infrastructure with SQL Server 2019 and Python 3.10. However to test whether they played any factor, I also tried Python 3.13 and SQL Server 2025 (via a fresh docker container)
Going via pyarrow is always the fastest which seems to be no surprise and provides generally the best features like fully typed columns and future compatibility with other libraries which is perfect.
However in testing the other scenarios which are more of a fair test between pyodbc and mssql-python and comparing their equivalent functionality. I keep finding that pyodbc is either approximately as performant or quicker than mssql-pyodbc.
Here's a list of results using the tests from the article:
And an example benchmark for one of our internal queries of 24 columns by:
I am by no means a SQL expert so maybe there's something else which is going on which means mssql-python is not reaching it's peak speeds.
Questions:
Spec Details:
Python version: 3.10.11 and 3.13.12
SQL Server Version: 2019 (existing) and 2025 (test docker container)
OS: Windows 10
python-mssql: 1.6.0
pyodbc: 5.3.0
sqlalchemy: 2.1.0b1