-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python: Add Pandas SQLi sinks #19594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces explicit SQL injection modeling for pandas.read_sql and pandas.read_sql_query, updates existing tests to use an sqlite3 connection for sink detection, and adds a change note entry.
- Updated pandas DataFrame test to connect via
sqlite3and mark SQL sinks with$getSql. - Added a
ReadSQLCallQL class to flag raw SQL queries in Pandas. - Documented the new analysis in the change notes.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| python/ql/test/library-tests/frameworks/pandas/dataframe_query.py | Replaced path-based calls with sqlite3.connect, updated sink markers |
| python/ql/src/change-notes/2025-05-26-pandas-sqli-sinks.md | Added entry for Pandas SQLi sinks |
| python/ql/lib/semmle/python/frameworks/Pandas.qll | Introduced ReadSQLCall to model read_sql and read_sql_query as sinks |
Comments suppressed due to low confidence (2)
python/ql/test/library-tests/frameworks/pandas/dataframe_query.py:59
- This test covers positional arguments but does not cover keyword argument usage (e.g.,
sql=andcon=). Adding a test forpd.read_sql_query(sql="...", con=connection)would ensure named-parameter sinks are detected.
df = pd.read_sql_query("sql query", connection) # $getSql="sql query"
python/ql/lib/semmle/python/frameworks/Pandas.qll:161
- Unconditionally treating
read_sqlas a sink may flag saferead_sql_tablecalls (whichread_sqlcan dispatch to). Consider refining the model to only treat calls that route toread_sql_queryor inspect argument patterns to avoid false positives.
ReadSQLCall() { this = API::moduleImport("pandas").getMember(["read_sql", "read_sql_query"]).getACall() }
python/ql/test/library-tests/frameworks/pandas/dataframe_query.py
Outdated
Show resolved
Hide resolved
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short and sweet! Looks good to me. 👍
|
Thanks @tausbn ! I fixed some nitpicks from automation :) |
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the additional fixes! 🙂
|
Hi @tausbn I don't have the rights to merge, this one is ready from my side :) |
Pandas has a
read_sqlsink, which allows for executing raw SQL queries with untrusted input.read_sqlis a convenience method, that dispatches the query either toread_sql_tableorread_sql_query. Please note thatread_sql_tableis not vulnerable to SQLi, butread_sql_queryis.Note that instead of adding new tests, I edited the previous tests, because they contained the vulnerable
read_sqlandread_sql_querymethod, which made them show up as failed.