Skip to content

fix: thread-safety in SQLiteSession and AdvancedSQLiteSession with shared lock (Finish)#2831

Closed
KirobotDev wants to merge 1 commit intoopenai:mainfrom
KirobotDev:patch-1
Closed

fix: thread-safety in SQLiteSession and AdvancedSQLiteSession with shared lock (Finish)#2831
KirobotDev wants to merge 1 commit intoopenai:mainfrom
KirobotDev:patch-1

Conversation

@KirobotDev
Copy link
Copy Markdown

@KirobotDev KirobotDev commented Apr 2, 2026

Thanks for the detailed feedback! I've updated the PR to address all the concerns:

Changes Made

Shared lock moved to base class (sqlite_session.py)

  • Added self._file_db_lock = threading.Lock() in SQLiteSession.__init__()
  • Updated all 4 base methods to use the shared lock:
    • get_items()
    • add_items()
    • pop_item()
    • clear_session()

AdvancedSQLiteSession now inherits the lock

  • Removed duplicate lock initialization
  • Now inherits _file_db_lock from base class
  • All 7 methods updated to use self._file_db_lock instead of threading.Lock()

Regression tests added

  • test_concurrent_file_db_access_regression(): Tests 30 concurrent writes from 3 tasks
  • test_concurrent_mixed_operations_file_db(): Tests interleaved add/get operations

Both tests use file-backed databases (not in-memory) to verify thread safety.

Problem Solved
Previously, threading.Lock() created fresh lock instances per method call → no actual locking. Now both base and derived classes use the same per-instance lock → proper thread safety for concurrent file DB access.

Ready for review!

@github-actions github-actions bot added bug Something isn't working feature:extensions labels Apr 2, 2026
Copy link
Copy Markdown
Member

@seratch seratch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing this. I don't think this patch fully closes the thread-safety gap yet.

AdvancedSQLiteSession.add_items() calls SQLiteSession.add_items() first, and SQLiteSession still uses a fresh threading.Lock() for file-backed DBs. So concurrent writes can still interleave before the advanced-session-specific code takes _file_db_lock.

Could we move the shared file-DB lock to the base SQLiteSession path (or otherwise make both base and advanced code use the same per-instance lock), and add a regression test that exercises concurrent AdvancedSQLiteSession access against a file-backed DB?

@seratch seratch marked this pull request as draft April 2, 2026 23:37
@seratch seratch changed the title Fix: Use instance-level threading.Lock for disk-based DBs in AdvancedSQLiteSession fix: use instance-level threading.Lock for disk-based DBs in *SQLiteSession Apr 2, 2026
@KirobotDev
Copy link
Copy Markdown
Author

Thanks for the detailed feedback! You're absolutely right - my fix only addressed the symptom in AdvancedSQLiteSession while leaving the root cause in the base SQLiteSession untouched.

Issues identified:
SQLiteSession.add_items() uses threading.Lock() directly (line 185 in sqlite_session.py)
SQLiteSession.get_items() uses threading.Lock() directly (line 131)
SQLiteSession.pop_item() uses threading.Lock() directly (line 226)
SQLiteSession.clear_session() uses threading.Lock() directly (line 263)

This means concurrent calls to base methods like add_items() can still interleave with AdvancedSQLiteSession operations since they use different lock instances.

Proposed approach:
Move the self._file_db_lock to SQLiteSession.init() so both classes share the same per-instance lock. This requires:
Adding self._file_db_lock = threading.Lock() in base class
Updating all base class methods to use it
AdvancedSQLiteSession will inherit and use the same lock

I'll update the PR to :
Move lock initialization to base SQLiteSession
Update all base class methods using the pattern
Remove duplicate lock creation from AdvancedSQLiteSession
Add a regression test with concurrent file-backed DB access

Should I proceed with this approach, or would you prefer a different solution making the lock injectable or using asyncio.Lock throughout?

@KirobotDev KirobotDev requested a review from seratch April 3, 2026 01:25
@KirobotDev KirobotDev marked this pull request as ready for review April 3, 2026 01:26
- Add shared _file_db_lock in base SQLiteSession class
- Replace fresh threading.Lock() calls with shared instance lock
- Ensure AdvancedSQLiteSession inherits the same lock
- Add regression tests for concurrent file DB access
@seratch seratch marked this pull request as draft April 3, 2026 05:59
@KirobotDev KirobotDev marked this pull request as ready for review April 3, 2026 15:05
@KirobotDev KirobotDev changed the title fix: use instance-level threading.Lock for disk-based DBs in *SQLiteSession fix: thread-safety in SQLiteSession and AdvancedSQLiteSession with shared lock Apr 3, 2026
@KirobotDev KirobotDev changed the title fix: thread-safety in SQLiteSession and AdvancedSQLiteSession with shared lock fix: thread-safety in SQLiteSession and AdvancedSQLiteSession with shared lock (Finish) Apr 3, 2026
@seratch
Copy link
Copy Markdown
Member

seratch commented Apr 4, 2026

Thanks again for iterating on this. I took a closer look from the current session implementation and I think there’s a slightly different fix shape that addresses a more important race.

The shared per-instance file lock in this PR is an improvement over creating a fresh threading.Lock() in each method call, but AdvancedSQLiteSession.add_items() still does two separate phases: super().add_items(items) first, then _add_structure_metadata(items) in a second to_thread() call and a second lock acquisition. That means another concurrent writer can still insert rows for the same session between those two phases, and then _add_structure_metadata() may resolve the wrong "latest N" message IDs when it builds message_structure.

So I think the stronger fix is:

  • put a reusable lock/context primitive in the base SQLiteSession and make file-backed sessions that point at the same DB path share the same process-local RLock (not just a per-instance lock);
  • keep using threading.RLock rather than switching to asyncio.Lock for now, because all sqlite3 work is still executed in worker threads via asyncio.to_thread();
  • in AdvancedSQLiteSession.add_items(), perform the base message insert and structure metadata insert in one lock-protected critical section, so the selected message IDs and the inserted metadata stay aligned;
  • route the remaining AdvancedSQLiteSession DB reads/writes through the same lock helper and add a regression test that checks concurrent file-backed add_items() keeps agent_messages and message_structure in a 1:1 correspondence.

I have an alternative at #2843. Please feel free to share feedback if you have any.

@seratch seratch closed this Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants