fix: reload session file on read to prevent stale chat history#1764
fix: reload session file on read to prevent stale chat history#1764cursor[bot] wants to merge 1 commit into
Conversation
_load_session returned an in-memory cache without checking disk when a session JSON file already existed. A second DefaultSessionStore instance (or direct file edits) could add messages that get_chat_history never saw until invalidate_cache was called manually. Always reload from disk under FileLock when the session file exists. Adds regression test for cross-instance reads. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
|
@coderabbitai review |
|
/review |
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthrough
ChangesSession store disk-first loading
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings. Review areas:
|
Review complete. Findings after checking all available reviewer outputs first:
PR assessment:
|
|
@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding. Phase 1: Review per AGENTS.md
Phase 2: FIX Valid Issues Phase 3: Final Verdict |
|
Claude finished @MervinPraison's task —— View job Final Architecture Review Complete ✅Todo List:
Phase 1: Architecture Review ✅PASSED - All architecture requirements met: ✅ Protocol-driven: Core SDK session store implementation - appropriate layer Phase 2: Issue Analysis ✅No valid issues found to fix from previous reviewers:
Fix Quality Assessment:
Phase 3: Final Verdict ✅APPROVED - This PR should be merged. Summary: Clean, focused fix for a legitimate data consistency bug. The solution properly trades minimal performance cost for correctness, follows established patterns, and includes comprehensive test coverage. No architecture violations or valid reviewer concerns identified. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/praisonai-agents/praisonaiagents/session/store.py`:
- Around line 271-283: Race between the os.path.exists check and returning
cached data can cause a stale-read; fix by checking the in-memory cache first,
then if the file exists acquire the FileLock and re-check the cache before
loading from disk. Concretely: in the method that uses filepath/session_id, do
"with self._lock: if session_id in self._cache: return self._cache[session_id]"
then if os.path.exists(filepath) acquire FileLock(filepath, self.lock_timeout)
and, while holding self._lock again, re-check self._cache (return if present)
else call _load_session_from_disk(session_id, filepath), set
self._cache[session_id]=session and return; if file does not exist create
SessionData(session_id=...) under self._lock, store and return. Ensure you
reference _load_session_from_disk, self._lock, self._cache, FileLock,
self.lock_timeout, session_id and filepath.
In `@src/praisonai-agents/tests/unit/session/test_session_store.py`:
- Around line 417-429: The unit test
test_get_chat_history_sees_other_store_instance only exercises
DefaultSessionStore internals; add an end-to-end agentic regression that uses
Agent.start() to exercise the same cross-process session visibility: create a
temporary session_dir, instantiate an Agent (or two Agent instances) configured
to use DefaultSessionStore(session_dir=tmpdir), call agent.start() with a real
prompt to produce an actual text response (hitting the LLM), then instantiate a
separate Agent or DefaultSessionStore pointing at the same session_dir and
assert get_chat_history(session_id) contains the messages produced by the first
agent (use DefaultSessionStore._load_session / get_chat_history to verify the
new message is visible). Ensure the test uses real agent.start() invocation and
validates text response content and session persistence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1d478e00-5cd8-4666-858e-fe9f68d3196b
📒 Files selected for processing (2)
src/praisonai-agents/praisonaiagents/session/store.pysrc/praisonai-agents/tests/unit/session/test_session_store.py
| if os.path.exists(filepath): | ||
| with FileLock(filepath, self.lock_timeout): | ||
| session = self._load_session_from_disk(session_id, filepath) | ||
| with self._lock: | ||
| self._cache[session_id] = session | ||
| return session | ||
|
|
||
| with FileLock(filepath, self.lock_timeout): | ||
| try: | ||
| with open(filepath, "r", encoding="utf-8") as f: | ||
| data = json.load(f) | ||
| session = SessionData.from_dict(data) | ||
| except (json.JSONDecodeError, IOError) as e: | ||
| logger.warning(f"Failed to load session {session_id}: {e}") | ||
| session = SessionData(session_id=session_id) | ||
|
|
||
|
|
||
| with self._lock: | ||
| if session_id in self._cache: | ||
| return self._cache[session_id] | ||
| session = SessionData(session_id=session_id) | ||
| self._cache[session_id] = session | ||
|
|
||
| return session | ||
| return session |
There was a problem hiding this comment.
Avoid TOCTOU race between existence check and cache return.
Line 271 checks file existence before locking, but Lines 279-280 can still return cached data if another instance creates/writes the file in between. That can reintroduce a stale-read edge case on this path.
Suggested fix
def _load_session(self, session_id: str) -> SessionData:
"""Load session from disk with file locking."""
filepath = self._get_session_path(session_id)
-
- # When a session file exists, always reload under FileLock so reads
- # from another DefaultSessionStore instance (or process) are visible.
- if os.path.exists(filepath):
- with FileLock(filepath, self.lock_timeout):
- session = self._load_session_from_disk(session_id, filepath)
- with self._lock:
- self._cache[session_id] = session
- return session
-
- with self._lock:
- if session_id in self._cache:
- return self._cache[session_id]
- session = SessionData(session_id=session_id)
- self._cache[session_id] = session
- return session
+ # Serialize existence/read decision to avoid stale cache races.
+ with FileLock(filepath, self.lock_timeout):
+ if os.path.exists(filepath):
+ session = self._load_session_from_disk(session_id, filepath)
+ else:
+ with self._lock:
+ session = self._cache.get(session_id) or SessionData(session_id=session_id)
+
+ with self._lock:
+ self._cache[session_id] = session
+ return session🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/praisonai-agents/praisonaiagents/session/store.py` around lines 271 -
283, Race between the os.path.exists check and returning cached data can cause a
stale-read; fix by checking the in-memory cache first, then if the file exists
acquire the FileLock and re-check the cache before loading from disk.
Concretely: in the method that uses filepath/session_id, do "with self._lock: if
session_id in self._cache: return self._cache[session_id]" then if
os.path.exists(filepath) acquire FileLock(filepath, self.lock_timeout) and,
while holding self._lock again, re-check self._cache (return if present) else
call _load_session_from_disk(session_id, filepath), set
self._cache[session_id]=session and return; if file does not exist create
SessionData(session_id=...) under self._lock, store and return. Ensure you
reference _load_session_from_disk, self._lock, self._cache, FileLock,
self.lock_timeout, session_id and filepath.
| def test_get_chat_history_sees_other_store_instance(self, temp_store): | ||
| """Another store on the same session_dir must see new messages without invalidate_cache.""" | ||
| with tempfile.TemporaryDirectory() as tmpdir: | ||
| writer = DefaultSessionStore(session_dir=tmpdir) | ||
| reader = DefaultSessionStore(session_dir=tmpdir) | ||
|
|
||
| writer.add_user_message("session-1", "first") | ||
| reader._load_session("session-1") | ||
| writer.add_user_message("session-1", "second") | ||
|
|
||
| history = reader.get_chat_history("session-1") | ||
| assert len(history) == 2 | ||
| assert history[1]["content"] == "second" |
There was a problem hiding this comment.
Add agent-level regression coverage for this feature path.
This new regression test is useful, but it only validates store internals. Per repo policy, this feature also needs a real agentic test (agent.start() with a real prompt and text response) to verify end-to-end behavior.
As per coding guidelines, "Real agentic tests are MANDATORY for every feature: Agent must call agent.start() with a real prompt, call the LLM, and produce actual text response—not just smoke tests of object construction".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/praisonai-agents/tests/unit/session/test_session_store.py` around lines
417 - 429, The unit test test_get_chat_history_sees_other_store_instance only
exercises DefaultSessionStore internals; add an end-to-end agentic regression
that uses Agent.start() to exercise the same cross-process session visibility:
create a temporary session_dir, instantiate an Agent (or two Agent instances)
configured to use DefaultSessionStore(session_dir=tmpdir), call agent.start()
with a real prompt to produce an actual text response (hitting the LLM), then
instantiate a separate Agent or DefaultSessionStore pointing at the same
session_dir and assert get_chat_history(session_id) contains the messages
produced by the first agent (use DefaultSessionStore._load_session /
get_chat_history to verify the new message is visible). Ensure the test uses
real agent.start() invocation and validates text response content and session
persistence.
Bug and impact
DefaultSessionStore.get_chat_history()could return stale or truncated history when anotherDefaultSessionStoreinstance (samesession_dir) or another process had written messages to disk. Users resuming managed-agent sessions, running bots with separate store instances, or using the JSON persistence examples could silently miss recent messages.Root cause
_load_session()short-circuited on the in-process cache whenever a session id was already cached, without re-reading the JSON file. Recent write-path fixes (#1724, #1727) used locked read-modify-write for mutations, but read paths still served stale cache entries.Fix and validation
_load_session()now reloads underFileLock(same as writes) before returning.test_get_chat_history_sees_other_store_instanceregression test.test_invalidate_cacheexpectations for correct reload behaviour.pytest tests/unit/session/test_session_store.py— 35 passed.Summary by CodeRabbit