fix: Security vulnerabilities in wrapper layer#1744
fix: Security vulnerabilities in wrapper layer#1744praisonai-triage-agent[bot] wants to merge 2 commits into
Conversation
- Fix 1: Gateway magic-link auth secret mismatch - export resolved token to environment for single source of truth across HTTP and WebSocket auth paths - Fix 2a: SSH sandbox timeout cleanup - add try/finally to ensure remote temp files are always cleaned up - Fix 2b: SSH sandbox remote process orphaning - wrap commands with timeout to ensure remote processes self-terminate - Fix 2c: Docker sandbox container orphaning - add container names and docker kill on timeout to stop actual containers - Fix 3: Cloud deploy API key exposure - replace inline argv secrets with secure temporary YAML file approach 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
|
@coderabbitai review |
|
/review |
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis PR hardens auth token consistency, sandbox workload lifecycle, and API key confidentiality across the gateway, cloud deployment, and sandbox execution paths. Auth token is now always exported to the environment at gateway init; deployment no longer exposes API keys in gcloud argv; and sandbox timeouts now reliably clean up remote files and force-stop running workloads. ChangesSecurity & Lifecycle Fixes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings. Review areas:
|
Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/174baeba-8504-435e-875b-60dc62d3972c Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>
Completed a full review and incorporated all available reviewer context before acting. Qodo had no review output (paused), CodeRabbit had only a “review in progress” status with no actionable findings, and there were no Gemini PR-thread findings to apply. I found and fixed one valid issue in Review summary:
One further improvement suggestion (optional) with code example: in SSH execution, consider fallback if remote |
|
@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding. Phase 1: Review per AGENTS.md
Phase 2: FIX Valid Issues Phase 3: Final Verdict |
|
Claude finished @MervinPraison's task —— View job Final Architecture Review ✅Acting as final architecture reviewer for security fixes in wrapper layer. Todo List:
Scope: Python packages only (praisonaiagents, praisonai) - excluding praisonai-rust and praisonai-ts. Phase 1: Architecture Review ✅✅ Protocol-Driven Core Compliance
✅ Backward Compatibility
✅ Performance Considerations
Phase 2: Security Fixes Analysis ✅Fix 1: Gateway Auth Token Sync ( os.environ["GATEWAY_AUTH_TOKEN"] = self.config.auth_token
Fix 2: Docker Timeout Resource Leaks ( # Kill the container, not just the client
kill_proc = await asyncio.create_subprocess_exec(
"docker", "kill", container_name, ...
)
Fix 3: SSH Timeout Resource Leaks ( timeout_cmd = f"timeout {int(timeout)} sh -c {shlex.quote(command)}"
# ... plus try/finally cleanup in lines 221-225
Fix 4: Cloud Deploy API Key Exposure ( # Create temporary env vars file to avoid exposing secrets in argv
fd, env_vars_file = tempfile.mkstemp(suffix=".yaml", prefix="praisonai-deploy-")
os.chmod(env_vars_file, 0o600) # Secure file permissions
Phase 3: Final Verdict ✅All Issues Resolved: ✅ No blocking architectural or security issues remain Code Quality: ✅ All modules pass syntax validation Reviewer Feedback: ✅ Copilot's valid issue properly addressed with regression test Architecture Compliance: ✅ Perfect adherence to AGENTS.md guidelines APPROVED 🚀This PR successfully addresses three critical security vulnerabilities while maintaining perfect architectural compliance. All fixes are properly implemented in the wrapper layer with appropriate test coverage. Ready for merge. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/praisonai/praisonai/sandbox/ssh.py (1)
537-543: 💤 Low valueDouble timeout wrapping when called from
execute().When
execute()is used,_build_command()(lines 510-512) already wraps the command withtimeout {seconds}. Then_run_command_with_limits()wraps it again, resulting in:timeout 30 sh -c 'timeout 30 python3 script.py'This is redundant but not harmful. Consider either:
- Removing the timeout from
_build_command()since_run_command_with_limits()now handles it, or- Adding a flag to skip the wrapper here if the command already has timeout
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/praisonai/praisonai/sandbox/ssh.py` around lines 537 - 543, The _build_command() function is redundantly wrapping commands with "timeout" while _run_command_with_limits() already applies the remote timeout; remove the timeout wrapping from _build_command() (the block that constructs timeout_cmd and uses it to form full_command) so it only returns "cd <working_dir> && <command>", and rely on _run_command_with_limits() to add the timeout; ensure callers like execute() and any tests still get the correct behavior after this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/praisonai/praisonai/sandbox/docker.py`:
- Around line 259-263: The execute() path still can leave orphaned containers:
update _build_docker_command() to accept or generate the same deterministic
container_name (e.g., f"praisonai-{execution_id}") and include the "--name",
container_name pair in the returned docker command just like run_command() does,
and modify execute()'s timeout handler to invoke a docker kill for that
container_name (in addition to proc.kill()) so the container is forcibly removed
on timeout; reference the _build_docker_command() function to add the name and
the execute() method's timeout handler to add the docker kill call.
---
Nitpick comments:
In `@src/praisonai/praisonai/sandbox/ssh.py`:
- Around line 537-543: The _build_command() function is redundantly wrapping
commands with "timeout" while _run_command_with_limits() already applies the
remote timeout; remove the timeout wrapping from _build_command() (the block
that constructs timeout_cmd and uses it to form full_command) so it only returns
"cd <working_dir> && <command>", and rely on _run_command_with_limits() to add
the timeout; ensure callers like execute() and any tests still get the correct
behavior after this change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9e5f371a-2130-4b67-af2c-433ca868b433
📒 Files selected for processing (4)
src/praisonai/praisonai/deploy.pysrc/praisonai/praisonai/gateway/server.pysrc/praisonai/praisonai/sandbox/docker.pysrc/praisonai/praisonai/sandbox/ssh.py
| ['gcloud', 'run', 'deploy', 'praisonai-service', | ||
| '--image', f'us-central1-docker.pkg.dev/{project_id}/praisonai-repository/praisonai-app:latest', | ||
| '--platform', 'managed', '--region', 'us-central1', '--allow-unauthenticated', | ||
| '--env-vars-file', env_vars_file] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if gcp.py still embeds secrets in --set-env-vars argv
rg -n --type=py '\-\-set-env-vars' -C3Repository: MervinPraison/PraisonAI
Length of output: 725
🏁 Script executed:
#!/bin/bash
# Check if this alternate provider is used anywhere
rg -n --type=py 'GCPDeployProvider|gcp\.py|providers\.gcp' -C2Repository: MervinPraison/PraisonAI
Length of output: 2542
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect deploy.py around the referenced snippet
echo "---- deploy.py (around lines 160-210) ----"
sed -n '140,210p' src/praisonai/praisonai/deploy.py | nl -ba | sed -n '1,120p'
# Inspect gcp.py around the --set-env-vars usage
echo "---- gcp.py (around lines 80-160) ----"
sed -n '80,170p' src/praisonai/praisonai/deploy/providers/gcp.py | nl -ba | sed -n '1,120p'
# Check what keys are placed into env_vars (search for OPENAI_API_KEY specifically)
echo "---- Search for OPENAI_API_KEY in deploy/providers and config building ----"
rg -n "OPENAI_API_KEY" src/praisonai/praisonai/deploy -S || true
echo "---- Search for any _API_KEY placed into env_vars in deploy ----"
rg -n "(API_KEY|_API_KEY)" src/praisonai/praisonai/deploy -S || true
# Find where config.env_vars is populated for GCPProvider
echo "---- Search for env_vars assignment for GCPProvider/config ----"
rg -n "env_vars\s*=" src/praisonai/praisonai/deploy -S || true
rg -n "config\.env_vars" src/praisonai/praisonai/deploy -S || trueRepository: MervinPraison/PraisonAI
Length of output: 171
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- deploy.py (around lines 160-210) ----"
sed -n '140,210p' src/praisonai/praisonai/deploy.py | awk '{print NR+139 ":\t"$0}'
echo "---- gcp.py (around lines 80-170) ----"
sed -n '80,170p' src/praisonai/praisonai/deploy/providers/gcp.py | awk '{print NR+79 ":\t"$0}'
echo "---- Search for OPENAI_API_KEY in deploy ----"
rg -n "OPENAI_API_KEY" src/praisonai/praisonai/deploy -S || true
echo "---- Search for any _API_KEY placed into env_vars in deploy ----"
rg -n "(API_KEY|_API_KEY)" src/praisonai/praisonai/deploy -S || true
echo "---- Search for env_vars assignment for GCPProvider/config ----"
rg -n "env_vars\s*=" src/praisonai/praisonai/deploy -S || true
rg -n "config\.env_vars" src/praisonai/praisonai/deploy -S || trueRepository: MervinPraison/PraisonAI
Length of output: 9217
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- Where CloudConfig.env_vars is defined/used ----"
rg -n "class CloudConfig|env_vars" src/praisonai/praisonai/deploy/models -S || true
rg -n "env_vars" src/praisonai/praisonai -S | head -n 200
echo "---- Where env_vars is populated with OPENAI_API_KEY (any module) ----"
rg -n "env_vars.*OPENAI_API_KEY|OPENAI_API_KEY.*env_vars|OPENAI_API_KEY" src/praisonai/praisonai -S
echo "---- How GCP provider is selected/used ----"
rg -n "get_provider|GCPProvider|providers\.gcp|deploy/providers/gcp\.py|cloud.*gcp" src/praisonai/praisonai -SRepository: MervinPraison/PraisonAI
Length of output: 29867
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- Find CloudConfig construction in deploy/* ----"
rg -n "CloudConfig\s*\(" src/praisonai/praisonai/deploy -S
echo "---- Search for env_vars population in deploy/main.py and deploy/* ----"
rg -n "env_vars" src/praisonai/praisonai/deploy -S
echo "---- Inspect deploy/main.py around provider deployment (likely where CloudConfig/env_vars are set) ----"
sed -n '1,260p' src/praisonai/praisonai/deploy/main.py | awk '{print NR ":" $0}' | sed -n '1,260p'
echo "---- Inspect deploy/main.py further (later sections) ----"
sed -n '260,420p' src/praisonai/praisonai/deploy/main.py | awk '{print NR+260 ":" $0}' | sed -n '1,200p'
echo "---- Inspect deploy/schema.py / related models for env_vars semantics ----"
ls -1 src/praisonai/praisonai/deploy
rg -n "env_vars" src/praisonai/praisonai/deploy/schema.py src/praisonai/praisonai/deploy/models.py 2>/dev/null || trueRepository: MervinPraison/PraisonAI
Length of output: 14479
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- Inspect deploy/models.py (CloudConfig) ----"
sed -n '1,140p' src/praisonai/praisonai/deploy/models.py | nl -ba
echo "---- Inspect deploy/schema.py for env_vars/cloud parsing ----"
sed -n '1,220p' src/praisonai/praisonai/deploy/schema.py | nl -ba
echo "---- Search schema.py for env_vars keys ----"
rg -n "env_vars|env-vars|environment" src/praisonai/praisonai/deploy/schema.py -S || true
echo "---- Inspect top-level deploy.py for flow/entrypoints and whether it uses provider/gcp.py ----"
sed -n '1,260p' src/praisonai/praisonai/deploy.py | nl -ba
echo "---- Does top-level deploy.py import/use deploy/providers/gcp.py or CloudConfig? ----"
rg -n "deploy\.providers\.gcp|GCPProvider|CloudConfig|providers\/gcp" src/praisonai/praisonai/deploy.py -S || true
echo "---- Where is deploy.py used from CLI (entrypoints)? ----"
rg -n "praisonai\.deploy\.py|from praisonai import deploy|import .*deploy\.py|import praisonai\.deploy|deploy\.py" src/praisonai/praisonai/cli -S || true
echo "---- Where is provider-based cloud deploy used from CLI? ----"
rg -n "DeployType\.CLOUD|_deploy_cloud|get_provider\(self\.config\.cloud\)|from praisonai\.deploy import Deploy" src/praisonai/praisonai/cli -S || trueRepository: MervinPraison/PraisonAI
Length of output: 177
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- deploy/models.py (first 120 lines) ----"
sed -n '1,140p' src/praisonai/praisonai/deploy/models.py | cat -n | sed -n '1,140p'
echo "---- deploy/schema.py (first 220 lines) ----"
sed -n '1,220p' src/praisonai/praisonai/deploy/schema.py | cat -n
echo "---- Search schema.py for env_vars/cloud parsing ----"
rg -n "env_vars|env-vars|cloud_data|CloudConfig|service_name|provider" src/praisonai/praisonai/deploy/schema.py -S || true
echo "---- top-level deploy.py (first 320 lines) ----"
sed -n '1,320p' src/praisonai/praisonai/deploy.py | cat -n | sed -n '1,320p'
echo "---- Search top-level deploy.py for provider usage ----"
rg -n "GCPProvider|get_provider|providers/gcp|CloudConfig|deploy/providers" src/praisonai/praisonai/deploy.py -S || true
echo "---- CLI usage: where Deploy class is invoked for cloud deploy ----"
rg -n "DeployType\.CLOUD|_deploy_cloud|Deploy\(|praisonai\.deploy" src/praisonai/praisonai/cli -S || trueRepository: MervinPraison/PraisonAI
Length of output: 28942
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- Is praisonai/deploy a package? (check __init__.py) ----"
ls -la src/praisonai/praisonai/deploy
ls -la src/praisonai/praisonai/deploy/__init__.py || true
cat src/praisonai/praisonai/deploy/__init__.py 2>/dev/null || true
echo "---- Inspect cli/features/deploy.py for which Deploy is used for CLOUD (GCP) ----"
sed -n '1,140p' src/praisonai/praisonai/cli/features/deploy.py | cat -n
sed -n '140,260p' src/praisonai/praisonai/cli/features/deploy.py | cat -n
sed -n '260,460p' src/praisonai/praisonai/cli/features/deploy.py | cat -n
echo "---- Inspect deploy/providers/base.py for env_vars/secrets handling ----"
sed -n '1,220p' src/praisonai/praisonai/deploy/providers/base.py | cat -n
echo "---- Inspect deploy/providers/gcp.py (class + where env_vars comes from) ----"
sed -n '1,220p' src/praisonai/praisonai/deploy/providers/gcp.py | cat -n
echo "---- Search providers for env_vars construction/assignment ----"
rg -n "env_vars\s*=" src/praisonai/praisonai/deploy/providers -S
rg -n "OPENAI_API_KEY|api_key|secrets" src/praisonai/praisonai/deploy/providers -SRepository: MervinPraison/PraisonAI
Length of output: 38666
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "env_vars" src/praisonai/tests -S || true
rg -n "OPENAI_API_KEY" src/praisonai/tests -S || true
rg -n "deploy.*gcp|gcp.*deploy|GCPProvider" src/praisonai/tests -S || trueRepository: MervinPraison/PraisonAI
Length of output: 26517
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- Inspect tests for GCP provider in test_providers.py (around relevant tests) ----"
sed -n '1,260p' src/praisonai/tests/unit/deploy/test_providers.py | cat -n
echo "---- Inspect later parts of test_providers.py (after 260) ----"
sed -n '260,420p' src/praisonai/tests/unit/deploy/test_providers.py | cat -nRepository: MervinPraison/PraisonAI
Length of output: 12285
GCP provider still exposes secrets via gcloud run deploy --set-env-vars argv
src/praisonai/praisonai/deploy/providers/gcp.py’s GCPProvider.deploy() constructs env_str from self.config.env_vars and passes it as --set-env-vars <k=v,...>; if env_vars contains API keys/secrets, they will end up in the gcloud command line (process tables/command logs). This remains unaddressed compared to the src/praisonai/praisonai/deploy.py fix that switches to --env-vars-file.
- Update the GCP provider deploy path to use a secure temp env-var file +
--env-vars-file(mode0600) instead of--set-env-vars.
| # Generate container name for timeout cleanup | ||
| container_name = f"praisonai-{execution_id}" | ||
|
|
||
| docker_cmd = [ | ||
| "docker", "run", "--rm", | ||
| "docker", "run", "--rm", "--name", container_name, |
There was a problem hiding this comment.
Container naming logic is correct, but the same fix is missing from execute().
The deterministic container naming and --name flag correctly address the orphan container issue for run_command(). However, the execute() method (lines 145-151) uses _build_docker_command() which does not add --name, and its timeout handler (lines 180-191) only calls proc.kill() without docker kill. This leaves execute() vulnerable to the same orphan container problem this PR aims to fix.
Consider applying the same pattern to execute() by:
- Adding container naming to
_build_docker_command() - Adding
docker killto the timeout handler inexecute()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/praisonai/praisonai/sandbox/docker.py` around lines 259 - 263, The
execute() path still can leave orphaned containers: update
_build_docker_command() to accept or generate the same deterministic
container_name (e.g., f"praisonai-{execution_id}") and include the "--name",
container_name pair in the returned docker command just like run_command() does,
and modify execute()'s timeout handler to invoke a docker kill for that
container_name (in addition to proc.kill()) so the container is forcibly removed
on timeout; reference the _build_docker_command() function to add the name and
the execute() method's timeout handler to add the docker kill call.
Greptile SummaryThis PR fixes three security issues in the gateway/sandbox/deploy layer: unifying the auth token source of truth, preventing resource leaks on sandbox timeout, and avoiding API key exposure in process arguments.
Confidence Score: 4/5Safe to merge; all three stated fixes are implemented correctly and the Docker/deploy changes are straightforward improvements with no regressions. The gateway token write-back changes process-global state in init, which works correctly for the intended single-instance deployment but could silently overwrite a concurrent instance's token in the same process. The SSH _run_command_with_limits now double-wraps the timeout for the execute_code path (harmless, but the redundancy suggests the _build_command timeout injection could be removed now). Neither issue affects correctness in normal deployment. ssh.py — the interaction between _build_command's timeout injection and the new _run_command_with_limits wrapper is worth a second look to decide whether one of the two wrappings should be removed. Important Files Changed
Reviews (1): Last reviewed commit: "fix: ensure gateway auth token env stays..." | Re-trigger Greptile |
| # Build command with remote-side timeout if needed | ||
| if timeout: | ||
| # Use timeout command to ensure remote process terminates | ||
| timeout_cmd = f"timeout {int(timeout)} sh -c {shlex.quote(command)}" | ||
| full_command = f"cd {shlex.quote(working_dir)} && {timeout_cmd}" | ||
| else: | ||
| full_command = f"cd {shlex.quote(working_dir)} && {command}" |
There was a problem hiding this comment.
Double remote-side timeout in
execute_code path
_run_command_with_limits now wraps the incoming command with timeout {N} sh -c '...', but for the execute_code caller the command string was already built by _build_command, which injects timeout {N} directly. The final remote command becomes cd /wd && timeout 30 sh -c 'ulimit -v … && timeout 30 python /file.py' — two nested timeout calls for the same duration. The fix is effective and needed for the execute_file path (which skips _build_command), but the inner timeout from _build_command is now redundant for execute_code.
| if self.config.auth_token: | ||
| os.environ["GATEWAY_AUTH_TOKEN"] = self.config.auth_token |
There was a problem hiding this comment.
Auth token written to process environment unconditionally
os.environ["GATEWAY_AUTH_TOKEN"] is now set during __init__, meaning every subprocess spawned from this point forward inherits the token. In a typical single-gateway deployment this is the intended single-source-of-truth effect, but in multi-tenant or test-runner scenarios where multiple WebSocketGateway instances are constructed in the same process, a later instance's token will silently overwrite an earlier one's in the global env, potentially causing auth mismatches for already-running request handlers that re-read the env var.
Fixes #1743
Summary
This PR addresses three critical security vulnerabilities identified in the PraisonAI wrapper layer:
Fix 1: Gateway Magic-Link Auth Secret Mismatch
Problem: HTTP/magic-link verification and WebSocket verification used different secret sources, breaking default login flow and creating auth inconsistencies.
Solution: Export resolved token to environment to ensure single source of truth across all auth paths.
Fix 2: Sandbox Timeout Resource Leaks
Problem:
Solution:
Fix 3: Cloud Deploy API Key Exposure
Problem: OPENAI_API_KEY exposed in gcloud argv (visible in process table/CI logs).
Solution: Replace inline argv secrets with secure temporary YAML file approach.
Security Impact
Files Changed
Generated with Claude Code
Summary by CodeRabbit