Skip to content

Add proactive port conflict detection before container start#62

Open
Joxx0r wants to merge 3 commits intomainfrom
feature/proactive-port-conflict-detection
Open

Add proactive port conflict detection before container start#62
Joxx0r wants to merge 3 commits intomainfrom
feature/proactive-port-conflict-detection

Conversation

@Joxx0r
Copy link
Collaborator

@Joxx0r Joxx0r commented Feb 24, 2026

Summary

  • Before running docker compose up, the backend now checks if workspace ports are in use by non-Docker processes
  • If a conflict is found, the frontend shows clear "Kill & Start" and "Start Anyway" options instead of a cryptic Docker error
  • Extracted shared checkPortInUse() and killPort() helpers, removing duplicate code from existing port-check/port-kill API handlers
  • Updated all three frontend call sites: workspace list Start button, wizard Start Container, and rebuild flow

Test plan

  • Start a process on a workspace port (e.g. wsl -- bash -c "node -e 'require(\"net\").createServer().listen(3847)'")
  • Try starting the workspace from the setup dashboard — should see port conflict message with process info
  • Click "Kill & Start" — process should be killed and container should start
  • Click "Start Anyway" — should attempt Docker start regardless (may fail if port still occupied)
  • With no conflicts, container should start normally with no extra delay
  • Test the same flow from the wizard (step 5) and from the rebuild button

🤖 Generated with Claude Code

Before running docker compose up, check if the workspace port is already
in use by a non-Docker process. If a conflict is found, return it to the
frontend which shows "Kill & Start" and "Start Anyway" buttons. This
avoids cryptic Docker errors and gives users a clear resolution path.

Backend:
- Add port pre-check to POST /api/docker/start with force/killConflicts opts
- Extract checkPortInUse() and killPort() shared helpers from existing handlers
- Skip docker-proxy/com.docker processes (container itself)

Frontend:
- Handle portConflicts response in startWorkspace(), startContainer(),
  and startAfterRebuild() with Kill & Start / Start Anyway buttons
- Remove dead killPortProcess() and killPortAndRetryWizard() functions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #62 — Proactive port conflict detection

The overall approach is solid — moving conflict detection ahead of Docker start gives a much better UX, and extracting checkPortInUse()/killPort() is a good deduplication. A few issues need fixing before merge.


Bug: startContainer() doesn't reset button text on port conflict

setup.htmlstartContainer():

btn.disabled = true;
btn.textContent = 'Starting...';   // set here

// ...if port conflict:
btn.disabled = false;              // re-enabled, but text stays "Starting..."
return;

The button is re-enabled with the text still showing "Starting..." — users see a clickable button labelled "Starting...". startWorkspace() correctly resets btn.textContent = 'Start' in the same path; startContainer() must do the same. The button's original label is "Start Container".


XSS: c.port and c.pid interpolated into innerHTML without escaping

In three places (startWorkspace, startContainer, startAfterRebuild):

const procInfo = c.process ? `${esc(c.process)} (PID ${c.pid})` : 'unknown process';
statusEl.innerHTML = `Port ${c.port} is in use by ${procInfo}. ...`;

c.process is correctly escaped, but c.port and c.pid are not passed through esc(). In practice they are an integer and parsed-integer respectively, so there's no realistic attack vector here. But this breaks the consistent escaping discipline used throughout the file. Use esc(String(c.port)) / esc(String(c.pid)) to be safe and consistent.


Fragile DOM traversal in inline onclick attributes

onclick="startWorkspace('${escAttr(name)}', this.closest('.ws-actions').querySelector('.btn-success, .btn-primary'), {killConflicts:true})"

this inside an onclick attribute refers to the button the click fires on (the dynamically-added "Kill & Start" button), not the original Start button. this.closest('.ws-actions') traverses up to find the actions container, then queries for the original Start button. This is fragile:

  • If .ws-actions is not an ancestor (e.g. the button renders outside that container), closest() returns null, causing a crash.
  • The selector '.btn-success, .btn-primary' matches the first green/primary button — any layout change could grab the wrong element.

Consider giving the original Start button a stable id or data- attribute so the onclick can reference it directly, or pass the workspace name and look it up by a data-workspace selector.


500 ms kill wait may be insufficient

await new Promise(r => setTimeout(r, 500));
// then immediately runs docker compose up

The old retry flow waited 1500 ms. Some processes (especially under WSL) can take longer to fully release a port after SIGKILL. Consider polling checkPortInUse() in a short loop instead of a fixed sleep, or at minimum extend the wait to 1000–1500 ms.


No automated tests for backend logic

The new checkPortInUse() / killPort() helpers and the port-conflict pre-check in /api/docker/start have no unit tests. The PR test plan is entirely manual. Given this is process-killing code, a few tests mocking execSync to verify: conflict detection returns correct shape, docker-proxy is skipped, killConflicts triggers kill then proceeds, force skips the check entirely, would meaningfully reduce regression risk.


Minor: docker-compose.yml changes are unrelated to the PR title

The docker-compose rename (discovery/editormainplay) and volume naming changes are infrastructure refactoring unrelated to port conflict detection. Worth splitting into a separate PR so this one stays focused and reviewable.


Summary: The port conflict bug in startContainer() and the escaping inconsistency are the must-fix items. The DOM traversal fragility is worth addressing before this ships. The 500 ms wait and test coverage are lower priority but worth noting.

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Add proactive port conflict detection before container start

Overall the approach is solid — moving detection before docker compose up gives much better UX than parsing Docker error strings after the fact. The helper extraction is a clean refactor. A few issues worth addressing before merging.


Security: Missing port validation in checkPortInUse (Medium)

// src/setup-gui.js
function checkPortInUse(port) {
  // ...
  output = execSync(`ss -tlnp 2>/dev/null | grep ':${port} ' || true`, ...)
}

The function uses port directly in a shell string without validating it's a safe integer. The existing /api/port-check and /api/port-kill handlers validate with parseInt before calling in, but the new internal call in /api/docker/start passes ws.port from config without validation:

const info = checkPortInUse(ws.port);  // ws.port comes straight from workspaces config

If a workspace config had a tampered port value (e.g. "port": "3847; malicious_command"), it would execute shell commands. The attack surface is limited to local config files, but the helper itself should be defensive. Fix: add port = parseInt(port, 10); if (!port || port < 1 || port > 65535) return { inUse: false }; at the top of checkPortInUse.


Bug: Null button dereference in inline onclick handlers (Medium)

The conflict UI injects onclick strings that try to find the original Start button via DOM traversal:

statusEl.innerHTML = `... onclick="startWorkspace('${escAttr(name)}', 
  this.closest('.ws-actions').querySelector('.btn-success, .btn-primary'), 
  {killConflicts:true})" ...`

If .ws-actions doesn't exist in the parent chain, or the button selector doesn't match (e.g. class changes), querySelector returns null. Then inside startWorkspace:

btn.disabled = true;  // TypeError: Cannot set properties of null

The function handles statusEl being null but assumes btn is always valid. This could silently break the Kill & Start flow. Consider passing the button reference differently, or adding a null guard at the top of startWorkspace.


Bug: startAfterRebuild has no button disable/re-enable (Low)

startWorkspace and startContainer both disable their button before the async fetch to prevent double-submission. startAfterRebuild does not disable anything — the Kill & Start / Start Anyway buttons remain clickable during the in-flight request, which can trigger duplicate concurrent starts.


Unrelated changes in docker-compose.yml (Note)

The diff renames workspaces (discovery/editormainplay), reduces memory limits, and adds named volumes. These aren't mentioned in the PR description and appear to be workspace-specific config changes bundled with the feature. If intentional, worth calling out explicitly; if not, should be a separate commit/PR.


Minor: Helper functions defined after their call sites

checkPortInUse and killPort are defined around line 1463 but called from the /api/docker/start handler at ~line 1244. JavaScript function hoisting makes this work, but it's easier to follow if helpers are defined before use.


What's working well

  • Proactive detection before Docker start is the right approach
  • Filtering out docker-proxy/com.docker processes is correct
  • Helper extraction removes real duplication from the port-check/port-kill handlers
  • Frontend fallback path (parsing Docker error strings) is preserved as a safety net
  • escAttr usage in onclick attributes and esc in displayed text looks correct

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Add proactive port conflict detection before container start

Overall the approach is solid — moving detection before docker compose up gives much better UX than parsing Docker error strings after the fact. The helper extraction is a clean refactor. A few issues worth addressing before merging.


Security: Missing port validation in checkPortInUse (Medium)

The function uses port directly in a shell string without validating it is a safe integer. The existing /api/port-check and /api/port-kill handlers validate with parseInt before calling in, but the new internal call in /api/docker/start passes ws.port from config without validation:

const info = checkPortInUse(ws.port);  // ws.port comes straight from workspaces config

If a workspace config had a tampered port value (e.g. port: "3847; malicious_command"), it would execute shell commands. The attack surface is limited to local config files, but the helper itself should be defensive.

Fix: add port validation at the top of checkPortInUse:

function checkPortInUse(port) {
  port = parseInt(port, 10);
  if (!port || port < 1 || port > 65535) return { inUse: false };
  // ...
}

Bug: Null button dereference in inline onclick handlers (Medium)

The conflict UI injects onclick strings that find the original Start button via DOM traversal:

statusEl.innerHTML = `... onclick="startWorkspace('...', 
  this.closest('.ws-actions').querySelector('.btn-success, .btn-primary'), 
  {killConflicts:true})" ...`

If .ws-actions does not exist in the parent chain, querySelector returns null. Inside startWorkspace:

btn.disabled = true;  // TypeError: Cannot set properties of null

The function handles statusEl being null but assumes btn is always valid. Add a null guard at the top of startWorkspace, or pass the btn reference more robustly.


Bug: startAfterRebuild has no button disable/re-enable (Low)

startWorkspace and startContainer both disable their button before the async fetch to prevent double-submission. startAfterRebuild does not — the Kill & Start / Start Anyway buttons remain clickable during the in-flight request, risking duplicate concurrent starts.


Unrelated changes in docker-compose.yml (Note)

The diff renames workspaces (discovery/editormain to play), reduces memory limits, and adds named volumes. These are not mentioned in the PR description and appear to be workspace-specific config changes bundled with the feature. If intentional, worth calling out explicitly; if not, should be a separate commit/PR.


Minor: Helpers defined after their call sites

checkPortInUse and killPort are defined around line 1463 but called from the /api/docker/start handler at ~line 1244. JavaScript hoisting makes this work but harder to follow. Consider moving helpers above their first call site.


What is working well

  • Proactive detection before Docker start is the right approach
  • Filtering out docker-proxy/com.docker processes is correct
  • Helper extraction removes real duplication from the port-check/port-kill handlers
  • Frontend fallback path (parsing Docker error strings) is preserved as a safety net
  • escAttr usage in onclick attributes and esc in displayed text looks correct

- Fix startContainer() not resetting button text on port conflict
- Escape c.port and c.pid with esc() in all innerHTML interpolations
- Use data-workspace attribute for stable Start button lookup instead
  of fragile this.closest('.ws-actions') DOM traversal
- Increase kill wait from 500ms to 1000ms for WSL process release time

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Joxx0r
Copy link
Collaborator Author

Joxx0r commented Feb 24, 2026

Addressed review feedback in 342351d:

  1. Bug fix: startContainer() now resets button text to "Start Container" on port conflict (was stuck on "Starting...")
  2. XSS consistency: All c.port and c.pid values now passed through esc() before innerHTML interpolation
  3. DOM stability: Start button now has data-workspace attribute; dynamic onclick handlers use document.querySelector('[data-workspace="..."]') instead of fragile this.closest('.ws-actions') traversal
  4. Kill wait: Increased from 500ms to 1000ms to give WSL processes more time to release ports

Not addressed (lower priority):

  • Automated tests: Would be valuable but out of scope for this PR
  • docker-compose.yml split: Pre-existing change, can be separated in a follow-up

- Validate port as integer in checkPortInUse() to prevent shell injection
  from tampered workspace config values
- Add null guard at top of startWorkspace() for missing btn reference
- Disable buttons in startAfterRebuild() during in-flight request to
  prevent duplicate concurrent starts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Joxx0r
Copy link
Collaborator Author

Joxx0r commented Feb 24, 2026

Addressed second review round in e1ebd05:

  1. Security: Added port validation (parseInt + range check) at the top of checkPortInUse() to prevent shell injection from tampered config values
  2. Null guard: Added if (!btn) return; at the top of startWorkspace() to prevent TypeError if the button reference is null
  3. Double-submit prevention: startAfterRebuild() now disables any action buttons in the status area before making the fetch request

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Add proactive port conflict detection before container start

This PR has gone through two prior review rounds and most of the previously flagged issues have been addressed well. The core approach is correct and the escaping discipline in the new proactive path is solid. However, there are a few remaining issues that need fixes before merging.


Must-fix issues

1. startContainer() does not re-enable the button in the fallback Docker error path

File: public/setup.htmlstartContainer(), the else block (fallback Docker error string parsing)

The proactive conflict path correctly resets btn.textContent and btn.disabled, but the older fallback path (which parses Docker error output for address already in use) was missed. If the fallback fires and the port regex fails to match, the user is left with a permanently disabled "Starting..." button and must reload the page.

Fix: add btn.textContent = 'Start Container'; btn.disabled = false; before each status.innerHTML/status.textContent assignment in the fallback else block — consistent with how startWorkspace handles errors.

2. startAfterRebuild missing null guard for status element

File: public/setup.htmlstartAfterRebuild()

function startAfterRebuild(name, opts) {
  const status = document.getElementById('ws-rebuild-status-' + name);
  status.querySelectorAll('button')...  // TypeError if status is null

If the element doesn't exist (UI re-rendered between click and function invocation), this crashes with a TypeError. startWorkspace and startContainer both guard with if (!statusEl) return;. One-line fix: add if (!status) return; at the top of startAfterRebuild.

3. killPort() — no integer validation on pid before shell interpolation

File: src/setup-gui.jskillPort() function

pid comes from checkPortInUse() which stores it as parseInt(pidMatch[1], 10) — so it's an integer in practice. However, killPort itself doesn't assert this before interpolating into the shell string. If checkPortInUse is ever changed, this becomes a shell injection vector. Suggested guard:

if (typeof pid !== 'number' || !Number.isFinite(pid) || pid <= 0) return { killed: false };

Low-priority notes (no fix required)

startAfterRebuild button disable is moot: The querySelectorAll('button').forEach(b => b.disabled = true) runs right before status.textContent = '...' which removes all child nodes, discarding the buttons. The double-submission window is effectively closed (buttons are gone), but the code is misleading. Not a bug but worth a cleanup comment.

Only the first port conflict is shown: The backend returns portConflicts as an array but the frontend always uses portConflicts[0]. Multi-workspace starts could have multiple conflicts; the user resolves them one at a time. Functional but worth a follow-up.

No unit tests for checkPortInUse/killPort: Acknowledged by the author as out of scope. These functions kill OS processes — please file a follow-up issue for regression coverage.

docker-compose.yml changes are unrelated to the feature (workspace rename, memory limit tweaks, named volumes). Fine to include if intentional.


What's working well

  • The pre-check-then-act pattern is the right approach; parsing Docker errors after the fact was brittle.
  • The docker-proxy/com.docker filter correctly avoids killing Docker's own port bindings.
  • escAttr() used in onclick attributes, esc() used in displayed text — escaping is consistent in the new path.
  • data-workspace attribute for button lookup is stable; if (!btn) return; guards are present in startWorkspace.
  • Port integer validation in checkPortInUse defends against shell injection from malformed config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant