Skip to content

fix(kiloclaw): Handle Fly 400 "no capacity" as a capacity error#787

Merged
pandemicsyn merged 1 commit intomainfrom
florian/fix/new-cap-error
Mar 3, 2026
Merged

fix(kiloclaw): Handle Fly 400 "no capacity" as a capacity error#787
pandemicsyn merged 1 commit intomainfrom
florian/fix/new-cap-error

Conversation

@pandemicsyn
Copy link
Contributor

@pandemicsyn pandemicsyn commented Mar 3, 2026

Summary

  • Fly returns 400 with {"error":"no capacity"} on createVolume when a region lacks capacity. This was not recognized by isFlyInsufficientResources, so provision failed with an unhandled error instead of falling back to the next region.
  • Adds 400 to CAPACITY_STATUS_CODES and "no capacity" to CAPACITY_MARKERS
  • Includes the region in createVolume error context for easier debugging (e.g. Fly API createVolume [iad] failed (400))

Capacity error codes handled (after this PR)

Status Body marker During/because
400 "no capacity" createVolume — observed in production, not sure why
412 "insufficient resources" Machine creation on volume-pinned host
409 "insufficient memory" Host memory exhausted on start/update
403 "over the allowed quota" Org regional memory quota exceeded

Testing

pnpm typecheck && pnpm test && pnpm lint — all pass (491 tests).

Fly returns 400 with {"error":"no capacity"} on createVolume when a region
lacks capacity. This was not recognized by isFlyInsufficientResources, causing
provision to fail instead of falling back to the next region.

- Add 400 to CAPACITY_STATUS_CODES and "no capacity" to CAPACITY_MARKERS
- Include region in createVolume error context for easier debugging
- Add tests for the new 400 capacity case and non-capacity 400 rejection
@pandemicsyn pandemicsyn changed the title Handle Fly 400 "no capacity" as a capacity error with region fallback fix(kiloclaw): Handle Fly 400 "no capacity" as a capacity error Mar 3, 2026
@pandemicsyn pandemicsyn marked this pull request as ready for review March 3, 2026 21:56
@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Mar 3, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File Line Issue
kiloclaw/src/fly/client.ts 204 SUGGESTION: The JSDoc on createVolumeWithFallback says "On capacity-related 412 errors, the next region is tried" but now 400 errors are also capacity-related. Consider updating to e.g. "On capacity-related errors (400/412), the next region is tried" for consistency with the rest of this PR.
Files Reviewed (2 files)
  • kiloclaw/src/fly/client.ts - 0 issues in diff
  • kiloclaw/src/fly/client.test.ts - 0 issues

body: JSON.stringify(request),
});
await assertOk(resp, 'createVolume');
await assertOk(resp, `createVolume [${request.region}]`);
Copy link
Contributor Author

@pandemicsyn pandemicsyn Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly out of morbid curiosity, but having the region handy here might be helpful later

@pandemicsyn pandemicsyn merged commit 3276266 into main Mar 3, 2026
13 checks passed
@pandemicsyn pandemicsyn deleted the florian/fix/new-cap-error branch March 3, 2026 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants