Production httpbin site randomly timing out

A few days ago I noticed https://httpbin.dmuth.org/ started hanging for no reason.  My dashboards would look like this:

![Screenshot by Dropbox Capture](https://github.com/dmuth/fastapi-httpbin/assets/374060/c9f6aacc-5df0-4b08-b09a-f3a253732f7a)

![Screenshot by Dropbox Capture](https://github.com/dmuth/fastapi-httpbin/assets/374060/5a5e152d-7256-4447-a63d-465999da6ecf)

...and I started seeing errors like these in the logs from fly.io:

```
could not find a good candidate within 90 attempts at load balancing. last error: no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the 'immediate' strategy? have your app's instances all reached their hard limit?)
```

I then SSHed into the instance and saw that `uvicorn` was using 100% of the CPU.

I also poked around in `/proc/` and saw that there was only about a dozen file descriptors open, so it's not a resource exhaustion issue.

I tried the following things so far, but have been unable to resolve it:

- ✅ Restarting the VM
- ✅ Changing the count of machines with the `fly scale` command to 0 and then 1 to spin up a new machine
- ✅ Running `fly deploy` again
- ✅ Turning off Fly's raw TCP check, thinking it was tripping up Uvicorn somehow.

I am continuing to investigate, and have a few other things to try:

- ✅ Turning off the HTTP check from Fly.io
- ✅ Adjusting the URLs that NodePing is hitting
- ✅ Upgrading FastAPI to the latest version and redeploying (this is in progress)
- ✅ Increase the number of workers to 3
- Seeing if I can capture log output from Uvicorn by setting an environment variable.
- Changing the server to Hypercorn





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Production httpbin site randomly timing out #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Production httpbin site randomly timing out #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions