-
Notifications
You must be signed in to change notification settings - Fork 2.8k
fix: request failure during reload after any Eureka node fails #12906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: request failure during reload after any Eureka node fails #12906
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
apisix/discovery/eureka/init.lua
Outdated
| local fetch_interval = local_conf.discovery.eureka.fetch_interval or 30 | ||
| log.info("fetch_interval:", fetch_interval, ".") | ||
|
|
||
| endpoints = build_endpoints() |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If build_endpoints() returns nil due to missing configuration, the timers will still be started, causing them to fail silently on every execution. Consider adding error handling to prevent timer initialization when endpoints cannot be built, similar to how other discovery services handle init failures.
| endpoints = build_endpoints() | |
| endpoints = build_endpoints() | |
| if not endpoints or #endpoints == 0 then | |
| log.error("failed to init eureka discovery: no valid endpoints " .. | |
| "could be built, please check discovery.eureka configuration") | |
| return | |
| end |
apisix/discovery/eureka/init.lua
Outdated
| applications = up_apps | ||
| log.info("successfully updated service registry, services count=", | ||
| core.table.nkeys(up_apps), "; source=", | ||
| selected_endpoint and selected_endpoint.url or "unknown") |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition selected_endpoint and selected_endpoint.url or 'unknown' is redundant since this log line is only reached when selected_endpoint is not nil (line 170 returns early if nil). Simplify to just selected_endpoint.url.
| selected_endpoint and selected_endpoint.url or "unknown") | |
| selected_endpoint.url) |
| --- request | ||
| GET /eureka/apps/APISIX-EUREKA | ||
| --- response_body_like | ||
| .*<name>APISIX-EUREKA</name>.* |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test verifies the response body but doesn't verify that the request was actually routed through the working endpoint (127.0.0.1:8761) rather than the failing one (127.0.0.1:20997). Consider adding an assertion to check that the upstream actually received the request.
| if r and r.body and r.status == 200 then | ||
| selected_endpoint = endpoint | ||
| selected_body = r.body | ||
| break | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This modification essentially ensures that if a registry unexpectedly fails, it will retry until a good registry is used. While this may introduce some additional latency, is it necessary to document it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should add a similar description to all service discovery documentation.
…young/fix/eureka-requests
apisix/discovery/eureka/init.lua
Outdated
| if not res.body or res.status ~= 200 then | ||
| log.error("failed to fetch registry, status = ", res.status) | ||
| if not selected_endpoint then | ||
| log.error("failed to fetch registry from all eureka hosts") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need more information, eg: no healthy registry, the count of all hosts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
Description
Summary
Fix eureka service discovery to retry all configured endpoints when one fails, preventing 503 errors on APISIX restart when some eureka services are unhealthy.
Problem
When APISIX restarts with multiple eureka services configured (some healthy, some unhealthy), each worker randomly selects one eureka node. Workers that hit an unhealthy node fail silently, leaving their
applicationsvariable asnil, causing 503 errors for requests routed to those workers.Changes
Add
build_endpoints()function - Converts all eureka hosts from config to endpoint objects with URL and auth info, handling prefix and trailing slash normalization.Implement endpoint failover in
fetch_full_registry()- Start from a random endpoint position for load balancing, then try endpoints sequentially until one succeeds. If all fail, log error and return without clearing existing applications data.Add test case - Added TEST 4 in
t/discovery/eureka.tto verify failover behavior when first eureka host returns 502.refs
#12734
Which issue(s) this PR fixes:
Fixes #12610
Checklist