fix: request failure during reload after any Eureka node fails #12906

SkyeYoung · 2026-01-14T06:17:28Z

Description

Summary

Fix eureka service discovery to retry all configured endpoints when one fails, preventing 503 errors on APISIX restart when some eureka services are unhealthy.

Problem

When APISIX restarts with multiple eureka services configured (some healthy, some unhealthy), each worker randomly selects one eureka node. Workers that hit an unhealthy node fail silently, leaving their applications variable as nil, causing 503 errors for requests routed to those workers.

Changes

Add build_endpoints() function - Converts all eureka hosts from config to endpoint objects with URL and auth info, handling prefix and trailing slash normalization.
Implement endpoint failover in fetch_full_registry() - Start from a random endpoint position for load balancing, then try endpoints sequentially until one succeeds. If all fail, log error and return without clearing existing applications data.
Add test case - Added TEST 4 in t/discovery/eureka.t to verify failover behavior when first eureka host returns 502.

refs

#12734

Which issue(s) this PR fixes:

Fixes #12610

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change
I have updated the documentation to reflect this change
I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-15T02:32:51Z

apisix/discovery/eureka/init.lua

    local fetch_interval = local_conf.discovery.eureka.fetch_interval or 30
    log.info("fetch_interval:", fetch_interval, ".")
+
+    endpoints = build_endpoints()


If build_endpoints() returns nil due to missing configuration, the timers will still be started, causing them to fail silently on every execution. Consider adding error handling to prevent timer initialization when endpoints cannot be built, similar to how other discovery services handle init failures.

Suggested change

endpoints = build_endpoints()

endpoints = build_endpoints()

if not endpoints or #endpoints == 0 then

log.error("failed to init eureka discovery: no valid endpoints " ..

"could be built, please check discovery.eureka configuration")

return

end

Copilot · 2026-01-15T02:32:51Z

apisix/discovery/eureka/init.lua

    applications = up_apps
+    log.info("successfully updated service registry, services count=",
+             core.table.nkeys(up_apps), "; source=",
+             selected_endpoint and selected_endpoint.url or "unknown")


The condition selected_endpoint and selected_endpoint.url or 'unknown' is redundant since this log line is only reached when selected_endpoint is not nil (line 170 returns early if nil). Simplify to just selected_endpoint.url.

Suggested change

selected_endpoint and selected_endpoint.url or "unknown")

selected_endpoint.url)

Copilot · 2026-01-15T02:32:52Z

t/discovery/eureka.t

+--- request
+GET /eureka/apps/APISIX-EUREKA
+--- response_body_like
+.*<name>APISIX-EUREKA</name>.*


The test verifies the response body but doesn't verify that the request was actually routed through the working endpoint (127.0.0.1:8761) rather than the failing one (127.0.0.1:20997). Consider adding an assertion to check that the upstream actually received the request.

apisix/discovery/eureka/init.lua

bzp2010 · 2026-01-15T07:13:14Z

apisix/discovery/eureka/init.lua

+        if r and r.body and r.status == 200 then
+            selected_endpoint = endpoint
+            selected_body = r.body
+            break
+        end


This modification essentially ensures that if a registry unexpectedly fails, it will retry until a good registry is used. While this may introduce some additional latency, is it necessary to document it?

Perhaps we should add a similar description to all service discovery documentation.

apisix/discovery/eureka/init.lua

…young/fix/eureka-requests

membphis · 2026-01-15T09:58:56Z

apisix/discovery/eureka/init.lua

-    if not res.body or res.status ~= 200 then
-        log.error("failed to fetch registry, status = ", res.status)
+    if not selected_endpoint then
+        log.error("failed to fetch registry from all eureka hosts")


need more information, eg: no healthy registry, the count of all hosts

SkyeYoung added 7 commits January 14, 2026 06:08

fix: request failure during reload after any Eureka node fails

650d436

chore: add more comments

d26e37d

fix

8074ef1

chore: simplify

99306c7

test: add one node failed case

f62113b

simplify

0bf938d

test: add more assert

4f1eb86

SkyeYoung marked this pull request as ready for review January 14, 2026 11:21

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Jan 14, 2026

SkyeYoung requested review from Baoyuantop, bzp2010, membphis, moonming and nic-6443 January 14, 2026 11:24

moonming requested a review from Copilot January 14, 2026 11:36

Copilot AI reviewed Jan 14, 2026

View reviewed changes

moonming requested a review from Copilot January 15, 2026 02:30

Copilot started reviewing on behalf of moonming January 15, 2026 02:30 View session

Copilot AI reviewed Jan 15, 2026

View reviewed changes

SkyeYoung added 3 commits January 15, 2026 03:12

rollback: use ramdom order

57c8aa3

chore: mv build_endpoints to fetch_full_registry

a88620b

fix: solve comments by @copilot

32a5add

SkyeYoung commented Jan 15, 2026

View reviewed changes

apisix/discovery/eureka/init.lua Outdated Show resolved Hide resolved

Apply suggestion from @SkyeYoung

30390d1

bzp2010 reviewed Jan 15, 2026

View reviewed changes

Refactor build_endpoints function for clarity

e7e457a

moonming reviewed Jan 15, 2026

View reviewed changes

apisix/discovery/eureka/init.lua Show resolved Hide resolved

membphis reviewed Jan 15, 2026

View reviewed changes

apisix/discovery/eureka/init.lua Outdated Show resolved Hide resolved

apisix/discovery/eureka/init.lua Show resolved Hide resolved

fix: solve comments by @membphis

8da8daa

Merge remote-tracking branch 'origin/young/fix/eureka-requests' into …

ab79551

…young/fix/eureka-requests

membphis previously approved these changes Jan 15, 2026

View reviewed changes

moonming previously approved these changes Jan 15, 2026

View reviewed changes

fix: solve comments by @membphis

b70606c

SkyeYoung dismissed stale reviews from moonming and membphis via b70606c January 15, 2026 10:10

membphis approved these changes Jan 15, 2026

View reviewed changes

SkyeYoung requested review from bzp2010 and moonming January 15, 2026 10:30

moonming approved these changes Jan 15, 2026

View reviewed changes

bzp2010 approved these changes Jan 16, 2026

View reviewed changes

SkyeYoung merged commit 9718c69 into apache:master Jan 16, 2026
23 checks passed

SkyeYoung deleted the young/fix/eureka-requests branch January 16, 2026 02:57

SkyeYoung mentioned this pull request Jan 16, 2026

fix: reloading after any node failure in eureka will lose half of the requests #12695

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: request failure during reload after any Eureka node fails #12906

fix: request failure during reload after any Eureka node fails #12906

Uh oh!

SkyeYoung commented Jan 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Uh oh!

bzp2010 Jan 15, 2026

Uh oh!

SkyeYoung Jan 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

membphis Jan 15, 2026

Uh oh!

SkyeYoung Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-    endpoints = build_endpoints()
+    endpoints = build_endpoints()
+    if not endpoints or #endpoints == 0 then
+        log.error("failed to init eureka discovery: no valid endpoints " ..
+                  "could be built, please check discovery.eureka configuration")
+        return
+    end

	selected_endpoint and selected_endpoint.url or "unknown")
	selected_endpoint.url)

fix: request failure during reload after any Eureka node fails #12906

fix: request failure during reload after any Eureka node fails #12906

Uh oh!

Conversation

SkyeYoung commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary

Problem

Changes

refs

Which issue(s) this PR fixes:

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bzp2010 Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

SkyeYoung Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

membphis Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

SkyeYoung Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SkyeYoung commented Jan 14, 2026 •

edited

Loading