feat: replace events library with shdict#12353
Conversation
|
|
||
|
|
||
| function _M.dump_data() | ||
| return {config = local_conf.discovery.nacos, services = applications or {}} |
There was a problem hiding this comment.
config is removed from here as it's a security risk.
| local http = require("resty.http") | ||
| local uri = "http://127.0.0.1:" .. ngx.var.server_port | ||
|
|
||
| -- Wait for 2 seconds for APISIX initialization |
There was a problem hiding this comment.
Test is modified to add this sleep because: Since retry mechanism is removed, priviliged agent fills data in shdict might take some time so we want to send the request after that.
apisix/discovery/nacos/init.lua
Outdated
| local value = nacos_dict:get(key) | ||
| if value then | ||
| local nodes = core.json.decode(value) | ||
| if nodes and #nodes > 0 then |
There was a problem hiding this comment.
dump_data is for debug purpose, if value in shared dict for this key is empty , we should return the real empty value for debugging instead hide it.
nic-6443
left a comment
There was a problem hiding this comment.
one thing you missed: remove unused key from shared dict in every fetch_full_registry run
apisix/discovery/nacos/init.lua
Outdated
| local old_curr_service_in_use = curr_service_in_use | ||
| curr_service_in_use = service_names | ||
| -- remove services that are not in use anymore | ||
| for key, _ in pairs(old_curr_service_in_use) do | ||
| if not service_names[key] then | ||
| nacos_dict:delete(key) | ||
| end | ||
| end |
There was a problem hiding this comment.
| local old_curr_service_in_use = curr_service_in_use | |
| curr_service_in_use = service_names | |
| -- remove services that are not in use anymore | |
| for key, _ in pairs(old_curr_service_in_use) do | |
| if not service_names[key] then | |
| nacos_dict:delete(key) | |
| end | |
| end | |
| -- remove services that are not in use anymore | |
| for key, _ in pairs(curr_service_in_use) do | |
| if not service_names[key] then | |
| nacos_dict:delete(key) | |
| end | |
| end | |
| curr_service_in_use = service_names |
|
@bzp2010 I ran a benchmark to test whether the Configuration usedconfig.yaml apisix:
node_listen: 1984
deployment:
role: data_plane
role_data_plane:
config_provider: yaml
nginx_config:
worker_processes: 8
discovery:
nacos:
host:
- "http://nacos:nacos@127.0.0.1:8848"
prefix: "/nacos/v1/"
fetch_interval: 1
weight: 1
timeout:
connect: 2000
send: 2000
read: 5000apisix.yaml routes:
-
uri: /hello
upstream:
service_name: APISIX-NACOS
discovery_type: nacos
type: roundrobin
#ENDHere are the results. On master❯ wrk -c 10 -t 5 -d 30s -R 50000 http://localhost:1984/hello
Running 30s test @ http://localhost:1984/hello
5 threads and 10 connections
Thread calibration: mean lat.: 2999.418ms, rate sampling interval: 10395ms
Thread calibration: mean lat.: 2885.546ms, rate sampling interval: 10125ms
Thread calibration: mean lat.: 2857.330ms, rate sampling interval: 10059ms
Thread calibration: mean lat.: 2967.000ms, rate sampling interval: 10231ms
Thread calibration: mean lat.: 2953.790ms, rate sampling interval: 10256ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 11.12s 3.06s 16.47s 57.49%
Req/Sec 4.52k 82.05 4.63k 60.00%
689405 requests in 30.00s, 110.45MB read
Requests/sec: 22980.47
Transfer/sec: 3.68MB
After changes❯ wrk -c 10 -t 5 -d 30s -R 50000 http://localhost:1984/hello
Running 30s test @ http://localhost:1984/hello
5 threads and 10 connections
Thread calibration: mean lat.: 3060.053ms, rate sampling interval: 10584ms
Thread calibration: mean lat.: 3084.401ms, rate sampling interval: 10772ms
Thread calibration: mean lat.: 3254.041ms, rate sampling interval: 11198ms
Thread calibration: mean lat.: 3093.539ms, rate sampling interval: 10985ms
Thread calibration: mean lat.: 3062.424ms, rate sampling interval: 10772ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 11.89s 3.36s 17.74s 56.68%
Req/Sec 4.13k 39.11 4.18k 60.00%
623009 requests in 30.00s, 99.81MB read
Requests/sec: 20767.18
Transfer/sec: 3.33MBResultThe QPS dropped from 22980.47 to 20767.18 with ~2k difference. I think this should be acceptable given the issue with events library this PR solves. According to me, it's an okay tradeoff |
How to understand: Now only priviliged agent will fetch data from nacos and write to shdict while all workers will read from it ? In the current code, does each worker periodically fetch registration information from Nacos? |
@Revolyssup I'm upgrade apisix from 3.7 to 3.14.1,my nacos server get 40x requests than original apisxi(worker=40),it seems every worker is fetching data from nacos,i checked |
|
Hi @9268, thank you for your report. After reviewing the code, I found that the Nacos service discovery is currently handling data requests from every worker, which needs to be changed so that only the privileged agent can retrieve the data. Please feel free to submit an issue to document this problem or submit PR modifications. |
@Baoyuantop hi,i have commit a pr to fix this issue,pls have a review #12867 |
Breakout from: #12263
Fixes #
Currently nacos discovery uses event library and during testing it was found out that when using lua-resty-events, sometimes not all workers get the events. And inconsistencies emerge. Moreover the idiomatic way to share data between workers is through a shared dict. Therefore this PR migrates nacos from older events mechanism to newer shared dict way. Now only priviliged agent will fetch data from nacos and write to shdict while all workers will read from it.
Checklist