-
Notifications
You must be signed in to change notification settings - Fork 2.8k
fix(limit-conn): implement configurable redis key expiry #12872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(limit-conn): implement configurable redis key expiry #12872
Conversation
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Shreemaan Abhishek <shreemaan.abhishek@gmail.com>
…-abhishek/apisix into fea/redis-ttl-limiting-plugins
docs/en/latest/plugins/limit-conn.md
Outdated
| | only_use_default_delay | boolean | False | false | | If false, delay requests proportionally based on how much they exceed the `conn` limit. The delay grows larger as congestion increases. For instance, with `conn` being `5`, `burst` being `3`, and `default_conn_delay` being `1`, 6 concurrent requests would result in a 1-second delay, 7 requests a 2-second delay, 8 requests a 3-second delay, and so on, until the total limit of `conn + burst` is reached, beyond which requests are rejected. If true, use `default_conn_delay` to delay all excessive requests within the `burst` range. Requests beyond `conn + burst` are rejected immediately. For instance, with `conn` being `5`, `burst` being `3`, and `default_conn_delay` being `1`, 6, 7, or 8 concurrent requests are all delayed by exactly 1 second each. | | ||
| | key_type | string | False | var | ["var","var_combination"] | The type of key. If the `key_type` is `var`, the `key` is interpreted a variable. If the `key_type` is `var_combination`, the `key` is interpreted as a combination of variables. | | ||
| | key | string | False | remote_addr | | The key to count requests by. If the `key_type` is `var`, the `key` is interpreted a variable. The variable does not need to be prefixed by a dollar sign (`$`). If the `key_type` is `var_combination`, the `key` is interpreted as a combination of variables. All variables should be prefixed by dollar signs (`$`). For example, to configure the `key` to use a combination of two request headers `custom-a` and `custom-b`, the `key` should be configured as `$http_custom_a $http_custom_b`. | | ||
| | key_ttl | integer | False | 60 | | The TTL of the Redis key in seconds. Used when `policy` is `redis` or `redis-cluster`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- update
limit-reqdoc for the parameter - add the corresponding zh docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
apisix/plugins/limit-req/util.lua
Outdated
| local err | ||
| ok, err = red:set(excess_key, excess) | ||
| -- limit-req does rate limiting per second, so we set the ttl to 2 seconds | ||
| local ttl = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TTL for limit-req is hardcoded to 2 seconds. This feels arbitrary and lacks flexibility.
apisix/utils/redis-schema.lua
Outdated
|
|
||
|
|
||
| function _M.ttl_policy_schema(kind, ttl) | ||
| local schema = policy_to_additional_properties[kind] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ttl_policy_schema function modifies a shared table, is this safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it is not. Good catch. Let me fix this.
| @@ -0,0 +1,334 @@ | |||
| # | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add a scenario where a key's TTL expires. The test should verify that the counter is effectively gone and that a new request can proceed without being blocked by a stale counter.
•Example: Set a short TTL (e.g., 2s), make a request, wait for 3s, and then confirm that a subsequent request does not inherit the old counter value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds TTL (Time To Live) configuration to Redis keys used by the limit-conn and limit-req plugins to prevent keys from persisting indefinitely when APISIX nodes crash before decrementing counters. This addresses a critical issue where user quotas could be permanently occupied or users could be permanently banned due to orphaned Redis keys.
Changes:
- Added configurable
key_ttlparameter tolimit-connplugin with a default value of 60 seconds - Implemented hardcoded 2-second TTL for
limit-reqplugin Redis keys - Added comprehensive test coverage for both plugins with Redis and Redis Cluster configurations
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| apisix/utils/redis-schema.lua | Added ttl_policy_schema function to extend Redis policy schemas with TTL configuration |
| apisix/plugins/limit-conn.lua | Updated schema to use ttl_policy_schema function for Redis policies |
| apisix/plugins/limit-conn/util.lua | Implemented TTL setting using Redis pipeline for both increment and decrement operations |
| apisix/plugins/limit-req/util.lua | Added hardcoded 2-second TTL to Redis SET operations for rate limiting keys |
| docs/en/latest/plugins/limit-conn.md | Documented the new key_ttl configuration parameter |
| t/plugin/limit-req-redis-ttl.t | Added test to verify TTL is set correctly for limit-req with Redis |
| t/plugin/limit-req-redis-cluster-ttl.t | Added test to verify TTL is set correctly for limit-req with Redis Cluster |
| t/plugin/limit-conn-redis-ttl.t | Added tests for default and custom TTL values with Redis |
| t/plugin/limit-conn-redis-cluster-ttl.t | Added tests for default and custom TTL values with Redis Cluster |
Comments suppressed due to low confidence (1)
apisix/plugins/limit-conn/util.lua:52
- When a request is rejected because it exceeds max + burst, the counter is decremented (line 48) but the TTL is not refreshed in this path. This could lead to the key expiring prematurely if there are many rejected requests, potentially causing the counter to be lost even though there are still active connections.
Consider adding TTL refresh logic after the decrement operation on line 48, similar to how it's done in the leaving function.
if conn > max + self.burst then
conn, err = red:incrby(key, -1)
if not conn then
return nil, err
end
return nil, "rejected"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| -- for rate=1, burst=10 -> ttl = ceil(10/1)+1 = 11. | ||
| if ttl > 0 and ttl <= 11 then |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment indicates that the TTL calculation formula is ceil(burst/rate) + 1, which would equal 11 seconds for rate=1 and burst=10. However, this formula and comment appear to be copied from limit-req tests but are actually checking a hardcoded 2-second TTL in the implementation. This comment is misleading and should be updated to reflect the actual TTL value of 2 seconds that is set in the implementation.
| -- for rate=1, burst=10 -> ttl = ceil(10/1)+1 = 11. | |
| if ttl > 0 and ttl <= 11 then | |
| -- ttl is expected to be 2 seconds as set by the current limit-req implementation. | |
| if ttl == 2 then |
apisix/utils/redis-schema.lua
Outdated
| function _M.ttl_policy_schema(kind, ttl) | ||
| local schema = policy_to_additional_properties[kind] | ||
| schema.properties.key_ttl = { | ||
| type = "integer", default = ttl, | ||
| } | ||
| return schema | ||
| end |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function mutates the original schema object from policy_to_additional_properties by directly adding the key_ttl property. This means that after the first call, the schema will be modified permanently for all subsequent uses. This could lead to unexpected behavior if the schema is reused elsewhere.
Consider creating a deep copy of the schema before adding the key_ttl property to avoid mutating the shared schema object.
apisix/plugins/limit-req/util.lua
Outdated
| -- limit-req does rate limiting per second, so we set the ttl to 2 seconds | ||
| local ttl = 2 |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit-req plugin uses a hardcoded TTL of 2 seconds for Redis keys, while the limit-conn plugin has a configurable key_ttl parameter with a default of 60 seconds. This inconsistency means:
- Users cannot configure the TTL for limit-req Redis keys
- The 2-second TTL may be too short for high-burst scenarios where requests are queued
Consider making the TTL configurable for limit-req as well, similar to limit-conn. The TTL could be calculated as ceil(burst/rate) + a safety margin to ensure keys persist long enough for delayed requests to be processed.
| -- limit-req does rate limiting per second, so we set the ttl to 2 seconds | |
| local ttl = 2 | |
| -- limit-req does rate limiting per second. Allow configurable TTL via self.key_ttl, | |
| -- defaulting to 2 seconds to preserve existing behavior if not set. | |
| local ttl = self.key_ttl or 2 |
t/plugin/limit-req-redis-ttl.t
Outdated
| end | ||
| local ttl = red:ttl(keys[1]) | ||
| if ttl > 0 and ttl <= 11 then |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test expects a TTL of up to 11 seconds (based on the formula ceil(burst/rate) + 1 = ceil(10/1) + 1 = 11), but the implementation in apisix/plugins/limit-req/util.lua sets a hardcoded TTL of 2 seconds. This test will pass since 2 is within the range (0, 11], but the test is validating against the wrong expected value.
The test should check for TTL <= 2 to match the actual implementation, or the implementation should be updated to use a calculated TTL based on burst and rate parameters.
| if ttl > 0 and ttl <= 11 then | |
| if ttl > 0 and ttl <= 2 then |
| -- for rate=1, burst=10 -> ttl = ceil(10/1)+1 = 11. | ||
| if ttl > 0 and ttl <= 11 then |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the redis test, this test expects a TTL of up to 11 seconds based on the formula ceil(burst/rate) + 1, but the implementation uses a hardcoded 2-second TTL. The test should check for TTL <= 2 to match the actual implementation, or the implementation should use a calculated TTL.
| -- for rate=1, burst=10 -> ttl = ceil(10/1)+1 = 11. | |
| if ttl > 0 and ttl <= 11 then | |
| -- implementation currently uses a fixed TTL of 2 seconds | |
| if ttl > 0 and ttl <= 2 then |
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
membphis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think:
you'd better split this PR to two PRs
One PR for one plugin, eg: limit-count, limit-req and limit-conn
Smaller PR is much easier to review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…-abhishek/apisix into fea/redis-ttl-limiting-plugins
Setting a TTL for Redis keys does not seem to solve the scenario where a node in an APISIX cluster goes down under high traffic. When the traffic is heavy, APISIX keeps performing Redis operations, which continuously refresh the TTL. As a result, the counter associated with the failed node in Redis cannot be reset through TTL expiration. |
I am not sure if I fully understand what you are trying to convey. The counter key will be same for all apisix instances in a cluster. Am I missing something 🤔 |
apisix/utils/redis-schema.lua
Outdated
|
|
||
| local limit_conn_redis_cluster_schema = policy_to_additional_properties["redis-cluster"] | ||
| limit_conn_redis_cluster_schema.properties.key_ttl = { | ||
| type = "integer", default = 60, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default value can be 3600, means 1hour
it should be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, I have set the default ttl to be 3600.
1a0bffb
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
|
@ChuanFF I have updated the redis script to ensure each redis key has a ttl assigned to it. So the issue that you described should not occur. |
apisix/plugins/limit-conn/util.lua
Outdated
| ngx.ctx.limit_conn_req_ids[raw_key] = req_id | ||
|
|
||
| local now = ngx_time() | ||
| local res, err = red:eval(redis_incoming_script, 1, key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can take a look at this example, which is better performance
we can cache the sha1 of Lua script, then we call evalsha
local redis = require "resty.redis"
local red = redis:new()
-- 1. Define your fixed script and pre-calculate its SHA1 locally
-- You can use an online SHA1 tool or calculate it via code
local script_content = "return redis.call('GET', KEYS[1])"
local script_sha1 = "6bce267422d7793d3e0034d7c99602029c5737e6"
red:set_timeout(1000) -- 1 sec
local ok, err = red:connect("127.0.0.1", 6379)
if not ok then
ngx.log(ngx.ERR, "failed to connect: ", err)
return
end
--- Function to execute script with EVALSHA optimization
local function optimized_eval(red, sha1, script, num_keys, ...)
-- Try EVALSHA first
local res, err = red:evalsha(sha1, num_keys, ...)
-- If the script is missing in Redis (NOSCRIPT error)
if not res and err and string.find(err, "NOSCRIPT") then
ngx.log(ngx.INFO, "Script not found in cache, loading now...")
-- Load the script to get the SHA1 (or verify it)
local new_sha, load_err = red:script("LOAD", script)
if not new_sha then
return nil, "failed to load script: " .. (load_err or "")
end
-- Retry EVALSHA with the loaded script
return red:evalsha(new_sha, num_keys, ...)
end
return res, err
end
-- 2. Call the function
local res, err = optimized_eval(red, script_sha1, script_content, 1, "my_key")
if not res then
ngx.say("Error: ", err)
else
ngx.say("Result: ", res)
end
-- Put it back in the connection pool
red:set_keepalive(10000, 100)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
apisix/plugins/limit-conn/util.lua
Outdated
| ngx.ctx.limit_conn_req_ids[raw_key] = req_id | ||
|
|
||
| local now = ngx_time() | ||
| local res, err = red:evalsha(redis_incoming_script_sha, 1, key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another example, gen the sha1 by redis:script:
local function fetch_latest_redis_script_sha1()
if redis_incoming_script_sha then
return redis_incoming_script_sha
end
local sha1, load_err = red:script("LOAD", lua_script)
if load_err then
return nil, load_err
end
redis_incoming_script_sha = sha1
return redis_incoming_script_sha
end
local sha1, err = fetch_latest_redis_script_sha1()
if not sha1 then
ngx.log(ngx.ERR, "Failed to load Lua script into Redis: ", err)
return nil, err
end
local res, err = red:evalsha(sha1, ...)
if err and core.string.has_prefix(err, "NOSCRIPT") then
redis_incoming_script_sha = nil
endThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is more safer than we generate it by ngx api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, I have fixed this too now.
FYI, lua-resty-redis-cluster didn't work with evalsha so I ensured a conditional usage of plain redis eval by rediscluster policy and evalsha usage by redis policy.
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Shreemaan Abhishek <shreemaan.abhishek@gmail.com>
Description
When using limit-conn, when an APISIX node crashes and is unable to perform the "decrement count" operation, this key remains permanently in Redis as there is not expiry configured, causing the user's concurrency quota to be permanently occupied or even the user to be blocked.
UPDATE:
Thanks to @ChuanFF's comment: #12872 (comment), we realised that the existing method of counting connections would not yield accurate and correct connection limiting in distributed deployment scenario.
The correct solution to ensure keys expire timely would be when an expiry was tied to each individual increment (each connection). To achieve this we use redis sorted set to count each connection by a
request_id, assign a ttl to eachrequest_idand remove timely remove allrequest_idsthat have exceeded the ttl.Checklist