Skip to content

Conversation

@shreemaan-abhishek
Copy link
Contributor

@shreemaan-abhishek shreemaan-abhishek commented Jan 8, 2026

Description

When using limit-conn, when an APISIX node crashes and is unable to perform the "decrement count" operation, this key remains permanently in Redis as there is not expiry configured, causing the user's concurrency quota to be permanently occupied or even the user to be blocked.

UPDATE:

Thanks to @ChuanFF's comment: #12872 (comment), we realised that the existing method of counting connections would not yield accurate and correct connection limiting in distributed deployment scenario.

The correct solution to ensure keys expire timely would be when an expiry was tied to each individual increment (each connection). To achieve this we use redis sorted set to count each connection by a request_id, assign a ttl to each request_id and remove timely remove all request_ids that have exceeded the ttl.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
@shreemaan-abhishek shreemaan-abhishek marked this pull request as ready for review January 8, 2026 03:00
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jan 8, 2026
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Shreemaan Abhishek <shreemaan.abhishek@gmail.com>
Signed-off-by: Shreemaan Abhishek <shreemaan.abhishek@gmail.com>
Signed-off-by: Shreemaan Abhishek <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
f
Signed-off-by: Shreemaan Abhishek <shreemaan.abhishek@gmail.com>
…-abhishek/apisix into fea/redis-ttl-limiting-plugins
| only_use_default_delay | boolean | False | false | | If false, delay requests proportionally based on how much they exceed the `conn` limit. The delay grows larger as congestion increases. For instance, with `conn` being `5`, `burst` being `3`, and `default_conn_delay` being `1`, 6 concurrent requests would result in a 1-second delay, 7 requests a 2-second delay, 8 requests a 3-second delay, and so on, until the total limit of `conn + burst` is reached, beyond which requests are rejected. If true, use `default_conn_delay` to delay all excessive requests within the `burst` range. Requests beyond `conn + burst` are rejected immediately. For instance, with `conn` being `5`, `burst` being `3`, and `default_conn_delay` being `1`, 6, 7, or 8 concurrent requests are all delayed by exactly 1 second each. |
| key_type | string | False | var | ["var","var_combination"] | The type of key. If the `key_type` is `var`, the `key` is interpreted a variable. If the `key_type` is `var_combination`, the `key` is interpreted as a combination of variables. |
| key | string | False | remote_addr | | The key to count requests by. If the `key_type` is `var`, the `key` is interpreted a variable. The variable does not need to be prefixed by a dollar sign (`$`). If the `key_type` is `var_combination`, the `key` is interpreted as a combination of variables. All variables should be prefixed by dollar signs (`$`). For example, to configure the `key` to use a combination of two request headers `custom-a` and `custom-b`, the `key` should be configured as `$http_custom_a $http_custom_b`. |
| key_ttl | integer | False | 60 | | The TTL of the Redis key in seconds. Used when `policy` is `redis` or `redis-cluster`. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. update limit-req doc for the parameter
  2. add the corresponding zh docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

local err
ok, err = red:set(excess_key, excess)
-- limit-req does rate limiting per second, so we set the ttl to 2 seconds
local ttl = 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TTL for limit-req is hardcoded to 2 seconds. This feels arbitrary and lacks flexibility.



function _M.ttl_policy_schema(kind, ttl)
local schema = policy_to_additional_properties[kind]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ttl_policy_schema function modifies a shared table, is this safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is not. Good catch. Let me fix this.

@@ -0,0 +1,334 @@
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add a scenario where a key's TTL expires. The test should verify that the counter is effectively gone and that a new request can proceed without being blocked by a stale counter.

•Example: Set a short TTL (e.g., 2s), make a request, wait for 3s, and then confirm that a subsequent request does not inherit the old counter value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds TTL (Time To Live) configuration to Redis keys used by the limit-conn and limit-req plugins to prevent keys from persisting indefinitely when APISIX nodes crash before decrementing counters. This addresses a critical issue where user quotas could be permanently occupied or users could be permanently banned due to orphaned Redis keys.

Changes:

  • Added configurable key_ttl parameter to limit-conn plugin with a default value of 60 seconds
  • Implemented hardcoded 2-second TTL for limit-req plugin Redis keys
  • Added comprehensive test coverage for both plugins with Redis and Redis Cluster configurations

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
apisix/utils/redis-schema.lua Added ttl_policy_schema function to extend Redis policy schemas with TTL configuration
apisix/plugins/limit-conn.lua Updated schema to use ttl_policy_schema function for Redis policies
apisix/plugins/limit-conn/util.lua Implemented TTL setting using Redis pipeline for both increment and decrement operations
apisix/plugins/limit-req/util.lua Added hardcoded 2-second TTL to Redis SET operations for rate limiting keys
docs/en/latest/plugins/limit-conn.md Documented the new key_ttl configuration parameter
t/plugin/limit-req-redis-ttl.t Added test to verify TTL is set correctly for limit-req with Redis
t/plugin/limit-req-redis-cluster-ttl.t Added test to verify TTL is set correctly for limit-req with Redis Cluster
t/plugin/limit-conn-redis-ttl.t Added tests for default and custom TTL values with Redis
t/plugin/limit-conn-redis-cluster-ttl.t Added tests for default and custom TTL values with Redis Cluster
Comments suppressed due to low confidence (1)

apisix/plugins/limit-conn/util.lua:52

  • When a request is rejected because it exceeds max + burst, the counter is decremented (line 48) but the TTL is not refreshed in this path. This could lead to the key expiring prematurely if there are many rejected requests, potentially causing the counter to be lost even though there are still active connections.

Consider adding TTL refresh logic after the decrement operation on line 48, similar to how it's done in the leaving function.

        if conn > max + self.burst then
            conn, err = red:incrby(key, -1)
            if not conn then
                return nil, err
            end
            return nil, "rejected"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 103 to 104
-- for rate=1, burst=10 -> ttl = ceil(10/1)+1 = 11.
if ttl > 0 and ttl <= 11 then
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment indicates that the TTL calculation formula is ceil(burst/rate) + 1, which would equal 11 seconds for rate=1 and burst=10. However, this formula and comment appear to be copied from limit-req tests but are actually checking a hardcoded 2-second TTL in the implementation. This comment is misleading and should be updated to reflect the actual TTL value of 2 seconds that is set in the implementation.

Suggested change
-- for rate=1, burst=10 -> ttl = ceil(10/1)+1 = 11.
if ttl > 0 and ttl <= 11 then
-- ttl is expected to be 2 seconds as set by the current limit-req implementation.
if ttl == 2 then

Copilot uses AI. Check for mistakes.
Comment on lines 82 to 88
function _M.ttl_policy_schema(kind, ttl)
local schema = policy_to_additional_properties[kind]
schema.properties.key_ttl = {
type = "integer", default = ttl,
}
return schema
end
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function mutates the original schema object from policy_to_additional_properties by directly adding the key_ttl property. This means that after the first call, the schema will be modified permanently for all subsequent uses. This could lead to unexpected behavior if the schema is reused elsewhere.

Consider creating a deep copy of the schema before adding the key_ttl property to avoid mutating the shared schema object.

Copilot uses AI. Check for mistakes.
Comment on lines 60 to 61
-- limit-req does rate limiting per second, so we set the ttl to 2 seconds
local ttl = 2
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit-req plugin uses a hardcoded TTL of 2 seconds for Redis keys, while the limit-conn plugin has a configurable key_ttl parameter with a default of 60 seconds. This inconsistency means:

  1. Users cannot configure the TTL for limit-req Redis keys
  2. The 2-second TTL may be too short for high-burst scenarios where requests are queued

Consider making the TTL configurable for limit-req as well, similar to limit-conn. The TTL could be calculated as ceil(burst/rate) + a safety margin to ensure keys persist long enough for delayed requests to be processed.

Suggested change
-- limit-req does rate limiting per second, so we set the ttl to 2 seconds
local ttl = 2
-- limit-req does rate limiting per second. Allow configurable TTL via self.key_ttl,
-- defaulting to 2 seconds to preserve existing behavior if not set.
local ttl = self.key_ttl or 2

Copilot uses AI. Check for mistakes.
end
local ttl = red:ttl(keys[1])
if ttl > 0 and ttl <= 11 then
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test expects a TTL of up to 11 seconds (based on the formula ceil(burst/rate) + 1 = ceil(10/1) + 1 = 11), but the implementation in apisix/plugins/limit-req/util.lua sets a hardcoded TTL of 2 seconds. This test will pass since 2 is within the range (0, 11], but the test is validating against the wrong expected value.

The test should check for TTL <= 2 to match the actual implementation, or the implementation should be updated to use a calculated TTL based on burst and rate parameters.

Suggested change
if ttl > 0 and ttl <= 11 then
if ttl > 0 and ttl <= 2 then

Copilot uses AI. Check for mistakes.
Comment on lines 103 to 104
-- for rate=1, burst=10 -> ttl = ceil(10/1)+1 = 11.
if ttl > 0 and ttl <= 11 then
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the redis test, this test expects a TTL of up to 11 seconds based on the formula ceil(burst/rate) + 1, but the implementation uses a hardcoded 2-second TTL. The test should check for TTL <= 2 to match the actual implementation, or the implementation should use a calculated TTL.

Suggested change
-- for rate=1, burst=10 -> ttl = ceil(10/1)+1 = 11.
if ttl > 0 and ttl <= 11 then
-- implementation currently uses a fixed TTL of 2 seconds
if ttl > 0 and ttl <= 2 then

Copilot uses AI. Check for mistakes.
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Copy link
Member

@membphis membphis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think:

you'd better split this PR to two PRs

One PR for one plugin, eg: limit-count, limit-req and limit-conn

Smaller PR is much easier to review

shreemaan-abhishek and others added 4 commits January 12, 2026 11:51
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…-abhishek/apisix into fea/redis-ttl-limiting-plugins
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
@ChuanFF
Copy link
Contributor

ChuanFF commented Jan 12, 2026

when an APISIX node crashes and is unable to perform the "decrement count" operation, this key remains permanently in Redis as there is not expiry configured, causing the user's concurrency quota to be permanently occupied or even the user to be banned.

Setting a TTL for Redis keys does not seem to solve the scenario where a node in an APISIX cluster goes down under high traffic. When the traffic is heavy, APISIX keeps performing Redis operations, which continuously refresh the TTL. As a result, the counter associated with the failed node in Redis cannot be reset through TTL expiration.

Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
@shreemaan-abhishek
Copy link
Contributor Author

As a result, the counter associated with the failed node in Redis cannot be reset through TTL expiration.

I am not sure if I fully understand what you are trying to convey. The counter key will be same for all apisix instances in a cluster. Am I missing something 🤔

@shreemaan-abhishek shreemaan-abhishek changed the title feat: add ttl to redis counters in redis policy rate limiters fix(limit-conn): implement configurable redis key expiry Jan 12, 2026

local limit_conn_redis_cluster_schema = policy_to_additional_properties["redis-cluster"]
limit_conn_redis_cluster_schema.properties.key_ttl = {
type = "integer", default = 60,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default value can be 3600, means 1hour

it should be enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, I have set the default ttl to be 3600.

@shreemaan-abhishek shreemaan-abhishek dismissed stale reviews from nic-6443 and membphis via 1a0bffb January 13, 2026 04:54
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jan 13, 2026
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
@shreemaan-abhishek
Copy link
Contributor Author

@ChuanFF I have updated the redis script to ensure each redis key has a ttl assigned to it. So the issue that you described should not occur.

Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
ngx.ctx.limit_conn_req_ids[raw_key] = req_id

local now = ngx_time()
local res, err = red:eval(redis_incoming_script, 1, key,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can take a look at this example, which is better performance

we can cache the sha1 of Lua script, then we call evalsha

local redis = require "resty.redis"
local red = redis:new()

-- 1. Define your fixed script and pre-calculate its SHA1 locally
-- You can use an online SHA1 tool or calculate it via code
local script_content = "return redis.call('GET', KEYS[1])"
local script_sha1 = "6bce267422d7793d3e0034d7c99602029c5737e6"

red:set_timeout(1000) -- 1 sec
local ok, err = red:connect("127.0.0.1", 6379)
if not ok then
    ngx.log(ngx.ERR, "failed to connect: ", err)
    return
end

--- Function to execute script with EVALSHA optimization
local function optimized_eval(red, sha1, script, num_keys, ...)
    -- Try EVALSHA first
    local res, err = red:evalsha(sha1, num_keys, ...)
    
    -- If the script is missing in Redis (NOSCRIPT error)
    if not res and err and string.find(err, "NOSCRIPT") then
        ngx.log(ngx.INFO, "Script not found in cache, loading now...")
        
        -- Load the script to get the SHA1 (or verify it)
        local new_sha, load_err = red:script("LOAD", script)
        if not new_sha then
            return nil, "failed to load script: " .. (load_err or "")
        end
        
        -- Retry EVALSHA with the loaded script
        return red:evalsha(new_sha, num_keys, ...)
    end
    
    return res, err
end

-- 2. Call the function
local res, err = optimized_eval(red, script_sha1, script_content, 1, "my_key")

if not res then
    ngx.say("Error: ", err)
else
    ngx.say("Result: ", res)
end

-- Put it back in the connection pool
red:set_keepalive(10000, 100)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

nic-6443
nic-6443 previously approved these changes Jan 22, 2026
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
ngx.ctx.limit_conn_req_ids[raw_key] = req_id

local now = ngx_time()
local res, err = red:evalsha(redis_incoming_script_sha, 1, key,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another example, gen the sha1 by redis:script:

local function fetch_latest_redis_script_sha1()
    if redis_incoming_script_sha then
        return redis_incoming_script_sha
    end
    
    local sha1, load_err = red:script("LOAD", lua_script)
    if load_err then
        return nil, load_err
    end

    redis_incoming_script_sha = sha1
    return redis_incoming_script_sha
end

local sha1, err = fetch_latest_redis_script_sha1()
if not sha1 then
    ngx.log(ngx.ERR, "Failed to load Lua script into Redis: ", err)
    return nil, err
end

local res, err = red:evalsha(sha1, ...)
if err and core.string.has_prefix(err, "NOSCRIPT") then
    redis_incoming_script_sha = nil
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is more safer than we generate it by ngx api

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, I have fixed this too now.

FYI, lua-resty-redis-cluster didn't work with evalsha so I ensured a conditional usage of plain redis eval by rediscluster policy and evalsha usage by redis policy.

Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Shreemaan Abhishek <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
@shreemaan-abhishek shreemaan-abhishek merged commit b7ec0c8 into apache:master Jan 27, 2026
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants