Skip to content

refactor: extract reusable building blocks from Consul discovery#13230

Open
nic-6443 wants to merge 13 commits intoapache:masterfrom
nic-6443:feat/consul-registry-refactor
Open

refactor: extract reusable building blocks from Consul discovery#13230
nic-6443 wants to merge 13 commits intoapache:masterfrom
nic-6443:feat/consul-registry-refactor

Conversation

@nic-6443
Copy link
Copy Markdown
Member

@nic-6443 nic-6443 commented Apr 16, 2026

Following the pattern established by #13201 (Nacos + K8s refactor), this PR refactors the Consul discovery module to support a multi-instance registry API while preserving all existing behavior.

The goal is to let the Consul module be driven by an external caller so that multiple Consul registries can coexist, each with their own configuration.

What changed:

  • New client.lua: HTTP/protocol primitives extracted from init.lua — blocking query watchers, service fetching, URL parsing, sort helpers, and service scanner
  • Refactored init.lua: create_registry() / start_registry() / stop_registry() / get_registry() / get_nodes() multi-instance API, following the same pattern as nacos
  • Mutable module state moved into per-registry object (skip_service_map, dump_params, consul_services, default_service)
  • Shared dict keys use registry ID prefix (default/service_name) for multi-instance isolation
  • all_nodes() / dump_data() / show_dump_file() strip the prefix for backward compatibility
  • Optional metadata filtering via get_nodes(key, metadata)
  • Privileged-agent-only gating preserved for write operations
  • No schema changes — all existing tests should pass as-is

Following the pattern established by PR apache#13201 (Nacos + K8s refactor),
refactor the Consul discovery module to support multi-instance registry
API while preserving all existing behavior.

Changes:
- Create client.lua: extracted HTTP/protocol primitives (blocking query
  watchers, service fetching, URL parsing, sort helpers, service scanner)
- Refactor init.lua: add create_registry/start_registry/stop_registry/
  get_registry/get_nodes multi-instance registry API
- Move mutable module state (skip_service_map, dump_params, consul_services,
  default_service) into per-registry reg object
- Shared dict keys now use registry ID prefix ("default/service_name")
  for multi-instance isolation
- all_nodes/dump_data/show_dump_file strip prefix for backward compat
- Add metadata filtering support via get_nodes(key, metadata)
- Preserve privileged-agent-only gating for write operations
- No schema changes, all existing tests should pass
Copilot AI review requested due to automatic review settings April 16, 2026 03:59
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Apr 16, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the Consul discovery implementation to follow the multi-instance “registry” abstraction (similar to the Nacos/K8s refactor), extracting reusable Consul HTTP/watch logic into a dedicated client module and isolating shared-dict keys by registry ID while keeping default behavior backward compatible.

Changes:

  • Added apisix/discovery/consul/client.lua to host reusable Consul HTTP/watch, URL parsing, sorting, fetching, and service scanning helpers.
  • Refactored apisix/discovery/consul/init.lua to expose a multi-instance registry lifecycle API (create_registry/start_registry/stop_registry/get_registry/get_nodes) and to prefix shared-dict keys with {id}/.
  • Kept default-mode compatibility via default/... prefix handling in nodes() and all_nodes(), plus optional metadata filtering support.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
apisix/discovery/consul/init.lua Introduces multi-instance registry management, shared-dict key prefixing, dump handling changes, and a new get_nodes(key, metadata) API.
apisix/discovery/consul/client.lua New module extracting Consul client/watch and service-fetching primitives from init.lua for reuse by registry instances.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apisix/discovery/consul/init.lua Outdated
Comment thread apisix/discovery/consul/init.lua Outdated
Comment thread apisix/discovery/consul/init.lua Outdated
Comment thread apisix/discovery/consul/init.lua
Comment thread apisix/discovery/consul/init.lua Outdated
Comment thread apisix/discovery/consul/client.lua
- Clone default_service in create_registry to avoid mutating caller's table
- Create default registry on all workers (not just privileged agent) so
  nodes()/control_api() work on non-privileged workers
- Replace flush_all() with prefix-based flush to avoid clobbering other
  registries in the shared dict
- Resolve default_reg inside control_api handler to avoid stale capture
- Replace ngx.timer.every with self-rescheduling ngx.timer.at so stop_flag
  can actually halt future wakeups for non-keepalive servers
- Remove unused ngx_timer_every import
- Fix fetch_services_from_server docstring to match actual return values
Fix luacheck lint errors:
- client.lua: add local tostring
- init.lua: add local next
Return nil instead of crashing when shared dict value fails to decode.
The stream subsystem requires a separate shared dict named
'consul-stream' (following the same pattern as nacos-stream,
kubernetes-stream, etc). The refactored code correctly uses
the stream-specific dict name, but t/APISIX.pm was missing
this declaration.
Add 'consul registry updated, id: <id>, services: <count>' log message
to match the nacos discovery pattern, providing clear observable signal
when consul service synchronization completes.
- get_nodes() now uses per-worker LRU cache (was only in nodes())
- Fix stale key cleanup when all Consul services are deleted
- Add comment for all_nodes() diagnostic-only usage
- Fix regression: compare new indexes against stored ones before calling
  update_all_services, avoiding accidental service clearing on no-change
  blocking query timeouts
- Add TEST 16: get_nodes metadata filtering test via shared dict
@nic-6443 nic-6443 closed this Apr 17, 2026
@nic-6443 nic-6443 reopened this Apr 17, 2026
Introduce DEFAULT_REGISTRY_ID = 'default' constant in client.lua. The
get_consul_services(id) scanner now filters discovered services by
registry id:

- If service_name contains '/', treat as '{id}/{name}': match by strict
  prefix and return stripped name.
- If service_name has no '/', treat as belonging to the implicit default
  namespace; accept only when id == DEFAULT_REGISTRY_ID.

The default service_scanner in create_registry now passes reg.id, so the
core module cleanly supports multi-instance registries while APISIX
single-instance usage continues to work without any user-facing change
(users keep writing 'service_name: my-svc').
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants