Add pool/claim types and controllers for numbered resource allocation#258
Add pool/claim types and controllers for numbered resource allocation#258felix-kaestner wants to merge 7 commits into
Conversation
fe66c1f to
02d7fbd
Compare
02d7fbd to
1a226ce
Compare
1a226ce to
47ce13c
Compare
47ce13c to
ccb8443
Compare
cc6b628 to
f8ff9ec
Compare
3b85728 to
9d64fe5
Compare
The previously used logo had a low resolution and didn't work well in dark mode and the navigation header. With this change we are replacing it with the standard IronCore logo which is also inline with other projects such as the metal-operator. Signed-off-by: Felix Kästner <felix.kaestner@sap.com>
SVGs are resolution-independent, render natively in browsers, and avoid Git LFS complications for the docs build. Signed-off-by: Felix Kästner <felix.kaestner@sap.com>
Signed-off-by: Felix Kästner <felix.kaestner@sap.com>
Introduces three pool types (IndexPool, IPAddressPool, IPPrefixPool) and a Claim type that references a pool via spec.poolRef. The claim controller allocates the lowest available resource from the referenced pool and writes it back to both the pool status and the claim status. Pools track allocations with ClaimRef+ClaimUID for idempotency, and support Recycle/Retain reclaim policies on claim deletion. All pool types expose an Available condition (HasCapacity/Exhausted). A preferred value can be requested by setting the annotation pool.networking.metal.ironcore.dev/preferred-value on a Claim. The controller will attempt to allocate that exact value; if it is outside the pool's configured ranges or already taken, the claim enters a terminal error state with reason PreferredValueUnavailable. Removing the annotation re-triggers reconciliation and falls back to normal allocation. Signed-off-by: Felix Kästner <felix.kaestner@sap.com>
Documents the pool-based allocation system introduced alongside the IndexPool, IPAddressPool, IPPrefixPool, and Claim types. Covers pool and claim concepts, reclaim policies, the preferred-value annotation, and allocation result fields. Includes light/dark theme diagram images and links to sample manifests in the repository.
866f5e6 to
012e6c0
Compare
Replace the embedded Status.Allocations lists on pool types with dedicated Kubernetes objects (Index, IPAddress, IPPrefix) that model allocations as first-class resources following the PV/PVC pattern. Previously, the claim controller reserved values by appending entries to a pool's status and used optimistic locking on the pool status update to prevent races. This coupled allocation state tightly to the pool object and made it impossible to pre-provision or inspect individual allocations. The new design introduces three allocation types — Index, IPAddress, and IPPrefix — each with a spec containing the reserved value, a poolRef back to the owning pool, and an optional claimRef binding it to a Claim. The claim controller now creates allocation objects with deterministic names derived from the pool name and value (e.g. my-pool-64512), using the API server's name-uniqueness guarantee as the concurrency guard: two controllers racing for the same value both attempt Create, exactly one succeeds, the other retries via retry.OnError with a fresh list of used values. Pool types implement a Pool interface with Allocate and ListAllocations methods. Allocate receives the existing allocation objects, builds a used-value set, finds the first free slot, and returns a ready-to-create allocation object. ListAllocations encapsulates the typed List call and returns []Allocation, an interface providing GetClaimRef, SetClaimRef, and GetValue so the claim controller operates generically without type switches. Pool controllers count allocations by listing objects via a shared poolRef field index rather than reading Status.Allocations. The allocation type controllers (Index, IPAddress, IPPrefix) validate each object's value against its pool's ranges/prefixes and set a Valid condition. The claim controller's finalize path applies the pool's reclaim policy: Recycle deletes the allocation object, Retain clears its claimRef so the value persists as reserved but unbound. The AllowBindingAnnotation enables rebinding of allocation objects whose claimRef name matches a Claim but whose UID is stale, such as after a Claim is deleted and recreated with the same name. Signed-off-by: Felix Kästner <felix.kaestner@sap.com>
Reflect the migration from embedded pool status allocations to dedicated allocation objects. Document the new allocation types (Index, IPAddress, IPPrefix), deterministic naming, pre-provisioning with the allow-binding annotation, and the updated claim status fields (status.value, status.allocationRef). Update the flow diagram to match the simplified allocation and reclaim logic. Signed-off-by: Felix Kästner <felix.kaestner@sap.com>
012e6c0 to
d78ee8b
Compare
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
nikatza
left a comment
There was a problem hiding this comment.
This is fantastic contribution!! 🚀🚀🚀
Since this is a very large PR I came with a larger number of comments than usual, but they are mostly to help me understand the new concepts, so please bear with me.
Most of my comments reduce to one of these 3 points:
- Pools are quite central to the
Fabriccontroller (or users). Hence, we may want to increase our test coverage to make sure we don't run into regressions by accident in the future. - Unless there is a valid use case, I think I would try to make
ReclaimPolicyimmutable and avoid complex management corner cases for the time being. - We should extend our user docs with the expected behavior for a resizing of the pool. In this regard, are we covering it correctly? For example, after downsizing a pool, e.g., an allocated prefix gets removed, the controllers sets an
ValueOutOfRangeReason, but the Claim remains Allocated and continues to report its value to the controller.
Happy to hear your thoughts on this. Please let me know if any of the comments needs further explanation. But again, amazing work, looking forward merging this PR!
| @@ -0,0 +1,92 @@ | |||
| // SPDX-FileCopyrightText: 2025 SAP SE or an SAP affiliate company and IronCore contributors | |||
There was a problem hiding this comment.
nit: 2025 -> 2026 (other new files are also affected)
| Start uint64 `json:"-"` | ||
| End uint64 `json:"-"` |
There was a problem hiding this comment.
The API conventions document does not seem to apply here as this is then later serialized as a string in IndexRange. However, this is later used in Index.Spec.Index, which is an int64. In theory we could run into an overflow I think. In fact, the nosec directive indexpool_types.go:155 is (mis?)handling this issue. May be I am missing part of the puzzle: Is my understanding here correct?
| const ( | ||
| // PreferredValueAnnotation is an optional annotation on a Claim that requests a specific | ||
| // allocation value. The format depends on the pool type: | ||
| // - IndexPool: decimal uint64, e.g. "64512" |
There was a problem hiding this comment.
If the comments from earlier are valid then this should be int64
| // Returns ErrPoolExhausted when no free value remains. | ||
| Allocate(claim *Claim, existing []Allocation) (Allocation, error) | ||
| } | ||
| type ClaimSpec struct { |
There was a problem hiding this comment.
We need a comment for the function.
| const ( | ||
| // AllowBindingAnnotation permits an allocation object whose claimRef name matches | ||
| // a Claim but whose UID is stale (e.g. after the Claim was deleted and recreated) | ||
| // to be rebound by updating the UID to the current Claim. | ||
| AllowBindingAnnotation = "pool.networking.metal.ironcore.dev/allow-binding" | ||
| ) |
There was a problem hiding this comment.
As a user I would have first looked into groupversion_info.go, seems a better home for this. That said, here is also fine.
| UpdateFunc: func(e event.UpdateEvent) bool { | ||
| oldPool := e.ObjectOld.(*poolv1alpha1.IPPrefixPool) | ||
| newPool := e.ObjectNew.(*poolv1alpha1.IPPrefixPool) | ||
| return oldPool.Status.Allocated != newPool.Status.Allocated |
There was a problem hiding this comment.
Should we consider also .Status.Total here? If the pool is exhausted, we change the spec and add new ranges. The controller won't trigger a reconciliation and we are anyhow stucked due to the TerminalError() from the exhaustion.
| return oldPool.Status.Allocated != newPool.Status.Allocated | |
| return oldPool.Status.Allocated != newPool.Status.Allocated || | |
| oldPool.Status.Total != newPool.Status.Total |
| UpdateFunc: func(e event.UpdateEvent) bool { | ||
| oldPool := e.ObjectOld.(*poolv1alpha1.IPAddressPool) | ||
| newPool := e.ObjectNew.(*poolv1alpha1.IPAddressPool) | ||
| return oldPool.Status.Allocated != newPool.Status.Allocated |
| UpdateFunc: func(e event.UpdateEvent) bool { | ||
| oldPool := e.ObjectOld.(*poolv1alpha1.IndexPool) | ||
| newPool := e.ObjectNew.(*poolv1alpha1.IndexPool) | ||
| return oldPool.Status.Allocated != newPool.Status.Allocated |
| Reason: poolv1alpha1.PoolExhaustedReason, | ||
| Message: "Referenced pool is exhausted", | ||
| }) | ||
| return reconcile.TerminalError(err) |
There was a problem hiding this comment.
Assuming that we can change the pool size in run-time, we might want how we handle this case. If we keep the TerminalError then we probably need to modify the watch above to consider the .Status.Total. Alternatively, we could also check on the Available condition. By this we might ensure a retry if someone frees resources.
| "github.com/ironcore-dev/network-operator/internal/conditions" | ||
| ) | ||
|
|
||
| var _ = Describe("Claim Controller", func() { |
There was a problem hiding this comment.
I was wondering if we should be more exhaustive here and consider the following tests:
- Pool owns the claim and it owns the allocation
- Pool
Availablecondition is correctly set - Claim conditions are correctly set
- Allocation
claimRefhas correct UID - Re-reconciliation doesn't double allocate
- On conflict retry with the next slot
- Exhausted claims retry once a slot is free
- Exhausted claims retry after pool is expanded
- Behavior when pool is reduced?
- Pool deletion cascades on the claims
Introduces a pool-based allocation system for numbered resources (ASNs, VLAN IDs, IP addresses, prefixes). Three pool types —
IndexPool,IPAddressPool,IPPrefixPool— define the available value ranges. AClaimreferences a pool viaspec.poolRefand the controller allocates the next available value by creating a dedicated allocation object (Index,IPAddress, orIPPrefix).Allocation objects are first-class Kubernetes resources with a deterministic name derived from the pool name and value (e.g.
asn-pool-64512). This naming scheme uses the API server's name-uniqueness guarantee as the concurrency guard: two controllers racing for the same value both attemptCreate, exactly one succeeds, and the other retries with a fresh list. Each allocation object carries aspec.claimRefbinding it to the owning claim and aspec.poolRefback to the pool.Pools support
Recycle(default) andRetainreclaim policies. On claim deletion,Recycledeletes the allocation object to free the value.Retainclears theclaimRefso the allocation persists as reserved but unbound — it can be rebound later using theallow-bindingannotation. Owner references from the pool to both claims and allocation objects enable automatic garbage collection when a pool is deleted.Pool types implement a
Poolinterface withAllocateandListAllocationsmethods. Allocation types implement anAllocationinterface withGetClaimRef,SetClaimRef, andGetValue. The claim controller operates generically through these interfaces with no type-specific logic beyond a single kind switch for pool instantiation.Concept
Usage