filter_ecs: Retrieve container ID from record field#9033
Conversation
b4ba446 to
f14ac5f
Compare
|
the PR is not passing the unit test: https://github.com/fluent/fluent-bit/actions/runs/9771469517/job/27099695827?pr=9033 |
da51c32 to
59cd20b
Compare
|
@edsiper Added format checks as well as well as an extra test for invalid data. |
|
@edsiper One of the test fails, though it doesn't seem related to the changes I made, is this a random failure or was some kind of side-effect introduced? I had all passing when running locally. |
|
Bump @edsiper @PettitWesley |
|
Bump @edsiper @PettitWesley 👀 |
|
Bump @edsiper @PettitWesley |
|
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
|
Bump @edsiper @PettitWesley |
|
Bump @edsiper @PettitWesley . To give more context as to why this is useful, the current plugin retrieves the container ID through the tag, which can be overridden and will break this plugin. Why would anyone override the default tag you might ask? For routing purposes, which is currently impossible as the docker ID . Being able to have a custom tag would allow better routing capabilities. |
|
@patrick-stephens would you have time to review this PR? |
patrick-stephens
left a comment
There was a problem hiding this comment.
Looks ok to me but I am concerned about the lack of codeowners...
It needs rebasing as well.
Signed-off-by: Ra'Jiska <dodo.lasticot@gmail.com>
Signed-off-by: Ra'Jiska <dodo.lasticot@gmail.com>
Signed-off-by: Ra'Jiska <dodo.lasticot@gmail.com>
59cd20b to
8c960c9
Compare
WalkthroughThe ECS filter plugin now supports configurable container ID extraction. A new Changes
Sequence DiagramsequenceDiagram
participant Init as Initialization
participant Config as Configuration
participant Filter as Filter (cb_ecs_filter)
participant Tag as Tag-based Path
participant Field as Field-based Path
participant Meta as Metadata Store
Init->>Config: Load container_id_field_name setting
Config-->>Init: Config loaded
rect rgb(200, 220, 240)
Note over Filter: Per log event
Filter->>Filter: Check if container_id_field_name set
alt Field-based extraction enabled
Filter->>Field: deduce_short_container_id_from_record()
Field-->>Filter: container_short_id (or error)
Filter->>Meta: get_metadata_by_id(container_short_id)
Meta-->>Filter: metadata (or use cluster fallback)
else Tag-based extraction (default)
Filter->>Tag: deduce_short_container_id_from_tag()
Tag-->>Filter: container_short_id
Filter->>Meta: get_metadata_by_id(container_short_id)
Meta-->>Filter: metadata
end
Filter->>Filter: Populate resource with metadata
end
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
plugins/filter_ecs/ecs.c(7 hunks)plugins/filter_ecs/ecs.h(1 hunks)tests/runtime/filter_ecs.c(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/filter_ecs/ecs.c (3)
src/flb_sds.c (2)
flb_sds_create_len(58-76)flb_sds_destroy(389-399)src/flb_record_accessor.c (3)
flb_ra_create(271-358)flb_ra_get_value_object(803-814)flb_ra_destroy(232-248)src/flb_ra_key.c (1)
flb_ra_key_value_destroy(842-851)
tests/runtime/filter_ecs.c (2)
plugins/out_s3/s3.c (1)
setenv(57-60)src/flb_lib.c (3)
flb_filter_set(618-649)flb_start(983-994)flb_lib_push(843-870)
🔇 Additional comments (11)
plugins/filter_ecs/ecs.h (1)
148-148: LGTM!The new field
container_id_field_nameis appropriately placed in the struct and aligns with the feature requirements.tests/runtime/filter_ecs.c (4)
435-482: LGTM!The test correctly validates that container metadata is populated when a valid container_id field is present. The use of "randomtag" is appropriate since tag-based extraction is not used with the field-based approach.
484-531: LGTM!The test correctly validates fallback behavior when the configured field name doesn't exist in the record. The expected pattern "cluster_name.." confirms that only cluster metadata is attached.
533-586: LGTM!The test correctly validates handling of invalid container ID values (string too short and non-string type). Both records appropriately fall back to cluster metadata only.
596-598: LGTM!All three new tests are properly registered in the TEST_LIST.
plugins/filter_ecs/ecs.c (6)
202-206: LGTM!The debug logging appropriately informs which container ID retrieval method is in use based on the configuration.
1298-1328: LGTM!The helper function correctly extracts the 12-character container short ID from the tag using the configured prefix. Error messages are clear and the memory management is sound.
1368-1402: LGTM!The refactored function correctly accepts the pre-deduced
container_short_idand provides appropriate logging based on the extraction method in use.
1532-1563: LGTM!The tag-based extraction path correctly maintains backward compatibility, including the failure tracking mechanism to avoid repeated failed metadata requests. The fallback to cluster metadata is appropriate.
1593-1612: Verify interaction with the performance issue flagged earlier.This code block calls
deduce_short_container_id_from_recordfor every log event. As noted in the earlier comment (lines 1330-1366), creating a record accessor per event is expensive. Once that issue is addressed, this call site should be updated to pass the pre-created record accessor.The logic here is sound—extracting container ID per record is necessary since different records in the same batch can have different container IDs. The fallback behavior and cleanup are correct.
1796-1801: LGTM!The configuration map entry is complete with an appropriate description documenting that this option supersedes
ecs_tag_prefixwhen set.
| static int deduce_short_container_id_from_record(struct flb_filter_ecs *ctx, | ||
| const msgpack_object *record, | ||
| flb_sds_t *container_short_id) | ||
| { | ||
| struct flb_record_accessor *ra_key; | ||
| struct flb_ra_value *rval; | ||
|
|
||
| ra_key = flb_ra_create(ctx->container_id_field_name, FLB_TRUE); | ||
| if (!ra_key) { | ||
| flb_errno(); | ||
| return -1; | ||
| } | ||
| rval = flb_ra_get_value_object(ra_key, *record); | ||
| if (!rval) { | ||
| flb_plg_debug(ctx->ins, "Container field \"%s\" not found in record", | ||
| ctx->container_id_field_name); | ||
| flb_ra_destroy(ra_key); | ||
| return -1; | ||
| } | ||
| flb_ra_destroy(ra_key); | ||
| if (rval->type != FLB_RA_STRING || rval->o.via.str.size < 12) { | ||
| flb_plg_debug(ctx->ins, "Container field \"%s\" needs to represent a container " | ||
| "ID a string of at least 12 characters " | ||
| "representing the container ID", | ||
| ctx->container_id_field_name); | ||
| flb_ra_key_value_destroy(rval); | ||
| return -1; | ||
| } | ||
| *container_short_id = flb_sds_create_len(rval->val.string, 12); | ||
| if (!*container_short_id) { | ||
| flb_errno(); | ||
| flb_ra_key_value_destroy(rval); | ||
| return -1; | ||
| } | ||
| flb_ra_key_value_destroy(rval); | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Performance: Record accessor should be created once during initialization.
This function is called for every log event when container_id_field_name is configured (see line 1595). Creating and destroying a record accessor per event is expensive and will degrade throughput with high-volume logs.
Consider this refactor:
- Add a field to
struct flb_filter_ecs:
struct flb_record_accessor *container_id_ra;- Initialize it once in
cb_ecs_init(after line 206):
+ if (*ctx->container_id_field_name) {
+ ctx->container_id_ra = flb_ra_create(ctx->container_id_field_name, FLB_TRUE);
+ if (!ctx->container_id_ra) {
+ flb_plg_error(f_ins, "Could not create record accessor for container_id_field_name");
+ goto error;
+ }
+ }- Simplify this function to accept and use the pre-created record accessor:
-static int deduce_short_container_id_from_record(struct flb_filter_ecs *ctx,
+static int deduce_short_container_id_from_record(struct flb_record_accessor *ra_key,
+ struct flb_filter_ecs *ctx,
const msgpack_object *record,
flb_sds_t *container_short_id)
{
- struct flb_record_accessor *ra_key;
struct flb_ra_value *rval;
- ra_key = flb_ra_create(ctx->container_id_field_name, FLB_TRUE);
- if (!ra_key) {
- flb_errno();
- return -1;
- }
rval = flb_ra_get_value_object(ra_key, *record);
if (!rval) {
flb_plg_debug(ctx->ins, "Container field \"%s\" not found in record",
ctx->container_id_field_name);
- flb_ra_destroy(ra_key);
return -1;
}
- flb_ra_destroy(ra_key);- Clean up in
flb_filter_ecs_destroy(after line 1763):
+ if (ctx->container_id_ra) {
+ flb_ra_destroy(ctx->container_id_ra);
+ }- Update the call site at line 1595:
- ret = deduce_short_container_id_from_record(ctx,
+ ret = deduce_short_container_id_from_record(ctx->container_id_ra, ctx,
log_event.body,
&container_short_id);🤖 Prompt for AI Agents
In plugins/filter_ecs/ecs.c around lines 1330 to 1366, the function
creates/destroys a record accessor on every call which is costly; change it to
use a single pre-created accessor stored in ctx: add struct flb_record_accessor
*container_id_ra to struct flb_filter_ecs, create and assign
ctx->container_id_ra in cb_ecs_init (handle flb_ra_create failure), modify
deduce_short_container_id_from_record to use ctx->container_id_ra instead of
creating one (remove flb_ra_create/flb_ra_destroy calls and associated error
handling), ensure you still check rval for NULL and validate type/length as
before, free/ destroy ctx->container_id_ra in flb_filter_ecs_destroy, and update
the call site at line 1595 to rely on the ctx-held accessor (or pass it in) and
handle the case where the accessor is NULL.
| flb_plg_debug(ctx->ins, "Container field \"%s\" needs to represent a container " | ||
| "ID a string of at least 12 characters " | ||
| "representing the container ID", | ||
| ctx->container_id_field_name); |
There was a problem hiding this comment.
Minor grammar issue in error message.
The phrasing "needs to represent a container ID a string of at least 12 characters" is awkward.
Suggest:
- flb_plg_debug(ctx->ins, "Container field \"%s\" needs to represent a container "
- "ID a string of at least 12 characters "
- "representing the container ID",
+ flb_plg_debug(ctx->ins, "Container field \"%s\" must be a string of at least "
+ "12 characters representing the container ID",
ctx->container_id_field_name);🤖 Prompt for AI Agents
In plugins/filter_ecs/ecs.c around lines 1351 to 1354, the log message text is
grammatically awkward; update the format string so it reads clearly (for
example: "Container field \"%s\" must be a string of at least 12 characters
representing the container ID") and keep using ctx->container_id_field_name as
the argument; ensure spacing and punctuation are correct and the message remains
passed to flb_plg_debug unchanged otherwise.
|
Hi @RaJiska, thanks for the contribution. I read thru the PR and the attached issue but Im having trouble fully understanding the use-case here. Can you describe in more detail why you need this? Can you please explain the end-to-end reason why you need this? You can include any necessary details like full fluent-bit config file, ecs task definition, awscli commands, any other relevant aws infra, etc etc. Please also indicate what about that fluent-bit config is broken and where this new argument would fit in. Thank you. I see this quote, but I don't fully understand what you mean by this:
|
|
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Adds
container_id_field_nameto retrieve the container ID instead of deducing it from the tags viaecs_tag_prefixwhich is too restrictive to use as too invasive over the logic the user wants to use.Maintained the retrieval of the container ID to not introduce breaking change, but using
container_id_field_namewill override and retrieve container ID from the specified field.Fixes the following issue: #9028
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
fluentbit-valgrind-containerid.txt
fluentbit-valgrind-ecsprefix.txt
The above includes config file as well as debug outputs.
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
pipelines: filters: ecs: Retrieve container ID from record field fluent-bit-docs#1402
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Tests