Skip to content

Conversation

@abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Dec 29, 2025

This patch adds task dimensions to the ServiceEmitter so individual monitors no longer need to pass the task holder to retrieve task dimensions and append them on every monitor call. Some of the monitors were even missing these task dimensions.

The added dimensions include taskId, dataSource, taskType and groupId. This change ensures that all task metrics are emitted with these task dimensions by default and in a consistent format.

Changes:

  • Plumb TaskHolder into the ServiceEmitter via the EmitterModule.
  • Add a new method, TaskHolder.getMetricDimensions() and override it as appropriate.
  • Initialize the dimensions once during the ServiceEmitter’s start() by calling TaskHolder.getMetricDimensions(). This is done during the start() explicitly to ensure all the dependencies are ready.
  • Move the TaskHolder and NoopTaskHolder classes from the server module to the processing module for better reuse.
  • Remove and cleanup all usages of the dimension map from the individual monitor implementation since this is taken care of already.
  • Add TestLoadSpecHolder; this isn’t really in scope for this change, so I’m happy to move it out to simplify review if needed.
  • The remainder of the changes are primarily tests and related refactoring.

Release note

All task metrics are emitted with the following dimensions: taskId, dataSource, taskType, groupId, and id (for backward compatibility; id will be removed in favor of the taskId dimension in a future release).

This PR has:

  • been self-reviewed.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

This patch adds task dimensions to the ServiceEmitter so different
monitors don't need to pass the holder to get the dimensions and append
it during every monitor call. This also ensures that all task metrics emitted
contain the task dimensions by default and are emitted in the same format.

The dimensions added include  taskId, dataSource, taskType, groupId and id.
id is added primarily for backward compatibility as some monitors were previously
emitting the id dimension rather than taskId. We can remove it in later release as it's
been deprecated already.
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, @abhishekrb19 ! The task dimensions wiring is much cleaner and easy to follow now!

* @return a universally useful JVM-wide monitor
*/
public static Monitor createCompoundJvmMonitor(Map<String, String[]> dimensions)
public static Monitor createCompoundJvmMonitor()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class/method used anywhere? I couldn't find any usage.

Copy link
Contributor Author

@abhishekrb19 abhishekrb19 Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think they’re used anywhere. Opened a separate PR to remove this unused class altogether: #18877

/**
* Provides identifying information for a task. Implementations return {@code null}
* when used in server processes that are not {@code CliPeon}.
* when used in server processes that are not {@code CliPeon}. Note that t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete javadoc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, removed.

String getTaskId();

/**
* @return the taskId, or {@code null} if called from a server that is not {@code CliPeon}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return the taskId, or {@code null} if called from a server that is not {@code CliPeon}.
* @return the type name of this task, or {@code null} if called from a server that is not {@code CliPeon}.

String getTaskType();

/**
* @return the taskId, or {@code null} if called from a server that is not {@code CliPeon}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return the taskId, or {@code null} if called from a server that is not {@code CliPeon}.
* @return the group ID of this task, or {@code null} if called from a server that is not {@code CliPeon}.

}

/**
* Initialize a stub service emitter and auto-{@link #start()} it for test convenience.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing the auto-start in the constructor, add a static utility method that may be used something like

ServiceEmitter emitter = StubServiceEmitter.createStarted();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done

public Map<String, String> getMetricDimensions()
{
return Map.of(
DruidMetrics.DATASOURCE, getDataSource(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Each getter invokes the taskProvider separately. Maybe just invoke it once and assign the result to a Task task variable.

/**
* Initialize a stub service emitter. Tests must explicitly call {@link #start()}.
*/
public StubServiceEmitter(String service, String host, TaskHolder taskHolder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the LatchableEmitter constructor to accept a TaskHolder too.
Otherwise, the task dimensions will not show up in embedded tests.
You could also add a short test method in any of the existing tests to verify the new dimensions.

Copy link
Contributor

@kfaraz kfaraz Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also add a short test method in any of the existing tests to verify the new dimensions.

Oh, I suppose this would require MMs. You could add a metric verification in the KubernetesTaskRunnerDockerTest since that test runs peons and also uses an event collector + latchable emitter combo to wait on metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the LatchableEmitter constructor to accept a TaskHolder too.
Otherwise, the task dimensions will not show up in embedded tests.

Sounds good, updated.

Oh, I suppose this would require MMs. You could add a metric verification in the KubernetesTaskRunnerDockerTest since that test runs peons and also uses an event collector + latchable emitter combo to wait on metrics.

I’m trying to figure out how to run the Docker tests locally to verify this...I’m looking at the GHA Docker tests to see what I’m missing in my setup. It would probably be a good idea to add a brief README or update the javadocs, to document how devs can run these tests locally, since it requires some additional setup like installing dependencies like k3s, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair, let me create a PR to add the steps for running these tests.
Meanwhile, we can merge this PR and you can add the tests in a follow up.

since it requires some additional setup like installing dependencies like k3s, etc.

I might need to check but AFAIR, the tests just need docker to be running. There are some steps required though, like setting the Druid image name etc.

public void close()
{
try {
emitter.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the emitter always a NoopEmitter, why do we need to invoke start and close on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I intended to delegate this to the base class via super.close() here, similar to how start() invokes super.start() to delegate to the base class, which takes care of additional initialization. We could skip this in close() as it doesn't do much, but it feels cleaner to do the same way here as well for completeness.

{
private final ImmutableMap<String, String> serviceDimensions;
private final Emitter emitter;
protected final Emitter emitter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not private?

import org.apache.druid.guice.ManageLifecycle;
import org.apache.druid.java.util.emitter.core.Emitter;

public class StubServiceEmitterModule implements Module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding this, I wonder if we shouldn't just move LatchableEmitterModule to processing/src/test/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I considered doing this. However, all the unit tests use StubServiceEmitter directly and moving LatchableEmitterModule to processing would also require moving non-test classes like DruidNode and other dependencies from the server module into processing. Given that, I’ve left this test stub module in place for now and we can refactor these test dependencies later if needed.

I’ve also wired TaskHolder into LatchableEmitterModule so the embedded tests can make use of it as needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that, I’ve left this test stub module in place for now and we can refactor these test dependencies later if needed.

Makes sense, thanks!

@kfaraz
Copy link
Contributor

kfaraz commented Dec 30, 2025

Some of the monitors were even missing these task dimensions.

@abhishekrb19 , did you mean monitors like CgroupCpuMonitor which do have a Map<String, String[]> dimensions as a constructor arg but were still not wired up correctly? I couldn't figure out how these dimensions were getting injected in the original code.

@abhishekrb19
Copy link
Contributor Author

@kfaraz thanks for taking a look!

@abhishekrb19 , did you mean monitors like CgroupCpuMonitor which do have a Map<String, String[]> dimensions as a constructor arg but were still not wired up correctly? I couldn't figure out how these dimensions were getting injected in the original code.

That's right, all the Cgroup* monitors were previously initialized with an empty dimensions map via the default constructor.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏻

@abhishekrb19 abhishekrb19 merged commit 9cea875 into apache:master Dec 30, 2025
49 checks passed
@abhishekrb19 abhishekrb19 deleted the peon_dimension_service_emitter branch December 30, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants