Skip to content

Extractor timings#865

Open
Aske-Rosted wants to merge 4 commits intographnet-team:mainfrom
Aske-Rosted:extractor_timings
Open

Extractor timings#865
Aske-Rosted wants to merge 4 commits intographnet-team:mainfrom
Aske-Rosted:extractor_timings

Conversation

@Aske-Rosted
Copy link
Copy Markdown
Collaborator

This PR adds a wrapper around the base extractor class which is enabled if the root logger level is set to debug.

The wrapper enables recording and logging of the time used whenever the extractor is called. This does produce quite a bit logging but is useful when developing new extractors, in order to determine the time taken for each extractor call.

I do not believe that the debugging level is used very much in any other places inside of GraphNeT.

I am not using an init argument of the base extractor class to enable the timing, as that would require each of the sub-classes to also take this input argument and pass it on to the super class. This can be done if preferred but would require a lot of manual argument passing.

Copy link
Copy Markdown
Collaborator

@sevmag sevmag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Feature! Claude has a few opinions on this that I share:

A few concerns with this PR:

1. The enable-timing check runs at class-definition time, not at call time.
__init_subclass__ executes once per subclass, at import. That freezes the timing decision for the lifetime of the process. The very common workflow — import the library, then bump the logger to DEBUG to investigate something — will silently produce no timing output. The check should live inside the wrapper itself (if self._logger.isEnabledFor(logging.DEBUG): ...) so log-level changes are respected at runtime.

2. Coupling timing to the global log level is fragile.
The PR description notes that DEBUG isn't used much elsewhere in GraphNeT — but that's a landmine, not a justification. The first time someone enables DEBUG for an unrelated reason, they'll get a flood of extractor-timing output they didn't ask for. Timing should have its own switch (init arg, env var, or dedicated logger name), not piggyback on the root level.

3. Inconsistent behavior across the class hierarchy.
Because __init_subclass__ fires per subclass, a subclass defined while DEBUG is on gets timing and a sub-subclass defined later (e.g. in user code) when DEBUG is off doesn't. Mixed hierarchies will be inconsistently instrumented.

4. Wrong clock. Use time.perf_counter() instead of time.time() — wall-clock time is subject to NTP/system-clock adjustments and is not appropriate for measuring elapsed durations.

5. Magic number. getEffectiveLevel() > 10 should be > logging.DEBUG.

6. Misleading parameter name. In timed_call(cls: "Extractor", *args), the first argument is self, not cls__call__ is an instance method. Naming it cls will confuse readers.

7. Duplicated branches. Both if/else paths do cls.__call__ = cls.exclude(cls.__call__). Collapse to one unconditional line, then optionally wrap with the timer.

8. Inline import time. time is stdlib; move it to the top of the file.

9. No tests. A behavior change conditional on logger state should at minimum have one test for each branch.

10. Minor: original_call is captured after exclude is applied, so the reported duration includes exclusion overhead — probably not what you want when profiling extractor logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants