Skip to content

Comments

Minor logging setup refactors#4060

Open
nrwahl2 wants to merge 39 commits intoClusterLabs:mainfrom
nrwahl2:nrwahl2-logging
Open

Minor logging setup refactors#4060
nrwahl2 wants to merge 39 commits intoClusterLabs:mainfrom
nrwahl2:nrwahl2-logging

Conversation

@nrwahl2
Copy link
Contributor

@nrwahl2 nrwahl2 commented Feb 18, 2026

I think this is everything I have queued up for logging. I did this work a few months ago and updated it just now.

I also really want to make sense of the callsite stuff. However, this setup/scaffolding code is the easier part, so I might as well make it less of an eyesore first.

I am not actively working on anything else in the Pacemaker logging code. I'm opening this PR to avoid conflicts with anything Chris is working on.

With a clang compiler, this is a no-op.

With a non-clang compiler, this is defined as QB_LOG_INIT_DATA().
However, that macro did not actually initialize data. Rather, it is a
sanity check of some wildly complicated "linker magic" in libqb.

The libqb maintainers decided to drop the linker magic altogether in
v2.0.0, deciding that it provided little or no performance savings.
Beginning with v2.0.0, QB_LOG_INIT_DATA() is a no-op.

For a description of what QB_LOG_INIT_DATA() did and why it was dropped
(along with a LOT of related code), refer to the commit message and the
removed code/comments in commit 633f2624.
ClusterLabs/libqb@633f2624

To summarize:
* Prior to libqb v2.0.0, QB_LOG_INIT_DATA() was only an optional sanity
  check (and was eventually documented accordingly).
* Starting with libqb v2.0.0, QB_LOG_INIT_DATA() did nothing at all.

So we lose nothing by dropping our calls to it.

Ref T1

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Ref T622

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
To handle_glib_message(), to avoid public API prefix "crm_". Also add
Doxygen, and name and document arguments the same way as the GLogFunc
definition.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Nothing uses it.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
This avoids having to track so many variables, for one thing.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Below crm_log_init() is a more reasonable location, and soon we'll want
cleanup_tracing() defined above crm_log_deinit().

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And use it to free all internal logging data structures. Nothing was
calling this function before.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
If this is a daemon. There's no need to wait to set it until we enable
(or disable, for some reason) the blackbox. We also make it non-volatile
(it can only be set once) and free it in crm_log_deinit().

Also cast getpid() to long long, since pid_t is a signed integer type.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And scope the variables to their respective loops.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
No other changes besides splitting it up right now. This keeps the diff
small.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
To enable_and_write_blackbox().

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
* Drop a variable (qb_facility) that doesn't contribute much to
  readability.
* Zero-initialize the struct utsname to avoid a later memset().
* Use pcmk__str_empty() for clarity.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also drop pointless CRM_LOG_ASSERT() call, and improve formatting spec
notes.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
It's always crm_system_name.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also rename arguments.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
When this is called, crm_log_level is always LOG_CRIT.

crm_log_preinit() -> set_format_string() -> crm_tracing_enabled()

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
It's very short and only used once, and the name was misleading in my
opinion. It's more accurate to say that the argument is ignored. If
logging to some other file has been enabled, we don't disable it based
on these special values.

Also remove a redundant check against PCMK_VALUE_NONE in crm_log_init().

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
For whatever reason, the intent here is to ensure that the daemon group
owns the log file and can read and write it.

There are other ways that the daemon user could read and write the file.
For example, the file could belong to the daemon user even if it belongs
to a different group, or the group owner could be one of the daemon
user's secondary groups. It's not clear that we should be changing the
ownership in those cases. Also, the daemon group could have read/write
permissions via ACLs.

The current logic seems to try to ensure that ANY member of the daemon
group can read and write the file, although as discussed in the "\todo"
directive, it may not achieve that goal.

If we ever decide that we only want to ensure that the daemon user can
access the file, then we could impersonate the daemon user, try to open
the file, and then switch back to root. Or we could simply drop all of
this.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also don't exclude a 0 mode, even though it wouldn't make any sense.
There are other modes that wouldn't deny access to the daemon user, and
we don't treat those specially.

Note that pcmk__scan_ll() in its current form doesn't support the error
checking that we're doing here (it's at least missing a way to check for
trailing garbage).

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Anything that returned non-pcmk_rc_ok has already logged a warning.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
@nrwahl2 nrwahl2 requested a review from clumens February 18, 2026 03:06
@clumens clumens closed this Feb 18, 2026
@clumens clumens reopened this Feb 18, 2026

free(crm_system_name);
crm_system_name = NULL;
crm_log_deinit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like nothing has ever called crm_log_deinit aside from an old test program.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But since we have crm_log_init(), it seems reasonable to put the cleanup stuff in crm_log_deinit().

Or we could go ahead and deprecate it, and just have a private cleanup function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm fine with using it. I was just curious about the historical context, as well as whether there's a way to determine whether you can find all references to a string throughout history using git.

"descriptor: %s", filename, strerror(rc));
return rc;
pcmk__warn("Logging to '%s' is disabled because open() failed as root",
filename, strerror(rc));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many arguments for format string

@clumens
Copy link
Contributor

clumens commented Feb 18, 2026

You've done so much work to lib/common/logging.c that it would be nice to also run include-what-you-use and take care of that at the same time.

@clumens clumens added the review: in progress PRs that are currently being reviewed label Feb 18, 2026
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Don't bother opening in append mode with fopen() unless we're root. In
that case, do it in set_logfile_permissions() to limit scope of the file
descriptor. We use that file descriptor only for passing to fstat(),
fchown(), and fchmod().

This also lets us simplify the error message, since the user must be
root, and lets us drop a fileno() error check.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
This code has been disabled since it was added by commit 14d037e in
2013. Synchronous logging would be nice, especially for root cause
analysis when a node is fenced. However, we do a ton of logging, so
calling fsync() on each log call seems too expensive.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
...and helpers. This variable does not hold a file descriptor but rather
a libqb target identifier, which is of type int32_t.

Also change variable types to int32_t accordingly.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
@clumens
Copy link
Contributor

clumens commented Feb 18, 2026

I mentioned this in slack the other day, but it's worth repeating here. I really dislike how i18n is configured at the end of crm_log_preinit. It's clearly its own setup step unrelated to logging.

First, its position in that function and file makes it feel like something that'll happen as part of library init (I don't think this code gets called anywhere but for command line programs, but it's hard to say with the logging stuff) which shouldn't be done.

Second, it needs to happen pretty much first thing in any command line program, earlier than logging setup. This is in case we ever get around to translating --help output. The basic order needs to be: i18n, argument processing, formatted output, logging. One wrinkle here is that i18n setup calls pcmk__assert which could end up calling logging functions. Oh well.

Anyway, I was wondering if you had any thoughts on this since you're in the area, or if this was something you were planning on looking at in an upcoming PR.

@clumens
Copy link
Contributor

clumens commented Feb 18, 2026

One last observation about logging in general. I hate that we have pcmk__cli_init_logging, crm_log_init, and crm_log_preinit. I hate that they call each other. I hate that daemons call pcmk__cli_init_logging (which calls the other two) and then call the others as well. I do like that command line tools only call pcmk__cli_init_logging... except for crm_shadow, which also calls crm_log_preinit, which I hate.

The whole stack is just so annoying.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 18, 2026

Anyway, I was wondering if you had any thoughts on this since you're in the area, or if this was something you were planning on looking at in an upcoming PR.

I have given this zero thought up to this point, and so it's not something I was specifically planning to address in an upcoming PR. Right now as far as refactors go, I'm focused on pacemaker-based and ACL/XML code.

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 18, 2026

First, its position in that function and file makes it feel like something that'll happen as part of library init... which shouldn't be done.

What do you mean it shouldn't be done? Why not?

I don't think this code gets called anywhere but for command line programs, but it's hard to say with the logging stuff

It gets called by all of our CLI tools, yes, and also by all of our daemons. Nearly everything that Pacemaker runs as a process -- agents are the only exceptions that come to mind -- will call this, often via crm_log_init(). Further, crm_log_init() has been our de facto library init function. I think there's an upstream task to improve that situation.

Second, it needs to happen pretty much first thing in any command line program, earlier than logging setup.

Right, this is another motivation to improve the library init organization. I don't have much else to say about that. I agree that it's not great.

One wrinkle here is that i18n setup calls pcmk__assert which could end up calling logging functions.

Yeah, it makes sense that we can't log an internationalized message if internationalization has failed, unfortunately.

@clumens
Copy link
Contributor

clumens commented Feb 19, 2026

Oh, one more question I had. Have you tried any of the blackbox stuff, either before or after these patches? I haven't, so I'm curious if it works and if it's useful.

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 19, 2026

Oh, one more question I had. Have you tried any of the blackbox stuff, either before or after these patches? I haven't, so I'm curious if it works and if it's useful.

I have not.

I think I've used blackbox once or twice in my life (several years ago at this point), and it was more out of curiosity than anything. I don't remember gaining anything useful from it. Which leads me to wonder when or if it's ever useful in practice. We could be either missing out on something that would save us time diagnosing crashes, etc., or maintaining clutter for no reason haha.

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 19, 2026

Which leads me to wonder when or if it's ever useful in practice. We could be either missing out on something that would save us time diagnosing crashes, etc., or maintaining clutter for no reason haha.

Based on its description, it seems mostly intended for diagnosing crashes. (I don't know when a SIGTRAP would ever be relevant.) I've mostly used gdb combined with simple log analysis for that so far, and it's served me pretty well. Crashes and assertion failures typically dump a core.

I want to be charitable and think of ways this could help us, but I'm struggling to come up with anything that we can't get by simply reading /var/log/pacemaker/pacemaker.log. The only thing that's coming to mind, is that it avoids the need to explicitly enable debugging and tracing for specific components/functions/files -- the blackbox, once enabled, keeps log messages of all severities.

# Enable blackbox logging globally (yes or no) or by subsystem. A blackbox
# contains a rolling buffer of all logs (of all severities). Blackboxes are
# stored under /var/lib/pacemaker/blackbox by default, and their contents can
# be viewed using the qb-blackbox(8) command.
#
# The blackbox recorder can be enabled at start using this variable, or at
# runtime by sending a Pacemaker subsystem daemon process a SIGUSR1 or SIGTRAP
# signal, and disabled by sending SIGUSR2 (see kill(1)). The blackbox will be
# written after a crash, assertion failure, or SIGTRAP signal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review: in progress PRs that are currently being reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants