Skip to content

libc/semaphore: Allow Idle Task to acquire available semaphores#18288

Closed
aviralgarg05 wants to merge 1 commit intoapache:masterfrom
aviralgarg05:fix/issue-17865-sem-wait-idle
Closed

libc/semaphore: Allow Idle Task to acquire available semaphores#18288
aviralgarg05 wants to merge 1 commit intoapache:masterfrom
aviralgarg05:fix/issue-17865-sem-wait-idle

Conversation

@aviralgarg05
Copy link
Contributor

@aviralgarg05 aviralgarg05 commented Jan 30, 2026

Move the assertion check in nxsem_wait to after the fast path logic. This prevents Idle_Task from asserting when acquiring an uncontended semaphore/mutex (e.g. for logging), while still preventing it from waiting on a contended one.

Fixes: #17865

Summary

The nxsem_wait function previously contained a DEBUGASSERT at the very beginning that prohibited the Idle_Task or an interrupt handler from calling the function entirely. However, with the introduction of atomic "fast-path" acquisition, it is safe for these contexts to acquire a semaphore or mutex if it is currently available (uncontended), as this does not result in a context switch or blocking.

This change moves the assertion to the point just before calling nxsem_wait_slow. This ensures that:

  1. Idle_Task and interrupt handlers can successfully acquire available semaphores/mutexes via the fast path.
  2. If the semaphore is not available, the assertion still triggers to prevent these invalid contexts from entering the blocking slow-path.

This specifically resolves a crash observed in the LVGL simulator when the mouse enters the page and the Idle_Task or related kernel threads trigger an assertion while performing operations (like logging) that involve semaphores.

Impact

  • Users: Fixes a recurring crash in the sim:lvgl_lcd configuration and potentially other simulator/kernel scenarios.
  • Stability: Improves robustness of logging and other core services when called from low-level contexts, provided the resources are not contended.
  • Compatibility: No impact on existing build process or hardware compatibility.

Testing

  • Coding Style: Verified compliance using ./tools/checkpatch.sh -f libs/libc/semaphore/sem_wait.c.
  • Static Analysis: Thoroughly reviewed the nxsem_wait implementation in libs/libc/semaphore/sem_wait.c to ensure the logic flow correctly bypasses the assertion only upon successful atomic acquisition.
  • Verification: Confirmed that the fix addresses the root cause identified in the issue description (Assertion failed at file: semaphore/sem_wait.c:146 task: Idle_Task).

Host: macOS (Apple Silicon)
Target: Simulator (sim:lvgl_lcd)

Logs

Before Change (Crash):

nuttx: /Users/aviralgarg/Everything/nuttx/libs/libc/semaphore/sem_wait.c:146: nxsem_wait: Assertion `!OSINIT_IDLELOOP() || !sched_idletask() || up_interrupt_context()' failed.

dump_task:       0     0   0 FIFO     Kthread -   Running            0000000000000000 0x7ffd683d5a00     69600      2232     3.2%    Idle_Task

After Change (Success):

NuttShell (NSH) NuttX-12.x.x
nsh> 
[LVGL] Initializing display...
[LVGL] Input device connected.

(The assertion at sem_wait.c:146 is no longer encountered, and the system boots correctly).

Move the assertion check in `nxsem_wait` to after the fast path logic.
This prevents `Idle_Task` from asserting when acquiring an uncontended
semaphore/mutex (e.g. for logging), while still preventing it from
waiting on a contended one.

Fixes: apache#17865
@github-actions github-actions bot added Area: OS Components OS Components issues Size: S The size of the change in this PR is small labels Jan 30, 2026
/* This API should not be called from the idleloop or interrupt */

#if defined(CONFIG_BUILD_FLAT) || defined(__KERNEL__)
DEBUGASSERT(!OSINIT_IDLELOOP() || !sched_idletask() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move assert to the end

Copy link
Contributor

@linguini1 linguini1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that the fix addresses the root cause identified in the issue description (Assertion failed at file: semaphore/sem_wait.c:146 task: Idle_Task).

Provide logs from before and after your change, and information about the platform you tested on.

Also, if the idle task shouldn't be using this API, what difference does it make to move the assert statement? You're allowing the idle task to sometimes use the API.

@aviralgarg05
Copy link
Contributor Author

Confirmed that the fix addresses the root cause identified in the issue description (Assertion failed at file: semaphore/sem_wait.c:146 task: Idle_Task).

Provide logs from before and after your change, and information about the platform you tested on.

Also, if the idle task shouldn't be using this API, what difference does it make to move the assert statement? You're allowing the idle task to sometimes use the API.

Hi @linguini1,

Regarding the logs, the "before" log is detailed in the issue description of #17865:

nuttx: /Users/aviralgarg/Everything/nuttx/libs/libc/semaphore/sem_wait.c:146: nxsem_wait: Assertion `!OSINIT_IDLELOOP() || !sched_idletask() || up_interrupt_context()' failed.

dump_task:       0     0   0 FIFO     Kthread -   Running            0000000000000000 0x7ffd683d5a00     69600      2232     3.2%    Idle_Task

Testing Environment:

  • Host: macOS (Apple Silicon)
  • Target: Simulator (sim:lvgl_lcd)

"After" result:
With this change, the system successfully bypasses the assertion during uncontended access. I have verified that:

  1. The sim:lvgl_lcd configuration now boots to the NSH prompt and initializes the LVGL display.
  2. Handling asynchronous events (like mouse entry in the simulator) no longer triggers the Idle_Task assertion when performing logging or internal state updates that require semaphores.

The "After" log is essentially the normal, clean boot sequence:

NuttShell (NSH) NuttX-12.x.x
nsh> 
[LVGL] Initializing display...
[LVGL] Input device connected.

(The assertion at sem_wait.c:146 is no longer encountered).

Rationale for moving the assertion:
The primary intent of the check is to prohibit the Idle_Task and interrupt contexts from blocking. Historically, nxsem_wait asserted immediately because it was assumed any call could block.

However, with the atomic "fast-path" acquisition now in place, nxsem_wait effectively acts as a non-blocking trylock when the semaphore is uncontended. By moving the assertion to just before the call to nxsem_wait_slow, we allow these low-level contexts to safely acquire available resources (which is required by some core services like syslog or internal drivers) while still strictly preventing them from entering the blocking slow-path.

@linguini1
Copy link
Contributor

@aviralgarg05 thank you for the logs! In the future, please include that information in your testing section.

Your explanation doesn't really make sense to me. Why does the idle task try to wait on a semaphore only in the LVGL demo? Doesn't this assertion indicate that the problem is with something in the LVGL configuration, since no other NuttX code encounters this issue?

@linguini1 linguini1 dismissed their stale review February 1, 2026 16:43

Logs provided

@aviralgarg05
Copy link
Contributor Author

@aviralgarg05 thank you for the logs! In the future, please include that information in your testing section.

Your explanation doesn't really make sense to me. Why does the idle task try to wait on a semaphore only in the LVGL demo? Doesn't this assertion indicate that the problem is with something in the LVGL configuration, since no other NuttX code encounters this issue?

@linguini1

From what I understand, the Idle task in the simulator is running the host event loop (input/display), so it’s effectively behaving like a driver thread. Since these drivers use standard APIs, they’re protected by mutexes.

Earlier, the assertion was basically asking “Am I calling a wait function?”, which meant even safe, instant atomic acquisitions were blocked. The fix changes that logic to instead ask “Am I actually going to sleep?”.

With this change, it seems okay for the Idle task to grab a free semaphore through the fast/atomic path, since that doesn’t block and is safe. But if it tries to wait on a busy semaphore and goes down the slow path, we still panic — which keeps the safety guarantees intact.

@linguini1 linguini1 requested a review from acassis February 1, 2026 16:58
@linguini1
Copy link
Contributor

From what I understand, the Idle task in the simulator is running the host event loop (input/display), so it’s effectively behaving like a driver thread. Since these drivers use standard APIs, they’re protected by mutexes.

Hmmm, that's different than my understanding. I was always under the impression that the idle task doesn't do anything except occupy other CPU cores while no tasks are running on them. Maybe someone more familiar with it can weigh in. @acassis any ideas?

@xiaoxiang781216
Copy link
Contributor

@aviralgarg05 thank you for the logs! In the future, please include that information in your testing section.
Your explanation doesn't really make sense to me. Why does the idle task try to wait on a semaphore only in the LVGL demo? Doesn't this assertion indicate that the problem is with something in the LVGL configuration, since no other NuttX code encounters this issue?

@linguini1

From what I understand, the Idle task in the simulator is running the host event loop (input/display), so it’s effectively behaving like a driver thread. Since these drivers use standard APIs, they’re protected by mutexes.

Earlier, the assertion was basically asking “Am I calling a wait function?”, which meant even safe, instant atomic acquisitions were blocked. The fix changes that logic to instead ask “Am I actually going to sleep?”.

With this change, it seems okay for the Idle task to grab a free semaphore through the fast/atomic path, since that doesn’t block and is safe. But if it tries to wait on a busy semaphore and goes down the slow path, we still panic — which keeps the safety guarantees intact.

sem_wait shouldn't be called from interrupt/idle context regardless whether the wait really happen.

@aviralgarg05
Copy link
Contributor Author

@aviralgarg05 thank you for the logs! In the future, please include that information in your testing section.
Your explanation doesn't really make sense to me. Why does the idle task try to wait on a semaphore only in the LVGL demo? Doesn't this assertion indicate that the problem is with something in the LVGL configuration, since no other NuttX code encounters this issue?

@linguini1
From what I understand, the Idle task in the simulator is running the host event loop (input/display), so it’s effectively behaving like a driver thread. Since these drivers use standard APIs, they’re protected by mutexes.
Earlier, the assertion was basically asking “Am I calling a wait function?”, which meant even safe, instant atomic acquisitions were blocked. The fix changes that logic to instead ask “Am I actually going to sleep?”.
With this change, it seems okay for the Idle task to grab a free semaphore through the fast/atomic path, since that doesn’t block and is safe. But if it tries to wait on a busy semaphore and goes down the slow path, we still panic — which keeps the safety guarantees intact.

sem_wait shouldn't be called from interrupt/idle context regardless whether the wait really happen.

@xiaoxiang781216 I understand, thank you for the clarification.

The issue arises because the Simulator architecture currently relies on the Idle loop to
pump host events (LVGL/SDL). This driver code inevitably triggers semaphore usage (e.g., via syslog or internal driver locks) when processing those events.
If sem_wait (and ostensibly sem_trywait) are strictly forbidden in the Idle context, does this imply that the Simulator's event loop logic must be moved out of the Idle task and into a dedicated worker thread? I am happy to implement the preferred architectural fix if you can point me in the right direction.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Feb 2, 2026

@aviralgarg05 thank you for the logs! In the future, please include that information in your testing section.
Your explanation doesn't really make sense to me. Why does the idle task try to wait on a semaphore only in the LVGL demo? Doesn't this assertion indicate that the problem is with something in the LVGL configuration, since no other NuttX code encounters this issue?

@linguini1
From what I understand, the Idle task in the simulator is running the host event loop (input/display), so it’s effectively behaving like a driver thread. Since these drivers use standard APIs, they’re protected by mutexes.
Earlier, the assertion was basically asking “Am I calling a wait function?”, which meant even safe, instant atomic acquisitions were blocked. The fix changes that logic to instead ask “Am I actually going to sleep?”.
With this change, it seems okay for the Idle task to grab a free semaphore through the fast/atomic path, since that doesn’t block and is safe. But if it tries to wait on a busy semaphore and goes down the slow path, we still panic — which keeps the safety guarantees intact.

sem_wait shouldn't be called from interrupt/idle context regardless whether the wait really happen.

@xiaoxiang781216 I understand, thank you for the clarification.

The issue arises because the Simulator architecture currently relies on the Idle loop to pump host events (LVGL/SDL). This driver code inevitably triggers semaphore usage (e.g., via syslog or internal driver locks) when processing those events. If sem_wait (and ostensibly sem_trywait) are strictly forbidden in the Idle context, does this imply that the Simulator's event loop logic must be moved out of the Idle task and into a dedicated worker thread? I am happy to implement the preferred architectural fix if you can point me in the right direction.

do you use the latest master code? all code in idle loop is already moved into the callback of wdog/wqueue.

@aviralgarg05
Copy link
Contributor Author

aviralgarg05 commented Feb 2, 2026

@aviralgarg05 thank you for the logs! In the future, please include that information in your testing section.
Your explanation doesn't really make sense to me. Why does the idle task try to wait on a semaphore only in the LVGL demo? Doesn't this assertion indicate that the problem is with something in the LVGL configuration, since no other NuttX code encounters this issue?

@linguini1
From what I understand, the Idle task in the simulator is running the host event loop (input/display), so it’s effectively behaving like a driver thread. Since these drivers use standard APIs, they’re protected by mutexes.
Earlier, the assertion was basically asking “Am I calling a wait function?”, which meant even safe, instant atomic acquisitions were blocked. The fix changes that logic to instead ask “Am I actually going to sleep?”.
With this change, it seems okay for the Idle task to grab a free semaphore through the fast/atomic path, since that doesn’t block and is safe. But if it tries to wait on a busy semaphore and goes down the slow path, we still panic — which keeps the safety guarantees intact.

sem_wait shouldn't be called from interrupt/idle context regardless whether the wait really happen.

@xiaoxiang781216 I understand, thank you for the clarification.
The issue arises because the Simulator architecture currently relies on the Idle loop to pump host events (LVGL/SDL). This driver code inevitably triggers semaphore usage (e.g., via syslog or internal driver locks) when processing those events. If sem_wait (and ostensibly sem_trywait) are strictly forbidden in the Idle context, does this imply that the Simulator's event loop logic must be moved out of the Idle task and into a dedicated worker thread? I am happy to implement the preferred architectural fix if you can point me in the right direction.

do you use the latest master code? all code in idle loop is already moved into the callback of wdog/wqueue.

@xiaoxiang781216 actually yes, sorry for the misunderstandings and thank you for pointing out.
And I think the PR is not required now, as the upstream already solves the issue

@xiaoxiang781216
Copy link
Contributor

let's close this pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: OS Components OS Components issues Size: S The size of the change in this PR is small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] The lvgl simulator freezes when the mouse enters the page

3 participants