This fixes a subtle race condition in the thread timeout expiration
handler z_thread_timeout(). There was a small window of opportunity
between when sys_clock_announce() unlocked interrupts and that the
handler re-locked them that one or more higher priority interrupts
(or threads running on another CPU if in an SMP environment) could
abort the thread's timeout.
The fix has two parts. Part one ensures that _sched_spinlock is held
in every location before a thread's time can be canceled. Of the
various locations, only z_unpend_thread() was found to need updating.
Part two updates the timeout handler z_thread_timeout() to bail early
if the thread's timeout has been found to be canceled (or re-used)
during that aforementioned window.
Fixes#106653
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Extend z_spin_is_locked() to non-SMP configurations so assertions like
the one in z_unpend_all_locked() can validate lock ownership in UP
builds too. In UP a spinlock reduces to an IRQ lock, so the check
samples the current IRQ state via arch_irq_lock() / arch_irq_unlock().
Drop the now-unnecessary CONFIG_SMP guard around the sched spinlock
assertion in z_unpend_all_locked().
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
IAR emits diagnostic Go004 ("function cannot be inlined") for
every ALWAYS_INLINE function when optimisation is disabled, e.g.
in debug builds. The previous workaround wrapped each affected
function in per-function preprocessor guard pairs:
#ifdef IAR_SUPPRESS_ALWAYS_INLINE_WARNING_FLAG
TOOLCHAIN_DISABLE_WARNING(TOOLCHAIN_WARNING_ALWAYS_INLINE)
#endif
static ALWAYS_INLINE void foo(...) { ... }
#ifdef IAR_SUPPRESS_ALWAYS_INLINE_WARNING_FLAG
TOOLCHAIN_ENABLE_WARNING(TOOLCHAIN_WARNING_ALWAYS_INLINE)
#endif
This pattern is highly intrusive, scatters toolchain-specific
knowledge across generic source files, and requires a guard pair
every time a new ALWAYS_INLINE function is added for IAR.
Replace it with a single override of ALWAYS_INLINE inside
iccarm.h, using the C99 _Pragma operator to embed the diagnostic
suppression in the macro itself.
Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Migrate scheduler API implementations (k_sched_lock/unlock,
z_reschedule, z_yield_current, etc.) and their private declarations
from ksched.h/sched.c into scheduler.c and scheduler.h.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Move time-slice related declarations from ksched.h into the
dedicated kernel/include/timeslicing.h header.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Migrate z_add_thread_to_ready_q(), z_remove_thread_from_ready_q(),
and related helpers from sched.c to scheduler.c.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Move z_sched_init to scheduler.c and somplify implementation getting rid
of single use init_ready_q.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Reorder so that z_ready_thread and z_unready_thread are adjacent,
improving code locality for related queue operations.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Move run-queue management functions (add/remove/peek thread,
choose_next_thread) from sched.c into the new
kernel/include/run_q.h header.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Move meta-IRQ (highest-priority cooperative queue) scheduling
functions from sched.c into a new kernel/include/metairq.h header
to reduce sched.c size and group related logic.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Reduce complexity of sched.c by encapsulating sleep handling code
(k_sleep, k_usleep, k_msleep) into its own file.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Relocate k_thread_start(), k_thread_abort(), k_thread_suspend(), and
k_thread_resume() from sched.c to thread.c alongside related thread
lifecycle code.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Add a runtime assertion in z_unpend_all_locked() to verify that
_sched_spinlock is actually held by the caller. This catches misuse
early given the function call depth involved.
Extend the availability of z_spin_is_locked() from CONFIG_SMP &&
CONFIG_TEST to also include CONFIG_ASSERT, so the check can be
used in __ASSERT() outside of test builds.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
When halt_thread() calls k_thread_perms_all_clear() under
_sched_spinlock, the permission cleanup can trigger k_free() on
dynamic objects. k_heap_free() then calls z_unpend_all() which
attempts to take _sched_spinlock again, causing a recursive lock.
Fix this by introducing k_heap_free_sched_locked() and
k_free_sched_locked() variants that use z_unpend_all_locked()
to operate on the wait queue without re-acquiring the scheduler
lock. The existing z_unpend_all() becomes a wrapper that takes
the lock and delegates to z_unpend_all_locked().
unref_check() gains a sched_locked parameter: the abort path
(clear_perms_cb) passes true to use the locked free variant,
while k_thread_perms_clear() passes false for the normal path.
Fixes#106659
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Move signal_pending_ipi() inside the K_SPINLOCK block in
z_get_next_switch_handle(). Calling it after the lock release creates a
window where a CPU can consume its own pending IPI bit via atomic_clear
in signal_pending_ipi(), then silently drop it in
arch_sched_directed_ipi() which skips the calling CPU (i == id).
In configurations where secondary CPUs have a single pinned thread and
take no timer or external interrupts, this can lead to a permanent hang:
the idle CPU can only be woken by IPIs, but no IPIs are pending and no
timeslicing IPIs will be generated since the idle thread is not sliceable.
This was reproduced when running under QEMU with the following sequence
of events observed:
CPU 0 CPU 1
───── ─────
Thread calls k_poll(K_MSEC(1))
z_pend_curr():
mark thread PENDING
z_add_timeout(1ms)
do_swap() to idle thread
WFI
Timer tick fires
sys_clock_announce():
slice_timeout(cpu1):
flag_ipi(BIT(1))
signal_pending_ipi():
MSIP[cpu1] = 1
CPU1 wakes from WFI
z_get_next_switch_handle():
acquire _sched_spinlock
next_up() → idle
(thread still PENDING,
timeout hasn't fired yet)
release _sched_spinlock
Timer tick fires
sys_clock_announce():
z_thread_timeout(thread):
z_unpend_thread(thread)
z_ready_thread(thread):
flag_ipi(BIT(1))
signal_pending_ipi():
atomic_clear(pending_ipi)
returns BIT(1)
arch_sched_directed_ipi(BIT(1))
skips self, IPI silently lost
return to idle thread
WFI
thread still on ready queue
Such an interleaving of events is, of course, likely only reproducible in
practice in virtualized environments where (v)CPUs can be descheduled.
With signal_pending_ipi() inside the lock, next_up() and the IPI
dispatch are atomic. Either the concurrent flag_ipi lands before the
lock is acquired (and next_up sees the thread), or it lands after the
lock is released (and the caller dispatches the IPI). There is no
window where a CPU can consume its own bit for a thread it hasn't seen.
Similar races exist in reschedule() and z_reschedule_irqlock() as well.
Although they won't cause the same permanent hang described above, it
can result in unnecessary rescheduling latency. Fix reschedule(), and
add a TODO to z_reschedule_irqlock(); it doesn't not currently take
the sched spinlock.
Signed-off-by: Andrew Bresticker <abrestic@meta.com>
Fix several incorrect uses of the Doxygen `@retval` and @return command in
kernel sources.
- Convert @return to structured @retval where functions return
discrete values.
- Replace incorrect @retval usage with @return for non-discrete
return types.
Signed-off-by: Tharaka Jayasena <9dmpires2k17.tuj@gmail.com>
When CONFIG_TIMEOUT_64BIT is not set, k_ticks_t is uint32_t. The previous
code cast left_ticks through int32_t but then stored the result back in
k_ticks_t (uint32_t), losing the sign. The subsequent ticks > 0 check was
therefore an unsigned comparison, causing a past-due wakeup (where the
subtraction wraps to a large uint32_t) to be misread as a large positive
remainder and propagated up through k_sleep() as INT_MAX ms.
Fix by retaining the signed intermediate and comparing it directly as
int32_t so negative remainders (past-due) correctly fall through to
return 0.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
This function was a little clumsy, taking the scheduler lock,
releasing it, and then calling z_reschedule_unlocked() instead of the
normal locked variant of reschedule. Don't take the lock twice.
Mostly this is a code size and hygiene win. Obviously the sched lock
is not normally a performance path, but I happened to have picked this
API for my own microbenchmark in tests/benchmarks/swap and so noticed
the double-lock while staring at disassembly.
Signed-off-by: Andy Ross <andyross@google.com>
z_reschedule() is the basic kernel entry point for context switch,
wrapping z_swap(), and thence arch_switch(). It's currently defined
as a first class function for entry from other files in the kernel and
elsewhere (e.g. IPC library code).
But in practice it's actually a very thin wrapper without a lot of
logic of its own, and the context switch layers of some of the more
obnoxiously clever architectures are designed to interoperate with the
compiler's own spill/fill logic to avoid double saving. And with a
small z_reschedule() there's not a lot to work with.
Make reschedule() an inlinable static, so the compiler has more
options.
Signed-off-by: Andy Ross <andyross@google.com>
Pick some low hanging fruit on non-SMP code paths:
+ The scheduler spinlock is always taken, but as we're already in an
irqlocked state that's a noop. But the optmizer can't tell, because
arch_irq_lock() involves an asm block it can't see inside. Elide
the call when possible.
+ The z_swap_next_thread() function evaluates to just a single load of
_kernel.ready_q.cache when !SMP, but wasn't being inlined because of
function location. Move that test up into do_swap() so it's always
done correctly.
Signed-off-by: Andy Ross <andyross@google.com>
When CONFIG_WAITQ_SCALABLE=y, wake up all threads from a post-waitq-walk
callback which is invoked while the scheduler spinlock is still held. This
solves the race condition that was worked around via `no_wake_in_timeout`
flag in k_thread and `is_timeout` parameter of z_sched_wake_thread_locked()
which can now both be dropped.
Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
Modify z_sched_waitq_walk() to accept an optional callback invoked after
the walk while still holding the scheduler spinlock. This can be used to
perform post-walk operations "atomically". Update all callers to work with
this new function signature.
While at it, create dedicated (private) typedefs for the callbacks and
clean up/improve the routine and callbacks' documentation.
Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
z_sched_waitq_walk() used _WAIT_Q_FOR_EACH, a wrapper around the
"unsafe" SYS_DLIST_FOR_EACH_CONTAINER which does not allow detaching
elements from the list during the walk. As a result, attempting to
detach threads from the wait queue as part of the callback provided
to z_sched_waitq_walk() would result in breakage.
Introduce new _WAIT_Q_FOR_EACH_SAFE macro as wrapper around the "safe"
SYS_DLIST_FOR_EACH_CONTAINER_SAFE which allows detaching nodes from
the list during the walk, and use it inside z_sched_waitq_walk().
While at it:
- add documentation on the _WAIT_Q_FOR_EACH macro, including a warning
about detaching elements as part of the loop not being allowed
- add note to documentation of z_sched_waitq_walk() indicating that
the callback can safely remove the thread from wait queue as this
will no longer break the FOR_EACH loop
- add _WAIT_Q_FOR_EACH_SAFE to the list of ForEachMacros in .clang-format
NOTE: this new "safe removal inside callback" behavior is only available
when CONFIG_WAITQ_SCALABLE=n. When the option is 'y', red-black trees are
used instead of doubly-linked lists which prevent mutation of the list
while it is being walked. This limitation is explicitly documented.
Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
Don't acquire the _sched_spinlock in z_sched_wake_thread(). This allows
calling the function from callbacks which already own the spinlock. The
function is renamed to z_sched_wake_thread_locked() to reflect this new
behavior, and all existing callers are updated to ensure they hold the
_sched_spinlock as is now required.
Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
`k_yield()` can't be called when interrupt is disabled, update
`k_can_yield()` to reflect that.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
As per Zephyr coding guideline #59, "operands shall not be of an
inappropriate essential type". This makes sure boolean variables are
initialized with true/false, not 1/0.
Signed-off-by: Benjamin Cabé <benjamin@zephyrproject.org>
Instead of performing a linear search to determine if a given
thread is running on another CPU, or if it is marked as being
preempted by a metaIRQ on any CPU do this in O(1) time.
On SMP systems, Zephyr already tracks the CPU on which a thread
executes (or lasted executed). This information is leveraged to
do the search in O(1) time.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
IAR compiler may emit Error[Go004]: Could not inline function
when handling functions marked as always_inline or inline=forced,
especially in complex kernel code
Signed-off-by: Thinh Le Cong <thinh.le.xr@bp.renesas.com>
Re-instate a z_is_thread_ready() check on the preempted metaIRQ
thread before selecting it as the preferred next thread to
schedule. This code exists because of a corner case where it is
possible for the thread that was recorded as being pre-empted
by a meta-IRQ thread can be marked as not 'ready to run' when
the meta-IRQ thread(s) complete.
Such a scenario may occur if an interrupt ...
1. suspends the interrupted thread, then
2. readies a meta-IRQ thread, then
3. exits
The resulting reschedule can result in the suspended interrupted
thread being recorded as being interrupted by a meta-IRQ thread.
There may be other scenarios too.
Fixes#101296
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
If the thread being aborted or suspended was preempted by a metaIRQ
thread then clear the metairq_preempted record. In the case of
aborting a thread, this prevents a re-used thread from being
mistaken for a preempted thread. Furthermore, it removes the need
to test the recorded thread for readiness in next_up().
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
When a cooperative thread (temporary or otherwise) is preempted by a
metaIRQ thread on SMP, it is no longer re-inserted into the readyQ.
This prevents it from being scheduled by another CPU while the
preempting metaIRQ thread runs.
Fixes#95081
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Adjust the bounds for tracking metairq preemption to include the
case where the number of metairq threads matches the number of
cooperative threads. This is needed as a thread that is schedule
locked through k_sched_lock() is documented to be treated as a
cooperative thread. This implies that if such a thread is preempted
by a metairq thread that execution control must return to that
thread after the metairq thread finishes its work.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
arch_mem_coherent() is cache related so it is better to move it
under cache subsys. It is renamed to sys_cache_is_mem_coherent()
to reflect this change.
The only user of arch_mem_coherent() is Xtensa. However, it is
not an architecture feature. That's why it is moved to the cache
subsys.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
It is now more obvious that the move_current_to_end_or_prio_q() logic
is supposed to match that of k_yield() (without the schedule point).
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
All instances of the internal routine move_thread_to_end_of_prio_q()
use the current thread. Renaming it to move_current_to_end_of_prio_q()
to reflect that.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
The routine z_move_thread_to_end_of_prio_q() has been renamed to
z_yield_testing_only() as it was only both only used for test code
and always operated on the current thread.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Commit d4d51dc062 ("kernel: Replace redundant switch_handle assignment
with assertion") introduced an assertion check that may be triggered
as follows by tests/kernel/smp_abort:
CPU0 CPU1 CPU2
---- ---- ____
* [thread A] * [thread B] * [thread C]
* irq_offload() * irq_offload() * irq_offload()
* k_thread_abort(thread B)
* k_thread_abort(thread C)
* k_thread_abort(thread A)
* thread_halt_spin()
* z_is_thread_halting(_current) is false
* while (z_is_thread_halting(thread B));
* thread_halt_spin()
* z_is_thread_halting(_current) is true
* halt_thread(_current...);
* z_dummy_thread_init()
- dummy_thread->switch_handle = NULL;
- _current = dummy_thread;
* while (z_is_thread_halting(thread C));
* z_get_next_switch_handle()
* z_arm64_context_switch()
* [thread A is dead]
* thread_halt_spin()
* z_is_thread_halting(_current) is true
* halt_thread(_current...);
* z_dummy_thread_init()
- dummy_thread->switch_handle = NULL;
- _current = dummy_thread;
* while(z_is_thread_halting(thread A));
* z_get_next_switch_handle()
- old_thread == dummy_thread
- __ASSERT(old_thread->switch_handle == NULL) OK
* z_arm64_context_switch()
- str x1, [x1, #___thread_t_switch_handle_OFFSET]
* [thread B is dead]
* %%% dummy_thread->switch_handle no longer NULL %%%
* z_get_next_switch_handle()
- old_thread == dummy_thread
- __ASSERT(old_thread->
switch_handle == NULL) FAIL
This needs at least 3 CPUs and the perfect timing for the race to work as
sometimes CPUs 1 and 2 may be close enough in their execution paths for
the assertion to pass. For example, QEMU is OK while FVP is not.
Also adding sufficient debug traces can make the issue go away.
This happens because the dummy thread is shared among concurrent CPUs.
It could be argued that a per-CPU dummy thread structure would be the
proper solution to this problem. However the purpose of a dummy thread
structure is to provide a dumping ground for the scheduler code to work
while the original thread structure might already be reused and
therefore can't be clobbered as demonstrated above. But the dummy
structure _can_ be clobbered to some extent and it is not worth the
additional memory footprint implied by per-CPU instances. We just have
to ignore some validity tests when the dummy thread is concerned.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
When a thread halts and dummifies, set its switch_handle to (void *)1
instead of the thread pointer itself. This maintains the non-NULL value
required to prevent deadlock in k_thread_join() while making it obvious
that this value is not meant to be dereferenced or used.
The switch_handle should be an opaque architecture-specific value and
not be assumed to be a thread pointer in generic code. Using 1 makes
the intent clearer.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
The switch_handle for the outgoing thread is expected to be NULL
at the start of a context switch.
The previous code performed a redundant assignment to NULL.
This change replaces the assignment with an __ASSERT(). This makes the
code more robust by explicitly enforcing this precondition, helping to
catch potential scheduler bugs earlier.
Also, the switch_handle pointer is used to check a thread's state during a
context switch. For dummy threads, this pointer was left uninitialized,
potentially holding a unexpected value.
Set the handle to NULL during initialization to ensure these threads are
handled safely and predictably.
Signed-off-by: TaiJu Wu <tjwu1217@gmail.com>
1. There are debug info within k_sched_unlock so we shoulld add
same debug info to k_sched_lock.
2. The thread in run queue should be normal or metairq thread, we should
check it is not dummy thread.
Signed-off-by: TaiJu Wu <tjwu1217@gmail.com>
Replace all in-function instances of MIN/MAX/CLAMP with the single
evaluation version min/max/clamp.
There's probably no race conditions in these files, but the single
evaluation ones save a couple of instructions each so they should save
few code bytes and potentially perform better, so they should be
preferred in general.
Signed-off-by: Fabio Baltieri <fabiobaltieri@google.com>
The commit replaces negative thread state checks with a new,
more descriptivepositive check.
The expression `!z_is_thread_prevented_from_running()`
is updated to `z_is_thread_ready()` where appropriate, making
the code's intent clearer.
Removes a redundant `IS_ENABLED(CONFIG_SMP)`, they are included #ifdef.
Finally, this patch add the missing `#endif` directive.
Signed-off-by: TaiJu Wu <tjwu1217@gmail.com>
This patch moves `is_aborting()` and `is_halting()`
from `kernel/sched.c` to `kernel/include/kthread.h`
and renames them to `z_is_thread_aborting()` and `z_is_thread_halting()`,
for consistency with other internal kernel APIs.
It replaces the previous inline function definitions in `sched.c`
with calls to the new header functions. Additionally, direct bitwise
checks like `(thread->base.thread_state & _THREAD_DEAD) != 0U`
are updated to use the new `z_is_thread_dead()` helper function.
This enhances code readability and maintainability.
Signed-off-by: TaiJu Wu <tjwu1217@gmail.com>
k_thread_absolute_deadline_set is simiar to existing
k_thread_deadline_set. Diffrence is that k_thread_deadline_set
takes a deadline as a time delta from the current time,
k_thread_absolute_deadline_set is expecting a timestamp
in the same units used by k_cycle_get_32().
This allows to calculate deadlines for several thread and
set them in deterministic way, using a common timestamp as
a "now" time base.
Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>