Commit graph

3,340 commits

Author SHA1 Message Date
Anas Nashif
243012c33c kernel: move thread_entry from lib/os to kernel
Not really library code, this a core component that is part of the core
os/kernel.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-14 22:31:16 -04:00
Anas Nashif
c60e0e9436 kernel: move userspace sem into kernel/sys
This is a kernel permitive for use with userspace, so move it under
kernel.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-14 22:31:16 -04:00
Anas Nashif
b572cb23fc kernel: userspace: move mutex/user_work to userspace
Move userspace code out of lib/os into userspace folder under kernel.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-14 22:31:16 -04:00
Anas Nashif
85ca9bb992 kernel: move smp code into smp/
Isolate SMP code into own folder.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-14 22:31:16 -04:00
Anas Nashif
974dbbf2c0 kernel: move userspace kconfigs into own file
Move userspace Kconfig under userspace/.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-14 22:31:16 -04:00
Anas Nashif
d8a1960c8b kernel: reorg mem domain kconfig
Reorganize memory domain Kconfig and move it under userspace/.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-14 22:31:16 -04:00
Anas Nashif
eb294b7a1e kernel: move userspace code to own folder
Isolate userspace code into userspace/.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-14 22:31:16 -04:00
Anas Nashif
07fa9eabfe kernel: fix name of scheduler/wait queue: Dumb -> Simple
Rename leftover in kernel headers: Dumb -> Simple.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-13 11:09:25 -05:00
Peter Mitsis
083629e520 kernel: timer: Fix k_timer re-use in its handler
This fixes a subtle race-condition in the k_timer expiration
handler z_timer_expiration_handler(). There was a small window
of opportunity between when sys_clock_announce() unlocked
interrupts and that handler re-locked them that one or more
higher priority interrupts (or threads running on another CPU
if in an SMP environment) could not only abort the ktimer's
timeout, but restart it as well. Both of these situations are
now detectable in the handler (resulting in an immediate return
from the handler).

To make this work, every case where the ktimer internals either
adds or aborts its timeout is now encapsulated by the ktimer lock.
Thus, when the handler tests if the timeout handler has been
canceled with only the ktimer lock being held, we know that no
other thread or ISR can be modifying the ktimer's timeout.

Fixes #106654

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2026-04-11 10:17:20 -04:00
Nicolas Pitre
32b1399669 kernel/timeout: introduce sys_clock_lock() and sys_clock_announce_locked()
On SMP systems with tickless kernels, a race condition exists between
timer driver ISRs and the kernel's tick accounting. The driver updates
its hardware cycle baseline under a private lock, then calls
sys_clock_announce() which updates curr_tick under the separate
timeout_lock. In the gap between these two lock releases, any kernel
code calling sys_clock_elapsed() sees the new driver baseline but the
old curr_tick, producing inconsistent time values that can go backwards.

This affects every code path using the internal elapsed() helper:
uptime queries, timeout scheduling, timeout cancellation, remaining
time queries, and next-expiry calculations.

The root cause is two separate locks protecting state that must be
mutually consistent. Fix this by exposing the kernel's timeout_lock
to timer drivers via sys_clock_lock()/sys_clock_unlock(), and
providing sys_clock_announce_locked() which assumes the lock is
already held.

Timer drivers can now acquire the single lock, update their hardware
state, and announce ticks all under the same lock — eliminating the
race window entirely. The key is passed to sys_clock_announce_locked()
which consumes it (releasing the lock when it returns).

The existing sys_clock_announce() becomes a backward-compatible wrapper,
allowing incremental driver migration with no flag day.

Document that sys_clock_set_timeout(), sys_clock_elapsed(), and
sys_clock_idle_exit() are called by the kernel with the timer lock
held. Update the timer driver guide in clocks.rst accordingly.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-04-07 11:40:49 -05:00
Nicolas Pitre
184b5a3804 kernel: assert scheduler lock is held in z_unpend_all_locked()
Add a runtime assertion in z_unpend_all_locked() to verify that
_sched_spinlock is actually held by the caller. This catches misuse
early given the function call depth involved.

Extend the availability of z_spin_is_locked() from CONFIG_SMP &&
CONFIG_TEST to also include CONFIG_ASSERT, so the check can be
used in __ASSERT() outside of test builds.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-04-07 08:40:28 -05:00
Nicolas Pitre
9cef0da05c kernel: avoid recursive scheduler lock in k_heap_free path
When halt_thread() calls k_thread_perms_all_clear() under
_sched_spinlock, the permission cleanup can trigger k_free() on
dynamic objects. k_heap_free() then calls z_unpend_all() which
attempts to take _sched_spinlock again, causing a recursive lock.

Fix this by introducing k_heap_free_sched_locked() and
k_free_sched_locked() variants that use z_unpend_all_locked()
to operate on the wait queue without re-acquiring the scheduler
lock. The existing z_unpend_all() becomes a wrapper that takes
the lock and delegates to z_unpend_all_locked().

unref_check() gains a sched_locked parameter: the abort path
(clear_perms_cb) passes true to use the locked free variant,
while k_thread_perms_clear() passes false for the normal path.

Fixes #106659

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-04-07 08:40:28 -05:00
Andrew Bresticker
1666066082 kernel/sched: fix race in consuming self-directed IPIs
Move signal_pending_ipi() inside the K_SPINLOCK block in
z_get_next_switch_handle(). Calling it after the lock release creates a
window where a CPU can consume its own pending IPI bit via atomic_clear
in signal_pending_ipi(), then silently drop it in
arch_sched_directed_ipi() which skips the calling CPU (i == id).

In configurations where secondary CPUs have a single pinned thread and
take no timer or external interrupts, this can lead to a permanent hang:
the idle CPU can only be woken by IPIs, but no IPIs are pending and no
timeslicing IPIs will be generated since the idle thread is not sliceable.
This was reproduced when running under QEMU with the following sequence
of events observed:

  CPU 0                                  CPU 1
  ─────                                  ─────

                                         Thread calls k_poll(K_MSEC(1))
                                           z_pend_curr():
                                             mark thread PENDING
                                             z_add_timeout(1ms)
                                             do_swap() to idle thread
                                         WFI

  Timer tick fires
  sys_clock_announce():
    slice_timeout(cpu1):
      flag_ipi(BIT(1))
    signal_pending_ipi():
      MSIP[cpu1] = 1

                                         CPU1 wakes from WFI
                                         z_get_next_switch_handle():
                                           acquire _sched_spinlock
                                           next_up() → idle
                                             (thread still PENDING,
                                              timeout hasn't fired yet)
                                           release _sched_spinlock

  Timer tick fires
  sys_clock_announce():
    z_thread_timeout(thread):
      z_unpend_thread(thread)
      z_ready_thread(thread):
        flag_ipi(BIT(1))

                                         signal_pending_ipi():
                                           atomic_clear(pending_ipi)
                                             returns BIT(1)
                                           arch_sched_directed_ipi(BIT(1))
                                             skips self, IPI silently lost
                                         return to idle thread
                                           WFI
                                             thread still on ready queue

Such an interleaving of events is, of course, likely only reproducible in
practice in virtualized environments where (v)CPUs can be descheduled.

With signal_pending_ipi() inside the lock, next_up() and the IPI
dispatch are atomic. Either the concurrent flag_ipi lands before the
lock is acquired (and next_up sees the thread), or it lands after the
lock is released (and the caller dispatches the IPI). There is no
window where a CPU can consume its own bit for a thread it hasn't seen.

Similar races exist in reschedule() and z_reschedule_irqlock() as well.
Although they won't cause the same permanent hang described above, it
can result in unnecessary rescheduling latency. Fix reschedule(), and
add a TODO to z_reschedule_irqlock(); it doesn't not currently take
the sched spinlock.

Signed-off-by: Andrew Bresticker <abrestic@meta.com>
2026-04-04 10:57:11 -05:00
Fengming Ye
f75db68d03 kernel: workq: not yield when current workq is empty
Workq optionally yield after every work handler to avoid starving
other threads.
When current workq is empty after this work handler, current thread
will go to sleep in next loop. So no need to yield, bringing one more
schedule cost.

Signed-off-by: Fengming Ye <frank.ye@nxp.com>
2026-04-03 23:15:04 +09:00
Peter Mitsis
df630e09ae kernel: Fix timeout handler for delayable work
Between the points in time when sys_clock_announce() calls the
timeout handler for delayable work and when that handler wins
the work queue spinlock another thread or ISR could have called
k_work_reschedule_for_queue(). Should this occur, the timeout
that the handler is trying to process becomes stale and the
handler should not proceed any further with it.

As the workqueue spinlock is the controlling lock (it is always
held before either aborting or adding a timeout), it is safe
for the handler to call z_is_timeout_handler_canceled() once
it holds the workqueue spinlock.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2026-04-03 23:13:23 +09:00
Peter Mitsis
f9376ddde5 kernel: Fix gap in workqueue work timeout
The workqueue work timeout feature is supposed to abort the work
queue thread if the time to execute a work item exceeds the work
queue's configured threshold. The work thread may race against the
timeout handler responsible for aborting the thread when the two
are running on separate CPUs--particularly since the timeout handler
only locks the workqueue spinlock for part of its duration.

To get around this, two separate flags must be checked a 'finished'
flag to indicate that the thread has finished processing the work
item and the timeout's flag indicating if it has been removed while
processing its timeout handler. Should either be found to be true
within in the timeout handler, the thread is deemed to have completed
in time and the timeout handler proceeds no further.

Otherwise the timeout handler is deemed to have won the race and the
workqueue thread is aborted. Should the workqueue thread detect this,
it goes to sleep until it can be aborted to prevent it from handling
any more work items.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2026-04-03 23:13:23 +09:00
Nicolas Pitre
5a3c601e71 kernel: track announcing state in timeout dticks field
The routine sys_clock_announce() removes the timeout from the timeout
list and unlocks the timeout spinlock before invoking the timeout's
handler. This creates a window where another ISR (or a thread running
on another CPU) can abort or reuse the timeout before the handler
executes. When this happens, the timeout handler should bail early.

Use the dticks field to carry this state: set it to
TIMEOUT_DTICKS_ANNOUNCING after remove_timeout() (which needs
dticks = 0 to propagate remaining ticks) and before calling the
handler. In z_abort_timeout(), set TIMEOUT_DTICKS_ABORTED when the
timeout is either linked (existing behavior) or in the announcing
state (new). The z_add_timeout() path naturally overwrites dticks
with a real tick value, so re-use is also detected.

Provide z_is_timeout_handler_canceled() for handlers to check if
they should bail. This avoids adding a flags field to struct _timeout,
keeping the struct size unchanged.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-04-03 23:13:23 +09:00
Daniel Leung
23054a97f4 kernel: dynamic stack to cached area if coherence
With kernel coherence enabled, it is possible that the stack has
been allocated on uncached area. This has implications on
performance as memory access is not cached.

This adds a kconfig to force the indicated stack pointer of
the allocated thread stack object to be in cached area.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2026-03-31 11:45:30 -04:00
Cheng-Yang Chou
10c974c9d5 kernel: futex: fix TOCTOU race in k_futex_wait
Move the futex value validation inside the spinlock critical section
in z_impl_k_futex_wait().

Previously, a time-of-check to time-of-use (TOCTOU) race condition
existed because the futex value was evaluated before acquiring
futex_data->lock. This created a vulnerability window:

    Thread A (waiter)                 Thread B (waker)
    ─────────────────────────         ────────────────────────
    atomic_get() == expected
                                      atomic_set(new_val)
                                      k_futex_wake() -> no waiters yet
    k_spin_lock()
    z_pend_curr()
    [waits forever, wake lost]

If the waker updates the futex value and signals between the waiter's
value check and lock acquisition, the wake signal is lost. This causes
the waiting thread to block indefinitely.

Holding the lock during the evaluation ensures the value check and the
subsequent wait-queue operations are atomic relative to concurrent
wakeups. A concurrent wake must now either complete before the waiter
acquires the lock (waiter sees the updated value and returns -EAGAIN)
or arrive after (waiter is safely in the wait queue and gets woken).

Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
2026-03-23 12:34:58 -05:00
Joel Holdsworth
76def70bed arch: Added initial OpenRISC architecture port
This patch adds support for the OpenRISC 1000 (or1k) architecture: a
MIPS-like open hardware ISA which was first introduced in 2000.

The thread switching implementation uses the modern Zephyr thread "switch"
architecture.

Signed-off-by: Joel Holdsworth <jholdsworth@nvidia.com>
2026-03-21 07:50:57 -05:00
Christoph Busold
11f89f73eb kernel: userspace: Add k_object_access_check syscall
This allows user threads to test if they have permission to access
an object before attempting to perform an operation on it and fail
gracefully if not.

Signed-off-by: Christoph Busold <cbusold@qti.qualcomm.com>
2026-03-19 14:49:23 -05:00
Tharaka Jayasena
f6141e5ccf doc: kernel: fix incorrect Doxygen @retval/@return usage
Fix several incorrect uses of the Doxygen `@retval` and @return command in
kernel sources.

- Convert @return to structured @retval where functions return
  discrete values.
- Replace incorrect @retval usage with @return for non-discrete
  return types.

Signed-off-by: Tharaka Jayasena <9dmpires2k17.tuj@gmail.com>
2026-03-17 18:24:33 -04:00
Cheng-Yang Chou
d67038c7e7 kernel: fix z_tick_sleep unsigned comparison when !CONFIG_TIMEOUT_64BIT
When CONFIG_TIMEOUT_64BIT is not set, k_ticks_t is uint32_t. The previous
code cast left_ticks through int32_t but then stored the result back in
k_ticks_t (uint32_t), losing the sign. The subsequent ticks > 0 check was
therefore an unsigned comparison, causing a past-due wakeup (where the
subtraction wraps to a large uint32_t) to be misread as a large positive
remainder and propagated up through k_sleep() as INT_MAX ms.

Fix by retaining the signed intermediate and comparing it directly as
int32_t so negative remainders (past-due) correctly fall through to
return 0.

Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
2026-03-17 18:18:28 -04:00
Andy Ross
a04a30c6d1 kernel/sched: Move reschedule under lock in k_sched_unlock()
This function was a little clumsy, taking the scheduler lock,
releasing it, and then calling z_reschedule_unlocked() instead of the
normal locked variant of reschedule.  Don't take the lock twice.

Mostly this is a code size and hygiene win.  Obviously the sched lock
is not normally a performance path, but I happened to have picked this
API for my own microbenchmark in tests/benchmarks/swap and so noticed
the double-lock while staring at disassembly.

Signed-off-by: Andy Ross <andyross@google.com>
2026-03-10 17:24:10 +01:00
Andy Ross
2bbcece6ee kernel/sched: Refactor reschedule to permit better code generation
z_reschedule() is the basic kernel entry point for context switch,
wrapping z_swap(), and thence arch_switch().  It's currently defined
as a first class function for entry from other files in the kernel and
elsewhere (e.g. IPC library code).

But in practice it's actually a very thin wrapper without a lot of
logic of its own, and the context switch layers of some of the more
obnoxiously clever architectures are designed to interoperate with the
compiler's own spill/fill logic to avoid double saving.  And with a
small z_reschedule() there's not a lot to work with.

Make reschedule() an inlinable static, so the compiler has more
options.

Signed-off-by: Andy Ross <andyross@google.com>
2026-03-10 17:24:10 +01:00
Andy Ross
8638ed12f5 kernel/sched: Add optimized next switch handle wrapper
z_get_next_switch_handle() is a clean API, but implementing it as a
(comparatively large) callable function requires significant
entry/exit boilerplate and hides the very common "no switch needed"
early exit condition from the enclosing C code that calls it.  (Most
architectures call this from assembly though and don't notice).

Provide an unwrapped version for the specific needs non-SMP builds.
It's compatible in all other ways.

Slightly ugly, but the gains are significant (like a dozen cycles or
so).

Signed-off-by: Andy Ross <andyross@google.com>
2026-03-10 17:24:10 +01:00
Andy Ross
d535d17cbc kernel: Minor optimizations to z_swap()
Pick some low hanging fruit on non-SMP code paths:

+ The scheduler spinlock is always taken, but as we're already in an
  irqlocked state that's a noop.  But the optmizer can't tell, because
  arch_irq_lock() involves an asm block it can't see inside.  Elide
  the call when possible.

+ The z_swap_next_thread() function evaluates to just a single load of
  _kernel.ready_q.cache when !SMP, but wasn't being inlined because of
  function location.  Move that test up into do_swap() so it's always
  done correctly.

Signed-off-by: Andy Ross <andyross@google.com>
2026-03-10 17:24:10 +01:00
Andy Ross
e2e5542d14 arch/arm: Platform integration for new Cortex M arch_switch()
Integrate the new context layer, allowing it to be selected via the
pre-existing CONFIG_USE_SWITCH.  Not a lot of changes, but notable
ones:

+ There was code in the MPU layer to adjust PSP on exception exit at a
  stack overflow so that it remained inside the defined stack bounds.
  With the new context layer though, exception exit will rewrite the
  stack frame in a larger format, and needs PSP to be adjusted to make
  room.

+ There was no such treatment in the PSPLIM case (the hardware prents
  the SP from going that low), so I had to add similar code to
  validate PSP at exit from fault handling.

+ The various return paths for fault/svc assembly handlers need to
  call out to the switch code to do the needed scheduler work.  Really
  almost all of these can be replaced with C now, only userspace
  syscall entry (which has to "return" into the privileged stack)
  needs special treatment.

+ There is a gcc bug that prevents the arch_switch() inline assembly
  from building when frame pointers are enabled (which they almost
  never are on ARM): it disallows you from touching r7 (the thumb
  frame pointer) entirely.  But it's a context switch, we need to!
  Worked around by enforcing -fomit-frame-pointer even in the two
  scheduler files that can swap when NO_OPTIMIZATIONS=y.

Signed-off-by: Andy Ross <andyross@google.com>
Signed-off-by: Sudan Landge <sudan.landge@arm.com>
2026-03-10 17:24:10 +01:00
Anas Nashif
b84319c460 kernel: drop deprecated options SCHED_DUMB and WAITQ_DUMB
Those kconfig options were deprecated in 4.2. Now they are removed.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-03-09 15:09:04 -05:00
Henrik Brix Andersen
fb350217cc kernel: Kconfig.device: fix CONFIG_DEVICE_DEINIT_SUPPORT help text
Drivers supporting device deinitialization should not select
CONFIG_DEVICE_DEINIT_SUPPORT. Enabling deinit should be left up to the
application configuration.

Signed-off-by: Henrik Brix Andersen <henrik@brixandersen.dk>
2026-03-08 16:36:39 +01:00
Peter Mitsis
fa558229af kernel: mem_slab: Change loop variable type
Changes the loop variable type from 'int' to 'uint32_t' in the
create_free_list() routine to match the type of the 'num_blocks'
field. Otherwise, if a very large number of blocks is specified,
the conversion from 'uint32_t' to 'int' could have resulted in
a negative number. The result of this improper conversion would
be an empty free list.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2026-03-05 04:41:02 +01:00
Flavio Ceolin
2517803d98 kernel: msgq: Check possible overflow in put/get
Not only checks if writer_ptr is smaller than buffer_end but also
checks that write_ptr + msg_size is smaller than buffer_end to
avoid overflow when copying data.

Signed-off-by: Flavio Ceolin <flavio@hubblenetwork.com>
2026-02-27 07:59:02 +01:00
Flavio Ceolin
41fbeea3a8 kernel: msgq: Check overflow when initing queue
Check for possible overflow in k_msgq_init.

Signed-off-by: Flavio Ceolin <flavio@hubblenetwork.com>
2026-02-27 07:59:02 +01:00
Flavio Ceolin
26bd97edbc kernel: mem_slab: Check block size equals 0
Borrowing Peter Mitsis rationale in #104283

If someone passes a 0 block_size, then the buffer size must also be 0.
However, we iterate through the loop below num_blocks times, writing a
pointer to the buffer address. If the buffer is truly zero-sized, then
we are overwriting something else. If it is not truly zero-sized, then
we are creating a corrupted linked list as the pointer never actually
changes. This can cause problems later on when attempting to allocate
a slab because k_mem_slab_alloc() will only ever "allocate" the first
zero-sized block and act as though it was never truly consumed because
of the corrupted linked list.

Signed-off-by: Flavio Ceolin <flavio@hubblenetwork.com>
2026-02-27 07:59:02 +01:00
Flavio Ceolin
5688bcc10a kernel: mem_slab: Check possible overflow on init
Check possible overflow when initializing a memory block.

Signed-off-by: Flavio Ceolin <flavio@hubblenetwork.com>
2026-02-27 07:59:02 +01:00
Peter Mitsis
143076008b kernel: msgq: Fix __ASSERT_NO_MSG() checks
The message queue 'buffer_end' field points to the next address AFTER
the end of the buffer. When the buffer goes to the last addressable
byte, the next byte is 0x0. To ensure proper evaluation of the bounds
the __ASSERT_NO_MSG() checks must not use "< buffer_end", but
"<= buffer_end - 1".

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2026-02-24 15:36:04 +01:00
Daniel Leung
c1ff64599b kernel: mem_domain: support memory domain de-initialization
This adds the ability to de-initialization a memory domain.
This requires support in the architecture layer. One usage of
this is to release the resources associated with the domain.
For example, we can release allocated page tables so they can
go back to the pool of page tables to be allocated later.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2026-02-24 10:39:59 +01:00
Zhaoxiang Jin
cddedf9e3d kernel: busy_wait: handle runtime system timer frequency updates
When CONFIG_SYSTEM_CLOCK_HW_CYCLES_PER_SEC_RUNTIME_UPDATE is enabled,
the system timer frequency can change at runtime. Some timer drivers
(e.g. Cortex-M SysTick) rescale the cycle counter when the frequency
changes, which can break k_busy_wait() if the frequency changes during
the wait period.

Update k_busy_wait() to handle runtime frequency changes:
- Add busy_wait_us_to_cyc_ceil32() helper to convert microseconds to
  cycles with a given frequency (rounds up to avoid returning early)
- Implement a frequency-aware busy wait loop that:
  - Samples the frequency before and after reading the cycle counter to
    detect concurrent frequency changes
  - Rescales the start_cycles reference point when the frequency changes
    to keep it in the same scale as the cycle counter
  - Recomputes cycles_to_wait with the new frequency to preserve the
    requested duration
  - Retries sampling if a frequency change is detected mid-read

The original implementation is preserved when
CONFIG_SYSTEM_CLOCK_HW_CYCLES_PER_SEC_RUNTIME_UPDATE is not enabled.

Signed-off-by: Zhaoxiang Jin <Zhaoxiang.Jin_1@nxp.com>
2026-02-20 13:31:07 +01:00
Zhaoxiang Jin
60197bf514 kernel: refactor sys_clock_hw_cycles_per_sec runtime support
- Move the variable declaration and related code from kernel/timeout.c
  to a new kernel/sys_clock_hw_cycles.c file. The motivation is that
  both functions are part of the system clock frequency plumbing
  (runtime query / update) and don’t naturally fit the responsibilities
  of timeout.c, which is otherwise focused on timeout queue management
  and tick announcement logic.

- Make sys_clock_hw_cycles_per_sec_runtime_get() (and its
  z_impl_sys_clock_hw_cycles_per_sec_runtime_get() implementation)
  visible under CONFIG_SYSTEM_CLOCK_HW_CYCLES_PER_SEC_RUNTIME_UPDATE
  as well, not only under CONFIG_TIMER_READS_ITS_FREQUENCY_AT_RUNTIME.
  This allows callers and time unit conversion helpers to retrieve the
  current system timer frequency after runtime clock changes even when
  the timer driver does not discover the rate by querying hardware.

Signed-off-by: Zhaoxiang Jin <Zhaoxiang.Jin_1@nxp.com>
2026-02-20 13:31:07 +01:00
Mathieu Choplain
b669a39738 kernel: events: use abort thread timeout using dedicated function
Use z_abort_thread_timeout() instead of the lower-level z_abort_timeout().
The thread-flavoured version also has a stub fallback when
CONFIG_SYS_CLOCK_EXISTS=n removing the need for preprocessor checks.

Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
2026-02-18 14:43:10 +00:00
Mathieu Choplain
5a0f73f045 kernel: events: wake threads atomically using waitq post-walk callback
When CONFIG_WAITQ_SCALABLE=y, wake up all threads from a post-waitq-walk
callback which is invoked while the scheduler spinlock is still held. This
solves the race condition that was worked around via `no_wake_in_timeout`
flag in k_thread and `is_timeout` parameter of z_sched_wake_thread_locked()
which can now both be dropped.

Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
2026-02-18 14:43:10 +00:00
Mathieu Choplain
73feef8b20 kernel: sched: add post-walk callback argument to z_sched_waitq_walk()
Modify z_sched_waitq_walk() to accept an optional callback invoked after
the walk while still holding the scheduler spinlock. This can be used to
perform post-walk operations "atomically". Update all callers to work with
this new function signature.

While at it, create dedicated (private) typedefs for the callbacks and
clean up/improve the routine and callbacks' documentation.

Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
2026-02-18 14:43:10 +00:00
Mathieu Choplain
3ba7dcfe2c kernel: events: wake threads during wait queue walk if possible
When CONFIG_WAITQ_SCALABLE=n, the callback invoked by z_sched_waitq_walk()
is allowed to remove the thread provided as argument from the wait queue
(an operation implicitly performed when waking up a thread).

Use this to our advantage when waking threads pending on a k_event by
waking threads as part of the waitq walk callback instead of building
a list of threads to wake and performing the wake outside the callback.
When CONFIG_WAITQ_SCALABLE=n, this allows removing a pointer-sized field
from the thread structure which reduces the overhead of CONFIG_EVENTS=y.

The old implementation (build list in callback and wake outside callback)
is retained and used when CONFIG_WAITQ_SCALABLE=y since we can't modify
the wait queue as part of the walk callback in this situation. This is now
documented above the corresponding field in k_thread structure.

Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
2026-02-18 14:43:10 +00:00
Mathieu Choplain
e725225489 kernel: sched: perform safe waitq walk inside z_sched_waitq_walk()
z_sched_waitq_walk() used _WAIT_Q_FOR_EACH, a wrapper around the
"unsafe" SYS_DLIST_FOR_EACH_CONTAINER which does not allow detaching
elements from the list during the walk. As a result, attempting to
detach threads from the wait queue as part of the callback provided
to z_sched_waitq_walk() would result in breakage.

Introduce new _WAIT_Q_FOR_EACH_SAFE macro as wrapper around the "safe"
SYS_DLIST_FOR_EACH_CONTAINER_SAFE which allows detaching nodes from
the list during the walk, and use it inside z_sched_waitq_walk().
While at it:
- add documentation on the _WAIT_Q_FOR_EACH macro, including a warning
  about detaching elements as part of the loop not being allowed
- add note to documentation of z_sched_waitq_walk() indicating that
  the callback can safely remove the thread from wait queue as this
  will no longer break the FOR_EACH loop
- add _WAIT_Q_FOR_EACH_SAFE to the list of ForEachMacros in .clang-format

NOTE: this new "safe removal inside callback" behavior is only available
when CONFIG_WAITQ_SCALABLE=n. When the option is 'y', red-black trees are
used instead of doubly-linked lists which prevent mutation of the list
while it is being walked. This limitation is explicitly documented.

Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
2026-02-18 14:43:10 +00:00
Mathieu Choplain
eac6c7cb24 kernel: sched: don't acquire scheduler spinlock in z_sched_wake_thread()
Don't acquire the _sched_spinlock in z_sched_wake_thread(). This allows
calling the function from callbacks which already own the spinlock. The
function is renamed to z_sched_wake_thread_locked() to reflect this new
behavior, and all existing callers are updated to ensure they hold the
_sched_spinlock as is now required.

Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
2026-02-18 14:43:10 +00:00
Yong Cong Sin
e843931194 kernel: check if interrupt is disabled in k_can_yield
`k_yield()` can't be called when interrupt is disabled, update
`k_can_yield()` to reflect that.

Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
2026-02-16 00:13:38 +00:00
Peter Mitsis
ec2e20530c kernel: Disable k_mutex priority inheritance
There is no need to include the k_mutex priority inheritance code
when CONFIG_PRIORITY_CEILING is set to a priority level that is at
or below that of the idle thread.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2026-02-16 00:13:23 +00:00
Jordan Yates
cf866c1063 kernel: mempool: use Z_HEAP_MIN_SIZE_FOR for system heap
Use `Z_HEAP_MIN_SIZE_FOR` on the system heap. This fixes allocations
failing when there is only a single small user of the heap defining
a symbol like the following, even when only allocating 16 bytes.
```
config HEAP_MEM_POOL_ADD_SIZE_{X}
	int
	default 64
```

Signed-off-by: Jordan Yates <jordan@embeint.com>
2026-02-11 10:39:13 +01:00
Peter Mitsis
73cf293de6 kernel: Add NONZERO_SPINLOCK_SIZE Kconfig option
Embeds both an anonymous union and an anonymous structure within the
k_spinlock structure to ensure that the structure can easily have a
non-zero size.

This new option provides a cleaner way to specify that the
spinlock structure must have a non-zero size. A non-zero size
is necessary when C++ support is enabled, or when a library
or application wants to create an array of spinlocks.

Fixes #59922

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2026-02-09 11:16:03 +01:00
Benjamin Cabé
a87520bd04 kernel: use proper essential type to initialize boolean variables
As per Zephyr coding guideline #59, "operands shall not be of an
inappropriate essential type". This makes sure boolean variables are
initialized with true/false, not 1/0.

Signed-off-by: Benjamin Cabé <benjamin@zephyrproject.org>
2026-02-04 13:52:38 +01:00