Commit graph

3,398 commits

Author SHA1 Message Date
Lingutla Chandrasekhar
a5fbcbb12c arch: riscv: skip ATOMIC_OPERATIONS_C if arch-specific atomics are enabled
Previously, ATOMIC_OPERATIONS_C was selected for RISC-V whenever the
'A' (atomic) ISA extension (RISCV_ISA_EXT_A) was absent. This caused
a conflict on platforms that lack the 'A' extension but still provide
their own arch-level atomic implementation via ATOMIC_OPERATIONS_ARCH
(e.g. future RISC-V SoCs with custom atomic support).

Add !ATOMIC_OPERATIONS_ARCH to the select condition so that the
generic C fallback (interrupt-locking) is only chosen when neither
the ISA extension nor an arch-specific implementation is available.

This condition creates a Kconfig dependency cycle:

  RISCV selects ATOMIC_OPERATIONS_C if !ATOMIC_OPERATIONS_ARCH
  => ATOMIC_OPERATIONS_C depends on !ATOMIC_OPERATIONS_ARCH
  => ATOMIC_OPERATIONS_ARCH depends on SMP (fvp_base_revc_2xaem board)
  => SMP depends on !ATOMIC_OPERATIONS_C

Break the cycle by removing 'depends on !ATOMIC_OPERATIONS_C' from
SMP in kernel/smp/Kconfig. This is safe because ATOMIC_OPERATIONS_C
is now only selected when ATOMIC_OPERATIONS_ARCH is absent, so the
two symbols are mutually exclusive by construction. The existing
BUILD_ASSERT(!IS_ENABLED(CONFIG_SMP)) in lib/os/atomic_c.c provides
a compile-time backstop against any misconfiguration.

Suggested-by: Nicolas Pitre <npitre@baylibre.com>
Signed-off-by: Lingutla Chandrasekhar <lingutla@qti.qualcomm.com>
2026-05-22 10:44:26 +02:00
Holt Sun
15af50d202 pm: keep irq restore ownership in idle
Keep the original architecture IRQ key owned by idle across a

successful system PM transition.

Add architecture hooks and the PM_STATE_SET_IRQ_LOCKED migration

contract for SoCs that keep PM hooks from unmasking interrupts.

Signed-off-by: Holt Sun <holt.sun@nxp.com>
2026-05-21 17:02:03 -04:00
Srikanth Patchava
5c6c6837cc kernel: fix k_condvar_wait mutex re-acquisition on timeout
Always re-acquire the mutex before returning from k_condvar_wait(),
even when z_pend_curr() returns a non-zero status (timeout or error).
The previous code only re-locked the mutex on success (ret == 0),
violating POSIX semantics which require the mutex to always be held
by the calling thread when the function returns.

The K_NO_WAIT early-return path now also keeps the mutex locked
(returning -EAGAIN immediately without touching the mutex), preserving
the calling-thread-holds-mutex invariant on every exit path.

Signed-off-by: Srikanth Patchava <srikanth.patchava@outlook.com>
2026-05-21 17:00:29 -04:00
Peter Mitsis
8dc7a37bc7 kernel: poll: z_vrfy_k_poll() to free memory
Reworks the K_OOPS(K_SYSCALL_OBJ(...)) logic in z_vrfy_k_poll()
to ensure that the allocated 'events_copy' is freed before the
K_OOPS() is performed.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2026-05-20 20:08:10 -04:00
Daniel Leung
04c58a2ddb kernel: userspace: fix validate_kernel_object type/init
When z_object_validate() is renamed to k_object_validate(),
the call to it inside validate_kernel_object() was incorrectly
modified, and reverted back using arguments on old version.
So fix that.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2026-05-20 20:07:59 -04:00
Alberto Escolar Piedras
0b0b41ce9d kernel: Initialize timeout.dticks on k_timer_init()
Unlike Z_TIMER_INITIALIZER, k_timer_init() does not fully initialize the
provided timer.
This results on valgrind warning about a Conditional jump on uninitialised
value when calling k_timer_start() on that object later when dticks is
checked.

Let's initialize it to avoid this warning.

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
2026-05-20 14:12:52 +02:00
Nicolas Pitre
c3f2a6a07a kernel: timeslicing: rearm with slice_size-1 when slicer just fired
z_time_slice_reset() armed the slicer with K_TICKS(slice_size), expecting
z_add_timeout()'s "+1" round-up to be skipped on tick-edge entry by the
this_cpu_announcing() check introduced in commit 2e2202af61 ("kernel:
timeout: make z_add_timeout round-up conditional on announce"). In
practice that check is false at every reachable z_time_slice_reset()
call site:

  - z_time_slice() runs from sys_clock_announce_locked()'s post-loop
    epilogue, after announce_remaining has been zeroed and the timeout
    lock has been released. The locking discipline doesn't permit
    rearming earlier (i.e. from inside the firing loop), so the
    timeout-edge property is lost by the time the slicer is rearmed.

  - On SMP, the slicer can fire on another CPU and IPI ours to do the
    actual scheduler work; the receiving CPU is even further from the
    announce window when it rearms.

  - update_cache() (non-SMP) and z_get_next_switch_handle() (SMP) both
    invoke z_time_slice_reset() from the dispatch path, also outside
    any announce.

The result is that every slicer fire produces a slice that's one tick
longer than configured -- k_sched_time_slice_set(N ms) ends up firing at
roughly N+tick ms.

Compensate explicitly: when slice_expired[cpu] is set we know the slicer
just fired and the new arm is conceptually tick-aligned, so pass
slice_size-1. The +1 round-up in z_add_timeout() then lands the next
fire at exactly slice_size ticks. Other reset paths (voluntary yield,
higher-prio preempt, thread creation) leave slice_expired clear and keep
the full "at least N ticks" rounding.

Also drop the redundant rearm in z_time_slice() when curr is being
swapped out: the dispatch path's z_time_slice_reset(new_thread) will
arm correctly with slice_expired still set, so the in-z_time_slice()
arm just installed a timeout that the dispatch path immediately
aborted and replaced. Keep it only when curr stays at the front of the
queue (single runnable thread at this priority), where the dispatch
path won't run.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-20 10:55:56 +02:00
Nicolas Pitre
2ec65238d6 kernel: timeout: make in-announce check CPU-aware
Several pieces of state in kernel/timeout.c are read by code paths on
both the announcing CPU and other CPUs: announce_remaining, the
"in-announce" lock that prevents parallel firing on SMP, and
elapsed()/the +1 round-up in z_add_timeout(). Originally all of these
were keyed off announce_remaining != 0, which conflates two distinct
roles -- a remaining-ticks counter and a we-are-announcing flag --
and breaks down on SMP and across same-tick groups.

Symptom 1 (issue #106317): on SMP, while CPU A is announcing, threads
running on CPU B see announce_remaining != 0 and incorrectly take the
in-announce branch -- elapsed() returns 0 so curr_tick + elapsed()
goes backwards relative to a reading taken before the announce
started, and a slice rearm or k_timer_start() coming from CPU B fires
one tick early because the +1 round-up is suppressed even though CPU B
is partway through a tick. The same condition was the root cause of
the test_slice_reset failure on fvp_base_revc_2xaem/v8a/smp/ns and
/a320 once the timer accounting cleanup in #107452 stopped masking it.

Fix: distinguish "this CPU is at a tick edge (announcing)" from "some
CPU is in the firing loop". Add announcing_cpu, set to the CPU id at
the top of sys_clock_announce_locked() and cleared at the bottom;
expose it via this_cpu_announcing() and any_cpu_announcing(). The
former drives elapsed()'s return-0 short-circuit and z_add_timeout()'s
+1 round-up suppression -- both correct only on the announcing CPU.
The latter drives the SMP early-return and z_add_timeout()'s
deferred sys_clock_set_timeout call -- both correct for any CPU
observing that an announce is in progress, and both robust to
announce_remaining hitting 0 mid-loop (which now becomes possible --
see symptom 2 below).

elapsed() on non-announcing CPUs now returns
sys_clock_elapsed() + announce_remaining. The driver bumped its
announced-cycles baseline to (curr_tick_initial + N) * CYC_PER_TICK
at ISR entry while the kernel has only advanced curr_tick by the
ticks already committed in the loop, so announce_remaining is the
residual that must be added to sys_clock_elapsed() to recover
T_real - curr_tick across the announce window. The new invariant is

    curr_tick + announce_remaining + sys_clock_elapsed() == T_real

on every CPU at every point inside or outside the announce window.

Symptom 2: that invariant relies on curr_tick and announce_remaining
moving in lock-step. The original loop advanced curr_tick at the top
of each outer iteration but only decremented announce_remaining after
the handler returned, with the lock dropped around the handler. A
non-announcing CPU calling sys_clock_tick_get() during a handler
therefore observed curr_tick + announce_remaining one batch (dt) into
the future, breaking the invariant for the duration of the handler.

Fix: pair the updates so they happen under the same lock acquisition
before the handler runs:

    curr_tick += t->dticks;
    announce_remaining -= t->dticks;

Symptom 3: commit d157b3da19 ("kernel: timeout: keep announce_remaining
stable across same-tick group") kept announce_remaining > 0 across the
inner drain of a same-tick group so that (a) the SMP early-return
would still detect "another CPU is announcing" and (b) elapsed() in
same-tick handlers would still anchor to the firing tick. With the
lock-state role moved to announcing_cpu, neither property depends on
announce_remaining anymore: any_cpu_announcing() carries (a) and
this_cpu_announcing() carries (b), both throughout the loop on the
announcing CPU regardless of the budget. The inner drain do-while
loop is therefore no longer needed -- the outer loop's
t->dticks <= announce_remaining naturally accepts dticks == 0
follow-on timeouts even when announce_remaining has reached 0.

Fixes #106317

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-20 10:55:56 +02:00
Nicolas Pitre
90d1963745 sys: util: move lowercase min/max/clamp to a new minmax.h
Since commit 37717b229f ("sys: util: rename Z_MIN Z_MAX Z_CLAMP to min
max and clamp"), <zephyr/sys/util.h> unconditionally defines function-
like macros named `min`, `max`, and `clamp` in the global namespace (in
C mode). util.h gets pulled in transitively by very broad headers,
including the POSIX layer's <pthread.h>, so any third-party C code that
uses these names as ordinary identifiers (e.g. XNNPACK's static `clamp`
helper and its public `clamp` struct field) fails to build as soon as
<pthread.h> is included.

Following the approach used by Linux, move the lowercase `min`, `max`,
`min3`, `max3`, and `clamp` macros (and their helpers) into a new
<zephyr/sys/minmax.h> header that has to be included explicitly by
source files that want them. util.h keeps the uppercase MIN/MAX/CLAMP,
so most code is unaffected; only the (much smaller) set of files that
actually use the lowercase variants needs to pick up the new include.

Fixes #107853.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-19 17:49:24 -04:00
Peter Mitsis
4424aa681e kernel: pipe: user threads may not re-init pipe
Updates z_vrfy_k_pipe_init() to use K_SYSCALL_OBJ_NEVER_INIT()
instead of K_SYSCALL_OBJ() to prevent a user thread from
re-initializing a pipe. This aligns the pipe initialization
behavior to that of other kernel objects such as message queues.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2026-05-15 23:28:30 +02:00
Li Jie
b0ed3dde1d spinlock: Validate support for up to 8 CPUs on 64-bit systems
Extends SPIN_VALIDATE feature to support up to 8 CPUs on
64-bit systems while maintaining backward compatibility with 32-bit
systems (which still support up to 4 CPUs).

Many modern SoCs have more than 4 CPU cores, yet the SPIN_VALIDATE
feature was only available for systems with MP_MAX_NUM_CPUS <= 4.
This limitation becomes increasingly relevant as multi-core designs
with 6-8 cores become common in both embedded and server applications.

The implementation leverages pointer alignment guarantees:
- On 32-bit systems: pointers are 4-byte aligned → 2 free bits →
  up to 4 CPUs
- On 64-bit systems: pointers are 8-byte aligned → 3 free bits →
  up to 8 CPUs

Signed-off-by: Li Jie <lijie.1996@picoheart.com>
2026-05-15 23:27:26 +02:00
Daniel Leung
4915839510 kernel: add kobj NULL check in k_thread_name_copy()
Inside k_thread_name_copy(), we call k_object_find() to find
the associated thread object of the incoming thread. However,
the finder can return NULL if incoming pointer address has
no kobj associated. So we need to check for NULL before
dereferencing k_object to look inside. Since k_object_find()
returns NULL if input object is NULL, there is no need to
specifically test thread pointer for NULL, and only need to
check for the return of k_object_find().

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2026-05-15 12:38:52 -05:00
Mayur Salve
f94efe2607 arch: riscv: address review nits on TLS canary patches
Add copyright year, reflow comments, remote duplicate __tls_end,
and remove commented-out code.

Signed-off-by: Mayur Salve <msalve@qti.qualcomm.com>
2026-05-14 21:52:56 +02:00
Mayur Salve
2d6bdab5dc arch/riscv: fix early TLS setup and save callee-saved registers
Fix the early TLS initialization sequence on RISC-V to correctly set up
the thread pointer (tp) before any C code runs, and save/restore
callee-saved registers as required by the ABI.

Signed-off-by: Mayur Salve <msalve@qti.qualcomm.com>
2026-05-14 21:52:56 +02:00
Mayur Salve
729110c12f arch: riscv: use TLS-based stack canary guard
This change enables per thread stack canary for RISC-V.

RISC-V GCC accesses the stack canary via a fixed offset from the
thread pointer (tp) when -mstack-protector-guard=tls is used. The
compiler emits code equivalent to:

  lw t0, 0(tp)   # load canary from tp+0

Additionally, tp is zeroed in arch_kernel_init() when TLS is enabled,
which means any C function called before thread setup completes (such
as z_early_rand_get or data_copy_xip_relocation) would fault trying
to access the canary.

Introduce STACK_CANARIES_TLS_PREPEND, which places the
.stack_chk.guard section at offset 0 of the TLS block, before .tdata
and .tbss. The compiler flags -mstack-protector-guard-reg=tp and
-mstack-protector-guard-offset=0 are passed so GCC generates the
correct canary access.

With STACK_CANARIES_TLS_PREPEND the per-thread TLS block layout is:

  tp --> +------------------+  offset 0
         | .stack_chk.guard |  (__stack_chk_guard)
         +------------------+
         | .tdata           |  (initialized TLS data)
         +------------------+
         | .tbss            |  (zero-initialized TLS data)
         +------------------+

The RISC-V reset path is extended to initialize tp before any C code
runs by allocating a TLS area on the boot stack and calling
arch_riscv_early_tls_stack_update(). Early boot functions that run
before tp is set up (z_early_rand_get, data_copy_xip_relocation) are
marked FUNC_NO_STACK_PROTECTOR to avoid canary access before tp is
valid.

Signed-off-by: Mayur Salve <msalve@qti.qualcomm.com>
2026-05-14 21:52:56 +02:00
Flavio Ceolin
fdc42fa256 kernel: userspace: fix SMP use-after-free
obj_list traversal held lists_lock, but removals held objfree_lock
(k_object_free) or obj_lock (unref_check). On SMP a concurrent
thread could free the node an iterator had saved as next.

Drop objfree_lock and require lists_lock for every obj_list
modification. k_object_free() now holds it across find+remove;
k_thread_perms_clear() takes it around unref_check().

Signed-off-by: Flavio Ceolin <flavio@hubblenetwork.com>
2026-05-13 09:14:14 +02:00
Jamie McCrae
1d935da700 kernel: Add support for dts RAM configuration
Allows using the chosen SRAM node for RAM configuration without
using Kconfig values

Signed-off-by: Jamie McCrae <jamie.mccrae@nordicsemi.no>
2026-05-11 08:45:38 +02:00
Peter Mitsis
8509270a0e kernel: poll timeout: Fix race condition
This fixes a subtle race condition in the poll timeout expiration
handler triggered_work_expiration_handler(). There was a small
window of opportunity between when sys_clock_announce() unlocks
interrupts and that the handler re-locked them that one or more
higher priority interrupts (or threads running on another CPU)
could abort the poll's timeout.

As each place where the timeout could be added or aborted already
locked the poll.c::lock spinlock, this commit updates the handler
to lock that spinlock upon entry and bail early if the timeout
has been detected to have been canceled.

It also changes the handler's call to k_work_submit_to_queue() to
z_work_submit_to_queue() since it is known that it will not be
rescheduling at that point within the timer ISR.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2026-05-11 04:04:14 +02:00
Peter Mitsis
1b8c7a3038 kernel: thread timeout: Fix race condition
This fixes a subtle race condition in the thread timeout expiration
handler z_thread_timeout(). There was a small window of opportunity
between when sys_clock_announce() unlocked interrupts and that the
handler re-locked them that one or more higher priority interrupts
(or threads running on another CPU if in an SMP environment) could
abort the thread's timeout.

The fix has two parts. Part one ensures that _sched_spinlock is held
in every location before a thread's time can be canceled. Of the
various locations, only z_unpend_thread() was found to need updating.
Part two updates the timeout handler z_thread_timeout() to bail early
if the thread's timeout has been found to be canceled (or re-used)
during that aforementioned window.

Fixes #106653

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2026-05-11 04:04:02 +02:00
Christoph Busold
dad5096f85 drivers: entropy: Add support for architectural entropy drivers
Add new inline function entropy_get_default_device which returns
the "zephyr,entropy" device or the architectural entropy device,
if the former is not set, and use that in all places to query the
entropy device.

This allows using architectural drivers which do not have a DT
node.

Signed-off-by: Christoph Busold <cbusold@qti.qualcomm.com>
2026-05-06 07:05:12 +02:00
Nicolas Pitre
1296dc85e3 kernel: timer: make k_timer_start duration match documented semantics
The documentation for timer objects in
doc/kernel/services/timing/timers.rst has always stated:

    The timer's duration is a **minimum** delay relative to the time
    the timer was started.

However the implementation did not actually honour that contract. When
starting a relative duration, k_timer_start() was subtracting one from
duration.ticks before passing it to z_add_timeout(), cancelling out
z_add_timeout()'s conservative round-up. The net result was that
k_timer_start(K_TICKS(N), ...) could fire anywhere from just after 0
up to N ticks later -- strictly less than the documented minimum.

The in-tree comment acknowledged the mismatch ("i.e. k_timer_start()
doesn't treat its initial sleep argument the same way k_sleep() does,
but historical") and kept the subtraction for backwards compatibility.

This has lasted long enough. Drop the subtraction (and the companion
max(1, ...) whose only purpose was to keep the subtraction from
underflowing). k_timer_start() now honours its documented "minimum
delay" contract, matching the behaviour of k_sleep() for the same
tick count.

Callers that relied on the old "approximately N ticks" timing will
see up to one extra tick of delay on the initial fire, when the
call happens partway through a tick. Subsequent periodic fires are
unaffected: they are rescheduled from the timer ISR at an exact
tick boundary and continue to honour the period as before.

Note that a timer manually re-armed from within its own expiry
callback (rather than via the periodic 'period' argument) does not
suffer from the extra tick either: the callback runs inside
sys_clock_announce_locked(), so z_add_timeout()'s round-up is skipped
and the new fire lands at an exact tick stride. This preserves the
behaviour that the original -1 on the duration was presumably trying
to achieve in the first place, now obtained via the proper mechanism.

A few in-tree tests were tuned too tightly against the old
"approximately N" timing. Widen their tolerances to match the new
"at least N" contract:

  - tests/kernel/timer/timer_api: add one tick of slack in
    interval_check() to absorb the round-up.
  - tests/kernel/context: widen idle-timer slop by one tick.
  - tests/kernel/workq/work: express the busy-wait margin in ticks
    in the "running cancel" tests.
  - tests/kernel/threads/no-multithreading: on tickful kernels the
    pending IRQ delivered after irq_unlock()/k_cpu_idle() only
    announces one tick; wait one extra tick or loop idling until
    the timer callback runs.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-04 21:55:33 +02:00
Nicolas Pitre
2e2202af61 kernel: timeout: make z_add_timeout round-up conditional on announce
z_add_timeout() has always added one tick to the incoming tick count,
as a conservative round-up so that a request issued partway through a
tick still waits for "at least N full ticks" before the fire. This
round-up is correct in the general case, but *wrong* when the call
happens from within sys_clock_announce_locked() -- i.e. from a timer
expiration callback running at the tick-processing boundary where
elapsed() already returns 0. In that context there is no fractional
tick to compensate for, and the round-up simply makes every scheduled
timeout one tick late.

Two in-tree callers were already compensating for this caller-side:

* The k_timer periodic reschedule path in z_timer_expiration_handler
  always runs from inside sys_clock_announce_locked() and subtracted
  1 from the period before calling z_add_timeout(). Under
  CONFIG_TIMEOUT_64BIT, the same path additionally added +1 inside
  K_TIMEOUT_ABS_TICKS() to undo a related round-down.

* z_time_slice_reset() armed the slice timer with K_TICKS(slice_size
  - 1) so the resulting fire would land at exactly slice_size ticks.
  This one is reachable from both thread context (the +1 cancels the
  -1) and from announce context via update_cache() during a
  ready-thread wakeup (the +1 isn't applied, leaving the slice short
  by one tick). The latter is what actually trips
  tests/kernel/tickless/tickless_concept on every platform once the
  conditional below lands without dropping these workarounds.

All three are symptoms of the same root cause.

Handle it at the source: make the +1 conditional on announce_remaining
== 0. When scheduling from the timer ISR, we are already at a tick
boundary by construction, so no round-up is needed and periodic timers
now reschedule at exact period intervals without any caller-side
compensation. Drop the -1 in z_timer_expiration_handler's period
path, the +1 in its 64-bit absolute-reschedule companion, and the -1
in z_time_slice_reset(), since all three existed solely to paper over
this mismatch.

This change only affects timeouts scheduled from announce context
(periodic k_timers rescheduling themselves, callbacks starting new
timers, slice-timer rearm during ready-thread wakeup). All other
callers -- k_sleep(), z_abort_timeout(), initial k_timer_start() from
a thread, k_sched_time_slice_set() -- continue to use the +1 round-up.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-04 21:55:33 +02:00
Nicolas Pitre
d157b3da19 kernel: timeout: keep announce_remaining stable across same-tick group
When sys_clock_announce_locked() processes a tick that has multiple
timeouts queued for it, the timeout queue stores the second and
subsequent ones with dticks == 0 (relative to the first). The original
loop fired each in turn and decremented announce_remaining at the
bottom of every iteration:

    announce_remaining -= dt;

For the first timeout in a same-tick group, dt is the cumulative tick
delta that brought us to that tick. After that subtraction
announce_remaining can drop to zero, even though there are still
same-tick callbacks queued for the loop to fire. Each subsequent
same-tick callback then runs while announce_remaining == 0, which
breaks two invariants the rest of the kernel relies on:

* The SMP early-return at the top of sys_clock_announce_locked():

      if (IS_ENABLED(CONFIG_SMP) && (announce_remaining != 0)) {
          announce_remaining += ticks;
          k_spin_unlock(&timeout_lock, key);
          return;
      }

  is meant to detect that another CPU is already inside the loop and
  fold the new ticks into the ongoing announce. The lock is released
  around each callback, so during a same-tick callback another CPU
  can grab the lock, see announce_remaining == 0, miss the early
  return, set announce_remaining = ticks of its own, and start
  walking the queue in parallel with the original announcer -- the
  exact race the early return is supposed to prevent.

* The elapsed() helper:

      return announce_remaining == 0 ? sys_clock_elapsed() : 0U;

  is meant to return 0 for any z_add_timeout() / z_abort_timeout()
  call that happens from inside a tick-processing callback, so that
  timeouts scheduled from such a callback are anchored to the
  currently-firing tick rather than to a fresh sys_clock_elapsed()
  reading. With announce_remaining == 0 mid-loop, two callbacks on
  the same tick observe inconsistent semantics: the first one (that
  saw announce_remaining > 0) gets dticks anchored to the firing
  tick, while subsequent ones (seeing 0) get dticks computed against
  a fresh elapsed() reading and end up off by one tick. Two periodic
  timers that happen to fire on the same tick will therefore
  permanently drift apart by one tick going forward.

Restructure the loop so the announce_remaining decrement happens once
per tick rather than once per timeout: the outer while drives forward
across distinct ticks, and an inner do-while drains all timeouts
queued on the current tick before announce_remaining is updated.
announce_remaining now stays at its pre-tick value for the entire
same-tick group, which both the SMP early return and elapsed()
correctly observe as non-zero.

remove_timeout()'s dticks propagation is also unnecessary in this
loop because curr_tick is advanced by t->dticks before the timeout
is unlinked, which keeps the next item's stored dticks valid relative
to the new curr_tick. sys_dlist_remove() on its own is sufficient.

Loop structure suggested by Peter Mitsis.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-04 21:55:33 +02:00
Nicolas Pitre
603fa4818e drivers: timer: assert sys_clock lock held where required
Add sys_clock_is_locked(), the analog of z_spin_is_locked() for the
timer lock exposed via sys_clock_lock(). Use it to assert lock
ownership in sys_clock_set_timeout() and sys_clock_elapsed() of the
six timer drivers that were migrated to sys_clock_lock() and
consequently no longer acquire anything internally in those callbacks
(arm_arch_timer, riscv_machine_timer, xtensa_sys_timer, hpet,
apic_tsc, intel_adsp_timer).

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-01 11:18:04 -05:00
Nicolas Pitre
1dc47344d0 kernel: use arch_cpu_irqs_are_enabled() for IRQ-state probes
Replace the lock/test/restore dance used to probe the current IRQ state
with a direct non-modifying read:

  - z_spin_is_locked() (UP path) simply negates
    arch_cpu_irqs_are_enabled().

  - k_can_yield() and z_smp_cpu_mobile() likewise drop their lock/unlock
    pair.

  - arch_spin_relax() asserts IRQs are disabled without the sneaky
    unpaired arch_irq_lock() it used to rely on.

  - tests/arch/arm/arm_no_multithreading: same simplification on a
    probe-only assertion.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-01 11:18:04 -05:00
Nicolas Pitre
01b3821fd9 kernel: spinlock: provide z_spin_is_locked() for UP builds
Extend z_spin_is_locked() to non-SMP configurations so assertions like
the one in z_unpend_all_locked() can validate lock ownership in UP
builds too. In UP a spinlock reduces to an IRQ lock, so the check
samples the current IRQ state via arch_irq_lock() / arch_irq_unlock().

Drop the now-unnecessary CONFIG_SMP guard around the sched spinlock
assertion in z_unpend_all_locked().

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-01 11:18:04 -05:00
Daniel Leung
c91e5e1390 kernel: move atomic_c.c to lib/os
This moves the atomic_c.c from kernel to lib/os as atomic
functions are not exactly kernel features.

This also moves all the atomic kconfigs from kernel to lib/os
as the atomic headers are already under include/zephyr/sys/.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2026-05-01 11:17:27 -05:00
Daniel Leung
cc61366283 kernel: move errno from kernel to lib/libc/common
errno is not exactly a kernel functionality but more of C
library feature. So move errno from kernel into lib/libc/common.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2026-05-01 11:16:31 -05:00
Nicolas Pitre
81ccaca788 kernel: ensure kheap.c is linked for static heap initialization
k_free() bypasses k_heap_free() to avoid scheduler lock involvement,
going directly to sys_heap_free() instead. This means nothing in
kheap.c may have any callers, and since it is in a library linked
without --whole-archive, the linker may discard it entirely.

However, kheap.c contains a SYS_INIT handler that initializes all
statically defined k_heap objects (those created with K_HEAP_DEFINE).
Without it, heaps such as the system heap or those used as thread
resource pools via k_thread_heap_assign() are never initialized:
their internal sys_heap pointer remains NULL, causing a crash on
the first allocation.

This can be reproduced without this commit with e.g.:

  west build -b qemu_cortex_a53 tests/kernel/poll

Force kheap.o into the link by adding a __used reference to
k_heap_init in mempool.c. This is enough to pull in kheap.o and
its SYS_INIT registration. Unused functions from kheap.o are
still garbage collected.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-01 09:44:00 +02:00
Nicolas Pitre
39b23fa582 kernel: bypass k_heap_free() in k_free() to avoid scheduler locking
k_free() now goes directly to sys_heap_free() under heap->lock,
bypassing k_heap_free() and its z_unpend_all() call. This is
symmetric with z_alloc_helper() which already bypasses k_heap_alloc()
to go directly to sys_heap_*().

This avoids any scheduler lock involvement in the k_free() path,
eliminating the recursive _sched_spinlock issue when k_free() is
called from halt_thread() during thread abort with CONFIG_USERSPACE
and CONFIG_DYNAMIC_OBJECTS.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-01 09:44:00 +02:00
Nicolas Pitre
3ce52ac95d Revert "kernel: avoid recursive scheduler lock in k_heap_free path"
This partially reverts commit 9cef0da05c ("kernel: avoid recursive
scheduler lock in k_heap_free path"), keeping only the sched.c changes
(z_unpend_all_locked / z_unpend_all refactoring).

The _sched_locked variants of k_free, k_heap_free, k_msgq_cleanup,
k_stack_cleanup and the sched_locked parameter plumbing through
unref_check were an overcomplicated approach. A simpler fix follows.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2026-05-01 09:44:00 +02:00
Daniel Leung
db7a5e80a4 kernel: move bootargs out of kernel into lib/os.
This moves boot arguments from kernel into the lib/os.
This is not strictly a kernel function so this change provides
a separation between core kernel functionalities and others.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2026-04-29 06:23:14 -05:00
Daniel Leung
9fe4cc20a2 kernel: move boot banner into lib/os
Boot banner is not exactly a kernel feature. It is more like
an OS feature so moving it into lib/os.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2026-04-29 06:23:14 -05:00
Anas Nashif
26e88cee76 toolchain: iar: suppress Go004 via ALWAYS_INLINE override
IAR emits diagnostic Go004 ("function cannot be inlined") for
every ALWAYS_INLINE function when optimisation is disabled, e.g.
in debug builds.  The previous workaround wrapped each affected
function in per-function preprocessor guard pairs:

  #ifdef IAR_SUPPRESS_ALWAYS_INLINE_WARNING_FLAG
  TOOLCHAIN_DISABLE_WARNING(TOOLCHAIN_WARNING_ALWAYS_INLINE)
  #endif
  static ALWAYS_INLINE void foo(...) { ... }
  #ifdef IAR_SUPPRESS_ALWAYS_INLINE_WARNING_FLAG
  TOOLCHAIN_ENABLE_WARNING(TOOLCHAIN_WARNING_ALWAYS_INLINE)
  #endif

This pattern is highly intrusive, scatters toolchain-specific
knowledge across generic source files, and requires a guard pair
every time a new ALWAYS_INLINE function is added for IAR.

Replace it with a single override of ALWAYS_INLINE inside
iccarm.h, using the C99 _Pragma operator to embed the diagnostic
suppression in the macro itself.

Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-29 10:00:10 +02:00
Jason Yu
ad827a78ca lib: libc: iar: Fix build error when IAR_LIBC enabled.
Export the resolved K_HEAP_MEM_POOL_SIZE value to the
linker generator and evaluate it while generating the
IAR command file.

Fixes: #107234

Signed-off-by: Jason Yu <zejiang.yu@nxp.com>
2026-04-25 08:21:27 +02:00
Anas Nashif
7edd8834f6 kernel: sched.c: remove useless return on void function
Remove useless return on void function.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
ffea3d0062 kernel: sched: extract thread CPU-usage tracking to usage.h
Move thread CPU usage measurement helpers from ksched.h into a new
kernel/include/usage.h header.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
d94fbad890 kernel: sched: rename z_reset_time_slice() to z_time_slice_reset()
Align the function name with the z_<subsystem>_<verb> convention
used elsewhere in the kernel.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
9b1c89a601 kernel: move gen_offset.h to arch
gen_offset.h is an architecture-specific header, not a kernel one.
Move it under the arch tree where it belongs.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
d24cfd96f5 kernel: sched: move public scheduler API to scheduler.c/scheduler.h
Migrate scheduler API implementations (k_sched_lock/unlock,
z_reschedule, z_yield_current, etc.) and their private declarations
from ksched.h/sched.c into scheduler.c and scheduler.h.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
05cbc7c98a kernel: sched: extract timeslice declarations to timeslicing.h
Move time-slice related declarations from ksched.h into the
dedicated kernel/include/timeslicing.h header.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
28f157beae kernel: sched: move core schedule/deschedule functions to scheduler.c
Migrate z_add_thread_to_ready_q(), z_remove_thread_from_ready_q(),
and related helpers from sched.c to scheduler.c.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
bde2ba8901 kernel: sched: simplify z_sched_init using run_q.h helpers
Move z_sched_init to scheduler.c and somplify implementation getting rid
of single use init_ready_q.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
78993d7020 kernel: sched: reorder z_unready_thread before its callers in sched.c
Move z_unready_thread next to related functions.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
cf3f8331e5 kernel: sched: group z_ready_thread and z_unready_thread in sched.c
Reorder so that z_ready_thread and z_unready_thread are adjacent,
improving code locality for related queue operations.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
8e920c310f kernel: sched: remove z_requeue_current() indirection
Inline z_requeue_current() into its only call site in kswap.h and
remove the wrapper function.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
a564d82a04 kernel: sched: extract run-queue helpers to run_q.h
Move run-queue management functions (add/remove/peek thread,
choose_next_thread) from sched.c into the new
kernel/include/run_q.h header.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
4b525273bf kernel: sched: simplify thread_runq()
Simplify code and make it more readable.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
fc11a9166e kernel: sched: extract meta-IRQ handling to metairq.h
Move meta-IRQ (highest-priority cooperative queue) scheduling
functions from sched.c into a new kernel/include/metairq.h header
to reduce sched.c size and group related logic.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00
Anas Nashif
2ea5924943 kernel: k_yield: move code to thread.c
Move k_yield() from sched.c to thread.c alongside other thread
lifecycle calls.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2026-04-24 15:39:20 -04:00