This function was a little clumsy, taking the scheduler lock,
releasing it, and then calling z_reschedule_unlocked() instead of the
normal locked variant of reschedule. Don't take the lock twice.
Mostly this is a code size and hygiene win. Obviously the sched lock
is not normally a performance path, but I happened to have picked this
API for my own microbenchmark in tests/benchmarks/swap and so noticed
the double-lock while staring at disassembly.
Signed-off-by: Andy Ross <andyross@google.com>
z_reschedule() is the basic kernel entry point for context switch,
wrapping z_swap(), and thence arch_switch(). It's currently defined
as a first class function for entry from other files in the kernel and
elsewhere (e.g. IPC library code).
But in practice it's actually a very thin wrapper without a lot of
logic of its own, and the context switch layers of some of the more
obnoxiously clever architectures are designed to interoperate with the
compiler's own spill/fill logic to avoid double saving. And with a
small z_reschedule() there's not a lot to work with.
Make reschedule() an inlinable static, so the compiler has more
options.
Signed-off-by: Andy Ross <andyross@google.com>
z_get_next_switch_handle() is a clean API, but implementing it as a
(comparatively large) callable function requires significant
entry/exit boilerplate and hides the very common "no switch needed"
early exit condition from the enclosing C code that calls it. (Most
architectures call this from assembly though and don't notice).
Provide an unwrapped version for the specific needs non-SMP builds.
It's compatible in all other ways.
Slightly ugly, but the gains are significant (like a dozen cycles or
so).
Signed-off-by: Andy Ross <andyross@google.com>
Pick some low hanging fruit on non-SMP code paths:
+ The scheduler spinlock is always taken, but as we're already in an
irqlocked state that's a noop. But the optmizer can't tell, because
arch_irq_lock() involves an asm block it can't see inside. Elide
the call when possible.
+ The z_swap_next_thread() function evaluates to just a single load of
_kernel.ready_q.cache when !SMP, but wasn't being inlined because of
function location. Move that test up into do_swap() so it's always
done correctly.
Signed-off-by: Andy Ross <andyross@google.com>
Integrate the new context layer, allowing it to be selected via the
pre-existing CONFIG_USE_SWITCH. Not a lot of changes, but notable
ones:
+ There was code in the MPU layer to adjust PSP on exception exit at a
stack overflow so that it remained inside the defined stack bounds.
With the new context layer though, exception exit will rewrite the
stack frame in a larger format, and needs PSP to be adjusted to make
room.
+ There was no such treatment in the PSPLIM case (the hardware prents
the SP from going that low), so I had to add similar code to
validate PSP at exit from fault handling.
+ The various return paths for fault/svc assembly handlers need to
call out to the switch code to do the needed scheduler work. Really
almost all of these can be replaced with C now, only userspace
syscall entry (which has to "return" into the privileged stack)
needs special treatment.
+ There is a gcc bug that prevents the arch_switch() inline assembly
from building when frame pointers are enabled (which they almost
never are on ARM): it disallows you from touching r7 (the thumb
frame pointer) entirely. But it's a context switch, we need to!
Worked around by enforcing -fomit-frame-pointer even in the two
scheduler files that can swap when NO_OPTIMIZATIONS=y.
Signed-off-by: Andy Ross <andyross@google.com>
Signed-off-by: Sudan Landge <sudan.landge@arm.com>
Drivers supporting device deinitialization should not select
CONFIG_DEVICE_DEINIT_SUPPORT. Enabling deinit should be left up to the
application configuration.
Signed-off-by: Henrik Brix Andersen <henrik@brixandersen.dk>
Changes the loop variable type from 'int' to 'uint32_t' in the
create_free_list() routine to match the type of the 'num_blocks'
field. Otherwise, if a very large number of blocks is specified,
the conversion from 'uint32_t' to 'int' could have resulted in
a negative number. The result of this improper conversion would
be an empty free list.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Not only checks if writer_ptr is smaller than buffer_end but also
checks that write_ptr + msg_size is smaller than buffer_end to
avoid overflow when copying data.
Signed-off-by: Flavio Ceolin <flavio@hubblenetwork.com>
Borrowing Peter Mitsis rationale in #104283
If someone passes a 0 block_size, then the buffer size must also be 0.
However, we iterate through the loop below num_blocks times, writing a
pointer to the buffer address. If the buffer is truly zero-sized, then
we are overwriting something else. If it is not truly zero-sized, then
we are creating a corrupted linked list as the pointer never actually
changes. This can cause problems later on when attempting to allocate
a slab because k_mem_slab_alloc() will only ever "allocate" the first
zero-sized block and act as though it was never truly consumed because
of the corrupted linked list.
Signed-off-by: Flavio Ceolin <flavio@hubblenetwork.com>
The message queue 'buffer_end' field points to the next address AFTER
the end of the buffer. When the buffer goes to the last addressable
byte, the next byte is 0x0. To ensure proper evaluation of the bounds
the __ASSERT_NO_MSG() checks must not use "< buffer_end", but
"<= buffer_end - 1".
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
This adds the ability to de-initialization a memory domain.
This requires support in the architecture layer. One usage of
this is to release the resources associated with the domain.
For example, we can release allocated page tables so they can
go back to the pool of page tables to be allocated later.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
When CONFIG_SYSTEM_CLOCK_HW_CYCLES_PER_SEC_RUNTIME_UPDATE is enabled,
the system timer frequency can change at runtime. Some timer drivers
(e.g. Cortex-M SysTick) rescale the cycle counter when the frequency
changes, which can break k_busy_wait() if the frequency changes during
the wait period.
Update k_busy_wait() to handle runtime frequency changes:
- Add busy_wait_us_to_cyc_ceil32() helper to convert microseconds to
cycles with a given frequency (rounds up to avoid returning early)
- Implement a frequency-aware busy wait loop that:
- Samples the frequency before and after reading the cycle counter to
detect concurrent frequency changes
- Rescales the start_cycles reference point when the frequency changes
to keep it in the same scale as the cycle counter
- Recomputes cycles_to_wait with the new frequency to preserve the
requested duration
- Retries sampling if a frequency change is detected mid-read
The original implementation is preserved when
CONFIG_SYSTEM_CLOCK_HW_CYCLES_PER_SEC_RUNTIME_UPDATE is not enabled.
Signed-off-by: Zhaoxiang Jin <Zhaoxiang.Jin_1@nxp.com>
- Move the variable declaration and related code from kernel/timeout.c
to a new kernel/sys_clock_hw_cycles.c file. The motivation is that
both functions are part of the system clock frequency plumbing
(runtime query / update) and don’t naturally fit the responsibilities
of timeout.c, which is otherwise focused on timeout queue management
and tick announcement logic.
- Make sys_clock_hw_cycles_per_sec_runtime_get() (and its
z_impl_sys_clock_hw_cycles_per_sec_runtime_get() implementation)
visible under CONFIG_SYSTEM_CLOCK_HW_CYCLES_PER_SEC_RUNTIME_UPDATE
as well, not only under CONFIG_TIMER_READS_ITS_FREQUENCY_AT_RUNTIME.
This allows callers and time unit conversion helpers to retrieve the
current system timer frequency after runtime clock changes even when
the timer driver does not discover the rate by querying hardware.
Signed-off-by: Zhaoxiang Jin <Zhaoxiang.Jin_1@nxp.com>
Use z_abort_thread_timeout() instead of the lower-level z_abort_timeout().
The thread-flavoured version also has a stub fallback when
CONFIG_SYS_CLOCK_EXISTS=n removing the need for preprocessor checks.
Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
When CONFIG_WAITQ_SCALABLE=y, wake up all threads from a post-waitq-walk
callback which is invoked while the scheduler spinlock is still held. This
solves the race condition that was worked around via `no_wake_in_timeout`
flag in k_thread and `is_timeout` parameter of z_sched_wake_thread_locked()
which can now both be dropped.
Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
Modify z_sched_waitq_walk() to accept an optional callback invoked after
the walk while still holding the scheduler spinlock. This can be used to
perform post-walk operations "atomically". Update all callers to work with
this new function signature.
While at it, create dedicated (private) typedefs for the callbacks and
clean up/improve the routine and callbacks' documentation.
Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
When CONFIG_WAITQ_SCALABLE=n, the callback invoked by z_sched_waitq_walk()
is allowed to remove the thread provided as argument from the wait queue
(an operation implicitly performed when waking up a thread).
Use this to our advantage when waking threads pending on a k_event by
waking threads as part of the waitq walk callback instead of building
a list of threads to wake and performing the wake outside the callback.
When CONFIG_WAITQ_SCALABLE=n, this allows removing a pointer-sized field
from the thread structure which reduces the overhead of CONFIG_EVENTS=y.
The old implementation (build list in callback and wake outside callback)
is retained and used when CONFIG_WAITQ_SCALABLE=y since we can't modify
the wait queue as part of the walk callback in this situation. This is now
documented above the corresponding field in k_thread structure.
Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
z_sched_waitq_walk() used _WAIT_Q_FOR_EACH, a wrapper around the
"unsafe" SYS_DLIST_FOR_EACH_CONTAINER which does not allow detaching
elements from the list during the walk. As a result, attempting to
detach threads from the wait queue as part of the callback provided
to z_sched_waitq_walk() would result in breakage.
Introduce new _WAIT_Q_FOR_EACH_SAFE macro as wrapper around the "safe"
SYS_DLIST_FOR_EACH_CONTAINER_SAFE which allows detaching nodes from
the list during the walk, and use it inside z_sched_waitq_walk().
While at it:
- add documentation on the _WAIT_Q_FOR_EACH macro, including a warning
about detaching elements as part of the loop not being allowed
- add note to documentation of z_sched_waitq_walk() indicating that
the callback can safely remove the thread from wait queue as this
will no longer break the FOR_EACH loop
- add _WAIT_Q_FOR_EACH_SAFE to the list of ForEachMacros in .clang-format
NOTE: this new "safe removal inside callback" behavior is only available
when CONFIG_WAITQ_SCALABLE=n. When the option is 'y', red-black trees are
used instead of doubly-linked lists which prevent mutation of the list
while it is being walked. This limitation is explicitly documented.
Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
Don't acquire the _sched_spinlock in z_sched_wake_thread(). This allows
calling the function from callbacks which already own the spinlock. The
function is renamed to z_sched_wake_thread_locked() to reflect this new
behavior, and all existing callers are updated to ensure they hold the
_sched_spinlock as is now required.
Signed-off-by: Mathieu Choplain <mathieu.choplain-ext@st.com>
`k_yield()` can't be called when interrupt is disabled, update
`k_can_yield()` to reflect that.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
There is no need to include the k_mutex priority inheritance code
when CONFIG_PRIORITY_CEILING is set to a priority level that is at
or below that of the idle thread.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Use `Z_HEAP_MIN_SIZE_FOR` on the system heap. This fixes allocations
failing when there is only a single small user of the heap defining
a symbol like the following, even when only allocating 16 bytes.
```
config HEAP_MEM_POOL_ADD_SIZE_{X}
int
default 64
```
Signed-off-by: Jordan Yates <jordan@embeint.com>
Embeds both an anonymous union and an anonymous structure within the
k_spinlock structure to ensure that the structure can easily have a
non-zero size.
This new option provides a cleaner way to specify that the
spinlock structure must have a non-zero size. A non-zero size
is necessary when C++ support is enabled, or when a library
or application wants to create an array of spinlocks.
Fixes#59922
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
As per Zephyr coding guideline #59, "operands shall not be of an
inappropriate essential type". This makes sure boolean variables are
initialized with true/false, not 1/0.
Signed-off-by: Benjamin Cabé <benjamin@zephyrproject.org>
Upgrades the thread user_options to 16 bits from an 8-bit value to
provide more space for future values.
Also, as the size of this field has changed, the values for the
existing architecture specific thread options have also shifted
from the upper end of the old 8-bit field, to the upper end of
the new 16-bit field.
Fixes#101034
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
1. When the timeout is K_NO_WAIT, the thread should not be added
to the wait queue as that would otherwise cause to the thread
to wait until the next tick (which is not a no-wait situation).
2. Threads that were added to the wait queue AND did not receive
a signal before timing out should not lock the supplied mutex.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Some SoCs require kernel code to be placed in RAM, which makes
link-time optimization (LTO) unsuitable for these files.
Disabling LTO allows the affected code to be linked as separate
objects and placed in specific memory regions.
Running kernel code from RAM can improve execution performance,
especially for timing-critical routines or context switch paths.
Signed-off-by: Tim Lin <tim2.lin@ite.corp-partner.google.com>
The k_timer API requires CONFIG_SYS_CLOCK_EXISTS to be enabled,
as timer.c is only compiled when this config is set. Guard the
timer-based k_sleep() implementation and fall back to the previous
busy-wait approach when no system clock exists.
Signed-off-by: Sylvio Alves <sylvio.alves@espressif.com>
Adds board overlays for Intel ADSP platforms to use
CONFIG_LLEXT_TYPE_ELF_RELOCATABLE instead of SHAREDLIB
as xt-clang cannot link shared libs for Xtensa, exports
symbols used by Intel ADSP with Xtensa toolchain, and
adds XTENSA MPU / MMU to "no memory protection" config file.
Signed-off-by: Lauren Murphy <lauren.murphy@intel.com>
Instead of performing a linear search to determine if a given
thread is running on another CPU, or if it is marked as being
preempted by a metaIRQ on any CPU do this in O(1) time.
On SMP systems, Zephyr already tracks the CPU on which a thread
executes (or lasted executed). This information is leveraged to
do the search in O(1) time.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
The current implementation of k_sleep(), when multi-threading
is disabled, busy waits using k_busy_wait() until the sleep timeout
has expired.
This patch aims to improve power efficiency of k_sleep() for
single-threaded applications by starting a timer (k_timer) and idling
the CPU until the timer interrupt wakes it up, thus avoiding
busy-looping.
Signed-off-by: Emanuele Di Santo <emdi@nordicsemi.no>
In z_vrfy_k_poll, there is a memory access check
K_SYSCALL_MEMORY_WRITE which is wrapped in a spinlock, the same
spinlock used in z_handle_obj_poll_events which is called from
k_sem_give() for example.
The K_SYSCALL_MEMORY_WRITE() macro conditionally calls LOG_ERR()
which may call the UART console, which may call an API like
k_sem_give(). This will cause a deadlock since the locked spinlock
will be relocked, and a recursive lock if SPINLOCK_VALIDATE and
ASSERTS are enabled as the validation will fail, causing a LOG_ERR,
causing a k_sem_give() causing a relock... until stack overflows.
To solve the issue, only protect the copy of events to events_copy
with the spinlock, the content of events is not actually checked, and
bound is not shared, so there is no need to do this validation in a
critical section. The contents of events is shared so that must be
copied in atomically.
Signed-off-by: Bjarki Arge Andreasen <bjarki.andreasen@nordicsemi.no>
IAR compiler may emit Error[Go004]: Could not inline function
when handling functions marked as always_inline or inline=forced,
especially in complex kernel code
Signed-off-by: Thinh Le Cong <thinh.le.xr@bp.renesas.com>
This patch modifies k_timer_status_sync() to idle the CPU when MT
is disabled, instead of busy-looping. For this purpose, the spinlock
in the MULTITHREADING=n case has been reduced to a irq_lock(),
which works in pair with k_cpu_atomic_idle() to ensure the atomicity
of enabling the IRQs and idling the CPU.
Signed-off-by: Emanuele Di Santo <emdi@nordicsemi.no>
Instead of directly calling the current thread-specific time slice
handler in z_time_slice(), we must call a saved copy of the handler
that was made when _sched_spinlock was still held. Otherwise there
is a small window of time where another CPU could change the handler
to NULL just before we call it.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
When k_sched_time_slice_set() is called, the current time slice
should not be reset if the current thread is using thread-grained
time slicing. This is to maintain consistency with the already
established idea that thread-grained time slicing takes precedence
over the system-wide time slice size `slice_ticks`.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
This fixes several minor items related to the priority or importance
of checks in determining whether the thread can be time sliced.
A thread that is prevented from running can not be time sliced
regardless of whether it was configured for thread-grained
time slicing or not. Nor can the idle thread be time sliced.
If the thread is configured for thread-grained time slicing, then
do not bother with the preemptible or priority threshhold checks.
This maintains the same behavior, and just optimizes the checks.
If the thread is sliceable, we may as well return the size of the
tick slice since we are checking that information anyway. Thus, a
return value of zero (0) means that the thread is not sliceable,
and a value greater than zero (0) means that it is sliceable.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Within z_sched_ipi() there is no need for the thread_is_sliceable()
test as z_time_slice() performs that check. Since as a result of this
thread_is_sliceable() is now only used within timeslicing.c, the
'static' keyword is applied to it.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Re-instate a z_is_thread_ready() check on the preempted metaIRQ
thread before selecting it as the preferred next thread to
schedule. This code exists because of a corner case where it is
possible for the thread that was recorded as being pre-empted
by a meta-IRQ thread can be marked as not 'ready to run' when
the meta-IRQ thread(s) complete.
Such a scenario may occur if an interrupt ...
1. suspends the interrupted thread, then
2. readies a meta-IRQ thread, then
3. exits
The resulting reschedule can result in the suspended interrupted
thread being recorded as being interrupted by a meta-IRQ thread.
There may be other scenarios too.
Fixes#101296
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
If the thread being aborted or suspended was preempted by a metaIRQ
thread then clear the metairq_preempted record. In the case of
aborting a thread, this prevents a re-used thread from being
mistaken for a preempted thread. Furthermore, it removes the need
to test the recorded thread for readiness in next_up().
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
When a cooperative thread (temporary or otherwise) is preempted by a
metaIRQ thread on SMP, it is no longer re-inserted into the readyQ.
This prevents it from being scheduled by another CPU while the
preempting metaIRQ thread runs.
Fixes#95081
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Adjust the bounds for tracking metairq preemption to include the
case where the number of metairq threads matches the number of
cooperative threads. This is needed as a thread that is schedule
locked through k_sched_lock() is documented to be treated as a
cooperative thread. This implies that if such a thread is preempted
by a metairq thread that execution control must return to that
thread after the metairq thread finishes its work.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Add tracing support for timer expiry and stop function callbacks,
enabling measurement of callback execution duration and facilitating
debugging of cases where callbacks take longer than expected.
Signed-off-by: Vijay Sharma <vijshar@qti.qualcomm.com>
arch_mem_coherent() is cache related so it is better to move it
under cache subsys. It is renamed to sys_cache_is_mem_coherent()
to reflect this change.
The only user of arch_mem_coherent() is Xtensa. However, it is
not an architecture feature. That's why it is moved to the cache
subsys.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>