Commit graph

2656 commits

Author SHA1 Message Date
Kumar Gala
c778eb2a56 smp: Move arrays to use CONFIG_MP_MAX_NUM_CPUS
Move to use CONFIG_MP_MAX_NUM_CPUS for array size declarations instead
of CONFIG_MP_NUM_CPUS.

Signed-off-by: Kumar Gala <kumar.gala@intel.com>
2022-10-17 14:40:12 +09:00
Kumar Gala
2530ba5750 arch: smp: Allow for number of cpus to be determined at runtime
Introduce a Kconfig (MP_MAX_NUM_CPUS) and an api arch_num_cpus() to
allow for systems that might determine the number of CPUs available to
Zephyr at runtime.

CONFIG_MP_MAX_NUM_CPUS is intented to be use for any array initialization
and such that need to occur at build time.  For most systems
arch_num_cpus() will just report the value of CONFIG_MP_MAX_NUM_CPUS.

The intent is to phase out CONFIG_NP_NUM_CPUS.

Signed-off-by: Kumar Gala <kumar.gala@intel.com>
2022-10-13 16:02:19 +09:00
Francois Ramu
e01bee5bcc kernel: idle: fix -Werror=misleading-indentation
Warnings being treated as errors when building :
Error this 'for' clause does not guard...
[-Werror=misleading-indentation]

Signed-off-by: Francois Ramu <francois.ramu@st.com>
2022-10-12 18:43:15 +02:00
Gerard Marull-Paretas
495245a971 init: remove _SYS_INIT_LEVEL* definitions
The _SYS_INIT_LEVEL* definitions were used to indicate the index entry
into the levels array defined in init.c (z_sys_init_run_level). init.c
uses this information internally, so there is no point in exposing this
in a public header. It has been replaced with an enum inside init.c. The
device shell was re-using the same defines to index its own array. This
is a fragile design, the shell needs to be responsible of its own data
indexing. A similar situation happened with some unit tests.

Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
2022-10-12 18:49:12 +09:00
Gerard Marull-Paretas
831239300f kernel: move z_sys_init_run_level to init.c
The function in charge of calling all init function was defined in
device.c, had a public prototype and was just used in init.c. Since this
is really an internal function tied to Kernel init code, move it to
init.c and make it static, there's no need to expose it publicly.

Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
2022-10-12 18:49:12 +09:00
Gerard Marull-Paretas
e42f58ec94 init: s/ARCH/EARLY, call it just before arch kernel init
The `ARCH` init level was added to solve a specific problem, call init
code (SYS_INIT/devices) before `z_cstart` in the `intel_adsp` platform.
The documentation claims it runs before `z_cstart`, but this is only
true if the SoC/arch takes care of calling:

```c
z_sys_init_run_level(_SYS_INIT_LEVEL_ARCH);
```

Which is only true for `intel_adsp` nowadays. So in practice, we now
have a platform specific init level. This patch proposes to do things in
a slightly different way. First, level name is renamed to `EARLY`, to
emphasize it runs in the early stage of the boot process. Then, it is
handled by the Kernel (inside `z_cstart()` before calling
`arch_kernel_init()`). This means that any platform can now use this
level. For `intel_adsp`, there should be no changes, other than
`gcov_static_init()` will be called before (I assume this will allow to
obtain coverage for code called in EARLY?).

Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
2022-10-12 17:16:27 +09:00
Andy Ross
c32f376e99 kernel/sched: Fix SMP race on pend
For historical reasons[1] suspending threads would release the
scheduler lock between pend() (which places the current thread onto a
wait queue) and z_swap() (which effects the context swtich).  This
process happens with the caller's lock held, so local interrupts are
masked.  But on SMP this opens a tiny race where another CPU could
grab the pended thread and switch to it while we were still executing
on its stack!

Fix this by elevating the "lock swap" code that already exists in the
(portable/switch-based) z_swap() code one level so that it happens in
z_pend_curr() also.  Now we hold the scheduler lock between pend and
the final context switch.

Note that this technique can't work for the older z_swap_irqlock()
implementation, which exists to vestigially support a few bits of arch
code (mostly direct interrupts) that don't work on SMP anyway.
Address with an assert to prevent future misuse.

[1] z_swap() is a historical API implemented in per-arch assembly for
    older architectures (like ARM32!).  It was designed to be called
    with what at the time was a global IRQ lock, so it doesn't
    understand the idea of a separate scheduler lock.  When we finally
    get all archictures on arch_switch() this design can be cleaned up
    quite a bit.

Signed-off-by: Andy Ross <andyross@google.com>
2022-10-11 12:16:38 -04:00
Gerard Marull-Paretas
27845886a1 kernel: device: add missing kobject.h include
The module references a function declared in kobject.h.

Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
2022-10-11 18:05:17 +02:00
Anas Nashif
e8395351e6 kernel: init: introduce a new init level: ARCH
We have cases where some devices needs to be initialized very early and
before c_start is call, i.e. to setup very early console or to setup
memory. Traditionally this would be hardcoded as part of the soc layer
and not using device model or the init levels.

This patch adds a new level ARCH, which will be called in early
architecture code and before we jump to the kernel code.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2022-10-11 08:28:25 -04:00
Jay Shoen
b4734d5397 kernel: kheap: fix k_heap_aligned_alloc handling of K_FOREVER
k_heap_aligned_alloc was not handling K_FOREVER timeout
correctly due to unsigned return value. Added explicit
K_FOREVER handling of end time.

Fixes #50611.

Signed-off-by: Jay Shoen <jay.shoen@perceive.io>
2022-09-29 10:39:12 -05:00
Nicolas Pitre
1f362a81f1 riscv: fix crash resulting from touching the initial stack's guard area
The interrupt stack is used as the system stack during kernel
initialization while IRQs are not yet enabled. The sp register is
set to z_interrupt_stacks + CONFIG_ISR_STACK_SIZE.

CONFIG_ISR_STACK_SIZE only represents the desired usable stack size.
This does not take into account the added guard area. Result is a stack
whose pointer is much closer to the trigger zone than expected when
CONFIG_PMP_STACK_GUARD=y, and the SMP configuration in particular pushes
it over the edge during many CI test cases.

Worse: during early init we're not quite ready to handle exceptions
yet and complete havoc ensues with no meaningful debugging output.

Make sure the early assembly code locates the actual top of the stack
by generating a constant with its true size.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2022-09-28 07:53:56 +00:00
Tom Burdick
9c0bf4b071 kernel: Obtain current cpu inside of locks for thread usage
Obtaining the CPU outside of the spin locks on SMP would
result in an assert failing on __ASSERT(!z_smp_mobile())
which makes sense as the current cpu may change.

Signed-off-by: Tom Burdick <thomas.burdick@intel.com>
2022-09-26 07:55:33 +00:00
Andy Ross
0ca7150f90 kernel/idle: Fix !SCHED_IPI_SUPPORTED
The requirement for k_yield() to handle "yielding" in the idle thread
was removed a while back, but it missed a spot where we'd try to yield
in the fallback loop on bringup platforms that lack an IPI.  This now
crashes, because yield now unconditionally tries to reschedule the
current thread, which doesn't work for idle threads that don't live in
the run queue.

Just make it a busy loop calling swap(), even simpler.

Fixes #50119

Signed-off-by: Andy Ross <andyross@google.com>
2022-09-19 09:19:02 +02:00
Kai Vehmanen
e81ccef613 kernel/sched: fix condition for CPU mask set
When building with CONFIG_SCHED_CPU_MASK_PIN_ONLY=y, CPU mask
is fixed and cannot be changed while thread is running.

The current code asserts if thread state is anything but PREPARED.

We do however have interface like k_work_queue_start() where a thread is
started as part of the queue start. To allow user to set the pinned CPU
for the work queue thread, it needs to be possible to suspend the
thread, set the mask, and then call k_thread_resume(). This seems to be
a valid sequence, so relax the assert check to reflect this.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
2022-09-09 16:13:35 -04:00
Anas Nashif
6a9540a773 tracing: ctf: add timer support
Add k_timer tracing to CTF and other formats.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2022-08-31 16:04:01 -04:00
Jeremy Herbert
379ca18a93 kernel: allow k_poll to wait on pipes
k_poll does not currently allow polling on pipes. This adds support
for doing so on buffered pipes.

Signed-off-by: Jeremy Herbert <jeremy.006@gmail.com>
2022-08-24 17:49:20 +00:00
Carlo Caione
4806e1087e cache: Fix cache API calling from userspace
When a cache API function is called from userspace, this results on
ARM64 in an OOPS (bad syscall error). This is due to at least two
different factors:

- the location of the cache handlers is preventing the linker to
  actually find the handlers
- specifically for ARM64 and ARC some cache handling functions are not
  implemented (when userspace is not used the compiler simply optimizes
  out these calls)

Fix the problem by:

- moving the userspace cache handlers to a their logical and proper
  location (in the drivers directory)
- adding the missing handlers for ARM64 and ARC

Signed-off-by: Carlo Caione <ccaione@baylibre.com>
2022-08-23 10:14:17 +02:00
Gerard Marull-Paretas
a202341958 devices: constify device pointers initialized at compile time
Many device pointers are initialized at compile and never changed. This
means that the device pointer can be constified (immutable).

Automated using:

```
perl -i -pe 's/const struct device \*(?!const)(.*)= DEVICE/const struct
device *const $1= DEVICE/g' **/*.c
```

Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
2022-08-22 17:08:26 +02:00
Andy Ross
5722da71ce kernel: Skip bss clear on native_posix
There's no point to doing this when the host OS clears all memory at
mapping time.  And as it turns out, the __bss_end symbol it was
relying on actually comes from the host toolchain's linker, not our
own linker scripts (making it semi-dangerous to rely on).  And it's
not present in clang/lld output anyway.

Signed-off-by: Andy Ross <andyross@google.com>
2022-08-19 08:30:01 +02:00
Peter Mitsis
f86027ffb7 kernel: pipes: rewrite pipes implementation
This new implementation of pipes has a number of advantages over the
previous.
  1. The schedule locking is eliminated both making it safer for SMP
     and allowing for pipes to be used from ISR context.
  2. The code used to be structured to have separate code for copying
     to/from a wating thread's buffer and the pipe buffer. This had
     unnecessary duplication that has been replaced with a simpler
     scatter-gather copy model.
  3. The manner in which the "working list" is generated has also been
     simplified. It no longer tries to use the thread's queuing node.
     Instead, the k_pipe_desc structure (whose instances are on the
     part of the k_thread structure) has been extended to contain
     additional fields including a node for use with a linked list. As
     this impacts the k_thread structure, pipes are now configurable
     in the kernel via CONFIG_PIPES.

Fixes #47061

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2022-08-17 19:31:25 +02:00
Qi Yang
89c4a074dc kernel: mutex: fix races when lock timeout
Say threadA holds a mutex and threadB tries
to lock it with a timeout, a race would occur
if threadA unlock that mutex after threadB
got unpended by sys_clock and before it gets
scheduled and calls k_spin_lock.

This patch fixes this issue by checking the
mutex's status again after k_spin_lock calls.

Fixes #48056

Signed-off-by: Qi Yang <qi.yang@cmind-semi.com>
2022-08-12 17:40:20 +02:00
Hu Zhenyu
57487622f5 kernel: Init the base.slice_ticks for dummy_thread
Fixes #46324
Set dummy_thread->base.slice_ticks to 0 when
CONFIG_TIMESLICE_PER_THREAD is set. To avoid
_current_cpu->slice_ticks be a big number.

Signed-off-by: Hu Zhenyu <zhenyu.hu@intel.com>
2022-08-04 19:44:24 -04:00
Peter Mitsis
71ef669ea4 kernel: Fixes sys_clock_tick_get()
Fixes an issue in sys_clock_tick_get() that could lead to drift in
a k_timer handler. The handler is invoked in the timer ISR as a
callback in sys_tick_announce().
  1. The handler invokes k_uptime_ticks().
  2. k_uptime_ticks() invokes sys_clock_tick_get().
  3. sys_clock_tick_get() must call elapsed() and not
     sys_clock_elapsed() as we do not want to count any
     unannounced ticks that may have elapsed while
     processing the timer ISR.

Fixes #46378

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2022-08-04 05:32:11 -04:00
Peter Mitsis
3e2f30a7ef kernel: fix race condition in sys_clock_announce()
Updates sys_clock_announce() such that the <announce_remaining> update
calculation is done after the callback. This prevents another core from
entering the timeout processing loop before the first core leaves it.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2022-08-03 17:43:04 -04:00
Anas Nashif
13714a6742 kernel: clock: fix SYS_CLOCK_EXISTS help
SYS_CLOCK_EXISTS help was describing the 'do not exist' case and is
confusing.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2022-08-02 13:57:17 -04:00
Andrew Jackson
e183671808 kernel: Add k_event_set_masked primitive
There is no easy way to clear event bits without
the potential for a race to exist between producer(s)
and consumer(s). The result of this race is that events
can be lost through the various resetting mechanisms
available (flag to k_event_wait(), or k_event_set()).

Add k_event_set_masked() which permits bits to be set or cleared.
This allows consumers to clear just the bits that they have read
without (accidentally) discarding any new bits.

Update unit tests to verify the functionality.

Partly Fixes #46117.

Signed-off-by: Andrew Jackson <andrew.jackson@amd.com>
2022-07-25 15:24:32 -04:00
Andrew Jackson
e7e827a0d2 kernel: Use mask rather than boolean to update events
Although there is nothing wrong with the existing code,
it doesn't permit individual bits to be set (or cleared).
This makes further changes slightly awkward.

Use a mask to restrict the bits set in an event.

Signed-off-by: Andrew Jackson <andrew.jackson@amd.com>
2022-07-25 15:24:32 -04:00
Simon Hein
02cfbfea51 kernel: comply to coding guidelines MISRA C:2012 Rule 14.4
MISRA C:2012 Rule 14.4 (The controlling expression of an if statement
and the controlling expression of an iteration-statement shall have
essentially Boolean type.)

Use `bool' instead of `int' to represent Boolean values.
Use `do { ... } while (false)' instead of `do { ... } while (0)'.
Use comparisons with zero instead of implicitly testing integers.

This commit is a subset of the original commit:
5d02614e34a86b549c7707d3d9f0984bc3a5f22a

Signed-off-by: Simon Hein <SHein@baumer.com>
2022-07-21 06:16:16 -04:00
Johann Fischer
3c971307dc arch/kernel/soc/samples: use unsigned int for irq_lock()
irq_lock() returns an unsigned integer key.
Generated by spatch using semantic patch
scripts/coccinelle/irq_lock.cocci

Signed-off-by: Johann Fischer <johann.fischer@nordicsemi.no>
2022-07-14 14:37:13 -05:00
Peter Mitsis
1244065abc kernel: Extend slabs memory usage stats
Adds memory usage runtime stats routines that parallel those used
by both the heap and mem_blocks. This helps maintain some level of
of consistency across the different memory types.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2022-07-12 13:59:26 +00:00
Anas Nashif
efbadbb677 scripts: move gen_kobject_list.py to scripts/build/gen_kobject_list.py
Move scripts needed by the build system and not designed to be run
individually or standalone into the build subfolder.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2022-07-12 10:03:45 +02:00
Jordan Yates
6f41d52734 kernel: switch to SYS_INIT_NAMED
Update the two locations that use two `SYS_INIT` macros with the same
initilisation functions to use `SYS_INIT_NAMED`.

Signed-off-by: Jordan Yates <jordan.yates@data61.csiro.au>
2022-07-06 10:44:35 +02:00
Enjia Mai
89a9eab652 drivers: console: add a minimal EFI console driver to support printf
Add a minimal EFI console driver to support printf, this console driver
only supports console output. Otherwise the printf will not work.

Signed-off-by: Enjia Mai <enjia.mai@intel.com>
2022-07-05 16:52:32 -04:00
Abramo Bagnara
ad8778d019 coding guidelines: comply with MISRA C:2012 Rule 4.1
MISRA C:2012 Rule 4.1 (Octal and hexadecimal escape sequences shall be
terminated.)

Use string literal concatenation to properly terminate hexadecimal
escape sequences.

Signed-off-by: Abramo Bagnara <abramo.bagnara@bugseng.com>
Signed-off-by: Simon Hein <SHein@baumer.com>
2022-06-30 19:51:59 -04:00
Lauren Murphy
318e6db239 debug: coredump: add xtensa intel adsp, support toolchains
Adds compatibility with Intel ADSP GDB from Zephyr SDK and
from Cadence toolchain to coredump_gdbserver.py.

Adds CAVS 15-25 (APL) register definitions. Implements
handle_register_single_read_packet to serve ADSP GDB
p packets.

Prevents BSA from changing between stack dump printout
and coredump by taking lock. Observed to be necessary for
accurate results on slower simulated platforms.

Signed-off-by: Lauren Murphy <lauren.murphy@intel.com>
2022-06-23 15:44:45 -04:00
Krzysztof Chruscinski
041f0e5379 all: logging: Remove log_strdup function
Logging v1 has been removed and log_strdup wrapper function is no
longer needed. Removing the function and its use in the tree.

Signed-off-by: Krzysztof Chruscinski <krzysztof.chruscinski@nordicsemi.no>
2022-06-23 13:42:23 +02:00
Stephanos Ioannidis
360f810704 kernel: Migrate to K_KERNEL_PINNED_STACK_ARRAY_DECLARE
This commit updates all deprecated `K_KERNEL_PINNED_STACK_ARRAY_EXTERN`
macro usages to use the `K_KERNEL_PINNED_STACK_ARRAY_DECLARE` macro
instead.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
2022-06-20 10:25:52 +02:00
Gerard Marull-Paretas
afffc1006f kernel: remove redundant <zephyr/zephyr.h> includes
Files including <zephyr/kernel.h> do not have to include
<zephyr/zephyr.h>, a shim to <zephyr/kernel.h>.

Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
2022-06-15 09:13:11 +02:00
David Palchak
b4a7f0f2ca linker: ensure global constructors only run once
Rename the symbols used to denote the locations of the global
constructor lists and modify the Zephyr start-up code accordingly.
On POSIX systems this ensures that the native libc init code won't
find any constructors to run before Zephyr loads.

Fixes #39347, #36858

Signed-off-by: David Palchak <palchak@google.com>
2022-06-09 11:33:36 +02:00
Keith Packard
275b40ef25 kernel: Allow non-zephyr toolchains to advertise TLS support
Use a new environment variable,
ZEPHYR_TOOLCHAIN_SUPPORTS_THREAD_LOCAL_STORAGE, to set the value for
TOOLCHAIN_SUPPORTS_THREAD_LOCAL_STORAGE instead of setting it to 'n' for
all non-Zephyr toolchains. In particular, the Debian arm-none-eabi
toolchain has TLS support and with this option, can be used to build
Zephyr with thread local variables.

Signed-off-by: Keith Packard <keithp@keithp.com>
2022-06-05 14:29:12 +02:00
Andy Ross
fb613594c7 kernel/sched: Panic on aborting essential threads
Documentation specifies that aborting/terminating/exiting essential
threads is a system panic condition, but we didn't actually implement
that and allowed it as for other threads. At least one app wants to
exploit this documented behavior as a "watchdog" kind of condition,
and that seems reasonable.  Do what we say we're supposed to do.

This also includes a small fix to a test, which seemed like it was
written to exercise exactly this condition.  Except that it failed to
detect whether or not a system fatal error was actually signaled and
was (incorrectly) indicating "success".  Check that we actually enter
the handler.

Fixes #45545

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2022-05-20 12:34:30 +02:00
Peter Mitsis
976e4087ec kernel: Fix gathering of runtime thread stats
The function k_thread_runtime_stats_all_get() now populates the
current_cycles field in the thread runtime stats structure.

Resets the number of cycles in the CPU's current usage window once
the idle thread is scheduled.

Fixes the average_cycles calcuation.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2022-05-13 10:19:53 -05:00
Keith Packard
4fc00cae7a kernel: Allow Zephyr to use libc's internal errno
For a library which already provides a multi-thread aware errno, use
that instead of creating our own internal value.

Signed-off-by: Keith Packard <keithp@keithp.com>
2022-05-12 19:06:48 -04:00
Lucas Dietrich
9a848b3ad4 kernel: workq: Add internal function z_work_submit_to_queue()
This adds the internal function z_work_submit_to_queue(), which
submits the work item to the queue but doesn't force the thread to yield,
compared to the public function k_work_submit_to_queue().

When called from poll.c in the context of k_work_poll events, it ensures
that the thread does not yield in the context of the spinlock of object
that became available.

Fixes #45267

Signed-off-by: Lucas Dietrich <ld.adecy@gmail.com>
2022-05-10 18:39:51 +02:00
Gerard Marull-Paretas
cffefc818d kernel: migrate includes to <zephyr/...>
In order to bring consistency in-tree, migrate all kernel code to the
new prefix <zephyr/...>. Note that the conversion has been scripted,
refer to zephyrproject-rtos#45388 for more details.

Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
2022-05-09 09:26:20 +02:00
Jordan Yates
1ef647f396 kernel: add k_can_yield helper function
Implements a function that application and driver code can use to check
whether it is valid to yield (or block) in the current context. This
check is required for functions that can feasibly be run from multiple
contexts. The primary intended use case is power management transition
functions, which can be run by application code explicitly or
automatically in the idle thread by system PM.

Signed-off-by: Jordan Yates <jordan.yates@data61.csiro.au>
2022-05-06 11:33:10 +02:00
Bradley Bolen
88ba97fea4 arch: arm: aarch32: cortex_a_r: Add shared FPU support
This adds lazy floating point context switching.  On svc/irq entrance,
the VFP is disabled and a pointer to the exception stack frame is saved
away.  If the esf pointer is still valid on exception exit, then no
other context used the VFP so the context is still valid and nothing
needs to be restored.  If the esf pointer is NULL on exception exit,
then some other context used the VFP and the floating point context is
restored from the esf.

The undefined instruction handler is responsible for saving away the
floating point context if needed.  If the handler is in the first
irq/svc context and the current thread uses the VFP, then the float
context needs to be saved.  Also, if the handler is in a nested context
and the previous context was using the FVP, save the float context.

Signed-off-by: Bradley Bolen <bbolen@lexmark.com>
2022-05-05 12:03:27 +09:00
Flavio Ceolin
551038e748 kernel: sched: Change cpu pin only for not executing threads
Do not allow changing the CPU which a thread is pinned when it is
already being executed. This allows further optimizations in some
platforms with incoherent memory since we can safely assume that the
thread will run in the same CPU and avoid invalidate / flush the
cache during context switches.

Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
2022-05-04 13:46:48 -04:00
Andy Ross
7a59cebf12 kernel/k_timer: Robustify vs. late interrupts
The k_timer utility was written to assume that the kernel timeout
handler would never be delayed by more than a tick, so it can naively
reschedule the next interrupt with a simple delay.

Unfortunately real platforms have glitchy hardware and high tick
rates, and on intel_adsp we're seeing this promise being broken in
some circumstances.

It's probably not a good idea to try to plumb the timer driver
interface up into the IPC layer to do this correction, but thankfully
the existing absolute timeout API provides the tools we need (though
it does require that CONFIG_TIMEOUT_64BIT be enabled).

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2022-05-04 09:55:46 -05:00
Andy Ross
b4e9ef0691 kernel/sched: Defer IPI sending to schedule points
The original design intent with arch_sched_ipi() was that
interprocessor interrupts were fast and easily sent, so to reduce
latency the scheduler should notify other CPUs synchronously when
scheduler state changes.

This tends to result in "storms" of IPIs in some use cases, though.
For example, SOF will enumerate over all cores doing a k_sem_give() to
notify a worker thread pinned to each, each call causing a separate
IPI.  Add to that the fact that unlike x86's IO-APIC, the intel_adsp
architecture has targeted/non-broadcast IPIs that need to be repeated
for each core, and suddenly we have an O(N^2) scaling problem in the
number of CPUs.

Instead, batch the "pending" IPIs and send them only at known
scheduling points (end-of-interrupt and swap).  This semantically
matches the locations where application code will "expect" to see
other threads run, so arguably is a better choice anyway.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2022-05-02 10:23:13 -05:00