Commit graph

431 commits

Author SHA1 Message Date
Peter Mitsis
5c36567c56 tests: Add benchmark for IPI performance
Adds tests to better gauge IPI performance on SMP. In each case, one
CPU is used as the source of IPIs while the remaining CPUs are busy
doing "work". Every 30 seconds the benchmark reports on the amount
of "work" done by the busy CPUs and the amount of work done by the
CPU generating the IPIs.

This can be used to ...
 1. Show how enabling IPI optimization affects system performance
 2. Show the cost of spinlock contention as the number of CPUs increase
 3. Measure the relative performance of scheduler changes on SMP.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2025-04-04 21:15:14 +02:00
Nicolas Pitre
ea7a969204 tests: benchmarks: sys_kernel: add k_malloc() test
Useful to evaluate malloc performance changes.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2025-04-01 22:13:04 +02:00
Peter Mitsis
1c5072780f tests: thread_metric: Fix cooperative for SMP
The thread metric cooperative benchmark had a subtle flaw on
SMP enabled systems. The reporting/main thread was running at
a higher priority than the "cooperative" threads. When that
reporting thread woke up from its 30 second sleep, there was a
chance that it would change the ordering of the "cooperative"
threads before the test expected it.

This ordering change is not present on UP systems as the current
thread is always in the ready queue. However, on SMP systems the
current thread is not in the ready queue and is re-added to the
end of the list when it is preempted by a higher priority thread.

To work around this, we make the priority of the main/reporting
thread to be the same as the "cooperative" threads. Thus when
the reporting thread wakes, it is added to the end of list and
no longer introduces an unexpected schedule point.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2025-03-17 07:05:36 +01:00
Anas Nashif
f29ae72d79 kernel: rename 'dumb' scheduler and simply call it 'simple'
Improve naming of the scheduler and call it what it is: simple. Using
'dumb' for the default scheduler algorithm in Zephyr is a bad idea.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2025-03-15 00:34:58 +01:00
Benjamin Cabé
fc68909d1b tests: latency_measure: fix spelling of "switch"
s/siwtch/switch/

Signed-off-by: Benjamin Cabé <benjamin@zephyrproject.org>
2025-02-21 11:41:46 +00:00
Benjamin Cabé
8d253b78a1 tests: show thread_metric Kconfig menu first
Put the thread_metric Kconfig menu first in the Kconfig "homepage", like
other samples and tests typically do as this makes it easier to quickly
get to the relevant options in menuconfig.

Signed-off-by: Benjamin Cabé <benjamin@zephyrproject.org>
2025-02-14 03:04:35 +01:00
Peter Mitsis
11f69fb686 tests: Fix thread_metric message processing
Changes the type of both the tm_message_sent and tm_message_received
arrays from 'unsigned long' to 'unsigned int'. The test expects those
arrays to be 16 bytes long. This was a problem on 64-bit platforms
as 'unsigned long' is 8 bytes, which made the arrays 32 bytes long.
On both our supported 32-bit and 64-bit platforms, 'unsigned int'
works out to be 4 bytes long, thereby giving us the requisite 16
byte buffer.

Although a case could be made for using 'uint32_t', this was not
chosen simply to keep the structure as close as practical to the
original thread_metric implementation.

Fixes #83864

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2025-02-05 01:12:52 +01:00
Nicolas Pitre
99c2057bb6 tests: app_kernel: restore the PIPE_NOBUFF variant
... now that the new pipe implementation supports it.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2025-01-21 19:44:57 +01:00
Måns Ansgariusson
0572f1f098 tests: Update tests to use new k_pipe API
Update tests to use the reworked k_pipe API.

Signed-off-by: Måns Ansgariusson <Mansgariusson@gmail.com>
2025-01-17 19:43:44 +01:00
Dmitrii Golovanov
ae90679f88 tests: benchmarks: thread_metric: Record measurements
Extentd 'benchmark.thread_metric' (tests/benchmarks/thread_metric)
test suite to collect benchmark measurements in Twister reports
as recordings parsed from the test's output: time period values
as well as errors.

Additionally, each test is executed until it makes at least 3
measurements to estimate variance.

Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>
2025-01-16 22:38:51 +01:00
Dmitrii Golovanov
e63a513a93 tests: Adjust to Twister changes in recording feature
Adjust testcase.yaml files to changes in Twister schema which
now allows multiple recording patterns ('record: regex:').

Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>
2025-01-16 22:38:51 +01:00
Nicolas Pitre
46aa6717ff Revert "arch: deprecate _current"
Mostly a revert of commit b1def7145f ("arch: deprecate `_current`").

This commit was part of PR #80716 whose initial purpose was about providing
an architecture specific optimization for _current. The actual deprecation
was sneaked in later on without proper discussion.

The Zephyr core always used _current before and that was fine. It is quite
prevalent as well and the alternative is proving rather verbose.
Furthermore, as a concept, the "current thread" is not something that is
necessarily architecture specific. Therefore the primary abstraction
should not carry the arch_ prefix.

Hence this revert.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2025-01-10 07:49:08 +01:00
Peter Mitsis
60b38d50db tests: latency_measure: Update README.txt
Updates the README.txt with more current sample output.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-12-20 23:51:02 +02:00
Peter Mitsis
b3b94731d5 tests: latency_measure: Remove timeslicing
Removes the timeslicing configuration from the latency
measure benchmark as its numbers are nearly identical
to the default. This makes sense as the benchmark was
not really designed for exercising the timeslicing
feature.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-12-20 23:51:02 +02:00
Dmitrii Golovanov
33a91daf3b tests: benchmarks: wait_queues: Record metrics
Enhance the wait_queue benchmark to output resulting summary
metrics as records, so when it runs with Twister the results
are parsed and saved into twister.json and recording.csv files
for further analysis.
Minor documentation edits and make ClangFormat happy.

Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>
2024-12-20 08:33:14 +01:00
Dmitrii Golovanov
422c9d6c7b tests: benchmarks: wait_queues: Turn off verbose output
Turn off verbose output by default as described in the README.

Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>
2024-12-20 08:33:14 +01:00
Dmitrii Golovanov
f3358259d6 tests: benchmarks: sched_queues: Record metrics
Enhance the sched_queue benchmark to output resulting summary
metrics as records, so when it runs with Twister the results
are parsed and saved into twister.json and recording.csv files
for further analysis.
Minor documentation edits and make ClangFormat happy.

Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>
2024-12-20 08:33:14 +01:00
Dmitrii Golovanov
2fd8abef7f tests: benchmarks: sched_queues: Turn off verbose output
Turn off verbose output by default as described in the
README.

Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>
2024-12-20 08:33:14 +01:00
Peter Mitsis
35435928c2 kernel: Decouple sleep from suspend
Sleeping and suspended are now orthogonal states. That is, a thread
may be both sleeping and suspended and the two do not interact. One
repercussion of this is that suspending a thread will no longer
abort its timeout.

Threads are now created in the 'sleeping' state instead of a
'suspended' state. This dovetails nicely with the start delay that
can be given to a newly created thread--it is as though the very
first operation that a thread with a start delay is a sleep.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-12-18 18:17:03 +01:00
Peter Mitsis
e96626944b tests: thread_metric: Disable memory slab ptr validation
Disabling the memory slab pointer validation improves the performance
of the memory allocation sub-test by about 9%.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-12-14 01:03:28 +01:00
Benjamin Cabé
2d60d248e8 tests: benchmarks: increase timeout for sched_queues tests
Some tests can take close to 60 seconds to complete on m2gl025_miv
platform (it runs at 4 MHz) so increase the timeout to 120 seconds
to be on the safe side.

Signed-off-by: Benjamin Cabé <benjamin@zephyrproject.org>
2024-12-10 15:29:11 +00:00
Chris Friedt
3f60489fae tests: benchmarks: move pthread_pressure to benchmarks/posix
The pthread_pressure test was not a typical test per se. It was
a benchmark in search of the proper home.

Let's move it to the correct place in the Zephyr tree, add a
doc, and provide some reporting.

Currently, k_threads out-perform pthreads by almost a factor of
2. The theoretical maximum performance of pthreads would be at
parity of k_threads, since pthreads are a wrapper around kernel
threads. It would be great to reduce the gap.

Signed-off-by: Chris Friedt <cfriedt@tenstorrent.com>
2024-12-06 06:51:13 +01:00
Andy Ross
7cdf40541b kernel/sched: Eliminate PRESTART thread state
Traditionally threads have been initialized with a PRESTART flag set,
which gets cleared when the thread runs for the first time via either
its timeout or the k_thread_start() API.

But if you think about it, this is no different, semantically, than
SUSPENDED: the thread is prevented from running until the flag is
cleared.

So unify the two.  Start threads in the SUSPENDED state, point
everyone looking at the PRESTART bit to the SUSPENDED flag, and make
k_thread_start() be a synonym for k_thread_resume().

There is some mild code size savings from the eliminated duplication,
but the real win here is that we make space in the thread flags byte,
which had run out.

Signed-off-by: Andy Ross <andyross@google.com>
2024-11-27 10:38:05 -05:00
Anas Nashif
057ba5cf45 tests: benchmarks: optimize filters and use platform_key.
Benchmarks are not tests, we run them to verify they still work and do
not bitrot. Running them on each architecture should be sufficient.

This reduces amount of churn in CI and still allows them to be run
individually on platforms.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2024-11-26 21:42:22 +01:00
Yong Cong Sin
b1def7145f arch: deprecate _current
`_current` is now functionally equals to `arch_curr_thread()`, remove
its usage in-tree and deprecate it instead of removing it outright,
as it has been with us since forever.

Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
2024-11-23 20:12:24 -05:00
Alberto Escolar Piedras
f79d879d50 tests: Thread-Metric: Fix filter description
Correct the description of why we exclude the POSIX arch

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
2024-11-20 15:56:16 -05:00
Alberto Escolar Piedras
93c03214fa tests: Thread-Metric: Filter properly native targets
There is more native targets than native_sim and native_posix.
Let's exclude them all by architecture.

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
2024-11-19 08:52:05 -05:00
Peter Mitsis
95a97fd287 tests: Port Thread-Metric benchmark from ThreadX
Ports the Thread-Metric suite of benchmarks from ThreadX to Zephyr.
This makes it easier for others to run these benchmarks with the
best set of configuration options for their board so that
they can better compare Zephyr performance to another RTOS.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-11-18 19:32:02 -05:00
Daniel Leung
a6c1f80f46 tests: benchmarks/latency: limit CPU to 1 for Intel ADSP ACE
Due to addition of busy threads running on other cores, and
the simulator runs in single thread bouncing through all cores,
we are wasting quite a bit of time just busy waiting. This makes
each simulator run too long for CI. So limit CPU number to 1.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2024-11-16 15:54:36 -05:00
Yong Cong Sin
e0ce096b91 tests: benchmarks/sched: limit CPU to 1 for Intel ADSP ACE
Due to addition of busy threads running on other cores, and
the simulator runs in single thread bouncing through all cores,
we are wasting quite a bit of time just busy waiting. This makes
each simulator run too long for CI. So limit CPU number to 1.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
2024-11-16 14:07:08 -05:00
Yong Cong Sin
927420a423 tests: sched: Add busy threads for SMP
The sched benchmark is designed for systems with a single
CPU. Otherwise, the timestamps would be wrong when the partner
thread is scheduled on another CPU, i.e. negative values:

```
unpend   63 ready   62 switch -16562 pend 18937 tot 2500 (avg  928)
```

When the system allows for multiple CPUs, spawn a non-preemptible
thread to keep the other CPUs busy.

Signed-off-by: Yong Cong Sin <ycsin@meta.com>
2024-11-16 14:07:08 -05:00
Yong Cong Sin
70abb6077d tests/benchmarks: latency_measure: add qemu_riscv_64_smp board
Add `qemu_riscv64/qemu_virt_riscv64/smp` to the list of
integration_platforms.

Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
2024-11-16 13:38:37 -05:00
Peter Mitsis
aeaddd70b7 tests: Fix IRQ locking in sched benchmark
Corrects an issue that was introduced when the interrupt
locking/unlocking was added to the 'sched' benchmark by
unlocking the interrupts before the context switch done by
k_yield(), but after the call to z_unpend_first_thread().

Fixes PR #81050

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-11-08 15:55:23 -06:00
Peter Mitsis
41064c8e1d tests: Add objcore to latency_measure testcase.yaml
Adds the object core configuration to the latency_measure benchmark's
testcase.yaml file as there have been requests for it.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-11-07 18:42:03 -08:00
Sudan Landge
3092d96e5b boards: mps3: Add support for corstone300/an552
What is changed?
 - Added a new mps3 board an552 for the soc corstone300.
   The qualifier to build/run application with board mps3/an552 is
   `mps3/corstone300/an552` for secure and
   `mps3/corstone300/an552/ns` for non-secure.
 - Added FVP variant to enable FVP testing with corstone300
   and it uses the ARM FVP `FVP_Corstone_SSE-300_Ethos-U55`.
   The qualifier to build/run application with FVP is
   `mps3/corstone300/fvp` for secure and
   `mps3/corstone300/fvp/ns` for non-secure.
 - Note: the qualifier to build/run application with board mps3/an547
   is now changed to
   `mps3/corstone300/an547` for secure and
   `mps3/corstone300/an547/ns` for non-secure.

How is it changed?
 - Moved common code from mps3/an547 to corstone300.
 - Renamed soc for an547 to corstone300 and added
   a new soc corstone300/an552.

Why do we need this change?
 - This enables FVP support and testing for corstone300.
 - SOC/qualifier for mps3/an547 was renamed to reduce code redundancy
 - A separate FVP variant was added for AN552 because, the TFM board
   used for non-secure variant differs for FPGA and FVP.
   TFM board `arm/mps3/corstone300/fvp` should be used when testing
   AN552 with FVP and `arm/mps3/corstone300/an552` should be used when
   testing with AN552 FPGA.

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
2024-10-26 03:58:05 +01:00
Stephanos Ioannidis
a93095626b tests: benchmarks: wait_queues: Increase test timeout to 120s
This commit increases the test timeout for the wait queue benchmark tests
to 120 seconds because these tests frequently hit the default timeout of 60
seconds during execution.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
2024-10-22 19:04:37 -04:00
Peter Mitsis
cedd36106b kernel: Begin abstracting out _sched_spinlock
In a uniprocessor system, _sched_spinlock may not need to be
held in all the same cases that it does in a multiprocessor
system. Removing those unnecessary usages can lead to better
performance on UP systems. In the case of uncontested taking
and giving of a semaphore, this can be as much as a +14%
performance gain.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-10-21 18:38:00 -05:00
Yong Cong Sin
59e41ef830 tests: latency_measure: reduce the chance of cycles underflow
Sometimes there's an unusually large cycles for tests that are
known to complete with just a few cycles. Upon some testing,
I found that it was because the overhead cycles was larger
than the cycles taken by tests, causing the cycles to
underflow.

To workaround that, make sure that the overhead measurement
thread runs uninterrupted, repeat the measurement for a few
times, and take the minimum value.

Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
2024-10-08 18:10:11 -04:00
Peter Mitsis
318b49570a tests: scheduler queue benchmarks
Implements a set of tests designed to show how the performance of the
three scheduler queue implementations (DUMB, SCALABLE and MULTIQ)
varies with respect to the number of threads in the ready queue.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-10-07 20:16:20 -04:00
Peter Mitsis
2221ca82d4 tests: wait queue benchmarks
Implements a set of tests designed to show how the performance of the two
wait queue implementations (DUMB and SCALABLE) change as the number of
threads in the wait queue varies.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-10-07 20:16:20 -04:00
Yong Cong Sin
def418b920 tests: latency_measure: fix stacks for the busy threads
The current `K_THREAD_STACK_DEFINE` only create a single stack
shared by all the busy threads. This is causing the application
to crash when there are more than 2 cores in the system.

We should use `K_THREAD_STACK_ARRAY_DEFINE` to create an array
of stacks instead.

Updated the testcase to test up to 8 cores using
qemu_riscv64_smp

Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
2024-09-19 18:28:16 +01:00
Peter Mitsis
ee7bbf55e0 tests: latency_measure: Add busy threads for SMP
The latency_measure benchmark is designed for systems with a single
CPU. When the system allows for multiple CPUs, instead of forcing
a single CPU to be used via 'prj.conf', spawn a non-preemptible
thread to keep the other CPUs busy.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-08-26 14:45:04 -04:00
Peter Mitsis
886098a6c8 tests: latency_measure: Restructure configurations
Restructures the latency_measure benchmark configurations. Most of them
do not need to be executed by default by CI. However, we still want to
make it easy to run the benchmark with various configurations so that
they can be easily compared against the default.

The updated documentation shows how this can be done.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-06-12 14:33:47 +03:00
Tomi Fontanilles
3efdbe6c0c modules: mbedtls: rename CONFIG_MBEDTLS_MAC_*_ENABLED and rm duplicates
Remove the `_MAC` part because those Kconfig options enable only hash
algorithms, nothing MAC-related, and the `_ENABLED` part to align the
naming to the Mbed TLS defines (plus we don't need such a part).

As a bonus, enabling SHA-256 does not automatically enable SHA-224
anymore.

See the migration guide entries for more details on the practical
changes.

Signed-off-by: Tomi Fontanilles <tomi.fontanilles@nordicsemi.no>
2024-05-29 08:39:26 +02:00
Yong Cong Sin
bbe5e1e6eb build: namespace the generated headers with zephyr/
Namespaced the generated headers with `zephyr` to prevent
potential conflict with other headers.

Introduce a temporary Kconfig `LEGACY_GENERATED_INCLUDE_PATH`
that is enabled by default. This allows the developers to
continue the use of the old include paths for the time being
until it is deprecated and eventually removed. The Kconfig will
generate a build-time warning message, similar to the
`CONFIG_TIMER_RANDOM_GENERATOR`.

Updated the includes path of in-tree sources accordingly.

Most of the changes here are scripted, check the PR for more
info.

Signed-off-by: Yong Cong Sin <ycsin@meta.com>
2024-05-28 22:03:55 +02:00
Fin Maaß
e354927895 tests: use appropriate sys_randX_get()
use the appropriate sys_randX_get() instead
of always sys_rand32_get().

Signed-off-by: Fin Maaß <f.maass@vogl-electronic.com>
2024-04-05 10:57:45 -05:00
Anas Nashif
f9932c578d tests: benchmark: adapt recording of benchmark results
We now have a different output, so capture it correctly.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2024-03-12 15:02:49 -04:00
Peter Mitsis
a55a078909 tests: Use z_pend_curr() in sched benchmark
Replaces the use of z_pend_curr_irqlock() with z_pend_curr()
as the former is not used anywhere else anymore.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-03-07 11:51:06 -05:00
Anas Nashif
5e591c38f1 kernel: do not export z_thread_priority_set
This function is only being used by a test, so instead of reimplementing
a syscall in the test, provide a Kconfig option to provide the
functionality that only works with tests and remove some of the
duplication and extra code.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2024-03-06 19:27:28 -05:00
Peter Mitsis
590e4f3a82 tests: latency_measure: Add k_stack object support
Updates the latency_measure test to add support for benchmarking
k_stack_push() and k_stack_pop().

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-03-05 16:50:47 +00:00