Adds tests to better gauge IPI performance on SMP. In each case, one
CPU is used as the source of IPIs while the remaining CPUs are busy
doing "work". Every 30 seconds the benchmark reports on the amount
of "work" done by the busy CPUs and the amount of work done by the
CPU generating the IPIs.
This can be used to ...
1. Show how enabling IPI optimization affects system performance
2. Show the cost of spinlock contention as the number of CPUs increase
3. Measure the relative performance of scheduler changes on SMP.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
The thread metric cooperative benchmark had a subtle flaw on
SMP enabled systems. The reporting/main thread was running at
a higher priority than the "cooperative" threads. When that
reporting thread woke up from its 30 second sleep, there was a
chance that it would change the ordering of the "cooperative"
threads before the test expected it.
This ordering change is not present on UP systems as the current
thread is always in the ready queue. However, on SMP systems the
current thread is not in the ready queue and is re-added to the
end of the list when it is preempted by a higher priority thread.
To work around this, we make the priority of the main/reporting
thread to be the same as the "cooperative" threads. Thus when
the reporting thread wakes, it is added to the end of list and
no longer introduces an unexpected schedule point.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Improve naming of the scheduler and call it what it is: simple. Using
'dumb' for the default scheduler algorithm in Zephyr is a bad idea.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Put the thread_metric Kconfig menu first in the Kconfig "homepage", like
other samples and tests typically do as this makes it easier to quickly
get to the relevant options in menuconfig.
Signed-off-by: Benjamin Cabé <benjamin@zephyrproject.org>
Changes the type of both the tm_message_sent and tm_message_received
arrays from 'unsigned long' to 'unsigned int'. The test expects those
arrays to be 16 bytes long. This was a problem on 64-bit platforms
as 'unsigned long' is 8 bytes, which made the arrays 32 bytes long.
On both our supported 32-bit and 64-bit platforms, 'unsigned int'
works out to be 4 bytes long, thereby giving us the requisite 16
byte buffer.
Although a case could be made for using 'uint32_t', this was not
chosen simply to keep the structure as close as practical to the
original thread_metric implementation.
Fixes#83864
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Extentd 'benchmark.thread_metric' (tests/benchmarks/thread_metric)
test suite to collect benchmark measurements in Twister reports
as recordings parsed from the test's output: time period values
as well as errors.
Additionally, each test is executed until it makes at least 3
measurements to estimate variance.
Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>
Adjust testcase.yaml files to changes in Twister schema which
now allows multiple recording patterns ('record: regex:').
Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>
Mostly a revert of commit b1def7145f ("arch: deprecate `_current`").
This commit was part of PR #80716 whose initial purpose was about providing
an architecture specific optimization for _current. The actual deprecation
was sneaked in later on without proper discussion.
The Zephyr core always used _current before and that was fine. It is quite
prevalent as well and the alternative is proving rather verbose.
Furthermore, as a concept, the "current thread" is not something that is
necessarily architecture specific. Therefore the primary abstraction
should not carry the arch_ prefix.
Hence this revert.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Removes the timeslicing configuration from the latency
measure benchmark as its numbers are nearly identical
to the default. This makes sense as the benchmark was
not really designed for exercising the timeslicing
feature.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Enhance the wait_queue benchmark to output resulting summary
metrics as records, so when it runs with Twister the results
are parsed and saved into twister.json and recording.csv files
for further analysis.
Minor documentation edits and make ClangFormat happy.
Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>
Enhance the sched_queue benchmark to output resulting summary
metrics as records, so when it runs with Twister the results
are parsed and saved into twister.json and recording.csv files
for further analysis.
Minor documentation edits and make ClangFormat happy.
Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>
Sleeping and suspended are now orthogonal states. That is, a thread
may be both sleeping and suspended and the two do not interact. One
repercussion of this is that suspending a thread will no longer
abort its timeout.
Threads are now created in the 'sleeping' state instead of a
'suspended' state. This dovetails nicely with the start delay that
can be given to a newly created thread--it is as though the very
first operation that a thread with a start delay is a sleep.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Disabling the memory slab pointer validation improves the performance
of the memory allocation sub-test by about 9%.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Some tests can take close to 60 seconds to complete on m2gl025_miv
platform (it runs at 4 MHz) so increase the timeout to 120 seconds
to be on the safe side.
Signed-off-by: Benjamin Cabé <benjamin@zephyrproject.org>
The pthread_pressure test was not a typical test per se. It was
a benchmark in search of the proper home.
Let's move it to the correct place in the Zephyr tree, add a
doc, and provide some reporting.
Currently, k_threads out-perform pthreads by almost a factor of
2. The theoretical maximum performance of pthreads would be at
parity of k_threads, since pthreads are a wrapper around kernel
threads. It would be great to reduce the gap.
Signed-off-by: Chris Friedt <cfriedt@tenstorrent.com>
Traditionally threads have been initialized with a PRESTART flag set,
which gets cleared when the thread runs for the first time via either
its timeout or the k_thread_start() API.
But if you think about it, this is no different, semantically, than
SUSPENDED: the thread is prevented from running until the flag is
cleared.
So unify the two. Start threads in the SUSPENDED state, point
everyone looking at the PRESTART bit to the SUSPENDED flag, and make
k_thread_start() be a synonym for k_thread_resume().
There is some mild code size savings from the eliminated duplication,
but the real win here is that we make space in the thread flags byte,
which had run out.
Signed-off-by: Andy Ross <andyross@google.com>
Benchmarks are not tests, we run them to verify they still work and do
not bitrot. Running them on each architecture should be sufficient.
This reduces amount of churn in CI and still allows them to be run
individually on platforms.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
`_current` is now functionally equals to `arch_curr_thread()`, remove
its usage in-tree and deprecate it instead of removing it outright,
as it has been with us since forever.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
There is more native targets than native_sim and native_posix.
Let's exclude them all by architecture.
Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
Ports the Thread-Metric suite of benchmarks from ThreadX to Zephyr.
This makes it easier for others to run these benchmarks with the
best set of configuration options for their board so that
they can better compare Zephyr performance to another RTOS.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Due to addition of busy threads running on other cores, and
the simulator runs in single thread bouncing through all cores,
we are wasting quite a bit of time just busy waiting. This makes
each simulator run too long for CI. So limit CPU number to 1.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Due to addition of busy threads running on other cores, and
the simulator runs in single thread bouncing through all cores,
we are wasting quite a bit of time just busy waiting. This makes
each simulator run too long for CI. So limit CPU number to 1.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
The sched benchmark is designed for systems with a single
CPU. Otherwise, the timestamps would be wrong when the partner
thread is scheduled on another CPU, i.e. negative values:
```
unpend 63 ready 62 switch -16562 pend 18937 tot 2500 (avg 928)
```
When the system allows for multiple CPUs, spawn a non-preemptible
thread to keep the other CPUs busy.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Corrects an issue that was introduced when the interrupt
locking/unlocking was added to the 'sched' benchmark by
unlocking the interrupts before the context switch done by
k_yield(), but after the call to z_unpend_first_thread().
Fixes PR #81050
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Adds the object core configuration to the latency_measure benchmark's
testcase.yaml file as there have been requests for it.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
What is changed?
- Added a new mps3 board an552 for the soc corstone300.
The qualifier to build/run application with board mps3/an552 is
`mps3/corstone300/an552` for secure and
`mps3/corstone300/an552/ns` for non-secure.
- Added FVP variant to enable FVP testing with corstone300
and it uses the ARM FVP `FVP_Corstone_SSE-300_Ethos-U55`.
The qualifier to build/run application with FVP is
`mps3/corstone300/fvp` for secure and
`mps3/corstone300/fvp/ns` for non-secure.
- Note: the qualifier to build/run application with board mps3/an547
is now changed to
`mps3/corstone300/an547` for secure and
`mps3/corstone300/an547/ns` for non-secure.
How is it changed?
- Moved common code from mps3/an547 to corstone300.
- Renamed soc for an547 to corstone300 and added
a new soc corstone300/an552.
Why do we need this change?
- This enables FVP support and testing for corstone300.
- SOC/qualifier for mps3/an547 was renamed to reduce code redundancy
- A separate FVP variant was added for AN552 because, the TFM board
used for non-secure variant differs for FPGA and FVP.
TFM board `arm/mps3/corstone300/fvp` should be used when testing
AN552 with FVP and `arm/mps3/corstone300/an552` should be used when
testing with AN552 FPGA.
Signed-off-by: Sudan Landge <sudan.landge@arm.com>
This commit increases the test timeout for the wait queue benchmark tests
to 120 seconds because these tests frequently hit the default timeout of 60
seconds during execution.
Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
In a uniprocessor system, _sched_spinlock may not need to be
held in all the same cases that it does in a multiprocessor
system. Removing those unnecessary usages can lead to better
performance on UP systems. In the case of uncontested taking
and giving of a semaphore, this can be as much as a +14%
performance gain.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Sometimes there's an unusually large cycles for tests that are
known to complete with just a few cycles. Upon some testing,
I found that it was because the overhead cycles was larger
than the cycles taken by tests, causing the cycles to
underflow.
To workaround that, make sure that the overhead measurement
thread runs uninterrupted, repeat the measurement for a few
times, and take the minimum value.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
Implements a set of tests designed to show how the performance of the
three scheduler queue implementations (DUMB, SCALABLE and MULTIQ)
varies with respect to the number of threads in the ready queue.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Implements a set of tests designed to show how the performance of the two
wait queue implementations (DUMB and SCALABLE) change as the number of
threads in the wait queue varies.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
The current `K_THREAD_STACK_DEFINE` only create a single stack
shared by all the busy threads. This is causing the application
to crash when there are more than 2 cores in the system.
We should use `K_THREAD_STACK_ARRAY_DEFINE` to create an array
of stacks instead.
Updated the testcase to test up to 8 cores using
qemu_riscv64_smp
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
The latency_measure benchmark is designed for systems with a single
CPU. When the system allows for multiple CPUs, instead of forcing
a single CPU to be used via 'prj.conf', spawn a non-preemptible
thread to keep the other CPUs busy.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Restructures the latency_measure benchmark configurations. Most of them
do not need to be executed by default by CI. However, we still want to
make it easy to run the benchmark with various configurations so that
they can be easily compared against the default.
The updated documentation shows how this can be done.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Remove the `_MAC` part because those Kconfig options enable only hash
algorithms, nothing MAC-related, and the `_ENABLED` part to align the
naming to the Mbed TLS defines (plus we don't need such a part).
As a bonus, enabling SHA-256 does not automatically enable SHA-224
anymore.
See the migration guide entries for more details on the practical
changes.
Signed-off-by: Tomi Fontanilles <tomi.fontanilles@nordicsemi.no>
Namespaced the generated headers with `zephyr` to prevent
potential conflict with other headers.
Introduce a temporary Kconfig `LEGACY_GENERATED_INCLUDE_PATH`
that is enabled by default. This allows the developers to
continue the use of the old include paths for the time being
until it is deprecated and eventually removed. The Kconfig will
generate a build-time warning message, similar to the
`CONFIG_TIMER_RANDOM_GENERATOR`.
Updated the includes path of in-tree sources accordingly.
Most of the changes here are scripted, check the PR for more
info.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Replaces the use of z_pend_curr_irqlock() with z_pend_curr()
as the former is not used anywhere else anymore.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
This function is only being used by a test, so instead of reimplementing
a syscall in the test, provide a Kconfig option to provide the
functionality that only works with tests and remove some of the
duplication and extra code.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Updates the latency_measure test to add support for benchmarking
k_stack_push() and k_stack_pop().
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>