zephyr/tests/benchmarks/latency_measure/README.rst
Peter Mitsis 590e4f3a82 tests: latency_measure: Add k_stack object support
Updates the latency_measure test to add support for benchmarking
k_stack_push() and k_stack_pop().

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
2024-03-05 16:50:47 +00:00

221 lines
25 KiB
ReStructuredText

Latency Measurements
####################
This benchmark measures the average latency of selected kernel capabilities,
including:
* Context switch time between preemptive threads using k_yield
* Context switch time between cooperative threads using k_yield
* Time to switch from ISR back to interrupted thread
* Time from ISR to executing a different thread (rescheduled)
* Time to signal a semaphore then test that semaphore
* Time to signal a semaphore then test that semaphore with a context switch
* Times to lock a mutex then unlock that mutex
* Time it takes to create a new thread (without starting it)
* Time it takes to start a newly created thread
* Time it takes to suspend a thread
* Time it takes to resume a suspended thread
* Time it takes to abort a thread
* Time it takes to add data to a fifo.LIFO
* Time it takes to retrieve data from a fifo.LIFO
* Time it takes to wait on a fifo.lifo.(and context switch)
* Time it takes to wake and switch to a thread waiting on a fifo.LIFO
* Time it takes to send and receive events
* Time it takes to wait for events (and context switch)
* Time it takes to wake and switch to a thread waiting for events
* Time it takes to push and pop to/from a k_stack
* Measure average time to alloc memory from heap then free that memory
When userspace is enabled using the prj_user.conf configuration file, this benchmark will
where possible, also test the above capabilities using various configurations involving user
threads:
* Kernel thread to kernel thread
* Kernel thread to user thread
* User thread to kernel thread
* User thread to user thread
Sample output of the benchmark (without userspace enabled)::
*** Booting Zephyr OS build zephyr-v3.5.0-4267-g6ccdc31233a3 ***
thread.yield.preemptive.ctx.k_to_k - Context switch via k_yield : 329 cycles , 2741 ns :
thread.yield.cooperative.ctx.k_to_k - Context switch via k_yield : 329 cycles , 2741 ns :
isr.resume.interrupted.thread.kernel - Return from ISR to interrupted thread : 363 cycles , 3033 ns :
isr.resume.different.thread.kernel - Return from ISR to another thread : 404 cycles , 3367 ns :
thread.create.kernel.from.kernel - Create thread : 404 cycles , 3374 ns :
thread.start.kernel.from.kernel - Start thread : 423 cycles , 3533 ns :
thread.suspend.kernel.from.kernel - Suspend thread : 428 cycles , 3574 ns :
thread.resume.kernel.from.kernel - Resume thread : 350 cycles , 2924 ns :
thread.abort.kernel.from.kernel - Abort thread : 339 cycles , 2826 ns :
fifo.put.immediate.kernel - Add data to FIFO (no ctx switch) : 269 cycles , 2242 ns :
fifo.get.immediate.kernel - Get data from FIFO (no ctx switch) : 128 cycles , 1074 ns :
fifo.put.alloc.immediate.kernel - Allocate to add data to FIFO (no ctx switch) : 945 cycles , 7875 ns :
fifo.get.free.immediate.kernel - Free when getting data from FIFO (no ctx switch) : 575 cycles , 4792 ns :
fifo.get.blocking.k_to_k - Get data from FIFO (w/ ctx switch) : 551 cycles , 4592 ns :
fifo.put.wake+ctx.k_to_k - Add data to FIFO (w/ ctx switch) : 660 cycles , 5500 ns :
fifo.get.free.blocking.k_to_k - Free when getting data from FIFO (w/ ctx siwtch) : 553 cycles , 4608 ns :
fifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to FIFO (w/ ctx switch) : 655 cycles , 5458 ns :
lifo.put.immediate.kernel - Add data to LIFO (no ctx switch) : 280 cycles , 2341 ns :
lifo.get.immediate.kernel - Get data from LIFO (no ctx switch) : 133 cycles , 1116 ns :
lifo.put.alloc.immediate.kernel - Allocate to add data to LIFO (no ctx switch) : 945 cycles , 7875 ns :
lifo.get.free.immediate.kernel - Free when getting data from LIFO (no ctx switch) : 580 cycles , 4833 ns :
lifo.get.blocking.k_to_k - Get data from LIFO (w/ ctx switch) : 553 cycles , 4608 ns :
lifo.put.wake+ctx.k_to_k - Add data to LIFO (w/ ctx switch) : 655 cycles , 5458 ns :
lifo.get.free.blocking.k_to_k - Free when getting data from LIFO (w/ ctx switch) : 550 cycles , 4583 ns :
lifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to LIFO (w/ ctx siwtch) : 655 cycles , 5458 ns :
events.post.immediate.kernel - Post events (nothing wakes) : 225 cycles , 1875 ns :
events.set.immediate.kernel - Set events (nothing wakes) : 225 cycles , 1875 ns :
events.wait.immediate.kernel - Wait for any events (no ctx switch) : 130 cycles , 1083 ns :
events.wait_all.immediate.kernel - Wait for all events (no ctx switch) : 135 cycles , 1125 ns :
events.wait.blocking.k_to_k - Wait for any events (w/ ctx switch) : 573 cycles , 4783 ns :
events.set.wake+ctx.k_to_k - Set events (w/ ctx switch) : 784 cycles , 6534 ns :
events.wait_all.blocking.k_to_k - Wait for all events (w/ ctx switch) : 589 cycles , 4916 ns :
events.post.wake+ctx.k_to_k - Post events (w/ ctx switch) : 795 cycles , 6626 ns :
semaphore.give.immediate.kernel - Give a semaphore (no waiters) : 125 cycles , 1041 ns :
semaphore.take.immediate.kernel - Take a semaphore (no blocking) : 69 cycles , 575 ns :
semaphore.take.blocking.k_to_k - Take a semaphore (context switch) : 494 cycles , 4116 ns :
semaphore.give.wake+ctx.k_to_k - Give a semaphore (context switch) : 599 cycles , 4992 ns :
condvar.wait.blocking.k_to_k - Wait for a condvar (context switch) : 692 cycles , 5767 ns :
condvar.signal.wake+ctx.k_to_k - Signal a condvar (context switch) : 715 cycles , 5958 ns :
stack.push.immediate.kernel - Add data to k_stack (no ctx switch) : 166 cycles , 1391 ns :
stack.pop.immediate.kernel - Get data from k_stack (no ctx switch) : 82 cycles , 691 ns :
stack.pop.blocking.k_to_k - Get data from k_stack (w/ ctx switch) : 499 cycles , 4166 ns :
stack.push.wake+ctx.k_to_k - Add data to k_stack (w/ ctx switch) : 645 cycles , 5375 ns :
mutex.lock.immediate.recursive.kernel - Lock a mutex : 100 cycles , 833 ns :
mutex.unlock.immediate.recursive.kernel - Unlock a mutex : 40 cycles , 333 ns :
heap.malloc.immediate - Average time for heap malloc : 627 cycles , 5225 ns :
heap.free.immediate - Average time for heap free : 432 cycles , 3600 ns :
===================================================================
PROJECT EXECUTION SUCCESSFUL
Sample output of the benchmark (with userspace enabled)::
*** Booting Zephyr OS build zephyr-v3.5.0-4268-g6af7a1230a08 ***
thread.yield.preemptive.ctx.k_to_k - Context switch via k_yield : 970 cycles , 8083 ns :
thread.yield.preemptive.ctx.u_to_u - Context switch via k_yield : 1260 cycles , 10506 ns :
thread.yield.preemptive.ctx.k_to_u - Context switch via k_yield : 1155 cycles , 9632 ns :
thread.yield.preemptive.ctx.u_to_k - Context switch via k_yield : 1075 cycles , 8959 ns :
thread.yield.cooperative.ctx.k_to_k - Context switch via k_yield : 970 cycles , 8083 ns :
thread.yield.cooperative.ctx.u_to_u - Context switch via k_yield : 1260 cycles , 10506 ns :
thread.yield.cooperative.ctx.k_to_u - Context switch via k_yield : 1155 cycles , 9631 ns :
thread.yield.cooperative.ctx.u_to_k - Context switch via k_yield : 1075 cycles , 8959 ns :
isr.resume.interrupted.thread.kernel - Return from ISR to interrupted thread : 415 cycles , 3458 ns :
isr.resume.different.thread.kernel - Return from ISR to another thread : 985 cycles , 8208 ns :
isr.resume.different.thread.user - Return from ISR to another thread : 1180 cycles , 9833 ns :
thread.create.kernel.from.kernel - Create thread : 989 cycles , 8249 ns :
thread.start.kernel.from.kernel - Start thread : 1059 cycles , 8833 ns :
thread.suspend.kernel.from.kernel - Suspend thread : 1030 cycles , 8583 ns :
thread.resume.kernel.from.kernel - Resume thread : 994 cycles , 8291 ns :
thread.abort.kernel.from.kernel - Abort thread : 2370 cycles , 19751 ns :
thread.create.user.from.kernel - Create thread : 860 cycles , 7167 ns :
thread.start.user.from.kernel - Start thread : 8965 cycles , 74713 ns :
thread.suspend.user.from.kernel - Suspend thread : 1400 cycles , 11666 ns :
thread.resume.user.from.kernel - Resume thread : 1174 cycles , 9791 ns :
thread.abort.user.from.kernel - Abort thread : 2240 cycles , 18666 ns :
thread.create.user.from.user - Create thread : 2105 cycles , 17542 ns :
thread.start.user.from.user - Start thread : 9345 cycles , 77878 ns :
thread.suspend.user.from.user - Suspend thread : 1590 cycles , 13250 ns :
thread.resume.user.from.user - Resume thread : 1534 cycles , 12791 ns :
thread.abort.user.from.user - Abort thread : 2850 cycles , 23750 ns :
thread.start.kernel.from.user - Start thread : 1440 cycles , 12000 ns :
thread.suspend.kernel.from.user - Suspend thread : 1219 cycles , 10166 ns :
thread.resume.kernel.from.user - Resume thread : 1355 cycles , 11292 ns :
thread.abort.kernel.from.user - Abort thread : 2980 cycles , 24834 ns :
fifo.put.immediate.kernel - Add data to FIFO (no ctx switch) : 315 cycles , 2625 ns :
fifo.get.immediate.kernel - Get data from FIFO (no ctx switch) : 209 cycles , 1749 ns :
fifo.put.alloc.immediate.kernel - Allocate to add data to FIFO (no ctx switch) : 1040 cycles , 8667 ns :
fifo.get.free.immediate.kernel - Free when getting data from FIFO (no ctx switch) : 670 cycles , 5583 ns :
fifo.put.alloc.immediate.user - Allocate to add data to FIFO (no ctx switch) : 1765 cycles , 14709 ns :
fifo.get.free.immediate.user - Free when getting data from FIFO (no ctx switch) : 1410 cycles , 11750 ns :
fifo.get.blocking.k_to_k - Get data from FIFO (w/ ctx switch) : 1220 cycles , 10168 ns :
fifo.put.wake+ctx.k_to_k - Add data to FIFO (w/ ctx switch) : 1285 cycles , 10708 ns :
fifo.get.free.blocking.k_to_k - Free when getting data from FIFO (w/ ctx siwtch) : 1235 cycles , 10291 ns :
fifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to FIFO (w/ ctx switch) : 1340 cycles , 11167 ns :
fifo.get.free.blocking.u_to_k - Free when getting data from FIFO (w/ ctx siwtch) : 1715 cycles , 14292 ns :
fifo.put.alloc.wake+ctx.k_to_u - Allocate to add data to FIFO (w/ ctx switch) : 1665 cycles , 13876 ns :
fifo.get.free.blocking.k_to_u - Free when getting data from FIFO (w/ ctx siwtch) : 1565 cycles , 13042 ns :
fifo.put.alloc.wake+ctx.u_to_k - Allocate to add data to FIFO (w/ ctx switch) : 1815 cycles , 15126 ns :
fifo.get.free.blocking.u_to_u - Free when getting data from FIFO (w/ ctx siwtch) : 2045 cycles , 17042 ns :
fifo.put.alloc.wake+ctx.u_to_u - Allocate to add data to FIFO (w/ ctx switch) : 2140 cycles , 17834 ns :
lifo.put.immediate.kernel - Add data to LIFO (no ctx switch) : 309 cycles , 2583 ns :
lifo.get.immediate.kernel - Get data from LIFO (no ctx switch) : 219 cycles , 1833 ns :
lifo.put.alloc.immediate.kernel - Allocate to add data to LIFO (no ctx switch) : 1030 cycles , 8583 ns :
lifo.get.free.immediate.kernel - Free when getting data from LIFO (no ctx switch) : 685 cycles , 5708 ns :
lifo.put.alloc.immediate.user - Allocate to add data to LIFO (no ctx switch) : 1755 cycles , 14625 ns :
lifo.get.free.immediate.user - Free when getting data from LIFO (no ctx switch) : 1405 cycles , 11709 ns :
lifo.get.blocking.k_to_k - Get data from LIFO (w/ ctx switch) : 1229 cycles , 10249 ns :
lifo.put.wake+ctx.k_to_k - Add data to LIFO (w/ ctx switch) : 1290 cycles , 10751 ns :
lifo.get.free.blocking.k_to_k - Free when getting data from LIFO (w/ ctx switch) : 1235 cycles , 10292 ns :
lifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to LIFO (w/ ctx siwtch) : 1310 cycles , 10917 ns :
lifo.get.free.blocking.u_to_k - Free when getting data from LIFO (w/ ctx switch) : 1715 cycles , 14293 ns :
lifo.put.alloc.wake+ctx.k_to_u - Allocate to add data to LIFO (w/ ctx siwtch) : 1630 cycles , 13583 ns :
lifo.get.free.blocking.k_to_u - Free when getting data from LIFO (w/ ctx switch) : 1554 cycles , 12958 ns :
lifo.put.alloc.wake+ctx.u_to_k - Allocate to add data to LIFO (w/ ctx siwtch) : 1805 cycles , 15043 ns :
lifo.get.free.blocking.u_to_u - Free when getting data from LIFO (w/ ctx switch) : 2035 cycles , 16959 ns :
lifo.put.alloc.wake+ctx.u_to_u - Allocate to add data to LIFO (w/ ctx siwtch) : 2125 cycles , 17709 ns :
events.post.immediate.kernel - Post events (nothing wakes) : 295 cycles , 2458 ns :
events.set.immediate.kernel - Set events (nothing wakes) : 300 cycles , 2500 ns :
events.wait.immediate.kernel - Wait for any events (no ctx switch) : 220 cycles , 1833 ns :
events.wait_all.immediate.kernel - Wait for all events (no ctx switch) : 215 cycles , 1791 ns :
events.post.immediate.user - Post events (nothing wakes) : 795 cycles , 6625 ns :
events.set.immediate.user - Set events (nothing wakes) : 790 cycles , 6584 ns :
events.wait.immediate.user - Wait for any events (no ctx switch) : 740 cycles , 6167 ns :
events.wait_all.immediate.user - Wait for all events (no ctx switch) : 740 cycles , 6166 ns :
events.wait.blocking.k_to_k - Wait for any events (w/ ctx switch) : 1190 cycles , 9918 ns :
events.set.wake+ctx.k_to_k - Set events (w/ ctx switch) : 1464 cycles , 12208 ns :
events.wait_all.blocking.k_to_k - Wait for all events (w/ ctx switch) : 1235 cycles , 10292 ns :
events.post.wake+ctx.k_to_k - Post events (w/ ctx switch) : 1500 cycles , 12500 ns :
events.wait.blocking.u_to_k - Wait for any events (w/ ctx switch) : 1580 cycles , 13167 ns :
events.set.wake+ctx.k_to_u - Set events (w/ ctx switch) : 1630 cycles , 13583 ns :
events.wait_all.blocking.u_to_k - Wait for all events (w/ ctx switch) : 1765 cycles , 14708 ns :
events.post.wake+ctx.k_to_u - Post events (w/ ctx switch) : 1795 cycles , 14960 ns :
events.wait.blocking.k_to_u - Wait for any events (w/ ctx switch) : 1375 cycles , 11459 ns :
events.set.wake+ctx.u_to_k - Set events (w/ ctx switch) : 1825 cycles , 15209 ns :
events.wait_all.blocking.k_to_u - Wait for all events (w/ ctx switch) : 1555 cycles , 12958 ns :
events.post.wake+ctx.u_to_k - Post events (w/ ctx switch) : 1995 cycles , 16625 ns :
events.wait.blocking.u_to_u - Wait for any events (w/ ctx switch) : 1765 cycles , 14708 ns :
events.set.wake+ctx.u_to_u - Set events (w/ ctx switch) : 1989 cycles , 16583 ns :
events.wait_all.blocking.u_to_u - Wait for all events (w/ ctx switch) : 2085 cycles , 17376 ns :
events.post.wake+ctx.u_to_u - Post events (w/ ctx switch) : 2290 cycles , 19084 ns :
semaphore.give.immediate.kernel - Give a semaphore (no waiters) : 220 cycles , 1833 ns :
semaphore.take.immediate.kernel - Take a semaphore (no blocking) : 130 cycles , 1083 ns :
semaphore.give.immediate.user - Give a semaphore (no waiters) : 710 cycles , 5917 ns :
semaphore.take.immediate.user - Take a semaphore (no blocking) : 655 cycles , 5458 ns :
semaphore.take.blocking.k_to_k - Take a semaphore (context switch) : 1135 cycles , 9458 ns :
semaphore.give.wake+ctx.k_to_k - Give a semaphore (context switch) : 1244 cycles , 10374 ns :
semaphore.take.blocking.k_to_u - Take a semaphore (context switch) : 1325 cycles , 11048 ns :
semaphore.give.wake+ctx.u_to_k - Give a semaphore (context switch) : 1610 cycles , 13416 ns :
semaphore.take.blocking.u_to_k - Take a semaphore (context switch) : 1499 cycles , 12499 ns :
semaphore.give.wake+ctx.k_to_u - Give a semaphore (context switch) : 1434 cycles , 11957 ns :
semaphore.take.blocking.u_to_u - Take a semaphore (context switch) : 1690 cycles , 14090 ns :
semaphore.give.wake+ctx.u_to_u - Give a semaphore (context switch) : 1800 cycles , 15000 ns :
condvar.wait.blocking.k_to_k - Wait for a condvar (context switch) : 1385 cycles , 11542 ns :
condvar.signal.wake+ctx.k_to_k - Signal a condvar (context switch) : 1420 cycles , 11833 ns :
condvar.wait.blocking.k_to_u - Wait for a condvar (context switch) : 1537 cycles , 12815 ns :
condvar.signal.wake+ctx.u_to_k - Signal a condvar (context switch) : 1950 cycles , 16250 ns :
condvar.wait.blocking.u_to_k - Wait for a condvar (context switch) : 2025 cycles , 16875 ns :
condvar.signal.wake+ctx.k_to_u - Signal a condvar (context switch) : 1715 cycles , 14298 ns :
condvar.wait.blocking.u_to_u - Wait for a condvar (context switch) : 2313 cycles , 19279 ns :
condvar.signal.wake+ctx.u_to_u - Signal a condvar (context switch) : 2225 cycles , 18541 ns :
stack.push.immediate.kernel - Add data to k_stack (no ctx switch) : 244 cycles , 2041 ns :
stack.pop.immediate.kernel - Get data from k_stack (no ctx switch) : 195 cycles , 1630 ns :
stack.push.immediate.user - Add data to k_stack (no ctx switch) : 714 cycles , 5956 ns :
stack.pop.immediate.user - Get data from k_stack (no ctx switch) : 1009 cycles , 8414 ns :
stack.pop.blocking.k_to_k - Get data from k_stack (w/ ctx switch) : 1234 cycles , 10291 ns :
stack.push.wake+ctx.k_to_k - Add data to k_stack (w/ ctx switch) : 1360 cycles , 11333 ns :
stack.pop.blocking.u_to_k - Get data from k_stack (w/ ctx switch) : 2084 cycles , 17374 ns :
stack.push.wake+ctx.k_to_u - Add data to k_stack (w/ ctx switch) : 1665 cycles , 13875 ns :
stack.pop.blocking.k_to_u - Get data from k_stack (w/ ctx switch) : 1544 cycles , 12874 ns :
stack.push.wake+ctx.u_to_k - Add data to k_stack (w/ ctx switch) : 1850 cycles , 15422 ns :
stack.pop.blocking.u_to_u - Get data from k_stack (w/ ctx switch) : 2394 cycles , 19958 ns :
stack.push.wake+ctx.u_to_u - Add data to k_stack (w/ ctx switch) : 2155 cycles , 17958 ns :
mutex.lock.immediate.recursive.kernel - Lock a mutex : 155 cycles , 1291 ns :
mutex.unlock.immediate.recursive.kernel - Unlock a mutex : 57 cycles , 475 ns :
mutex.lock.immediate.recursive.user - Lock a mutex : 665 cycles , 5541 ns :
mutex.unlock.immediate.recursive.user - Unlock a mutex : 585 cycles , 4875 ns :
heap.malloc.immediate - Average time for heap malloc : 640 cycles , 5341 ns :
heap.free.immediate - Average time for heap free : 436 cycles , 3633 ns :
===================================================================
PROJECT EXECUTION SUCCESSFUL