From 122c7be703ebf1d1f78f20fc7936ea263a82d345 Mon Sep 17 00:00:00 2001 From: Evgeniy Paltsev Date: Thu, 12 Jan 2023 20:10:17 +0000 Subject: [PATCH] tests: smp: fix fatal on smp test case After the dbe3874079 - (tests: kernel/smp: wait for threads to exits between tests) I've started seeing sporadic kernel.multiprocessing.smp test failures on our platforms. ------------------------------->8--------------------------------- [*snip*] =================================================================== START - test_fatal_on_smp E: r0: 0x3 r1: 0x0 r2: 0x0 r3: 0x0 E: r4: 0x80000194 r5: 0x0 r6: 0x0 r7: 0x0 E: r8: 0x800079c4 r9: 0x82802 r10: 0x80008d8c r11: 0x8000dad8 E: r0: 0x3 r1: 0x2712 r2: 0x114 r3: 0x0 E: r4: 0xf4240000 r5: 0x0 r6: 0xf424 r7: 0xbe40 E: r8: 0x2540 r9: 0x0 r10: 0x80008d8c r11: 0x8000db8c E: r12: 0x8000ddf0 r13: 0x0 pc: 0x80000aec E: blink: 0x80000ae6 status32: 0x80082002 E: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0 E: Current thread: 0x8000db8c (test_fatal_on_smp) E: r12: 0x8000ddf0 r13: 0x0 pc: 0x8000019a PASS - test_fatal_on_smp in 0.014 seconds =================================================================== START - test_get_cpu E: blink: 0x80001490 status32: 0x80082002 E: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 1 E: Current thread: 0x8000dad8 (unknown) ------------------------------->8--------------------------------- The rootcause if that we doesn't proper cleanup resources after test_fatal_on_smp test case. So child thread we start test_fatal_on_smp may continue running for some time after the test_fatal_on_smp test case is finished. As in the next test case (test_get_cpu) we use same thead structures again to create new child thread we may actually rewrite some data of thread which is still running (or vise versa). As we trigger the crash in test_fatal_on_smp we can't simply join child thread in the end of test case (as we never get here). We can't simply use join child thread before we initiate crash in test_fatal_on_smp either as we don't want to introduce reschedule point here which may break the test logic. So, to fix that, we'll just do k_busy_wait in test_fatal_on_smp thread after we start child thread to wait for thread trigger exception and being terminated. To verify that we also assert that child thread is dead by the time when we stop busy waiting. Signed-off-by: Evgeniy Paltsev Signed-off-by: Eugeniy Paltsev --- tests/kernel/smp/src/main.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tests/kernel/smp/src/main.c b/tests/kernel/smp/src/main.c index 9cdf166f249..8bed52c9e18 100644 --- a/tests/kernel/smp/src/main.c +++ b/tests/kernel/smp/src/main.c @@ -716,8 +716,13 @@ ZTEST(smp, test_fatal_on_smp) NULL, NULL, NULL, K_PRIO_PREEMPT(2), 0, K_NO_WAIT); - /* hold cpu and wait for thread trigger exception */ - k_busy_wait(2000); + /* hold cpu and wait for thread trigger exception and being terminated */ + k_busy_wait(2 * DELAY_US); + + /* Verify that child thread is no longer running. We can't simply use k_thread_join here + * as we don't want to introduce reschedule point here. + */ + zassert_true(z_is_thread_state_set(&t2, _THREAD_DEAD)); /* Manually trigger the crash in mainthread */ entry_oops(NULL, NULL, NULL);