From 122c7be703ebf1d1f78f20fc7936ea263a82d345 Mon Sep 17 00:00:00 2001
From: Evgeniy Paltsev <PaltsevEvgeniy@gmail.com>
Date: Thu, 12 Jan 2023 20:10:17 +0000
Subject: [PATCH] tests: smp: fix fatal on smp test case

After the dbe3874079 - (tests: kernel/smp: wait for threads to exits
between tests) I've started seeing sporadic kernel.multiprocessing.smp
test failures on our platforms.

------------------------------->8---------------------------------
[*snip*]
===================================================================
START - test_fatal_on_smp
E:  r0: 0x3  r1: 0x0  r2: 0x0  r3: 0x0
E:  r4: 0x80000194  r5: 0x0  r6: 0x0  r7: 0x0
E:  r8: 0x800079c4  r9: 0x82802 r10: 0x80008d8c r11: 0x8000dad8
E:  r0: 0x3  r1: 0x2712  r2: 0x114  r3: 0x0
E:  r4: 0xf4240000  r5: 0x0  r6: 0xf424  r7: 0xbe40
E:  r8: 0x2540  r9: 0x0 r10: 0x80008d8c r11: 0x8000db8c
E: r12: 0x8000ddf0 r13: 0x0  pc: 0x80000aec
E:  blink: 0x80000ae6 status32: 0x80082002
E: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
E: Current thread: 0x8000db8c (test_fatal_on_smp)
E: r12: 0x8000ddf0 r13: 0x0  pc: 0x8000019a
 PASS - test_fatal_on_smp in 0.014 seconds
===================================================================
START - test_get_cpu
E:  blink: 0x80001490 status32: 0x80082002
E: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 1
E: Current thread: 0x8000dad8 (unknown)
------------------------------->8---------------------------------

The rootcause if that we doesn't proper cleanup resources after
test_fatal_on_smp test case. So child thread we start test_fatal_on_smp
may continue running for some time after the test_fatal_on_smp
test case is finished.

As in the next test case (test_get_cpu) we use same thead structures
again to create new child thread we may actually rewrite some data of
thread which is still running (or vise versa).

As we trigger the crash in test_fatal_on_smp we can't simply join
child thread in the end of test case (as we never get here). We can't
simply use join child thread before we initiate crash in test_fatal_on_smp
either as we don't want to introduce reschedule point here which may break
the test logic.

So, to fix that, we'll just do k_busy_wait in test_fatal_on_smp
thread after we start child thread to wait for thread trigger
exception and being terminated.

To verify that we also assert that child thread is dead by the
time when we stop busy waiting.

Signed-off-by: Evgeniy Paltsev <PaltsevEvgeniy@gmail.com>
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
---
 tests/kernel/smp/src/main.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tests/kernel/smp/src/main.c b/tests/kernel/smp/src/main.c
index 9cdf166f249..8bed52c9e18 100644
--- a/tests/kernel/smp/src/main.c
+++ b/tests/kernel/smp/src/main.c
@@ -716,8 +716,13 @@ ZTEST(smp, test_fatal_on_smp)
 				      NULL, NULL, NULL,
 				      K_PRIO_PREEMPT(2), 0, K_NO_WAIT);
 
-	/* hold cpu and wait for thread trigger exception */
-	k_busy_wait(2000);
+	/* hold cpu and wait for thread trigger exception and being terminated */
+	k_busy_wait(2 * DELAY_US);
+
+	/* Verify that child thread is no longer running. We can't simply use k_thread_join here
+	 * as we don't want to introduce reschedule point here.
+	 */
+	zassert_true(z_is_thread_state_set(&t2, _THREAD_DEAD));
 
 	/* Manually trigger the crash in mainthread */
 	entry_oops(NULL, NULL, NULL);