From 7dee7a6139e55ab46b778d07814b9b2990f3e246 Mon Sep 17 00:00:00 2001 From: Andy Ross Date: Fri, 22 Oct 2021 09:09:22 -0700 Subject: [PATCH] kernel/sched: Fix race with thread return values There was a brief (but seen in practice on real apps on real hardware!) race with the switch-based z_swap() implementation. The thread return value was being initialized to -EAGAIN after the enclosing lock had been released. But that lock is supposed to be atomic with the thread suspend. This opened a window for another racing thread to come by and "wake up" our pending thread (which is fine on its own), set its return value (e.g. to 0 for success) and then have that value clobbered by the thread continuing to suspend itself outside the lock. Melodramatic aside: I continue to hate this arch_thread_return_value_set() API; it needs to die. At best it's a mild optimization on a handful of architectures (e.g. x86 implements it by writing to the EAX register save slot in the context block). Asynchronous APIs are almost always worse than synchronous ones, and in this case it's an async operation that races against literal context switch code that can't use traditional locking strategies. Fixes #39575 Signed-off-by: Andy Ross --- kernel/include/kswap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/include/kswap.h b/kernel/include/kswap.h index a00f33846f8..917315d5899 100644 --- a/kernel/include/kswap.h +++ b/kernel/include/kswap.h @@ -90,6 +90,8 @@ static ALWAYS_INLINE unsigned int do_swap(unsigned int key, z_check_stack_sentinel(); + old_thread->swap_retval = -EAGAIN; + /* We always take the scheduler spinlock if we don't already * have it. We "release" other spinlocks here. But we never * drop the interrupt lock. @@ -108,8 +110,6 @@ static ALWAYS_INLINE unsigned int do_swap(unsigned int key, z_reset_time_slice(); #endif - old_thread->swap_retval = -EAGAIN; - #ifdef CONFIG_SMP _current_cpu->swap_ok = 0; new_thread->base.cpu = arch_curr_cpu()->id;