kernel/sched: Address thread abort termination delay issue on SMP

It's possible for a thread to abort itself simultaneously with an
external abort from another thread.  In fact in our test suite this is
a common thing, as ztest will abort its own spawend threads at the end
of a test, as they tend to be exiting on their own.

When that happens, the thread marks itself DEAD and does all its
scheduler bookeeping, but it is STILL RUNNING on its own stack until
it makes its way to its final swap.  The external context would see
that "dead" metadata and return from k_thread_abort(), allowing the
next test to reuse and spawn the same thread struct while the old
context was still running.  Obviously that's bad.

Unfortunately, this is impossible to address completely without
modifying every SMP architecture to add a API-visible hook to every
swap that signals completion.  In practice the best we can do is add a
delay.  But note the optimization: almost always, the scheduler IPI
catches the running thread and kills it from interrupt context
(i.e. on a different stack).  When that happens, we know that the
interrupted thread will never be resumed (because it's dead) and can
elide the delay.  We only pay the cost when we actually detect a race.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This commit is contained in:
Andy Ross 2020-01-14 06:26:10 -08:00 committed by Andrew Boie
commit e06ba702d5
2 changed files with 37 additions and 2 deletions

View file

@ -59,8 +59,11 @@
/* Thread is being aborted (SMP only) */
#define _THREAD_ABORTING (BIT(5))
/* Thread was aborted in interrupt context (SMP only) */
#define _THREAD_ABORTED_IN_ISR (BIT(6))
/* Thread is present in the ready queue */
#define _THREAD_QUEUED (BIT(6))
#define _THREAD_QUEUED (BIT(7))
/* end - states */

View file

@ -15,6 +15,13 @@
#include <stdbool.h>
#include <kernel_internal.h>
/* Maximum time between the time a self-aborting thread flags itself
* DEAD and the last read or write to its stack memory (i.e. the time
* of its next swap()). In theory this might be tuned per platform,
* but in practice this conservative value should be safe.
*/
#define THREAD_ABORT_DELAY_US 500
#if defined(CONFIG_SCHED_DUMB)
#define _priq_run_add z_priq_dumb_add
#define _priq_run_remove z_priq_dumb_remove
@ -436,7 +443,20 @@ void z_thread_single_abort(struct k_thread *thread)
thread->base.pended_on = NULL;
}
}
thread->base.thread_state |= _THREAD_DEAD;
u32_t mask = _THREAD_DEAD;
/* If the abort is happening in interrupt context,
* that means that execution will never return to the
* thread's stack and that the abort is known to be
* complete. Otherwise the thread still runs a bit
* until it can swap, requiring a delay.
*/
if (IS_ENABLED(CONFIG_SMP) && arch_is_in_isr()) {
mask |= _THREAD_ABORTED_IN_ISR;
}
thread->base.thread_state |= mask;
}
sys_trace_thread_abort(thread);
@ -1178,7 +1198,9 @@ void z_sched_abort(struct k_thread *thread)
* it locally. Not all architectures support that, alas. If
* we don't have it, we need to wait for some other interrupt.
*/
key = k_spin_lock(&sched_spinlock);
thread->base.thread_state |= _THREAD_ABORTING;
k_spin_unlock(&sched_spinlock, key);
#ifdef CONFIG_SCHED_IPI_SUPPORTED
arch_sched_ipi();
#endif
@ -1202,6 +1224,16 @@ void z_sched_abort(struct k_thread *thread)
k_busy_wait(100);
}
}
/* If the thread self-aborted (e.g. its own exit raced with
* this external abort) then even though it is flagged DEAD,
* it's still running until its next swap and thus the thread
* object is still in use. We have to resort to a fallback
* delay in that circumstance.
*/
if ((thread->base.thread_state & _THREAD_ABORTED_IN_ISR) == 0U) {
k_busy_wait(THREAD_ABORT_DELAY_US);
}
}
#endif