doc: kernel: workqueue: add section on best practices

Refactor best practices from the API refactoring issue and integrate
them into the existing documentation.

Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
This commit is contained in:
Peter Bigot 2021-04-21 10:19:16 -05:00 committed by Anas Nashif
commit d8cf822d66

View file

@ -54,7 +54,8 @@ when no work items are available.
Using the return values of :c:func:`k_work_busy_get()` or
:c:func:`k_work_is_pending()`, or measurements of remaining time until
delayable work is scheduled, should be avoided to prevent race conditions
of the type observed with the previous implementation.
of the type observed with the previous implementation. See also `Workqueue
Best Practices`_.
Work Item Lifecycle
********************
@ -214,8 +215,8 @@ use of it.
for example, if the new work items perform blocking operations that
would delay other system workqueue processing to an unacceptable degree.
Implementation
**************
How to Use Workqueues
*********************
Defining and Controlling a Workqueue
====================================
@ -336,16 +337,22 @@ A delayable work item is defined using a variable of type
:c:struct:`k_work_delayable`. It must be initialized by calling
:c:func:`k_work_init_delayable`.
There are two APIs that submit work after a delay:
For delayed work there are two common use cases, depending on whether a
deadline should be extended if a new event occurs. An example is collecting
data that comes in asynchronously, e.g. characters from a UART associated with
a keyboard. There are two APIs that submit work after a delay:
* :c:func:`k_work_schedule()` (or :c:func:`k_work_schedule_for_queue()`)
schedules work to be executed at a specific time or after a delay. Further
attempts to schedule the same item with this API before the delay completes
will not change the time at which the item will be submitted to its queue.
Use this if the policy is to keep collecting data until a specified delay
since the **first** unprocessed data was received;
* :c:func:`k_work_reschedule()` (or :c:func:`k_work_reschedule_for_queue()`)
unconditionally sets the deadline for the work, replacing any previous
incomplete delay and changing the destination queue if necessary.
incomplete delay and changing the destination queue if necessary. Use this
if the policy is to keep collecting data until a specified delay since the
**last** unprocessed data was received.
If the work item is not scheduled both APIs behave the same. If
:c:macro:`K_NO_WAIT` is specified as the delay the behavior is as if the item
@ -390,6 +397,132 @@ synchronization objects. These objects should not be allocated on a stack if
the code is expected to work on architectures with
:option:`CONFIG_KERNEL_COHERENCE`.
Workqueue Best Practices
************************
Avoid Race Conditions
=====================
Sometimes the data a work item must process is naturally thread-safe, for
example when it's put into a :c:struct:`k_queue` by some thread and processed
in the work thread. More often external synchronization is required to avoid
data races: cases where the work thread might inspect or manipulate shared
state that's being accessed by another thread or interrupt. Such state might
be a flag indicating that work needs to be done, or a shared object that is
filled by an ISR or thread and read by the work handler.
For simple flags :ref:`atomic_v2` may be sufficient. In other cases spin
locks (:c:struct:`k_spinlock_t`) or thread-aware locks (:c:struct:`k_sem`,
:c:struct:`k_mutex` , ...) may be used to ensure data races don't occur.
If the selected lock mechanism can :ref:`api_term_sleep` then allowing the
work thread to sleep will starve other work queue items, which may need to
make progress in order to get the lock released. Work handlers should try to
take the lock with its no-wait path. For example:
.. code-block:: c
static void work_handler(struct work *work)
{
struct work_context *parent = CONTAINER_OF(work, struct work_context,
work_item);
if (k_mutex_lock(&parent->lock, K_NO_WAIT) != 0) {
/* NB: Submit will fail if the work item is being cancelled. */
(void)k_work_submit(work);
return;
}
/* do stuff under lock */
k_mutex_unlock(&parent->lock);
/* do stuff without lock */
}
Be aware that if the lock is held by a thread with a lower priority than the
work queue the resubmission may starve the thread that would release the lock,
causing the application to fail. Where the idiom above is required a
delayable work item is preferred, and the work should be (re-)scheduled with a
non-zero delay to allow the thread holding the lock to make progress.
Note that submitting from the work handler can fail if the work item had been
cancelled. Generally this is acceptable, since the cancellation will complete
once the handler finishes. If it is not, the code above must take other steps
to notify the application that the work could not be performed.
Work items in isolation are self-locking, so you don't need to hold an
external lock just to submit or schedule them. Even if you use external state
protected by such a lock to prevent further resubmission, it's safe to do the
resubmit as long as you're sure that eventually the item will take its lock
and check that state to determine whether it should do anything. Where a
delayable work item is being rescheduled in its handler due to inability to
take the lock some other self-locking state, such as an atomic flag set by the
application/driver when the cancel is initiated, would be required to detect
the cancellation and avoid the cancelled work item being submitted again after
the deadline.
Check Return Values
===================
All work API functions return status of the underlying operation, and in many
cases it is important to verify that the intended result was obtained.
* Submitting a work item (:c:func:`k_work_submit_to_queue`) can fail if the
work is being cancelled or the queue is not accepting new items. If this
happens the work will not be executed, which could cause a subsystem that is
animated by work handler activity to become non-responsive.
* Asynchronous cancellation (:c:func:`k_work_cancel` or
:c:func:`k_work_cancel_delayable`) can complete while the work item is still
being run by a handler. Proceeding to manipulate state shared with the work
handler will result in data races that can cause failures.
Many race conditions have been present in Zephyr code because the results of
an operation were not checked.
There may be good reason to believe that a return value indicating that the
operation did not complete as expected is not a problem. In thoses cases the
code should clearly document this, by (1) casting the return value to ``void``
to indicate that the result is intentionally ignored, and (2) documenting what
happens in the unexpected case. For example:
.. code-block:: c
/* If this fails, the work handler will check pub->active and
* exit without transmitting.
*/
(void)k_work_cancel_delayable(&pub->timer);
However in such a case the following code must still avoid data races, as it
cannot guarantee that the work thread is not accessing work-related state.
Don't Optimize Prematurely
==========================
The workqueue API is designed to be safe when invoked from multiple threads
and interrupts. Attempts to externally inspect a work item's state and make
decisions based on the result are likely to create new problems.
So when new work comes in, just submit it. Don't attempt to "optimize" by
checking whether the work item is already submitted by inspecting snapshot
state with :c:func:`k_work_is_pending` or :c:func:`k_work_busy_get`, or
checking for a non-zero delay from
:c:func:`k_work_delayable_remaining_get()`. Those checks are fragile: a "busy"
indication can be obsolete by the time the test is returned, and a "not-busy"
indication can also be wrong if work is submitted from multiple contexts, or
(for delayable work) if the deadline has completed but the work is still in
queued or running state.
A general best practice is to always maintain in shared state some condition
that can be checked by the handler to confirm whether there is work to be
done. This way you can use the work handler as the standard cleanup path:
rather than having to deal with cancellation and cleanup at points where items
are submitted, you may be able to have everything done in the work handler
itself.
A rare case where you could safely use :c:func:`k_work_is_pending` is as a
check to avoid invoking :c:func:`k_work_flush` or
:c:func:`k_work_cancel_sync`, if you are *certain* that nothing else might
submit the work while you're checking (generally because you're holding a lock
that prevents access to state used for submission).
Suggested Uses
**************