doc: kernel: workqueue: add section on best practices
Refactor best practices from the API refactoring issue and integrate them into the existing documentation. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
This commit is contained in:
parent
add4c1d8ef
commit
d8cf822d66
1 changed files with 139 additions and 6 deletions
|
@ -54,7 +54,8 @@ when no work items are available.
|
|||
Using the return values of :c:func:`k_work_busy_get()` or
|
||||
:c:func:`k_work_is_pending()`, or measurements of remaining time until
|
||||
delayable work is scheduled, should be avoided to prevent race conditions
|
||||
of the type observed with the previous implementation.
|
||||
of the type observed with the previous implementation. See also `Workqueue
|
||||
Best Practices`_.
|
||||
|
||||
Work Item Lifecycle
|
||||
********************
|
||||
|
@ -214,8 +215,8 @@ use of it.
|
|||
for example, if the new work items perform blocking operations that
|
||||
would delay other system workqueue processing to an unacceptable degree.
|
||||
|
||||
Implementation
|
||||
**************
|
||||
How to Use Workqueues
|
||||
*********************
|
||||
|
||||
Defining and Controlling a Workqueue
|
||||
====================================
|
||||
|
@ -336,16 +337,22 @@ A delayable work item is defined using a variable of type
|
|||
:c:struct:`k_work_delayable`. It must be initialized by calling
|
||||
:c:func:`k_work_init_delayable`.
|
||||
|
||||
There are two APIs that submit work after a delay:
|
||||
For delayed work there are two common use cases, depending on whether a
|
||||
deadline should be extended if a new event occurs. An example is collecting
|
||||
data that comes in asynchronously, e.g. characters from a UART associated with
|
||||
a keyboard. There are two APIs that submit work after a delay:
|
||||
|
||||
* :c:func:`k_work_schedule()` (or :c:func:`k_work_schedule_for_queue()`)
|
||||
schedules work to be executed at a specific time or after a delay. Further
|
||||
attempts to schedule the same item with this API before the delay completes
|
||||
will not change the time at which the item will be submitted to its queue.
|
||||
|
||||
Use this if the policy is to keep collecting data until a specified delay
|
||||
since the **first** unprocessed data was received;
|
||||
* :c:func:`k_work_reschedule()` (or :c:func:`k_work_reschedule_for_queue()`)
|
||||
unconditionally sets the deadline for the work, replacing any previous
|
||||
incomplete delay and changing the destination queue if necessary.
|
||||
incomplete delay and changing the destination queue if necessary. Use this
|
||||
if the policy is to keep collecting data until a specified delay since the
|
||||
**last** unprocessed data was received.
|
||||
|
||||
If the work item is not scheduled both APIs behave the same. If
|
||||
:c:macro:`K_NO_WAIT` is specified as the delay the behavior is as if the item
|
||||
|
@ -390,6 +397,132 @@ synchronization objects. These objects should not be allocated on a stack if
|
|||
the code is expected to work on architectures with
|
||||
:option:`CONFIG_KERNEL_COHERENCE`.
|
||||
|
||||
Workqueue Best Practices
|
||||
************************
|
||||
|
||||
Avoid Race Conditions
|
||||
=====================
|
||||
|
||||
Sometimes the data a work item must process is naturally thread-safe, for
|
||||
example when it's put into a :c:struct:`k_queue` by some thread and processed
|
||||
in the work thread. More often external synchronization is required to avoid
|
||||
data races: cases where the work thread might inspect or manipulate shared
|
||||
state that's being accessed by another thread or interrupt. Such state might
|
||||
be a flag indicating that work needs to be done, or a shared object that is
|
||||
filled by an ISR or thread and read by the work handler.
|
||||
|
||||
For simple flags :ref:`atomic_v2` may be sufficient. In other cases spin
|
||||
locks (:c:struct:`k_spinlock_t`) or thread-aware locks (:c:struct:`k_sem`,
|
||||
:c:struct:`k_mutex` , ...) may be used to ensure data races don't occur.
|
||||
|
||||
If the selected lock mechanism can :ref:`api_term_sleep` then allowing the
|
||||
work thread to sleep will starve other work queue items, which may need to
|
||||
make progress in order to get the lock released. Work handlers should try to
|
||||
take the lock with its no-wait path. For example:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
static void work_handler(struct work *work)
|
||||
{
|
||||
struct work_context *parent = CONTAINER_OF(work, struct work_context,
|
||||
work_item);
|
||||
|
||||
if (k_mutex_lock(&parent->lock, K_NO_WAIT) != 0) {
|
||||
/* NB: Submit will fail if the work item is being cancelled. */
|
||||
(void)k_work_submit(work);
|
||||
return;
|
||||
}
|
||||
|
||||
/* do stuff under lock */
|
||||
k_mutex_unlock(&parent->lock);
|
||||
/* do stuff without lock */
|
||||
}
|
||||
|
||||
Be aware that if the lock is held by a thread with a lower priority than the
|
||||
work queue the resubmission may starve the thread that would release the lock,
|
||||
causing the application to fail. Where the idiom above is required a
|
||||
delayable work item is preferred, and the work should be (re-)scheduled with a
|
||||
non-zero delay to allow the thread holding the lock to make progress.
|
||||
|
||||
Note that submitting from the work handler can fail if the work item had been
|
||||
cancelled. Generally this is acceptable, since the cancellation will complete
|
||||
once the handler finishes. If it is not, the code above must take other steps
|
||||
to notify the application that the work could not be performed.
|
||||
|
||||
Work items in isolation are self-locking, so you don't need to hold an
|
||||
external lock just to submit or schedule them. Even if you use external state
|
||||
protected by such a lock to prevent further resubmission, it's safe to do the
|
||||
resubmit as long as you're sure that eventually the item will take its lock
|
||||
and check that state to determine whether it should do anything. Where a
|
||||
delayable work item is being rescheduled in its handler due to inability to
|
||||
take the lock some other self-locking state, such as an atomic flag set by the
|
||||
application/driver when the cancel is initiated, would be required to detect
|
||||
the cancellation and avoid the cancelled work item being submitted again after
|
||||
the deadline.
|
||||
|
||||
Check Return Values
|
||||
===================
|
||||
|
||||
All work API functions return status of the underlying operation, and in many
|
||||
cases it is important to verify that the intended result was obtained.
|
||||
|
||||
* Submitting a work item (:c:func:`k_work_submit_to_queue`) can fail if the
|
||||
work is being cancelled or the queue is not accepting new items. If this
|
||||
happens the work will not be executed, which could cause a subsystem that is
|
||||
animated by work handler activity to become non-responsive.
|
||||
* Asynchronous cancellation (:c:func:`k_work_cancel` or
|
||||
:c:func:`k_work_cancel_delayable`) can complete while the work item is still
|
||||
being run by a handler. Proceeding to manipulate state shared with the work
|
||||
handler will result in data races that can cause failures.
|
||||
|
||||
Many race conditions have been present in Zephyr code because the results of
|
||||
an operation were not checked.
|
||||
|
||||
There may be good reason to believe that a return value indicating that the
|
||||
operation did not complete as expected is not a problem. In thoses cases the
|
||||
code should clearly document this, by (1) casting the return value to ``void``
|
||||
to indicate that the result is intentionally ignored, and (2) documenting what
|
||||
happens in the unexpected case. For example:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/* If this fails, the work handler will check pub->active and
|
||||
* exit without transmitting.
|
||||
*/
|
||||
(void)k_work_cancel_delayable(&pub->timer);
|
||||
|
||||
However in such a case the following code must still avoid data races, as it
|
||||
cannot guarantee that the work thread is not accessing work-related state.
|
||||
|
||||
Don't Optimize Prematurely
|
||||
==========================
|
||||
|
||||
The workqueue API is designed to be safe when invoked from multiple threads
|
||||
and interrupts. Attempts to externally inspect a work item's state and make
|
||||
decisions based on the result are likely to create new problems.
|
||||
|
||||
So when new work comes in, just submit it. Don't attempt to "optimize" by
|
||||
checking whether the work item is already submitted by inspecting snapshot
|
||||
state with :c:func:`k_work_is_pending` or :c:func:`k_work_busy_get`, or
|
||||
checking for a non-zero delay from
|
||||
:c:func:`k_work_delayable_remaining_get()`. Those checks are fragile: a "busy"
|
||||
indication can be obsolete by the time the test is returned, and a "not-busy"
|
||||
indication can also be wrong if work is submitted from multiple contexts, or
|
||||
(for delayable work) if the deadline has completed but the work is still in
|
||||
queued or running state.
|
||||
|
||||
A general best practice is to always maintain in shared state some condition
|
||||
that can be checked by the handler to confirm whether there is work to be
|
||||
done. This way you can use the work handler as the standard cleanup path:
|
||||
rather than having to deal with cancellation and cleanup at points where items
|
||||
are submitted, you may be able to have everything done in the work handler
|
||||
itself.
|
||||
|
||||
A rare case where you could safely use :c:func:`k_work_is_pending` is as a
|
||||
check to avoid invoking :c:func:`k_work_flush` or
|
||||
:c:func:`k_work_cancel_sync`, if you are *certain* that nothing else might
|
||||
submit the work while you're checking (generally because you're holding a lock
|
||||
that prevents access to state used for submission).
|
||||
|
||||
Suggested Uses
|
||||
**************
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue