Commit graph

3183 commits

Author SHA1 Message Date
Ioannis Glaropoulos
9d184286d3 kernel: fatal: allow tests to continue upon spurious ISR trigger
We want to be regression-testing the spurious ISR functionality.
Therefore, in z_fatal_error() we need to allow a test to continue
if an error has occured due to a spurious IRQ being triggered.
Only in test mode, wee allow the function to return without an
error. In normal mode the current thread will be aborted.

Signed-off-by: Ioannis Glaropoulos <Ioannis.Glaropoulos@nordicsemi.no>
2020-03-11 10:26:36 +02:00
Ioannis Glaropoulos
49fb5d0812 kernel: fatal: check for esf validity when inspecting nested IRQ
For architectures that support detection of nested interrupts,
we need to check the validity of the exception stack frame,
before we can supply it as a pointer to the function that
evaluates whether we are in a nested interrupt context. This
commits adds the required esf pointer checks in z_fatal_error().

Signed-off-by: Ioannis Glaropoulos <Ioannis.Glaropoulos@nordicsemi.no>
2020-03-11 10:26:36 +02:00
Andrew Boie
c4fbc08347 kernel: fixup thread monitor locking
The lock in kernel/thread.c was pulling double-duty, protecting
both the thread monitor linked list and also serializing access
to k_thread_suspend/resume functions.

The monitor list now has its own dedicated lock.

The object tracing test has been updated to use k_thread_foreach().

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-03-10 16:09:24 -04:00
Flavio Ceolin
8ae822c9fa kernel: random: ifdef z_early_boot_rand_get
This function had a to sys_rand_get() even without random source. As
Zephyr is built with linkage garbage collection and this function is
called only if either ENTROPY_HAS_DRIVER or TEST_RANDOM_GENERATOR is
enabled and these options automatically enable a random source.

Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
2020-03-10 21:12:28 +02:00
Andrew Boie
9f0acd44a4 kernel: add APIs for atomic os on pointers
The existing APIs are insufficient on 64-bit systems as atomic_t
is 32-bits wide.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-03-10 10:18:16 -04:00
Andrew Boie
60e0019751 kernel: fix return type for atomic_cas()
In some cases this was a bool, in some cases an integer value
of 1 or 0. Standardize on bool.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-03-10 10:18:16 -04:00
Andrew Boie
6cf496f324 kernel: use sched lock for k_thread_suspend/resume
This logic should be using the sched_lock and not its own
separate lock for these two functions.

Some simplications were made; z_thread_single_resume and
z_thread_single_suspend were only used in one place, and there was
some redundant logic for whether to reschedule in the suspend case.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-03-10 09:57:58 -04:00
Flavio Ceolin
fbeaa0a510 kernel: Stack pointer random depends on MULTITHREADING
Don't pretend with have stack randomization without multithreading.
When multithreading is disabled the "main" thread never starts. Zephyr
will run on the stack used for the z_cstart(), which on most
architectures is the interrupt stack.

Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
2020-03-02 22:49:37 +02:00
Andrew Boie
896e32b414 kernel: remove problematic pend() assertion
This assertion, if built in, allows users threads to crash
the kernel in a critical section by passing a negative timeout
value, creating a DoS attack vector.

Remove this assertion, immediately below it there's a check
which just resets it to 0 anyway.

Fixes: #22999

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-02-21 08:57:07 -08:00
Peter Bigot
d8146d6c6d kernel: work_q: fix return value in non-error case
A recent patch allowed an error code to be returned even though the
execution path treated it as a non-error condition.  Clear the code
before returning.

Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
2020-02-20 17:50:05 +02:00
Andy Ross
a2f6826f9c kernel/thread: Don't clobber arch initialization of switch_handle
The recent synchronization work required that the kernel guarantee
switch_handle is non-null, but it did it in a way that works for ARC
and x86_64 but would clobber the work xtensa had already done to
populate that field.

There's no point: just make this an assert, as it's always been the
arch layer's job.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-02-19 08:29:35 -05:00
Luiz Augusto von Dentz
038d727c18 kernel: work: Return error if timeout cannot be aborted
This is aligned with the documentation which states that an error shall
be returned if the work has been completed:

  '-EINVAL Work item is being processed or has completed its work.'

Though in order to be able to resubmit from the handler itself it needs
to be able to distinct when the work is already completed so instead of
-EINVAL it return -EALREADY when the work is considered to be completed.

Fixes #22803

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2020-02-17 22:37:26 +02:00
Ioannis Glaropoulos
3a3364eef8 kernel: fatal: unlock IRQs in early return points in z_fatal_error
We need to unlock IRQs in early return points of
z_fatal_error() functions; not only at the normal
return point.

Signed-off-by: Ioannis Glaropoulos <Ioannis.Glaropoulos@nordicsemi.no>
2020-02-14 20:47:37 +02:00
Andy Ross
7353c7f95d kernel/userspace: Move syscall_frame field to thread struct
The syscall exception frame was stored on the CPU struct during
syscall execution, but that's not right.  System calls might "feel
like" exceptions, but they're actually perfectly normal kernel mode
code and can be preempted and migrated between CPUs at any time.

Put the field on the thread struct.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-02-08 08:51:04 -05:00
Andy Ross
8153144de0 kernel/fatal: Fatal errors must not be preempted
The code underneath z_fatal_error() (which is usually run in an
exception context, but is not required to be) was running with
interrupts enabled, which is a little surprising.

The only bug present currently is that the CPU ID extracted for
logging is subject to a race (i.e. it's possible but very unlikely
that such a handler might migrate to another CPU after the error is
flagged and log the wrong CPU ID), but in general users with custom
error handlers are likely to be surprised when their dying threads
gets preempted by other code before they can abort.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-02-08 08:51:04 -05:00
Andy Ross
eefd3daa81 kernel/smp: arch/x86_64: Address race with CPU migration
Use of the _current_cpu pointer cannot be done safely in a preemptible
context.  If a thread is preempted and migrates to another CPU, the
old CPU record will be wrong.

Add a validation assert to the expression that catches incorrect
usages, and fix up the spots where it was wrong (most important being
a few uses of _current outside of locks, and the arch_is_in_isr()
implementation).

Note that the resulting _current expression now requires locking and
is going to be somewhat slower.  Longer term it's going to be better
to augment the arch API to allow SMP architectures to implement a
faster "get current thread pointer" action than this default.

Note also that this change means that "_current" is no longer
expressible as an lvalue (long ago, it was just a static variable), so
the places where it gets assigned now assign to _current_cpu->current
instead.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-02-08 08:51:04 -05:00
Andrew Boie
efc5fe07a2 kernel: overhaul unused stack measurement
The existing stack_analyze APIs had some problems:

1. Not properly namespaced
2. Accepted the stack object as a parameter, yet the stack object
   does not contain the necessary information to get the associated
   buffer region, the thread object is needed for this
3. Caused a crash on certain platforms that do not allow inspection
   of unused stack space for the currently running thread
4. No user mode access
5. Separately passed in thread name

We deprecate these functions and add a new API
k_thread_stack_space_get() which addresses all of these issues.

A helper API log_stack_usage() also added which resembles
STACK_ANALYZE() in functionality.

Fixes: #17852

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-02-08 10:02:35 +02:00
Anas Nashif
73008b427c tracing: move headers under include/tracing
Move tracing.h to include/tracing/ to align with subsystem reorg.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-02-07 15:58:05 -05:00
Andrew Boie
d1f50122f9 kernel: move timing externs to public header
These arch_timing_ defines get used in certain timer
drivers and need to be in the public include space,
and not the private kernel headers.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-02-06 23:07:37 -05:00
Andy Ross
5737b5c843 kernel/sched: Re-add IPI calls on k_wakeup() and k_thread_priority_set()
These got dropped by an earlier patch, but are required on SMP systems
so synchronously notify other CPUs of changed scheduler state.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-02-04 17:50:11 -05:00
Andy Ross
c44d566aee kernel/sched: Re-fix SMP wait-for-switch on interrupt exit
This got clobbered by commit adac4cbafa in what I think was a rebase
mistake.  Without it, on SMP systems it's possible to select a new
_current thread and try to return into it before another CPU has
actually finished switching away from it.

Interestingly: the frequency with which this bug got caught once it
was reintroduced was much, much higher than it was when it was fixed
the first time due to the instruction pointer poisoning introduced in
the interrim.  Incompletely saved threads now have deliberately broken
state when assertions are enabled and will panic synchronously.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-02-04 17:50:11 -05:00
Andy Ross
96ccc46e03 kernel/sched: Put k_thread_start() under a single lock
Similar to the suspend refactoring earlier, this really nees to be
done in an atomic block.  There were two confirmable races here,
though it's not completely clear either was being hit in practice:

1. The bit operations in z_mark_thread_as_started() aren't atomic so
   it needs to be protected.

2. The intermediate state in z_ready_thread() could result in a dead
   or suspended thread being added to the ready queue if another
   context tried a simultaneous abort or suspend.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-02-03 09:31:56 -05:00
Andy Ross
ed6b4fb21c kernel/sched: Properly synchronize pend()
Kernel wait_q's and the thread pended_on backpointer are scheduler
state and need to be modified under the scheduler lock.  There was one
spot in pend() where they were not.

Also unpack z_remove_thread_from_ready_q() into an unsynchronized
utility so that it can be called by this process in a single lock
block.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-02-03 09:31:56 -05:00
Andy Ross
b8ff63e3c7 kernel/sem: Fix SMP race
This had the same race that queue did: you have to be 100% done with
state management before calling z_ready_thread(), because another CPU
can pick up the thread before the return value was set.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-02-03 09:31:56 -05:00
Daniel Leung
adac4cbafa sched: smp: fix thread marked dead but still running
Under SMP, when a thread is marked aborting, this thread may still
be running on another CPU. However, if there is only one thread
available to run, this thread may be selected to run again due to
next_up() not checking for the aborting state. Moreover, when
there is no IPI to signal to others k_thread_abort() being called,
the k_thread_abort() target thread is marked dead after a new
thread is selected to run. This causes the original thread calling
k_thread_abort() to mistaken that target thread is no longer
running and returns.

Note that, with working IPI, z_sched_ipi() is called as an ISR
to mark the target thread dead. A new thread is then selected to
run, so that the target thread would not be selected due to it
being dead.

This moves the code to mark thread dead into next_up(), where
the next best thread is selected, and the current thread being
swapped out. z_sched_ipi() now becomes an empty function, and
calls to it are removed.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2020-01-31 11:46:35 -05:00
Anas Nashif
471ffbe77d coverage: do not dump coverage data by default
Only dump data when we are interested in the analysing coverage. By
default just collect the data.

CONFIG_COVERAGE_DUMP is used to control this behaviour.

This will help speed up sanitycheck and will avoid lots of noise in the
log when some tests with coverage enabled failed. Dumping data to
console is also suspected to be one of the reason why qemu hangs in CI.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-30 16:04:03 -05:00
Anas Nashif
7605ac2c4c kernel: thread: fix string for _THREAD_PRESTART
_THREAD_PRESTART means the thread was not started yet and is being
setup, for example this is the case when starting a thread with a
timeout. We do not have a 'restart' thread state.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-29 13:17:19 -08:00
Sebastian Bøe
fdac7b3319 cmake: Add target for generating header files
Before C sources can be compiled any generated header that they
include must be generated. Currently, the target 'offsets_h' happens
to depend directly or indirectly on all generated headers.

This means that to compile safely, one can simply depend on
'offsets_h'. But this is coincidental and might not be true in the
future.

To be able to safely depend on a target that represents all generated
headers being ready we introduce the target
'zephyr_generated_headers'.

Any third-party build scripts can now safely depend on
'zephyr_generated_headers' and be protected from any internal changes
to the build system, like the removal of offsets_h.

Signed-off-by: Sebastian Bøe <sebastian.boe@nordicsemi.no>
2020-01-29 11:44:57 -06:00
Andrew Boie
6f654bbafd mempool: use k_malloc heap for ISR allocations
Fixes an issue where calling z_thread_malloc() would
borrow the resource pool of whatever thread happened
to be interrupted at the time.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-01-24 09:27:59 -08:00
Krzysztof Chruscinski
a8b5a2e65e kernel: device: Add const qualifier to device_config
Device config structure is placed in rom section but there was
no const prefix used. Lack of prefix suggested that structure
is in ram (ram_report is also fooled). Added const prefix to
explicitly inform that it goes to rom.

Signed-off-by: Krzysztof Chruscinski <krzysztof.chruscinski@nordicsemi.no>
2020-01-22 06:32:36 -06:00
Andy Ross
4c2fc2aed7 kernel/queue: Fix SMP race
Calling z_ready_thread() means the thread is now ready and can wake up
at any moment on another CPU.  But we weren't finished setting the
return value!  So the other side could wake up with a spurious "error"
condition if it ran too soon.  Note that on systems with a working
IPI, that wakeup can happen much faster than you might think.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-01-21 14:47:52 -08:00
Andy Ross
3235451880 kernel/swap: Add SMP "wait for switch" synchronization
On SMP, there is an inherent race when swapping: the old thread adds
itself back to the run queue before calling into the arch layer to do
the context switch.  The former is properly synchronized under the
scheduler lock, and the later operates with interrupts locally
disabled.  But until somewhere in the middle of arch_switch(), the old
thread (that is in the run queue!) does not have complete saved state
that can be restored.

So it's possible for another CPU to grab a thread before it is saved
and try to restore its unsaved register contents (which are garbage --
typically whatever state it had at the last interrupt).

Fix this by leveraging the "swapped_from" pointer already passed to
arch_switch() as a synchronization primitive.  When the switch
implementation writes the new handle value, we know the switch is
complete.  Then we can wait for that in z_swap() and at interrupt
exit.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-01-21 14:47:52 -08:00
Andy Ross
e06ba702d5 kernel/sched: Address thread abort termination delay issue on SMP
It's possible for a thread to abort itself simultaneously with an
external abort from another thread.  In fact in our test suite this is
a common thing, as ztest will abort its own spawend threads at the end
of a test, as they tend to be exiting on their own.

When that happens, the thread marks itself DEAD and does all its
scheduler bookeeping, but it is STILL RUNNING on its own stack until
it makes its way to its final swap.  The external context would see
that "dead" metadata and return from k_thread_abort(), allowing the
next test to reuse and spawn the same thread struct while the old
context was still running.  Obviously that's bad.

Unfortunately, this is impossible to address completely without
modifying every SMP architecture to add a API-visible hook to every
swap that signals completion.  In practice the best we can do is add a
delay.  But note the optimization: almost always, the scheduler IPI
catches the running thread and kills it from interrupt context
(i.e. on a different stack).  When that happens, we know that the
interrupted thread will never be resumed (because it's dead) and can
elide the delay.  We only pay the cost when we actually detect a race.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-01-21 14:47:52 -08:00
Andy Ross
60247ca149 kernel/sched: Correct IPI usage
These two spots were calling z_sched_ipi() (the IPI handler run under
the ISR, which is a noop here because obviously the current thread
isn't DEAD) and not arch_sched_ipi() (which triggers an IPI on other
CPUs to inform them of scheduling state changes), presumably because
of a typo.

Apparently we don't have tests for k_wakeup() and
k_thread_priority_set() that are sensitive to latency in SMP
contexts...

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-01-21 14:47:52 -08:00
Andy Ross
86430d8d46 kernel: arch: Clarify output switch handle requirements in arch_switch
The original intent was that the output handle be written through the
pointer in the second argument, though not all architectures used that
scheme.  As it turns out, that write is becoming a synchronization
signal, so it's no longer optional.

Clarify the documentation in arch_switch() about this requirement, and
add an instruction to the x86_64 context switch to implement it as
original envisioned.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-01-21 14:47:52 -08:00
Ulf Magnusson
40b49e22ea kernel: kconfig: Make SCHED_IPI_SUPPORTED invisible
Toggling this symbol probably doesn't make sense, because the
architecture is already known when Kconfig runs.

SCHED_IPI_SUPPORTED is enabled through being selected by the ARC_CONNECT
(maybe that one shouldn't be configurable either) and X86_64 symbols.

Note that it's not possible to disable the symbol when it's being
selected, so trying to turn it off on e.g. X86_64 won't work either.

Signed-off-by: Ulf Magnusson <Ulf.Magnusson@nordicsemi.no>
2020-01-20 18:38:10 -05:00
Anas Nashif
756d8b03e2 kernel: queue: runtime error handling
Runtime error handling for k_queue_append_list and k_queue_merge_slist.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-20 17:19:54 -05:00
Anas Nashif
1ed67d1d51 kernel: stack: error handling
Add runtime error checking for both k_stack_push and k_stack_cleanup.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-20 17:19:54 -05:00
Anas Nashif
11b9365542 kernel: msgq: error handling
Add runtime error handling for k_msgq_cleanup. We return 0 on success
now and -EAGAIN when cleanup is not possible.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-20 17:19:54 -05:00
Anas Nashif
dfc2bbcd3c kernel: mem_slab: error handling
Add runtime error checking for k_mem_slab_init and replace asserts with
returning error codes.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-20 17:19:54 -05:00
Anas Nashif
154af912e8 kernel: work_q: error handling
When trying to cancel a NULL work queue return -EAGAIN.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-20 17:19:54 -05:00
Anas Nashif
361a84d07f kernel: pipe: runtime error checking
Add runtime error checking to k_pipe_cleanup and k_pipe_get and remove
asserts.
Adapted test which was expecting a fault to handle errors instead.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-20 17:19:54 -05:00
Anas Nashif
5076a83ef5 kernel: semaphore: optimize code
Remove static helper functions used only once and integrate them into
calling functions.
In k_sem_take, return at the end.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-20 17:19:54 -05:00
Anas Nashif
928af3ce09 kernel: semaphore: k_sem_init error checking
Check for errors at runtime and stop depending on ASSERTs.
This changes the API for
- k_sem_init

k_sem_init now returns -EINVAL on invalid data.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-20 17:19:54 -05:00
Anas Nashif
86bb2d06d7 kernel: mutex: add error checking
k_mutex_unlock will now perform error checking and return on failures.

If the current thread does not own the mutex, we will now return -EPERM.
In the unlikely situation where we own a lock and the lock count is
zero, we assert. This is considered an undefined bahviour and should not
happen.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-20 17:19:54 -05:00
Andrew Boie
a594ca7c8f kernel: cleanup and formally define CPU start fn
The "key" parameter is legacy, remove it.

Add a typedef for the expected function pointer type.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-01-13 16:35:10 -05:00
Anas Nashif
0ad67650f2 tracing: better positioning of tracing points
Improve positioning of tracing calls. Avoid multiple calls and missing
events because of complex logix. Trace the event where things happen
really.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-09 11:21:19 -05:00
Anas Nashif
1530819e12 tracing: remove duplicate tracing of thread creation
This is already being called in z_setup_new_thread.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-01-09 11:21:19 -05:00
Andy Ross
8bdabcc46b kernel/sched: Move thread suspend and abort under the scheduler lock
Historically, these routines were placed in thread.c and would use the
scheduler via exported, synchronized functions (e.g. "remove from
ready queue").  But those steps were very fine grained, and there were
races where the thread could be seen by other contexts (in particular
under SMP) in an intermediate state.  It's not completely clear to me
that any of these were fatal bugs, but it's very hard to prove they
weren't.

At best, this is fragile.  Move the z_thread_single_suspend/abort()
functions into the scheduler and do the scheduler logic in a single
critical section.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-01-08 14:21:10 +01:00
Steven Wang
9b909f2d69 kernel/mutex: code improvement
Line "new_prio = mutex->owner_orig_prio" is unnecessary. So
remove it.

Signed-off-by: Steven Wang <steven.l.wang@linux.intel.com>
2020-01-07 17:13:37 +01:00