Commit graph

166 commits

Author SHA1 Message Date
Andy Ross cce5ff1510 arch/x86: Fix stack alignment for user threads
The x86_64 SysV ABI requires 16 byte alignment for the stack pointer
during execution of normal code.  That means that on entry to an
ABI-compatible C function (which is reached via a CALL instruction
that pushes the return address) the RSP register must be MISaligned by
exactly 8 bytes.  The kernel mode thread setup got this right, but we
missed the equivalent condition in userspace entry.

The end result was a misaligned stack, which is surprisingly robust
for most use.  But recent toolchains have starting doing some more
elaborate vectorization, and the resulting SSE instructions started
failing in userspace on the misaliged loads.

Note that there's a comment about optimization: we're doing the stack
alignment in the "wrong place" and are needlessly wasting bytes in
some cases.  We should see the raw stack boundaries where we are
setting up RSP values.  Add a FIXME to this effect, but don't touch
anything as this patch is a targeted bugfix.

Also fix a somewhat embarassing 32-bit-ism that would have truncated
the address of a userspace stack that we tried to put above 4G.

Fixes #31018

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-02-03 18:45:48 -05:00
Daniel Leung dea8fccfb3 x86: clear GS at boot for x86_64
On Intel processors, if GS is not zero and is being set to
zero, GS_BASE is also being set to zero. This would interfere
with the actual use of GS_BASE for usespace. To avoid accidentally
clearing GS_BASE, simply set GS to 0 at boot, so any subsequent
clearing of GS will not clear GS_BASE.

The clearing of GS_BASE was discovered while trying to figure out
why the mem_protect test would hang within 10-20 repeated runs.
GDB revealed that both GS and GS_BASE was set to zero when the tests
hanged. After setting GS to zero at boot, the mem_protect tests
were running repeated for 5,000+ times without hanging.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2021-02-01 21:38:28 -05:00
Daniel Leung d3218ca515 debug: coredump: remove z_ prefix for stuff used outside subsys
This removes the z_ prefix those (functions, enums, etc.) that
are being used outside the coredump subsys. This aligns better
with the naming convention.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2021-01-21 22:08:59 -05:00
Andrew Boie 0a791b7a09 x86: mmu: clarify physical/virtual conversions
The page table implementation requires conversion between virtual
and physical addresses when creating and walking page tables. Add
a phys_addr() and virt_addr() functions instead of hard-casting
these values, plus a macro for doing the same in ASM code.

Currently, all pages are identity mapped so VIRT_OFFSET = 0, but
this will now still work if they are not the same.

ASM language was also updated for 32-bit. Comments were left in
64-bit, as long mode semantics don't allow use of Z_X86_PHYS_ADDR
macro; this can be revisited later.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-12-15 14:16:51 -05:00
Andrei Emeltchenko 34573803a8 arch: x86_64: Rename _exception_stack to z_x86_exception_stack
Rename stack name according to MISRA-C standard.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
2020-12-10 17:06:17 +02:00
Andrei Emeltchenko a4dbb51e74 arch: x86_64: Rename _nmi_stack to z_x86_nmi_stack
Rename stack name according to MISRA-C standard.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
2020-12-10 17:06:17 +02:00
Andrei Emeltchenko ddc5139a6e arch: x86_64: Trivial correction
Correct register name in comment.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
2020-12-10 17:06:17 +02:00
Andrei Emeltchenko e7d3dd1362 arch: x86_64: Using right exception stack with KPTI
With kernel page table isolation (KPTI) we cannot use right exception
stack since after using trampoline stack there was always switch to
7th IST stack (__x86_tss64_t_ist7_OFFSET). Make this configurable as a
parameter in EXCEPT(nr, ist) and EXCEPT_CODE(nr, ist). For the NMI we
would use ist6 (_nmi_stack).

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
2020-12-10 17:06:17 +02:00
Andrei Emeltchenko 85db42883f arch/x86: Use NMI stack for NMIs
NMI can be triggered at any time, even when in the process of
switching stacks. Use special stack for it.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
2020-12-10 17:06:17 +02:00
Andrei Emeltchenko 8db06aee69 arch/x86: Add NMI registration API
Add simple NMI registration API.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
2020-12-10 17:06:17 +02:00
Tomasz Bursztyka abf65b5e65 arch/x86: Generalize dynamic irq connection on given vector
This will be used by MSI multi-vector implementation to connect the irq
and the vector prior to allocation.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
2020-12-08 09:29:20 -05:00
Tomasz Bursztyka bd57e4cf12 arch/x86: Generalize the vector allocator
This will be used by MSI/MSI-X when multi-vector is requested.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
2020-12-08 09:29:20 -05:00
Krzysztof Chruscinski 3ed8083dc1 kernel: Cleanup logger setup in kernel files
Most of kernel files where declaring os module without providing
log level. Because of that default log level was used instead of
CONFIG_KERNEL_LOG_LEVEL.

Signed-off-by: Krzysztof Chruscinski <krzysztof.chruscinski@nordicsemi.no>
2020-11-27 09:56:34 -05:00
Daniel Leung 11e6b43090 tracing: roll thread switch in/out into thread stats functions
Since the tracing of thread being switched in/out has the same
instrumentation points, we can roll the tracing function calls
into the one for thread stats gathering functions.
This avoids duplicating code to call another function.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2020-11-11 23:55:49 -05:00
Andrew Boie b8242bff64 x86: move page tables to the memory domain level
- z_x86_userspace_enter() for both 32-bit and 64-bit now
  call into C code to clear the stack buffer and set the
  US bits in the page tables for the memory range.

- Page tables are now associated with memory domains,
  instead of having separate page tables per thread.
  A spinlock protects write access to these page tables,
  and read/write access to the list of active page
  tables.

- arch_mem_domain_init() implemented, allocating and
  copying page tables from the boot page tables.

- struct arch_mem_domain defined for x86. It has
  a page table link and also a list node for iterating
  over them.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-11-05 09:33:40 -05:00
Andrew Boie d31f62a955 x86: smp: add TLB shootdown logic
This will be needed when we support memory un-mapping, or
the same user mode page tables on multiple CPUs. Neither
are implemented yet.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-11-05 09:33:40 -05:00
Andrew Boie edc5e31d6b x86_64: fix RBX clobber in nested IRQ case
In the code path for nested interrupts, we are not saving
RBX, yet the assembly code is using it as a storage location
for the ISR.

Use RAX. It is backed up in both the nested and non-nested
cases, and the ASM code is not currently using it at that
point.

Fixes: #29594

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-10-28 10:29:32 -07:00
Daniel Leung 53ac1ee6fa x86_64: add support for thread local storage
Adds the necessary bits to initialize TLS in the stack
area and sets up CPU registers during context switch.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2020-10-24 10:52:00 -07:00
Tomasz Bursztyka 2b6c574499 arch/x86: If ACPI is enabled, get the CPU APIC ID from it
The hardcoded APIC ID will be kept as default if the CPU is not found in
ACPI MADT.

Note that ACPI may expose more "CPUs" than there actually are
physically. Thus, make the logic aware of this possibility by checking
the enabled flas. (Non-enabled CPU are ignored).

This fixes up_squared board made of Celeron CPU.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
2020-10-01 11:16:40 -07:00
Andrew Boie a696f8ba27 x86: rename CONFIG_EXCEPTION_STACKS_SIZE
This is an x86-specific definition.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-09-28 13:38:41 -07:00
Andrew Boie 61d42cf42b x86-64: fix thread tracing
The current instrumentation point for CONFIG_TRACING added in
PR #28512 had two problems:

- If userspace and KPTI are enabled, the tracing point is simply
  never run if we are resuming a user thread as the
  z_x86_trampoline_to_user function is jumped to and calls
  'iret' from there

- Only %rdi is being saved. However, at that location, *all*
  caller-saved registers are in use as they contain the
  resumed thread's context

Simplest solution is to move this up near where we update page
tables. The #ifdefs are used to make sure we don't push/pop
%rdi more than once. At that point in the code only %rdi
is in use among the volatile registers.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-09-22 20:47:48 -04:00
Anas Nashif 75d159bf0d tracing: x86_64: move switched_in to switch function
Tracing switched in threads in C code does not work, it needs to happen
in the arch_switch code. See also Xtensa and ARC.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-09-20 21:27:55 -04:00
Kumar Gala 283a057279 x86_64: Fix memory access size for locore EOI
Newer QEMU (5.1) hangs / timeouts on a number of tests on x86_64.  In
debugging the issue this is related to a fix in QEMU 5.1 that
validates memory region access.  QEMU has the APIC region only allowing
1 to 4 byte access.  64-bit access is treated as an error.

Change the APIC EOI access in locore.S back to just doing a 32-bit
access.

Fixes #	28453

Signed-off-by: Kumar Gala <kumar.gala@linaro.org>
2020-09-18 13:29:00 -05:00
Daniel Leung 43b2e291db x86_64: fix size to init stack at boot
The boot code of x86_64 initializes the stack (if enabled)
with a hard-coded size for the ISR stack. However,
the stack being used does not have to be the ISR stack,
and can be any defined stacks. So pass in the actual size
of the stack so the stack can be initialized properly.

Fixes #21843

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2020-09-17 21:05:45 -04:00
Anas Nashif 6e27478c3d benchmarking: remove execution benchmarking code
This code had one purpose only, feed timing information into a test and
was not used by anything else. The custom trace points unfortunatly were
not accurate and this test was delivering informatin that conflicted
with other tests we have due to placement of such trace points in the
architecture and kernel code.

For such measurements we are planning to use the tracing functionality
in a special mode that would be used for metrics without polluting the
architecture and kernel code with additional tracing and timing code.

Furthermore, much of the assembly code used had issues.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2020-09-05 13:28:38 -05:00
Andrew Boie 7d32e9f9a5 mmu: support only identity RAM mapping
We no longer plan to support a split address space with
the kernel in high memory and per-process address spaces.
Because of this, we can simplify some things. System RAM
is now always identity mapped at boot.

We no longer require any virtual-to-physical translation
for page tables, and can remove the dual-mapping logic
from the page table generation script since we won't need
to transition the instruction point off of physical
addresses.

CONFIG_KERNEL_VM_BASE and CONFIG_KERNEL_VM_LIMIT
have been removed. The kernel's address space always
starts at CONFIG_SRAM_BASE_ADDRESS, of a fixed size
specified by CONFIG_KERNEL_VM_SIZE.

Driver MMIOs and other uses of k_mem_map() are still
virtually mapped, and the later introduction of demand
paging will result in only a subset of system RAM being
a fixed identity mapping instead of all of it.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-09-03 14:24:38 -04:00
Tomasz Bursztyka 93cd336204 arch: Apply dynamic IRQ API change
Switching to constant parameter.

Fixes #27399

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
2020-09-02 13:48:13 +02:00
Tomasz Bursztyka 7def6eeaee arch: Apply IRQ offload API change
Switching to constant parameter.

Fixes #27399

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
2020-09-02 13:48:13 +02:00
Andrew Boie 38e17b68e3 x86: paging code rewrite
The x86 paging code has been rewritten to support another paging mode
and non-identity virtual mappings.

 - Paging code now uses an array of paging level characteristics and
   walks tables using for loops. This is opposed to having different
   functions for every paging level and lots of #ifdefs. The code is
   now more concise and adding new paging modes should be trivial.

 - We now support 32-bit, PAE, and IA-32e page tables.

 - The page tables created by gen_mmu.py are now installed at early
   boot. There are no longer separate "flat" page tables. These tables
   are mutable at any time.

 - The x86_mmu code now has a private header. Many definitions that did
   not need to be in public scope have been moved out of mmustructs.h
   and either placed in the C file or in the private header.

 - Improvements to dumping page table information, with the physical
   mapping and flags all shown

 - arch_mem_map() implemented

 - x86 userspace/memory domain code ported to use the new
   infrastructure.

 - add logic for physical -> virtual instruction pointer transition,
   including cleaning up identity mappings after this takes place.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-08-25 15:49:59 -04:00
Andrew Boie ddb63c404f x86_64: fix sendling locore EOI
The address was being truncated because we were using
32-bit registers. CONFIG_MMU is always enabled on 64-bit,
remove the #ifdef.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-08-25 15:49:59 -04:00
Daniel Leung 8fbb14ef50 coredump: add support for x86 and x86_64
This adds the necessary bits to enable coredump for x86
and x86_64.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2020-08-24 20:28:24 -04:00
Andrew Boie 8b4b0d6264 kernel: z_interrupt_stacks are now kernel stacks
This will save memory on many platforms that enable
user mode.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-07-30 21:11:14 -04:00
Andrew Boie 8ce260d8df kernel: introduce supervisor-only stacks
These stacks are appropriate for threads that run purely in
supervisor mode, and also as stacks for interrupt and exception
handling.

Two new arch defines are introduced:

- ARCH_KERNEL_STACK_GUARD_SIZE
- ARCH_KERNEL_STACK_OBJ_ALIGN

New public declaration macros:

- K_KERNEL_STACK_RESERVED
- K_KERNEL_STACK_EXTERN
- K_KERNEL_STACK_DEFINE
- K_KERNEL_STACK_ARRAY_DEFINE
- K_KERNEL_STACK_MEMBER
- K_KERNEL_STACK_SIZEOF

If user mode is not enabled, K_KERNEL_STACK_* and K_THREAD_STACK_*
are equivalent.

Separately generated privilege elevation stacks are now declared
like kernel stacks, removing the need for K_PRIVILEGE_STACK_ALIGN.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-07-30 21:11:14 -04:00
Andrew Boie b0c155f3ca kernel: overhaul stack specification
The core kernel computes the initial stack pointer
for a thread, properly aligning it and subtracting out
any random offsets or thread-local storage areas.
arch_new_thread() no longer needs to make any calculations,
an initial stack frame may be placed at the bounds of
the new 'stack_ptr' parameter passed in. This parameter
replaces 'stack_size'.

thread->stack_info is now set before arch_new_thread()
is invoked, z_new_thread_init() has been removed.
The values populated may need to be adjusted on arches
which carve-out MPU guard space from the actual stack
buffer.

thread->stack_info now has a new member 'delta' which
indicates any offset applied for TLS or random offset.
It's used so the calculations don't need to be repeated
if the thread later drops to user mode.

CONFIG_INIT_STACKS logic is now performed inside
z_setup_new_thread(), before arch_new_thread() is called.

thread->stack_info is now defined as the canonical
user-accessible area within the stack object, including
random offsets and TLS. It will never include any
carved-out memory for MPU guards and must be updated at
runtime if guards are removed.

Available stack space is now optimized. Some arches may
need to significantly round up the buffer size to account
for page-level granularity or MPU power-of-two requirements.
This space is now accounted for and used by virtue of
the Z_THREAD_STACK_SIZE_ADJUST() call in z_setup_new_thread.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-07-30 21:11:14 -04:00
Andrew Boie 24825c8667 arches: fix arch_new_thread param names
MISRA-C wants the parameter names in a function implementaion
to match the names used by the header prototype.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-07-30 21:11:14 -04:00
Andrew Boie 62eb7d99dc arch_interface: remove unnecessary params
arch_new_thread() passes along the thread priority and option
flags, but these are already initialized in thread->base and
can be accessed there if needed.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-07-30 21:11:14 -04:00
Andrew Boie ee3c50ba6d x86: apic: use device MMIO APIs
A hack was required for the loapic code due to the address
range not being in DTS. A bug was filed.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-07-17 11:38:18 +02:00
Andy Ross c7d76cbe58 arch/x86: Add a spurious interrupt handler to x86_64
Right now x86_64 doesn't install handlers for vectors that aren't
populated by Zephyr code.  Add a tiny spurious interrupt handler that
logs the error and triggers a fatal error, like other platforms do.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-07-08 12:34:09 +02:00
Andy Ross 36b8db0129 arch/x86: Map 512G by default when booting x86_64
The default page table (the architecturally required one used for
entrance to long mode, before the OS page tables get assembled) was
mapping the first 4G of memory.

Extend this to 512G by fully populating the second level page table.
We have devices now (up_squared) which have real RAM mapped above 4G.
There's really no good reason not to do this, the page is present
always anyway.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-07-08 12:34:09 +02:00
Andy Ross 4d5e67ed13 arch/x86: Unbreak SMP startup on x86_64
A last minute "cleanup" to the EFI startup path (on a system where I
had SMP disabled) moved the load of the x86_cpuboot[0] entry into RBP
into the main startup code, which is wrong because on auxiliary CPUs
that's already set up by the 16/32 bit entry code to point to the
OTHER entries.

Put it back where it belongs.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-07-07 12:59:33 -04:00
Andy Ross 928d31125f arch/x86: Add zefi, an EFI stub/packer/wrappper/loader
This is a first cut on a tool that will convert a built Zephyr ELF
file into an EFI applciation suitable for launching directly from the
firmware of a UEFI-capable device, without the need for an external
bootloader.

It works by including the Zephyr sections into the EFI binary as
blobs, then copying them into place on startup.

Currently, it is not integrated in the build.  Right now you have to
build an image for your target (up_squared has been tested) and then
pass the resulting zephyr.elf file as an argument to the
arch/x86/zefi/zefi.py script.  It will produce a "zephyr.efi" file in
the current directory.

This involved a little surgery in x86_64 to copy over some setup that
was previously being done in 32 bit mode to a new EFI entry point.
There is no support for 32 bit UEFI targets for toolchain reasons.

See the README for more details.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-07-02 09:10:01 -04:00
Kumar Gala a1b77fd589 zephyr: replace zephyr integer types with C99 types
git grep -l 'u\(8\|16\|32\|64\)_t' | \
		xargs sed -i "s/u\(8\|16\|32\|64\)_t/uint\1_t/g"
	git grep -l 's\(8\|16\|32\|64\)_t' | \
		xargs sed -i "s/s\(8\|16\|32\|64\)_t/int\1_t/g"

Signed-off-by: Kumar Gala <kumar.gala@linaro.org>
2020-06-08 08:23:57 -05:00
Daniel Leung 251cb61e20 x86_64: instrument code for timing information
On x86_64, the arch_timing_* variables are not set which
results in incorrect values being used in the timing_info
benchmarks. So instrument the code for those values.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2020-05-20 22:36:04 +02:00
Zide Chen d27f6cb5eb interrupt_controller: program local APIC LDR register for xAPIC
If IO APIC is in logical destination mode, local APICs compare their
logical APIC ID defined in LDR (Logical Destination Register) with
the destination code sent with the interrupt to determine whether or not
to accept the incoming interrupt.

This patch programs LDR in xAPIC mode to support IO APIC logical mode.

The local APIC ID from local APIC ID register can't be used as the
'logical APIC ID' because LAPIC ID may not be consecutive numbers hence
it makes it impossible for LDR to encode 8 IDs within 8 bits.

This patch chooses 0 for BSP, and for APs, cpu_number which is the index
to x86_cpuboot[], which ultimately assigned in z_smp_init[].

Signed-off-by: Zide Chen <zide.chen@intel.com>
2020-05-08 22:32:39 -04:00
Andrew Boie 0091a700d3 x86_64: fix crash on nested interrupts
x86_64 supports 4 levels of interrupt nesting, with
the interrupt stack divided up into sub-stacks for
each nesting level.

Unfortunately, the initial interrupt stack pointer
on the first CPU was not taking into account reserved
space for guard areas, causing a stack overflow exception
when attempting to use the last interrupt nesting level,
as that page had been set up as a stack guard.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-05-01 11:44:05 -07:00
Andrew Boie 1f6f977f05 kernel: centralize new thread priority check
This was being done inconsistently in arch_new_thread(), just
move to the core kernel.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-04-21 18:45:45 -04:00
Andrew Boie c0df99cc77 kernel: reduce scope of z_new_thread_init()
The core kernel z_setup_new_thread() calls into arch_new_thread(),
which calls back into the core kernel via z_new_thread_init().

Move everything that doesn't have to be in z_new_thread_init() to
z_setup_new_thread() and convert to an inline function.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-04-21 18:45:45 -04:00
Andrew Boie 80a0d9d16b kernel: interrupt/idle stacks/threads as array
The set of interrupt stacks is now expressed as an array. We
also define the idle threads and their associated stacks this
way. This allows for iteration in cases where we have multiple
CPUs.

There is now a centralized declaration in kernel_internal.h.

On uniprocessor systems, z_interrupt_stacks has one element
and can be used in the same way as _interrupt_stack.

The IRQ stack for CPU 0 is now set in init.c instead of in
arch code.

The extern definition of the main thread stack is now removed,
this doesn't need to be in a header.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-03-16 23:17:36 +02:00
Andrew Boie 9062a5ee91 revert: "program local APIC LDR register for..."
This reverts commit 87b65c5ac2.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-02-19 14:40:19 -08:00
Zide Chen 87b65c5ac2 interrupt_controller: program local APIC LDR register for xAPIC
If IO APIC is in logical destination mode, local APICs compare their
logical APIC ID defined in LDR (Logical Destination Register) with
the destination code sent with the interrupt to determine whether or not
to accept the incoming interrupt.

This patch programs LDR in xAPIC mode to support IO APIC logical mode.

The local APIC ID from local APIC ID register can't be used as the
'logical APIC ID' because LAPIC ID may not be consecutive numbers hence
it makes it impossible for LDR to encode 8 IDs within 8 bits.

This patch chooses 0 for BSP, and for APs, cpu_number which is the index
to x86_cpuboot[], which ultimately assigned in z_smp_init[].

Signed-off-by: Zide Chen <zide.chen@intel.com>
2020-02-19 10:25:10 -08:00
Andrew Boie 768a30c14f x86: organize 64-bit ESF
The callee-saved registers have been separated out and will not
be saved/restored if exception debugging is shut off.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-02-08 08:51:43 -05:00
Andy Ross 0e32f4dab0 arch/x86_64: Save RFLAGS during arch_switch()
The context switch implementation forgot to save the current flag
state of the old thread, so on resume the flags would be restored to
whatever value they had at the last interrupt preemption or thread
initialization.  In practice this guaranteed that the interrupt enable
bit would always be wrong, becuase obviously new threads and preempted
ones have interrupts enabled, while arch_switch() is always called
with them masked.  This opened up a race between exit from
arch_switch() and the final exit path in z_swap().

The other state bits weren't relevant -- the oddball ones aren't used
by Zephyr, and as arch_switch() on this architecture is a function
call the compiler would have spilled the (caller-save) comparison
result flags anyway.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-02-08 08:51:04 -05:00
Andy Ross 5b85d6da6a arch/x86_64: Poison instruction pointer of running threads
There was a bug where double-dispatch of a single thread on multiple
SMP CPUs was possible.  This can be mind-bending to diagnose, so when
CONFIG_ASSERT is enabled add an extra instruction to __resume (the
shared code path for both interupt return and context switch) that
poisons the shared RIP of the now-running thread with a recognizable
invalid value.

Now attempts to run the thread again will crash instantly with a
discoverable cookie in their instruction pointer, and this will remain
true until it gets a new RIP at the next interrupt or switch.

This is under CONFIG_ASSERT because it meets the same design goals of
"a cheap test for impossible situations", not because it's part of the
assertion framework.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-02-03 09:31:56 -05:00
Andy Ross 86430d8d46 kernel: arch: Clarify output switch handle requirements in arch_switch
The original intent was that the output handle be written through the
pointer in the second argument, though not all architectures used that
scheme.  As it turns out, that write is becoming a synchronization
signal, so it's no longer optional.

Clarify the documentation in arch_switch() about this requirement, and
add an instruction to the x86_64 context switch to implement it as
original envisioned.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2020-01-21 14:47:52 -08:00
Andrew Boie e34f1cee06 x86: implement kernel page table isolation
Implement a set of per-cpu trampoline stacks which all
interrupts and exceptions will initially land on, and also
as an intermediate stack for privilege changes as we need
some stack space to swap page tables.

Set up the special trampoline page which contains all the
trampoline stacks, TSS, and GDT. This page needs to be
present in the user page tables or interrupts don't work.

CPU exceptions, with KPTI turned on, are treated as interrupts
and not traps so that we have IRQs locked on exception entry.

Add some additional macros for defining IDT entries.

Add special handling of locore text/rodata sections when
creating user mode page tables on x86-64.

Restore qemu_x86_64 to use KPTI, and remove restrictions on
enabling user mode on x86-64.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-01-17 16:17:39 -05:00
Andrew Boie 2690c9e550 x86: move some per-cpu initialization to C
No reason we need to stay in assembly domain once we have
GS and a stack set up.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-01-13 16:35:10 -05:00
Andrew Boie a594ca7c8f kernel: cleanup and formally define CPU start fn
The "key" parameter is legacy, remove it.

Add a typedef for the expected function pointer type.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-01-13 16:35:10 -05:00
Andrew Boie 4fcf28ef25 x86: mitigate swapgs Spectre V1 attacks
See CVE-2019-1125. We mitigate this by adding an 'lfence'
upon interrupt/exception entry after the decision has been
made whether it's necessary to invoke 'swapgs' or not.

Only applies to x86_64, 32-bit doesn't use swapgs.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-01-13 16:35:10 -05:00
Andrew Boie 3d80208025 x86: implement user mode on 64-bit
- In early boot, enable the syscall instruction and set up
  necessary MSRs
- Add a hook to update page tables on context switch
- Properly initialize thread based on whether it will
  start in user or supervisor mode
- Add landing function for system calls to execute the
  desired handler
- Implement arch_user_string_nlen()
- Implement logic for dropping a thread down to user mode
- Reserve per-CPU storage space for user and privilege
  elevation stack pointers, necessary for handling syscalls
  when no free registers are available
- Proper handling of gs register considerations when
  transitioning privilege levels

Kernel page table isolation (KPTI) is not yet implemented.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-01-13 16:35:10 -05:00
Andrew Boie 077b587447 x86: implement hw-based oops for both variants
We use a fixed value of 32 as the way interrupts/exceptions
are setup in x86_64's locore.S do not lend themselves to
Kconfig configuration of the vector to use.

HW-based kernel oops is now permanently on, there's no reason
to make it optional that I can see.

Default vectors for IPI and irq offload adjusted to not
collide.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-01-13 16:35:10 -05:00
Andrew Boie ded0185eb8 x86: add GDT descriptors for user mode
These are arranged in the particular order required
by the syscall/sysret instructions.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-01-13 16:35:10 -05:00
Andrew Boie 692fda47fc x86: use MSRs for %gs
We don't need to set up GDT data descriptors for setting
%gs. Instead, we use the x86 MSRs to set GS_BASE and
KERNEL_GS_BASE.

We don't currently allow user mode to set %gs on its own,
but later on if we do, we have everything set up to issue
'swapgs' instructions on syscall or IRQ.

Unused entries in the GDT have been removed.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-01-13 16:35:10 -05:00
Andrew Boie 10d033ebf0 x86: enable recoverable exceptions on 64-bit
These were previously assumed to always be fatal.
We can't have the faulting thread's XMM registers
clobbered, so put the SIMD/FPU state onto the stack
as well. This is fairly large (512 bytes) and the
execption stack is already uncomfortably small, so
increase to 2K.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-01-13 16:35:10 -05:00
Andrew Boie 2b67ca8ac9 x86: improve exception debugging
We now dump more information for less common cases,
and this is now centralized code for 32-bit/64-bit.
All of this code is now correctly wrapped around
CONFIG_EXCEPTION_DEBUG. Some cruft and unused defines
removed.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-12-17 11:39:22 -08:00
Andrew Boie 4f77c2ad53 kernel: rename z_arch_ to arch_
Promote the private z_arch_* namespace, which specifies
the interface between the core kernel and the
architecture code, to a new top-level namespace named
arch_*.

This allows our documentation generation to create
online documentation for this set of interfaces,
and this set of interfaces is worth treating in a
more formal way anyway.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-11-07 15:21:46 -08:00
Andrew Boie 0b27cacabd x86: up-level page fault handling
Now in common code for 32-bit and 64-bit.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-11-07 11:19:38 -05:00
Andrew Boie 7d1ae023f8 x86: enable stack overflow detection on 64-bit
CONFIG_HW_STACK_PROTECTION is now available.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-11-06 17:50:34 -08:00
Andrew Boie 9fb89ccf32 x86: consolidate some fatal error code
Some code for unwinding stacks and z_x86_fatal_error()
now in a common C file, suitable for both modes.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-11-06 17:50:34 -08:00
Andrew Boie e1f2292895 x86: intel64: set page tables for every CPU
The page tables to use are now stored in the cpuboot struct.
For the first CPU, we set to the flat page tables, and then
update later in z_x86_prep_c() once the runtime tables have
been generated.

For other CPUs, by the time we get to z_arch_start_cpu()
the runtime tables are ready do go, and so we just install
them directly.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-10-24 12:48:45 -07:00
Andrew Boie 46540f4bb1 intel64: don't set global flag in ptables
May cause issues when runtime page tables are installed.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-10-24 12:48:45 -07:00
Andrew Boie cdd721db3b locore: organize data by type
Program text, rodata, and data need different MMU
permissions. Split out rodata and data from the program
text, updating the linker script appropriately.

Region size symbols added to the linker script, so these
can later be used with MMU_BOOT_REGION().

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-10-24 12:48:45 -07:00
Andrew Boie 979b17f243 kernel: activate arch interface headers
Duplicate definitions elsewhere have been removed.

A couple functions which are defined by the arch interface
to be non-inline, but were implemented inline by native_posix
and intel64, have been moved to non-inline.

Some missing conditional compilation for z_arch_irq_offload()
has been fixed, as this is an optional feature.

Some massaging of native_posix headers to get everything
in the right scope.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-10-21 10:13:38 -07:00
Andy Ross 8bc3b6f673 arch/x86/intel64: Fix assumption with dummy threads
The intel64 switch implementation doesn't actually use a switch handle
per se, just the raw thread struct pointers which get stored into the
handle field.  This works fine for normally initialized threads, but
when switching out of a dummy thread at initialization, nothing has
initialized that field and the code was dumping registers into the
bottom of memory through the resulting NULL pointer.

Fix this by skipping the load of the field value and just using an
offset instead to get the struct address, which is actually slightly
faster anyway (a SUB immediate instruction vs. the load).

Actually for extra credit we could even move the switch_handle field
to the top of the thread struct and eliminate the instruction
entirely, though if we did that it's probably worth adding some
conditional code to make the switch_handle field disappear entirely.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2019-10-19 12:09:32 -07:00
Andrew Boie 9c76d28bee x86: intel64: fatal: minor formatting improvements
Line up everything nicely, add leading '0x' to hex
addresses, and remove redundant newlines. Add
whitespace between the register name and contents
so the contents can be easily selected from a terminal.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-10-15 09:00:49 -07:00
Andrew Boie fdd3ba896a x86: intel64: set the WP bit for paging
Otherwise, supervisor mode can write to read-only areas,
failing tests that check this.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-10-15 09:00:49 -07:00
Andrew Boie e340f8d22e x86: intel64: enable no-execute
Set the NXE bit in the EFER MSR so that the NX bit can
be set in page tables. Otherwise, the NX bit is treated
as reserved and leads to a fault if set.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-10-10 13:41:02 -07:00
Andrew Boie 7f715bd113 intel64: add inline definition of z_arch_switch()
This just calls the assembly code. Needed to be
inline per the architecture API.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-10-09 09:14:18 -04:00
Charles E. Youse f7cfb4303b arch/x86: do not assume MP means SMP
It's possible to have multiple processors configured without using the
SMP scheduler, so don't make definitions dependent on CONFIG_SMP.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse e6a31a9e89 arch/x86: (Intel64) initialize TSS interrupt stack from cpuboot[]
In non-SMP MP situations, the interrupt stacks might not exist, so
do not assume they do. Instead, initialize the TSS IST1 from the
cpuboot[] vector (meaning, on APs, the stack from z_arch_start_cpu).
Eliminates redundancy at the same time.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse 643661cb44 arch/x86: declare z_x86_prep_c() in kernel_arch_func.h
And remove the ad hoc prototype in cpu.c for Intel64.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse 5abab591c2 arch/x86: (Intel64) make z_arch_start_cpu() synchronous
Don't leave z_arch_start_cpu() until the target CPU has been started.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse 3b145c0d4b arch/x86: (Intel64) do not lock interrupts around irq_offload()
This is the Wrong Thing(tm) with SMP enabled. Previously this
worked because interrupts would be re-enabled in the interrupt
entry sequence, but this is no longer the case.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse 66510db98c arch/x86: (Intel64) add scheduler IPI support
Add z_arch_sched_ipi() and such to enable scheduler IPIs when SMP.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse 74e3717af6 arch/x86: (Intel64) fix conditional assembly in locore.S
was ignoring the rest of the expression, though the effect was
harmless (including unreachable code in some builds).

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse 3eb1a8b59a arch/x86: (Intel64) implement SMP support
Add duplicate per-CPU data structures (x86_cpuboot, tss, stacks, etc.)
for up to 4 total CPUs, add code in locore and z_arch_start_cpu().

The test board, qemu_x86_long, now defaults to 2 CPUs.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse 2808908816 arch/x86: alter signature of z_x86_prep_c() function
Take a dummy first argument, so that the BSP entry point (z_x86_prep_c)
has the same signature as the AP entry point (smp_init_top).

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse f9eaee35b8 arch/x86: (Intel64) use per-CPU parameter struct for CPU startup
A new 'struct x86_cpuboot' is created as well as an instance called
'x86_cpuboot[]' which contains per-CPU boot data (initial stack,
entry function/arg, selectors, etc.). The locore now consults this
table to set up per-CPU registers, etc. during early boot.

Also, rename tss.c to cpu.c as its scope is growing.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse edf5761c83 arch/x86: (Intel64) rename kernel segment constants
There's no need to qualify the 64-bit CS/DS selectors, and the GS and
TR selectors are renamed CPU0_GS and CPU0_TR as they are CPU-specific.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse 90bf0da332 arch/x86: (Intel64) optimize and re-order startup assembly sequence
In some places the code was being overly pedantic; e.g., there is no
need to load our own 32-bit descriptors because the loader's are fine
for our purposes. We can defer loading our own segments until 64-bit.

The sequence is re-ordered to faciliate code sharing between the BSP
and APs when SMP is enabled (all BSP-specific operations occur before
the per-CPU initialization).

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse 17e135bc41 arch/x86: (Intel64) clear BSS before entering long mode
This is really just to facilitate CPU bootstrap code between
the BSP and the APs, moving the clear operation out of the way.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse a981f51fe6 arch/x86: drivers/loapic_intr.c: move local APIC initialization
In the general case, the local APIC can't be treated as a normal device
with a single boot-time initialization - on SMP systems, each CPU must
initialize its own. Hence the initialization proper is separated from
the device-driver initialization, and said initialization is called
from the early startup-assembly code when appropriate.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse 418e5c1b38 arch/x86: factor out common assembly startup code
The 32-bit and 64-bit assembly startup sequences share quite a
bunch of common code, so it's factored out into one file to avoid
repeating ourselves (and potentially falling out of sync).

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse 8279af76c8 arch/x86: elevate prep_c.c to common code
Elevate the previously 32-bit-only z_x86_prep_c() function to common
code, so both 32-bit and 64-bit arches now enter the kernel this way.
Minor changes to prep_c.c to make it build with the SMP scheduler on.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Charles E. Youse cc9be2e982 arch/x86: (Intel64) start up on _interrupt_stack, not _exception_stack
Simply for consistency with other platforms.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-10-07 19:46:55 -04:00
Andrew Boie 89d4c6928e kernel: add arch abstraction for irq_offload()
This makes it clearer that this is an API that is expected
to be implemented at the architecture level.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-10-01 11:11:42 +02:00
Andrew Boie 61901ccb4c kernel: rename z_new_thread()
This is part of the core kernel -> architecture interface
and should have a leading prefix z_arch_.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2019-09-30 15:25:55 -04:00
Charles E. Youse 200056df2f arch/x86: rename CONFIG_X86_MULTIBOOT and related to CONFIG_MULTIBOOT
Simple naming change, since MULTIBOOT is clear enough by itself and
"namespacing" it to X86 is unnecessary and/or inappropriate.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-09-29 12:30:34 -07:00
Charles E. Youse 66a2ed2360 arch/x86: (Intel64) move RAX to volatile register set
This used to be part of the "restore always" set of registers because
__swap was expected to return a value.  No longer required, so RAX is
moved to the volatile registers and we save a few cycles occasionally.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-09-23 17:50:09 -07:00
Charles E. Youse 074ce889fb arch/x86: (Intel64) migrate from __swap to z_arch_switch()
The latter primitive is required for SMP.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-09-23 17:50:09 -07:00
Charles E. Youse 32fc239aa2 arch/x86: (Intel64) use TSS for per-CPU variables
A space is allocated in the TSS for per-CPU variables. At present,
this is only a 'struct _cpu *' to find the _kernel CPU struct. The
locore routines are rewritten to find _current and _nested via this
pointer rather than referencing the _kernel global directly.

This is obviously in preparation for SMP support.

Signed-off-by: Charles E. Youse <charles.youse@intel.com>
2019-09-23 17:50:09 -07:00