The boot code of x86_64 initializes the stack (if enabled)
with a hard-coded size for the ISR stack. However,
the stack being used does not have to be the ISR stack,
and can be any defined stacks. So pass in the actual size
of the stack so the stack can be initialized properly.
Fixes#21843
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Changes to paging code ensured that the NULL virtual page is
never mapped. Since RAM is identity mapped, on a PC-like
system accessing the BIOS Data Area in the first 4K requires
a memory mapping. We need to read this to probe the ACPI RSDP.
Additionally check that the BDA has something in it as well
and not a bunch of zeroes.
It is unclear whether this function is truly safe on UEFI
systems, but that is for another day.
Fixes: #27867
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
When probing for PCI-E device resources, it is possible that
configuration via MMIO is not available. This may caused by
BIOS or its settings. So when CONFIG_PCIE_MMIO_CFG=y, have
a fallback path to config devices via PIO. The inability to
config via MMIO has been observed on a couple UP Squared
boards.
Fixes#27339
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This code had one purpose only, feed timing information into a test and
was not used by anything else. The custom trace points unfortunatly were
not accurate and this test was delivering informatin that conflicted
with other tests we have due to placement of such trace points in the
architecture and kernel code.
For such measurements we are planning to use the tracing functionality
in a special mode that would be used for metrics without polluting the
architecture and kernel code with additional tracing and timing code.
Furthermore, much of the assembly code used had issues.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
We no longer plan to support a split address space with
the kernel in high memory and per-process address spaces.
Because of this, we can simplify some things. System RAM
is now always identity mapped at boot.
We no longer require any virtual-to-physical translation
for page tables, and can remove the dual-mapping logic
from the page table generation script since we won't need
to transition the instruction point off of physical
addresses.
CONFIG_KERNEL_VM_BASE and CONFIG_KERNEL_VM_LIMIT
have been removed. The kernel's address space always
starts at CONFIG_SRAM_BASE_ADDRESS, of a fixed size
specified by CONFIG_KERNEL_VM_SIZE.
Driver MMIOs and other uses of k_mem_map() are still
virtually mapped, and the later introduction of demand
paging will result in only a subset of system RAM being
a fixed identity mapping instead of all of it.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
In order to be possible to debug usermode threads need to be able
issue breakpoint and debug exceptions. To do this it is necessary to
set DPL bits to, at least, the same CPL level.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
It implements gdb remote protocol to talk with a host gdb during the
debug session. The implementation is divided in three layers:
1 - The top layer that is responsible for the gdb remote protocol.
2 - An architecture specific layer responsible to write/read registers,
set breakpoints, handle exceptions, ...
3 - A transport layer to be used to communicate with the host
The communication with GDB in the host is synchronous and the systems
stops execution waiting for instructions and return its execution after
a "continue" or "step" command. The protocol has an exception that is
when the host sends a packet to cause an interruption, usually triggered
by a Ctrl-C. This implementation ignores this instruction though.
This initial work supports only X86 using uart as backend.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Now that device_api attribute is unmodified at runtime, as well as all
the other attributes, it is possible to switch all device driver
instance to be constant.
A coccinelle rule is used for this:
@r_const_dev_1
disable optional_qualifier
@
@@
-struct device *
+const struct device *
@r_const_dev_2
disable optional_qualifier
@
@@
-struct device * const
+const struct device *
Fixes#27399
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
The x86 paging code has been rewritten to support another paging mode
and non-identity virtual mappings.
- Paging code now uses an array of paging level characteristics and
walks tables using for loops. This is opposed to having different
functions for every paging level and lots of #ifdefs. The code is
now more concise and adding new paging modes should be trivial.
- We now support 32-bit, PAE, and IA-32e page tables.
- The page tables created by gen_mmu.py are now installed at early
boot. There are no longer separate "flat" page tables. These tables
are mutable at any time.
- The x86_mmu code now has a private header. Many definitions that did
not need to be in public scope have been moved out of mmustructs.h
and either placed in the C file or in the private header.
- Improvements to dumping page table information, with the physical
mapping and flags all shown
- arch_mem_map() implemented
- x86 userspace/memory domain code ported to use the new
infrastructure.
- add logic for physical -> virtual instruction pointer transition,
including cleaning up identity mappings after this takes place.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The address was being truncated because we were using
32-bit registers. CONFIG_MMU is always enabled on 64-bit,
remove the #ifdef.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Move tracing switched_in and switched_out to the architecture code and
remove duplications. This changes swap tracing for x86, xtensa.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
This set of functions seem to be there just because of historical
reasons, stemming from Kbuild. They are non-obvious and prone to errors,
so remove them in favor of the `_ifdef()` ones with an explicit
`CONFIG_` condition.
Script used:
git grep -l _if_kconfig | xargs sed -E -i
"s/_if_kconfig\(\s*(\w*)/_ifdef(CONFIG_\U\1\E \1/g"
Signed-off-by: Carles Cufi <carles.cufi@nordicsemi.no>
These stacks are appropriate for threads that run purely in
supervisor mode, and also as stacks for interrupt and exception
handling.
Two new arch defines are introduced:
- ARCH_KERNEL_STACK_GUARD_SIZE
- ARCH_KERNEL_STACK_OBJ_ALIGN
New public declaration macros:
- K_KERNEL_STACK_RESERVED
- K_KERNEL_STACK_EXTERN
- K_KERNEL_STACK_DEFINE
- K_KERNEL_STACK_ARRAY_DEFINE
- K_KERNEL_STACK_MEMBER
- K_KERNEL_STACK_SIZEOF
If user mode is not enabled, K_KERNEL_STACK_* and K_THREAD_STACK_*
are equivalent.
Separately generated privilege elevation stacks are now declared
like kernel stacks, removing the need for K_PRIVILEGE_STACK_ALIGN.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This now takes a stack pointer as an argument with TLS
and random offsets accounted for properly.
Based on #24467 authored by Flavio Ceolin.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The core kernel computes the initial stack pointer
for a thread, properly aligning it and subtracting out
any random offsets or thread-local storage areas.
arch_new_thread() no longer needs to make any calculations,
an initial stack frame may be placed at the bounds of
the new 'stack_ptr' parameter passed in. This parameter
replaces 'stack_size'.
thread->stack_info is now set before arch_new_thread()
is invoked, z_new_thread_init() has been removed.
The values populated may need to be adjusted on arches
which carve-out MPU guard space from the actual stack
buffer.
thread->stack_info now has a new member 'delta' which
indicates any offset applied for TLS or random offset.
It's used so the calculations don't need to be repeated
if the thread later drops to user mode.
CONFIG_INIT_STACKS logic is now performed inside
z_setup_new_thread(), before arch_new_thread() is called.
thread->stack_info is now defined as the canonical
user-accessible area within the stack object, including
random offsets and TLS. It will never include any
carved-out memory for MPU guards and must be updated at
runtime if guards are removed.
Available stack space is now optimized. Some arches may
need to significantly round up the buffer size to account
for page-level granularity or MPU power-of-two requirements.
This space is now accounted for and used by virtue of
the Z_THREAD_STACK_SIZE_ADJUST() call in z_setup_new_thread.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
MISRA-C wants the parameter names in a function implementaion
to match the names used by the header prototype.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
arch_new_thread() passes along the thread priority and option
flags, but these are already initialized in thread->base and
can be accessed there if needed.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
MISRA-C directive 4.10 requires that files being included must
prevent itself from being included more than once. So add
include guards to the offset files, even though they are C
source files.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The hardware stack overflow feature requires
CONFIG_THREAD_STACK_INFO enabled in order to distingush
stack overflows from other causes when we get an exception.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
A hack was required for the loapic code due to the address
range not being in DTS. A bug was filed.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This driver code uses PCIe and doesn't use Zephyr's
device model, so we can't use the nice DEVICE_MMIO macros.
Set stuff up manually instead using device_map().
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This currently only supports identity paging; there's just
enough here for device_map() calls to work.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Adding just the cache flush function for x86. The name
arch_cache_flush comply with API names in include/cache.h
Signed-off-by: Aastha Grover <aastha.grover@intel.com>
The page table initialization needs a populated PCI MMIO
configuration, and that is lazy-evaluated. We aren't guaranteed that
a driver already hit that path, so be sure to call it explicitly.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Right now x86_64 doesn't install handlers for vectors that aren't
populated by Zephyr code. Add a tiny spurious interrupt handler that
logs the error and triggers a fatal error, like other platforms do.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This patch is almost entirely aesthetics, designed to isolate the
variant configurations to a simple macro API (just IN/OUT), reduce
complexity derived from code pasted out of the larger ns16550 driver,
and keep the complexity out of the (very simple!) core code. Useful
when hacking on the driver in contexts where it isn't working yet.
The sole behavioral change here is that I've removed the runtime
printk hook installation in favor of defining an
arch_printk_char_out() function which overrides the weak-linked
default (that is, we don't need to install a hook, we can be the
default hook at startup).
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Various cleanups to the x86 early serial driver, mostly with the goal
of simplifying its deployment during board bringup (which is really
the only reason it exists in the first place):
+ Configure it =y by default. While there are surely constrained
environments that will want to disable it, this is a TINY driver,
and it serves a very important role for niche tasks. It should be
built always to make sure it works everywhere.
+ Decouple from devicetree as much as possible. This code HAS to work
during board bringup, often with configurations cribbed from other
machines, before proper configuration gets written. Experimentally,
devicetree errors tend to be easy to make, and without a working
console impossible to diagnose. Specify the device via integer
constants in soc.h (in the case of IOPORT access, we already had
such a symbol) so that the path from what the developer intends to
what the code executes is as short and obvious as possible.
Unfortunately I'm not allowed to remove devicetree entirely here,
but at least a developer adding a new platform will be able to
override it in an obvious way instead of banging blindly on the
other side of a DTS compiler.
+ Don't try to probe the PCI device by ID to "verify". While this
sounds like a good idea, in practice it's just an extra thing to get
wrong. If we bail on our early console because someone (yes, that's
me) got the bus/device/function right but typoed the VID/DID
numbers, we're doing no one any favors.
+ Remove the word-sized-I/O feature. This is a x86 driver for a PCI
device. No known PC hardware requires that UART register access be
done in dword units (in fact doing so would be a violation of the
PCI specifciation as I understand it). It looks to have been cut
and pasted from the ns16550 driver, remove.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The default page table (the architecturally required one used for
entrance to long mode, before the OS page tables get assembled) was
mapping the first 4G of memory.
Extend this to 512G by fully populating the second level page table.
We have devices now (up_squared) which have real RAM mapped above 4G.
There's really no good reason not to do this, the page is present
always anyway.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
A last minute "cleanup" to the EFI startup path (on a system where I
had SMP disabled) moved the load of the x86_cpuboot[0] entry into RBP
into the main startup code, which is wrong because on auxiliary CPUs
that's already set up by the 16/32 bit entry code to point to the
OTHER entries.
Put it back where it belongs.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This is a first cut on a tool that will convert a built Zephyr ELF
file into an EFI applciation suitable for launching directly from the
firmware of a UEFI-capable device, without the need for an external
bootloader.
It works by including the Zephyr sections into the EFI binary as
blobs, then copying them into place on startup.
Currently, it is not integrated in the build. Right now you have to
build an image for your target (up_squared has been tested) and then
pass the resulting zephyr.elf file as an argument to the
arch/x86/zefi/zefi.py script. It will produce a "zephyr.efi" file in
the current directory.
This involved a little surgery in x86_64 to copy over some setup that
was previously being done in 32 bit mode to a new EFI entry point.
There is no support for 32 bit UEFI targets for toolchain reasons.
See the README for more details.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The traditional IO Port configuration mechanism was technically
deprecated about 15 years ago when PCI Express started shipping.
While frankly the MMIO support is significantly more complicated and
no more performant in practice, Zephyr should have support for current
standards. And (particularly complicated) devices do exist in the
wild whose extended capability pointers spill beyond the 256 byte area
allowed by the legacy mechanism. Zephyr will want drivers for those
some day.
Also, Windows and Linux use MMIO access, which means that's what
system vendors validate.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The existing minimal ACPI implementation was enough to find the MADT
table for dumping CPU info. Enhance it with a slightly less minimal
implementation that can fetch any table, supports the ACPI 2.0 XSDT
directory (technically required on 64 bit systems so tables can live
>4G) and provides definitions for the MCFG table with the PCI
configuration pointers.
Note that there is no use case right now for high performance table
searching, so the "init" step has been removed and tables are probed
independently from scratch for each one requested (there are only
two).
Note also that the memory to which these tables point is not
understood by the Zephyr MMU configuration, so in long mode all ACPI
calls have to be done very early, before z_x86_paging_init() (or on a
build with the MMU initialization disabled).
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
If we get a page fault in early boot context, before
main thread is started, page faults were being
incorrectly reported as stack overflows.
z_x86_check_stack_bounds() needs to consider the
interrupt stack as the correct stack for this context.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This helps distingush between fatal errors if logging isn't
enabled.
As detailed in comments, pass a reason code which controls
the QEMU process' return value.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
x86_64's __resume path 'poisons' the incoming thread's
saved RIP value with a special 0xB9 value, to catch
re-use of thread objects across CPUs in SMP. Add a check
and printout for this when handling fatal errors, and
treat as a kernel panic.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
If KPTI is not enabled, the current value of CR3 is the correct
page tables when the exception happened in all cases.
If KPTI is enabled, and the excepting thread was in user mode,
then a page table switch happened and the current value of CR3
is not the page tables when the fault happened. Get it out of the
thread object instead.
Fixes two problems:
- Divergent exception loop if we crash when _current is a dummy
thread or its page table pointer stored in the thread object is
NULL or uninitialized
- Printing the wrong CR3 value on exceptions from user mode in
the register dump
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
In one of the ASSERT() statement, the PHYS_RAM_ADDR (alias
of DT_REG_ADDR()) may be interpreted by the compiler as
long long int when it's large than 0x7FFFFFFF, but is
paired with %x, resulting in compiler warning. Fix this
by type casting it to uintptr_t and use %lx instead.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>