This function should be pinned in memory instead of simply
putting it in the boot section, as this function will be
used when new threads are created at runtime.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
If generic section is not present at boot, the thread stack
may not be in physical memory. Unconditionally page in the stack
instead of relying on page fault to speed up a little bit
on starting the thread.
Also, this prevents a double fault during thread setup when
setting up stack permission in z_x86_userspace_enter().
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
If the BSS section is not present in physical memory at boot,
do not zero the section, or else page faults would occur.
The zeroing of BSS will be done once the paging mechanism
has been initialized.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
add .S file extension suffix into CMAKE_ASM_SOURCE_FILE_EXTENSIONS,
because clang from OneApi can't recongnize them as asm files on
windows, then they won't be added into build system.
Signed-off-by: Chen Peng1 <peng1.chen@intel.com>
Correct the wrong operand of clflush instruction. The old operand
points to a location inside stack and doesn't work. The new one
works well by taking linux kernel code as reference.
End address instead of size should get round up
Add Kconfig option to disable the usage of mfence intruction for
SoC that has clfulsh but no mfence supported.
Signed-off-by: Dong Wang <dong.d.wang@intel.com>
The code depends on the order of evaluation 'z_x86_check_stack_bounds'
function arguments.
The solution is to assign these values to variables and then pass
them in.
The fix would be to make 2 local variables, assign them the values
of _df_esf.esp and .cs, and then call the function with those 2 local
variables as arguments.
Found as a coding guideline violation (MISRA R13.2) by static
coding scanning tool.
Change "int reason" to "unsigned reason" like in other functions.
Signed-off-by: Maksim Masalski <maksim.masalski@intel.com>
According to the Zephyr Coding Guideline all switch statements
shall be well-formed.
Add a comment to the empty default case.
Add a LOG_ERR to the default case.
Found as a coding guideline violation (MISRA R16.1) by static
coding scanning tool.
Signed-off-by: Maksim Masalski <maksim.masalski@intel.com>
Instead of accessing ACPI tables through physical address, do
memory mapping/unmapping so they can be accessed via virtual
addresses. This allows us to avoid identity mapping all
physical memory, and thus no need for a page table large enough
to map everything.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This limits the search for Extended BIOS Data Area (EBDA) to
0x80000 to 0x100000 as this is usually the area for it.
If 0000:040e has an address not pointing to this area, it is
probably an invalid address, and should not be de-referenced
to avoid segfault.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Most arch's CMakeLists.txt contain rules to add compiler and linker
flags for coverage if CONFIG_COVERAGE is enabled, but 4 of them were
missing this.
Instead, set the coverage flags in arch/common/CMakeLists.txt which
affects all archs.
Signed-off-by: Jeremy Bettis <jbettis@chromium.org>
Essential type of RHS operand (64 bit) is wider than essential
type of composite expression in LHS operand (32 bit).
LHS entry_val is 32 bit, and RHS (phys+offset) is 64 bit.
Cast RHS composite expression to the (pentry_t) type.
Found as a coding guideline violation (MISRA R10.7) by static
coding scanning tool.
Signed-off-by: Maksim Masalski <maksim.masalski@intel.com>
The 16 bit bootstrap code for SMP CPUs was using the 286-era "lmsw"
instruction (load machine status word) to set the protected bit in CR0
(which is the modern evolution of the same register), presumably
because this is 16 bit code and we can't move a dword into CR0.
But that's wrong, because the full instruction set *is* available in
real mode on a 386, you just have to use a operand size prefix to get
to it, which the assembler emits for you automatically when you use
the .code16 directive.
Write this conventionally and use modern (e.g. 1986-era) instructions.
It also has the advantage of not confusing much more modern
hypervisors like ACRN by issuing instructions they (and I!) never knew
existed.
Fixes#35076
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Because of a historical misunderstanding, by default the ACRN
hypervisor wants to load Zephyr at address 0x1000 and enter the binary
at that same address. This entry point corresponds to the __start
symbol of the build they were given, which is a 1-cpu non-SMP
configuration. Unfortunately, when we build with
CONFIG_MP_NUM_CPUS=1, the code in locore.S #if's out the 16 bit entry
point for the auxiliary CPUs at the start of the section. So in the
build ACRN received, the start address happened to be 0x7000, the same
address we need to launch the AP processors from.
That's right: under ACRN, the SAME ADDRESS used to enter the OS in 32
bit mode needs to be used later to boot CPUs running in 16 bit real
mode!
The solution, such as it is, is to put a 32 bit jump at the entry
address which hops to the 32 bit OS entry code, and then scribble NOP
instructions over that jump once we get there so that the next time we
reach that address (in real mode) we fall through to the correct
entry.
This patch should be considered a temporary workaround. While it
works on all x86 hardware, it's not really needed. A much better
solution would be to eliminate the locore linker region entirely
(which causes other headaches) and enter the Zephyr binary in a 32 bit
address somewhere in the contiguous high memory area. All that locore
is needed for is the 16 bit bootstrap code for SMP processors, which
is ~6 instructions and can be copied in from the kernel at runtime.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
From the point of checking the info pointer value all code in the
z_multiboot_init() function depends on it being non-NULL. Therefore,
simply return from the function if it's NULL.
Fixes#33084
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When loaded via EFI, we obviously don't have a multiboot info pointer
available (we might have an EFI system table, but zefi doesn't pass
that through yet). Don't try to parse the "whatever garbage was in
%rbp" as a multiboot table.
The configuration is a little clumsy, as strictly our EFI kconfig just
says we're "building for" EFI but not that we'll boot that way. And
tests like arch/x86/info are trying to set CONFIG_MULTIBOOT=n
unconditionally, when it really should be something they detect from
devicetree or wherever.
Fixes#33545
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This marks code and data within x86/ia32 so they are going to
reside in boot and pinned regions. This is a step to enable
demand paging for whole kernel.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This adds both boot and pinned sections to the linker
script for ia32. This is required for enabling demand
paging for kernel and data.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
There is exactly one function being defined with TEXT_START
macro so the x86-32 __start can appear at the beginning of
text section. Since no one else is using it, better remove
TEXT_START to simplify things.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This implements arch_page_phys_get() to translate mapped
virtual addresses back to physical addresses.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
In z_mem_manage_init(), z_free_page_count is only manipulated
after all reserved pages are marked, and will reflect
the actual number of page frames being added to the free page
frame list. Manipulating z_free_page_count before this is
going to mess up the accounting, so remove the code to
decrement z_free_page_count in arch_reserved_pages_update()
under x86.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
There was a restriction that KERNEL_VM_OFFSET must equal to
SRAM_OFFSET so that page directory pointer (PDP) or page
directory (PD) can be reused. This is not very practical in
real world due to various hardware designs, especially those
where SRAM is not aligned to PDP or PD. So rework those bits.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Remove the config BOOT_TIME_MEASUREMENT and corresponding #ifdef'd code
throughout (kernel/init.c, idle.c, core/common.S , reset.S, ... ) which
hold the extern hooks for z_timestamp_main and z_timestamp_idle in the
removed boot_time test suite.
Signed-off-by: Jennifer Williams <jennifer.m.williams@intel.com>
Rephrasing away from ain't, which is informal, uncommon, and can
be viewed as substandard or 'slang'.
Signed-off-by: Jennifer Williams <jennifer.m.williams@intel.com>
Reboot functionality has nothing to do with PM, so move it out to the
subsys/os folder.
Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
Both operands of an operator in which the usual arithmetic
conversions are performed shall have the same essential
type category.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Tests of a value against zero should be made explicit, unless the
operand is effectively Boolean. This is based on MISRA rule 14.4.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
This changes the assert when a large page is encountered to
copying the page directory entry to the new page directory.
This is needed when a large page entry is generated by
gen_mmu.py. Note that this still asserts when there are entries
of large page at higher level.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The whole page table is pre-allocated at build time and is
dependent on the range of address space. This kconfig allows
reserving extra pages (of size CONFIG_MMU_PAGE_SIZE) to
the page table so that gen_mmu.py can make use of these
extra pages.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This patch replaces ENOSYS into ENOTSUP to keep consistency with
the return value specification of k_float_enable().
Signed-off-by: Katsuhiro Suzuki <katsuhiro@katsuster.net>
This patch introduce new API to enable FPU of thread. This is pair of
existed k_float_disable() API. And also add empty arch_float_enable()
into each architectures that have arch_float_disable(). The arc and
riscv already implemented arch_float_enable() so I do not touch
these implementations.
Motivation: Current Zephyr implementation does not allow to use FPU
on main and other system threads like as work queue. Users need to
create an other thread with K_FP_REGS for floating point programs.
Users can use FPU more easily if they can enable FPU on running
threads.
Signed-off-by: Katsuhiro Suzuki <katsuhiro@katsuster.net>
The only user of arch_mem_domain_destroy was the deprecated
k_mem_domain_destroy function which has now been removed. So remove
arch_mem_domain_destroy as well.
Signed-off-by: Kumar Gala <kumar.gala@linaro.org>
After page table is load, we should be executing in virtual
address space. Therefore we need to set ESP to the virtual
address of interrupt stack for the boot process.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This reverts commit d40e8ede8e.
This fixes triple faults after wiping the identity mapping of
physical memory when running entering userspace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This reverts commit 7d32e9f9a5.
We now allow the kernel to be linked virtually. This patch:
- Properly converts between virtual/physical addresses
- Handles early boot instruction pointer transition
- Double-maps SRAM to both virtual and physical locations
in boot page tables to facilitate instruction pointer
transition, with logic to clean this up after completed.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This reuses the page directory pointer table (PAE=y) or page
directory (PAE=n) to point to next level page directory table
(PAE=y) or page tables (PAE=n) to identity map the physical
memory. This gets rid of the extra memory needed to host
the extra mappings which are only used at boot. Following
patches will have code to actual unmap physical memory
during the boot process, so this avoids some wasting of
memory.
Since no extra memory needs to be reserved, this also reverts
commit ee3d345c09
("x86: mmu: reserve more space for page table if linking in virt").
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
There is no need to use this kconfig, as the phys-to-virt
offset is enough to figure out if the kernel is linked in
virtual address space in gen_mmu.py.
For code, use Z_VM_KERNEL instead.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
With the introduction of Z_MEM_*_ADDR for physical<->virtual
address translation, there is no need to have x86 specific
versions.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
For some unknown reason, the pagetable address for _df_tss.cr3
did not get translated from virtual to physical. However,
the translation is done if the pointer to pagetable is obtained
through reference to the first array element (instead of simply
through the name of array). Without CR3 pointing to the page
table via physical address, double fault does not work. So
fixing this by being explicit with the page table pointer.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
When adding a new thread to memory domain, there is a NULL check
to figure out if a thread is being migrated to another memory
domain. However, the NULL check is AFTER physical-to-virtual
address translation which means (NULL + offset) != NULL anymore.
This results in calling reset_region() with an invalid page table
pointer. Fix this by doing the NULL check before address
translation.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
When linking in virtual address space, we still need physical
addresses in SRAM to be mapped so platform can boot from physical
memory and to access structure necessary for boot (e.g. GDT and
IDT). So we need to enlarge the reserved space for page table
to accommodate this.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
We have been having the assumption that the physical memory
is identity-mapped to virtual address space. However, with
the ability to set CONFIG_KERNEL_VM_BASE separately from
CONFIG_SRAM_BASE_ADDRESS, the assumption is no longer valid.
This changes the boot code in x86 32-bit, so that once
the page table is loaded, we can proceed with executing in
the virtual address space. So do a long jump to virtual
address just before calling z_x86_prep_c. From this point on,
code execution is in virtual address space.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This adds virtual address translation to a few variables
used in crt0.S. This is needed as they are linked at
virtual addresses but before page table is loaded,
they are not available at virtual addresses and must be
referred via physical addresses.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
When feeding &z_shared_kernel_page_start directly to
Z_X86_PHYS_ADDR(), the compiler would complain array subscript
out of bound if linking in virtual address space. So cast it
into uintptr_t first before translation.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Add a newer, much smaller and simpler implementation of abort and
join. No need to involve the idle thread. No need for a special code
path for self-abort. Joining a thread and waiting for an aborting one
to terminate elsewhere share an implementation. All work in both
calls happens under a single locked path with no unexpected
synchronization points.
This fixes a bug with the current implementation where the action of
z_sched_single_abort() was nonatomic, releasing the lock internally at
a point where the thread to be aborted could self-abort and confuse
the state such that it failed to abort at all.
Note that the arm32 and native_posix architectures, which have their
own thread abort implementations, now see a much simplified
"z_thread_abort()" internal API.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Instead of doing these in assembly, use the common z_bss_zero()
and z_data_copy() C functions instead. This simplifies code
a bit and we won't miss any additions to these two functions
(if any) under x86 in the future (as x86_64 was actually not
clearing gcov bss area).
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This moves calling z_loapic_enable() from crt0.S into
z_x86_prep_c(). This is done so we can move BSS clearing
and data section copying inside z_x86_prep_c() as
these are needed before calling z_loapic_enable().
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This adds a new kconfig to enable the use of memory map.
This map can be populated automatically if
CONFIG_MULTIBOOT_MEMMAP=y or can be manually defined
via x86_memmap[].
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This is an hidden option to indicate we are building for
PC-compatible devices (where there are BIOS, ACPI, etc.
which are standard on such devices).
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Before accessing the multiboot data passed by the bootloader,
we need to map the memory first. This adds the code to map
the memory if necessary.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
We assume that all x86 CPUs do have clflush instructions.
And the cache line size is now provided through DTS.
So detecting clflush instruction as well as the cache line size is no
longer required at runtime and thus removed.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
This adds X86 keyword to the kconfigs to indicate these are
for x86. The old options are still there marked as
deprecated.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
With x86, there are usually memory regions that are reserved
for firmware and device MMIOs. We don't want to use these
pages for memory mapping so mark them as reserved at boot.
The weakly defined x86_memmap contains the list of memory
regions which can be overriden by SoC or board configurations.
Also, with CONFIG_MULTIBOOT_MEMMAP=y, the memory regions
are populated from multiboot provided data.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The x86_64 SysV ABI requires 16 byte alignment for the stack pointer
during execution of normal code. That means that on entry to an
ABI-compatible C function (which is reached via a CALL instruction
that pushes the return address) the RSP register must be MISaligned by
exactly 8 bytes. The kernel mode thread setup got this right, but we
missed the equivalent condition in userspace entry.
The end result was a misaligned stack, which is surprisingly robust
for most use. But recent toolchains have starting doing some more
elaborate vectorization, and the resulting SSE instructions started
failing in userspace on the misaliged loads.
Note that there's a comment about optimization: we're doing the stack
alignment in the "wrong place" and are needlessly wasting bytes in
some cases. We should see the raw stack boundaries where we are
setting up RSP values. Add a FIXME to this effect, but don't touch
anything as this patch is a targeted bugfix.
Also fix a somewhat embarassing 32-bit-ism that would have truncated
the address of a userspace stack that we tried to put above 4G.
Fixes#31018
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
On Intel processors, if GS is not zero and is being set to
zero, GS_BASE is also being set to zero. This would interfere
with the actual use of GS_BASE for usespace. To avoid accidentally
clearing GS_BASE, simply set GS to 0 at boot, so any subsequent
clearing of GS will not clear GS_BASE.
The clearing of GS_BASE was discovered while trying to figure out
why the mem_protect test would hang within 10-20 repeated runs.
GDB revealed that both GS and GS_BASE was set to zero when the tests
hanged. After setting GS to zero at boot, the mem_protect tests
were running repeated for 5,000+ times without hanging.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
All arch_ APIs and macros are implemented, and the page fault
handling code will call into the core kernel.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Pre-allocation of paging structures is now required, such that
no allocations are ever needed when mapping memory.
Instantiation of new memory domains may still require allocations
unless a common page table is used.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We no longer use a page pool to draw memory pages when doing
memory map operations. We now preallocate the entire virtual
address space so no allocations are ever necessary when mapping
memory.
We still need memory to clone page tables, but this is now
expressed by a new Kconfig X86_MAX_ADDITIONAL_MEM_DOMAINS
which has much clearer semantics than specifying the number
of pages in the pool.
The default address space size is now 8MB, but this can be
tuned by the application.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
A more comprehensive solution would use E820 enumeration, but we
are unlikely to ever care that much, as we intend to use demand
paging on microcontrollers and not PC-like hardware. This is
really to just prevent QEMU from crashing.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This removes the z_ prefix those (functions, enums, etc.) that
are being used outside the coredump subsys. This aligns better
with the naming convention.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
All arch_ APIs and macros are implemented, and the page fault
handling code will call into the core kernel.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Pre-allocation of paging structures is now required, such that
no allocations are ever needed when mapping memory.
Instantiation of new memory domains may still require allocations
unless a common page table is used.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We no longer use a page pool to draw memory pages when doing
memory map operations. We now preallocate the entire virtual
address space so no allocations are ever necessary when mapping
memory.
We still need memory to clone page tables, but this is now
expressed by a new Kconfig X86_MAX_ADDITIONAL_MEM_DOMAINS
which has much clearer semantics than specifying the number
of pages in the pool.
The default address space size is now 8MB, but this can be
tuned by the application.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
A more comprehensive solution would use E820 enumeration, but we
are unlikely to ever care that much, as we intend to use demand
paging on microcontrollers and not PC-like hardware. This is
really to just prevent QEMU from crashing.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This adds the correct compiler and linker flags to
support software floating point operations. The flags
need to be added to TOOLCHAIN_*_FLAGS for GCC to find
the correct library (when calling GCC with
--print-libgcc-file-name).
Note that software floating point needs to be turned
on for Newlib. This is due to Newlib having floating
point numbers in its various printf() functions which
results in floating point instructions being emitted
from toolchain. These instructions are placed very
early in the functions which results in them being
executed even though the format string contains
no floating point conversions. Without using CONFIG_FPU
to enable hardware floating point support, any calls to
printf() like functions will result in exceptions
complaining FPU is not available. Although forcing
CONFIG_FPU=y with newlib is an option, and because
the OS doesn't know which threads would call these
printf() functions, Zephyr has to assume all threads
are using FPU and thus incurring performance penalty as
every context switching now needs to save FPU registers.
A compromise here is to use soft float instead. Newlib
with soft float enabled does not have floating point
instructions and yet can still support its printf()
like functions.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The new APIs are not only dealing with cache flushing. Rename the
Kconfig symbol to CACHE_MANAGEMENT to better reflect this change.
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
The kconfig options to configure the cache flushing framework are
currently living in the arch-specific kconfigs of ARC and X86 (32-bit)
architectures even though these are defining the same things.
Move the common symbols in one place accessible by all the architectures
and create a menu for those.
Leave the default values in the arch-specific locations.
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
Until now, any attempts to call printk prior to early serial init has
caused page faults due to the device not being mapped yet. Add static
variable to track the pre-init status, and instead of page faulting
just suppress the characters and log a warning right after init to
give an indication that output characters have been lost.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The original idea of using a custom switch to main thread
function is to make sure the buffer to save floating point
registers are aligned correctly or else exception would be
raised when saving/restoring those registers. Since
the struct of the buffer is defined with alignment hint
to toolchain, the alignment will be enforced by toolchain
as long as the k_thread struct variable is a dedicated,
declared variable. So there is no need for the custom
switch to main thread function anymore.
This also allows the stack usage calculation of
the interrupt stack to function properly as the end of
the interrupt stack is not being used for the dummy
thread anymore.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Convert device to DEVICE_DEFINE instead of DEVICE_AND_API_INIT
so we can deprecate DEVICE_AND_API_INIT in the future.
Signed-off-by: Kumar Gala <kumar.gala@linaro.org>
Fix the following complilation error that happens when specifying a
fixed MMIO address for the UART through X86_SOC_EARLY_SERIAL_MMIO8_ADDR:
arch/x86/core/early_serial.c:30:26: error: #if with no expression
30 | #if DEVICE_MMIO_IS_IN_RAM
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Renamed to make its semantics clearer; this function maps
*physical* memory addresses and is not equivalent to
posix mmap(), which might confuse people.
mem_map test case remains the same name as other memory
mapping scenarios will be added in the fullness of time.
Parameter names to z_phys_map adjusted slightly to be more
consistent with names used in other memory mapping functions.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The page table implementation requires conversion between virtual
and physical addresses when creating and walking page tables. Add
a phys_addr() and virt_addr() functions instead of hard-casting
these values, plus a macro for doing the same in ASM code.
Currently, all pages are identity mapped so VIRT_OFFSET = 0, but
this will now still work if they are not the same.
ASM language was also updated for 32-bit. Comments were left in
64-bit, as long mode semantics don't allow use of Z_X86_PHYS_ADDR
macro; this can be revisited later.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Fix compiler warnings associated with 'level' and 'entry' variables
'may be used uninitialized in this function'
Signed-off-by: Kumar Gala <kumar.gala@linaro.org>
We now show:
- Data pages that are paged out in red
- Pages that are mapped but non-present due to KPTI,
respectively in cyan or blue if they are identity mapped
or not.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
With kernel page table isolation (KPTI) we cannot use right exception
stack since after using trampoline stack there was always switch to
7th IST stack (__x86_tss64_t_ist7_OFFSET). Make this configurable as a
parameter in EXCEPT(nr, ist) and EXCEPT_CODE(nr, ist). For the NMI we
would use ist6 (_nmi_stack).
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
NMI can be triggered at any time, even when in the process of
switching stacks. Use special stack for it.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
range_map() now doesn't implicitly hold x86_mmu_lock, allowing
callers to use it if the lock is already held.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
- Remove SYS_ prefix
- shorten POWER_MANAGEMENT to just PM
- DEVICE_POWER_MANAGEMENT -> PM_DEVICE
and use PM_ as the prefix for all PM related Kconfigs
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Provide the necessary adjustments to get MSI-X working (with or without
Intel VT-D).
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
This is part of Intel VT-D and how to discover capabilities, base
addresses and so on in order to start taking advantage from it.
There is a lot to get from there, but currently we are interested only
by getting the remapping hardware base address. And more specifically
for interrupt remapping usage.
There might be more than one of such hardware so the exposed function is
made to retrieve all of them.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
This will be used by MSI multi-vector implementation to connect the irq
and the vector prior to allocation.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
This is important for when we will need to atomically
un-map a page and get its dirty state before the un-mapping
completed.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Most of kernel files where declaring os module without providing
log level. Because of that default log level was used instead of
CONFIG_KERNEL_LOG_LEVEL.
Signed-off-by: Krzysztof Chruscinski <krzysztof.chruscinski@nordicsemi.no>
currently pcie_get_mbar only returns the physical address.
This changes the function to return the size of the mbar and
the flags (IO Bar vs MEM BAR).
Signed-off-by: Maximilian Bachmann <m.bachmann@acontis.com>
Since the tracing of thread being switched in/out has the same
instrumentation points, we can roll the tracing function calls
into the one for thread stats gathering functions.
This avoids duplicating code to call another function.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
We provide an option for low-memory systems to use a single set
of page tables for all threads. This is only supported if
KPTI and SMP are disabled. This configuration saves a considerable
amount of RAM, especially if multiple memory domains are used,
at a cost of context switching overhead.
Some caching techniques are used to reduce the amount of context
switch updates; the page tables aren't updated if switching to
a supervisor thread, and the page table configuration of the last
user thread switched in is cached.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This will do until we can set up a proper page pool using
all unused ram for paging structures, heaps, and anonymous
mappings.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We don't need this for stacks any more and only use this
for pre-calculating the boot page tables size. Move to C
code, this doesn't need to be in headers anywhere.
Names adjusted for conciseness.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
- z_x86_userspace_enter() for both 32-bit and 64-bit now
call into C code to clear the stack buffer and set the
US bits in the page tables for the memory range.
- Page tables are now associated with memory domains,
instead of having separate page tables per thread.
A spinlock protects write access to these page tables,
and read/write access to the list of active page
tables.
- arch_mem_domain_init() implemented, allocating and
copying page tables from the boot page tables.
- struct arch_mem_domain defined for x86. It has
a page table link and also a list node for iterating
over them.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Page table management for x86 is being revised such that there
will not in many cases be a pristine, master set of page tables.
Instead, when mapping memory, use unused PTE bits to store the
original RW, US, and XD settings when the mapping was made.
This will allow memory domains to alter page tables while still
being able to restore the original mapping permissions.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This will be needed when we support memory un-mapping, or
the same user mode page tables on multiple CPUs. Neither
are implemented yet.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
In the code path for nested interrupts, we are not saving
RBX, yet the assembly code is using it as a storage location
for the ISR.
Use RAX. It is backed up in both the nested and non-nested
cases, and the ASM code is not currently using it at that
point.
Fixes: #29594
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Adds the necessary bits to initialize TLS in the stack
area and sets up CPU registers during context switch.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Adds the necessary bits to initialize TLS in the stack
area and sets up CPU registers during context switch.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This seems like a typo since all other places accessing bus_segs in
this context use i as the index.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
These days all threads are always a member of a memory domain,
remove this NULL check as it won't ever be false.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This function iterates over the thread's memory domain
and updates page tables based on it. We need to be holding
z_mem_domain_lock while this happens.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
fixes the following compilation errors
- sys_cache_line_size was undeclared at first use
- there was an assignment to an rvalue in arch_dcache_flush
Signed-off-by: Maximilian Bachmann <m.bachmann@acontis.com>
The hardcoded APIC ID will be kept as default if the CPU is not found in
ACPI MADT.
Note that ACPI may expose more "CPUs" than there actually are
physically. Thus, make the logic aware of this possibility by checking
the enabled flas. (Non-enabled CPU are ignored).
This fixes up_squared board made of Celeron CPU.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
No need to mix super short version of names with other structures
having full name. Let's follow a more relevant naming where each and
every attribute name is self-documenting then. (such as s/id/apic_id
etc...)
Also make CONFIG_ACPI usable through IS_ENABLED by enclosing exposed
functions with ifdef CONFIG_ACPI.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
We are not RAM-constrained and there is an open issue where
exception stack overflows are not caught. Increase this size
so that options like CONFIG_NO_OPTIMIZATIONS work without
incident.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Commit 5632ee26f3 introduced an issue where in order to use MMIO
configuration:
- do_pcie_mmio_cfg is required to be true
- Only set to true in pcie_mm_init()
- Which is only called from pcie_mm_conf()
- Which is only called from pcie_conf() if do_pcie_mmio_cfg is
already true!
The end result is that MMIO configuration will never be used.
Fix the situation by moving the initialization check to pcie_conf().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The current instrumentation point for CONFIG_TRACING added in
PR #28512 had two problems:
- If userspace and KPTI are enabled, the tracing point is simply
never run if we are resuming a user thread as the
z_x86_trampoline_to_user function is jumped to and calls
'iret' from there
- Only %rdi is being saved. However, at that location, *all*
caller-saved registers are in use as they contain the
resumed thread's context
Simplest solution is to move this up near where we update page
tables. The #ifdefs are used to make sure we don't push/pop
%rdi more than once. At that point in the code only %rdi
is in use among the volatile registers.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Tracing switched in threads in C code does not work, it needs to happen
in the arch_switch code. See also Xtensa and ARC.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Newer QEMU (5.1) hangs / timeouts on a number of tests on x86_64. In
debugging the issue this is related to a fix in QEMU 5.1 that
validates memory region access. QEMU has the APIC region only allowing
1 to 4 byte access. 64-bit access is treated as an error.
Change the APIC EOI access in locore.S back to just doing a 32-bit
access.
Fixes # 28453
Signed-off-by: Kumar Gala <kumar.gala@linaro.org>
The boot code of x86_64 initializes the stack (if enabled)
with a hard-coded size for the ISR stack. However,
the stack being used does not have to be the ISR stack,
and can be any defined stacks. So pass in the actual size
of the stack so the stack can be initialized properly.
Fixes#21843
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Changes to paging code ensured that the NULL virtual page is
never mapped. Since RAM is identity mapped, on a PC-like
system accessing the BIOS Data Area in the first 4K requires
a memory mapping. We need to read this to probe the ACPI RSDP.
Additionally check that the BDA has something in it as well
and not a bunch of zeroes.
It is unclear whether this function is truly safe on UEFI
systems, but that is for another day.
Fixes: #27867
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
When probing for PCI-E device resources, it is possible that
configuration via MMIO is not available. This may caused by
BIOS or its settings. So when CONFIG_PCIE_MMIO_CFG=y, have
a fallback path to config devices via PIO. The inability to
config via MMIO has been observed on a couple UP Squared
boards.
Fixes#27339
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This code had one purpose only, feed timing information into a test and
was not used by anything else. The custom trace points unfortunatly were
not accurate and this test was delivering informatin that conflicted
with other tests we have due to placement of such trace points in the
architecture and kernel code.
For such measurements we are planning to use the tracing functionality
in a special mode that would be used for metrics without polluting the
architecture and kernel code with additional tracing and timing code.
Furthermore, much of the assembly code used had issues.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
We no longer plan to support a split address space with
the kernel in high memory and per-process address spaces.
Because of this, we can simplify some things. System RAM
is now always identity mapped at boot.
We no longer require any virtual-to-physical translation
for page tables, and can remove the dual-mapping logic
from the page table generation script since we won't need
to transition the instruction point off of physical
addresses.
CONFIG_KERNEL_VM_BASE and CONFIG_KERNEL_VM_LIMIT
have been removed. The kernel's address space always
starts at CONFIG_SRAM_BASE_ADDRESS, of a fixed size
specified by CONFIG_KERNEL_VM_SIZE.
Driver MMIOs and other uses of k_mem_map() are still
virtually mapped, and the later introduction of demand
paging will result in only a subset of system RAM being
a fixed identity mapping instead of all of it.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
In order to be possible to debug usermode threads need to be able
issue breakpoint and debug exceptions. To do this it is necessary to
set DPL bits to, at least, the same CPL level.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
It implements gdb remote protocol to talk with a host gdb during the
debug session. The implementation is divided in three layers:
1 - The top layer that is responsible for the gdb remote protocol.
2 - An architecture specific layer responsible to write/read registers,
set breakpoints, handle exceptions, ...
3 - A transport layer to be used to communicate with the host
The communication with GDB in the host is synchronous and the systems
stops execution waiting for instructions and return its execution after
a "continue" or "step" command. The protocol has an exception that is
when the host sends a packet to cause an interruption, usually triggered
by a Ctrl-C. This implementation ignores this instruction though.
This initial work supports only X86 using uart as backend.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Now that device_api attribute is unmodified at runtime, as well as all
the other attributes, it is possible to switch all device driver
instance to be constant.
A coccinelle rule is used for this:
@r_const_dev_1
disable optional_qualifier
@
@@
-struct device *
+const struct device *
@r_const_dev_2
disable optional_qualifier
@
@@
-struct device * const
+const struct device *
Fixes#27399
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
The x86 paging code has been rewritten to support another paging mode
and non-identity virtual mappings.
- Paging code now uses an array of paging level characteristics and
walks tables using for loops. This is opposed to having different
functions for every paging level and lots of #ifdefs. The code is
now more concise and adding new paging modes should be trivial.
- We now support 32-bit, PAE, and IA-32e page tables.
- The page tables created by gen_mmu.py are now installed at early
boot. There are no longer separate "flat" page tables. These tables
are mutable at any time.
- The x86_mmu code now has a private header. Many definitions that did
not need to be in public scope have been moved out of mmustructs.h
and either placed in the C file or in the private header.
- Improvements to dumping page table information, with the physical
mapping and flags all shown
- arch_mem_map() implemented
- x86 userspace/memory domain code ported to use the new
infrastructure.
- add logic for physical -> virtual instruction pointer transition,
including cleaning up identity mappings after this takes place.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The address was being truncated because we were using
32-bit registers. CONFIG_MMU is always enabled on 64-bit,
remove the #ifdef.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Move tracing switched_in and switched_out to the architecture code and
remove duplications. This changes swap tracing for x86, xtensa.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
This set of functions seem to be there just because of historical
reasons, stemming from Kbuild. They are non-obvious and prone to errors,
so remove them in favor of the `_ifdef()` ones with an explicit
`CONFIG_` condition.
Script used:
git grep -l _if_kconfig | xargs sed -E -i
"s/_if_kconfig\(\s*(\w*)/_ifdef(CONFIG_\U\1\E \1/g"
Signed-off-by: Carles Cufi <carles.cufi@nordicsemi.no>
These stacks are appropriate for threads that run purely in
supervisor mode, and also as stacks for interrupt and exception
handling.
Two new arch defines are introduced:
- ARCH_KERNEL_STACK_GUARD_SIZE
- ARCH_KERNEL_STACK_OBJ_ALIGN
New public declaration macros:
- K_KERNEL_STACK_RESERVED
- K_KERNEL_STACK_EXTERN
- K_KERNEL_STACK_DEFINE
- K_KERNEL_STACK_ARRAY_DEFINE
- K_KERNEL_STACK_MEMBER
- K_KERNEL_STACK_SIZEOF
If user mode is not enabled, K_KERNEL_STACK_* and K_THREAD_STACK_*
are equivalent.
Separately generated privilege elevation stacks are now declared
like kernel stacks, removing the need for K_PRIVILEGE_STACK_ALIGN.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This now takes a stack pointer as an argument with TLS
and random offsets accounted for properly.
Based on #24467 authored by Flavio Ceolin.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The core kernel computes the initial stack pointer
for a thread, properly aligning it and subtracting out
any random offsets or thread-local storage areas.
arch_new_thread() no longer needs to make any calculations,
an initial stack frame may be placed at the bounds of
the new 'stack_ptr' parameter passed in. This parameter
replaces 'stack_size'.
thread->stack_info is now set before arch_new_thread()
is invoked, z_new_thread_init() has been removed.
The values populated may need to be adjusted on arches
which carve-out MPU guard space from the actual stack
buffer.
thread->stack_info now has a new member 'delta' which
indicates any offset applied for TLS or random offset.
It's used so the calculations don't need to be repeated
if the thread later drops to user mode.
CONFIG_INIT_STACKS logic is now performed inside
z_setup_new_thread(), before arch_new_thread() is called.
thread->stack_info is now defined as the canonical
user-accessible area within the stack object, including
random offsets and TLS. It will never include any
carved-out memory for MPU guards and must be updated at
runtime if guards are removed.
Available stack space is now optimized. Some arches may
need to significantly round up the buffer size to account
for page-level granularity or MPU power-of-two requirements.
This space is now accounted for and used by virtue of
the Z_THREAD_STACK_SIZE_ADJUST() call in z_setup_new_thread.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
MISRA-C wants the parameter names in a function implementaion
to match the names used by the header prototype.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
arch_new_thread() passes along the thread priority and option
flags, but these are already initialized in thread->base and
can be accessed there if needed.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
MISRA-C directive 4.10 requires that files being included must
prevent itself from being included more than once. So add
include guards to the offset files, even though they are C
source files.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The hardware stack overflow feature requires
CONFIG_THREAD_STACK_INFO enabled in order to distingush
stack overflows from other causes when we get an exception.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
A hack was required for the loapic code due to the address
range not being in DTS. A bug was filed.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This driver code uses PCIe and doesn't use Zephyr's
device model, so we can't use the nice DEVICE_MMIO macros.
Set stuff up manually instead using device_map().
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This currently only supports identity paging; there's just
enough here for device_map() calls to work.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Adding just the cache flush function for x86. The name
arch_cache_flush comply with API names in include/cache.h
Signed-off-by: Aastha Grover <aastha.grover@intel.com>
The page table initialization needs a populated PCI MMIO
configuration, and that is lazy-evaluated. We aren't guaranteed that
a driver already hit that path, so be sure to call it explicitly.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Right now x86_64 doesn't install handlers for vectors that aren't
populated by Zephyr code. Add a tiny spurious interrupt handler that
logs the error and triggers a fatal error, like other platforms do.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This patch is almost entirely aesthetics, designed to isolate the
variant configurations to a simple macro API (just IN/OUT), reduce
complexity derived from code pasted out of the larger ns16550 driver,
and keep the complexity out of the (very simple!) core code. Useful
when hacking on the driver in contexts where it isn't working yet.
The sole behavioral change here is that I've removed the runtime
printk hook installation in favor of defining an
arch_printk_char_out() function which overrides the weak-linked
default (that is, we don't need to install a hook, we can be the
default hook at startup).
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Various cleanups to the x86 early serial driver, mostly with the goal
of simplifying its deployment during board bringup (which is really
the only reason it exists in the first place):
+ Configure it =y by default. While there are surely constrained
environments that will want to disable it, this is a TINY driver,
and it serves a very important role for niche tasks. It should be
built always to make sure it works everywhere.
+ Decouple from devicetree as much as possible. This code HAS to work
during board bringup, often with configurations cribbed from other
machines, before proper configuration gets written. Experimentally,
devicetree errors tend to be easy to make, and without a working
console impossible to diagnose. Specify the device via integer
constants in soc.h (in the case of IOPORT access, we already had
such a symbol) so that the path from what the developer intends to
what the code executes is as short and obvious as possible.
Unfortunately I'm not allowed to remove devicetree entirely here,
but at least a developer adding a new platform will be able to
override it in an obvious way instead of banging blindly on the
other side of a DTS compiler.
+ Don't try to probe the PCI device by ID to "verify". While this
sounds like a good idea, in practice it's just an extra thing to get
wrong. If we bail on our early console because someone (yes, that's
me) got the bus/device/function right but typoed the VID/DID
numbers, we're doing no one any favors.
+ Remove the word-sized-I/O feature. This is a x86 driver for a PCI
device. No known PC hardware requires that UART register access be
done in dword units (in fact doing so would be a violation of the
PCI specifciation as I understand it). It looks to have been cut
and pasted from the ns16550 driver, remove.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The default page table (the architecturally required one used for
entrance to long mode, before the OS page tables get assembled) was
mapping the first 4G of memory.
Extend this to 512G by fully populating the second level page table.
We have devices now (up_squared) which have real RAM mapped above 4G.
There's really no good reason not to do this, the page is present
always anyway.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
A last minute "cleanup" to the EFI startup path (on a system where I
had SMP disabled) moved the load of the x86_cpuboot[0] entry into RBP
into the main startup code, which is wrong because on auxiliary CPUs
that's already set up by the 16/32 bit entry code to point to the
OTHER entries.
Put it back where it belongs.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This is a first cut on a tool that will convert a built Zephyr ELF
file into an EFI applciation suitable for launching directly from the
firmware of a UEFI-capable device, without the need for an external
bootloader.
It works by including the Zephyr sections into the EFI binary as
blobs, then copying them into place on startup.
Currently, it is not integrated in the build. Right now you have to
build an image for your target (up_squared has been tested) and then
pass the resulting zephyr.elf file as an argument to the
arch/x86/zefi/zefi.py script. It will produce a "zephyr.efi" file in
the current directory.
This involved a little surgery in x86_64 to copy over some setup that
was previously being done in 32 bit mode to a new EFI entry point.
There is no support for 32 bit UEFI targets for toolchain reasons.
See the README for more details.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The traditional IO Port configuration mechanism was technically
deprecated about 15 years ago when PCI Express started shipping.
While frankly the MMIO support is significantly more complicated and
no more performant in practice, Zephyr should have support for current
standards. And (particularly complicated) devices do exist in the
wild whose extended capability pointers spill beyond the 256 byte area
allowed by the legacy mechanism. Zephyr will want drivers for those
some day.
Also, Windows and Linux use MMIO access, which means that's what
system vendors validate.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The existing minimal ACPI implementation was enough to find the MADT
table for dumping CPU info. Enhance it with a slightly less minimal
implementation that can fetch any table, supports the ACPI 2.0 XSDT
directory (technically required on 64 bit systems so tables can live
>4G) and provides definitions for the MCFG table with the PCI
configuration pointers.
Note that there is no use case right now for high performance table
searching, so the "init" step has been removed and tables are probed
independently from scratch for each one requested (there are only
two).
Note also that the memory to which these tables point is not
understood by the Zephyr MMU configuration, so in long mode all ACPI
calls have to be done very early, before z_x86_paging_init() (or on a
build with the MMU initialization disabled).
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
If we get a page fault in early boot context, before
main thread is started, page faults were being
incorrectly reported as stack overflows.
z_x86_check_stack_bounds() needs to consider the
interrupt stack as the correct stack for this context.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This helps distingush between fatal errors if logging isn't
enabled.
As detailed in comments, pass a reason code which controls
the QEMU process' return value.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
x86_64's __resume path 'poisons' the incoming thread's
saved RIP value with a special 0xB9 value, to catch
re-use of thread objects across CPUs in SMP. Add a check
and printout for this when handling fatal errors, and
treat as a kernel panic.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
If KPTI is not enabled, the current value of CR3 is the correct
page tables when the exception happened in all cases.
If KPTI is enabled, and the excepting thread was in user mode,
then a page table switch happened and the current value of CR3
is not the page tables when the fault happened. Get it out of the
thread object instead.
Fixes two problems:
- Divergent exception loop if we crash when _current is a dummy
thread or its page table pointer stored in the thread object is
NULL or uninitialized
- Printing the wrong CR3 value on exceptions from user mode in
the register dump
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
In one of the ASSERT() statement, the PHYS_RAM_ADDR (alias
of DT_REG_ADDR()) may be interpreted by the compiler as
long long int when it's large than 0x7FFFFFFF, but is
paired with %x, resulting in compiler warning. Fix this
by type casting it to uintptr_t and use %lx instead.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
On x86_64, the arch_timing_* variables are not set which
results in incorrect values being used in the timing_info
benchmarks. So instrument the code for those values.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The SoCs usually have devices that are accessed through MMIO.
This requires the corresponding regions to be marked readable
and writable in the MMU or else accesses will result in page
faults.
This adds a function which can be implemented in the SoC code to
specify those pages to be added to MMU.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The integers used for pointer calculation were u32_t.
Change them to uintptr_t to be compatible with 64-bit.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
x86-32 thread objects require special alignment since they
contain a buffer that is passed to fxsave/fxrstor instructions.
This fell over if the dummy thread is created in a stack frame.
Implement a custom swap to main for x86 which still uses a
dummy thread, but in an unused part of the interrupt stack
with proper alignment.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
If IO APIC is in logical destination mode, local APICs compare their
logical APIC ID defined in LDR (Logical Destination Register) with
the destination code sent with the interrupt to determine whether or not
to accept the incoming interrupt.
This patch programs LDR in xAPIC mode to support IO APIC logical mode.
The local APIC ID from local APIC ID register can't be used as the
'logical APIC ID' because LAPIC ID may not be consecutive numbers hence
it makes it impossible for LDR to encode 8 IDs within 8 bits.
This patch chooses 0 for BSP, and for APs, cpu_number which is the index
to x86_cpuboot[], which ultimately assigned in z_smp_init[].
Signed-off-by: Zide Chen <zide.chen@intel.com>
This commit renames the x86 Kconfig `CONFIG_{EAGER,LAZY}_FP_SHARING`
symbol to `CONFIG_{EAGER,LAZY}_FPU_SHARING`, in order to align with the
recent `CONFIG_FP_SHARING` to `CONFIG_FPU_SHARING` renaming.
Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
This commit renames the Kconfig `FP_SHARING` symbol to `FPU_SHARING`,
since this symbol specifically refers to the hardware FPU sharing
support by means of FPU context preservation, and the "FP" prefix is
not fully descriptive of that; leaving room for ambiguity.
Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
This expands the early_serial to support MMIO UART, in addition to
port I/O, by duplicating part of the hardware initialization from
the NS16550 UART driver. This allows enabling of early console on
hardware with MMIO-based UARTs.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
x86_64 supports 4 levels of interrupt nesting, with
the interrupt stack divided up into sub-stacks for
each nesting level.
Unfortunately, the initial interrupt stack pointer
on the first CPU was not taking into account reserved
space for guard areas, causing a stack overflow exception
when attempting to use the last interrupt nesting level,
as that page had been set up as a stack guard.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We need to lock interrupts before setting the thread's
stack pointer to the trampoline stack. Otherwise, we
could unexpectedly take an interrupt on this stack
instead of the thread stack as intended.
The specific problem happens at the end of the interrupt,
when we switch back to the thread stack and call swap.
Doing this on a per-cpu trampoline stack instead of the
thread stack causes data corruption.
Fixes: #24869
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>