acpi_madt_entry_get() and acpi_dmar_entry_get() advance their loop
offset by subtable->Length with no check for zero. A malformed (or
malicious) firmware table with a subtable whose Length field is 0
would cause offset to never advance while the loop condition
(offset < table_length) remains true, hanging the boot indefinitely.
acpi_get_subtable_entry_num() already carries the fix: after
advancing the pointer it checks if the next subtable's Length is 0
and breaks. Apply the identical guard to both entry-get loops so
that a zero-length subtable is handled consistently across all three
ACPI iteration sites.
Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Previously, ATOMIC_OPERATIONS_C was selected for RISC-V whenever the
'A' (atomic) ISA extension (RISCV_ISA_EXT_A) was absent. This caused
a conflict on platforms that lack the 'A' extension but still provide
their own arch-level atomic implementation via ATOMIC_OPERATIONS_ARCH
(e.g. future RISC-V SoCs with custom atomic support).
Add !ATOMIC_OPERATIONS_ARCH to the select condition so that the
generic C fallback (interrupt-locking) is only chosen when neither
the ISA extension nor an arch-specific implementation is available.
This condition creates a Kconfig dependency cycle:
RISCV selects ATOMIC_OPERATIONS_C if !ATOMIC_OPERATIONS_ARCH
=> ATOMIC_OPERATIONS_C depends on !ATOMIC_OPERATIONS_ARCH
=> ATOMIC_OPERATIONS_ARCH depends on SMP (fvp_base_revc_2xaem board)
=> SMP depends on !ATOMIC_OPERATIONS_C
Break the cycle by removing 'depends on !ATOMIC_OPERATIONS_C' from
SMP in kernel/smp/Kconfig. This is safe because ATOMIC_OPERATIONS_C
is now only selected when ATOMIC_OPERATIONS_ARCH is absent, so the
two symbols are mutually exclusive by construction. The existing
BUILD_ASSERT(!IS_ENABLED(CONFIG_SMP)) in lib/os/atomic_c.c provides
a compile-time backstop against any misconfiguration.
Suggested-by: Nicolas Pitre <npitre@baylibre.com>
Signed-off-by: Lingutla Chandrasekhar <lingutla@qti.qualcomm.com>
Restrict ARCH_HAS_LLEXT_VENEERS to ARMV7_M_ARMV8_M_MAINLINE instead
of all CPU_CORTEX_M targets, since the veneer implementation relies
on a Mainline-only Thumb-2 instruction sequence not supported on
Baseline cores.
Signed-off-by: Ibrahim Abdalkader <i.abdalkader@gmail.com>
Keep the original architecture IRQ key owned by idle across a
successful system PM transition.
Add architecture hooks and the PM_STATE_SET_IRQ_LOCKED migration
contract for SoCs that keep PM hooks from unmasking interrupts.
Signed-off-by: Holt Sun <holt.sun@nxp.com>
Adds Kconfig option
CONFIG_XTENSA_EMULATE_UNSUPPORTED_UNSIGNED_LOAD_STORE
to enable exception handler for unsupported
narrow and / or unaligned unsigned load / stores that
reads the triggering instruction and performs the
operation manually with supported word sized and
aligned accesses.
Signed-off-by: Lauren Murphy <lauren.murphy@intel.com>
Co-authored-by: Anthony Giardina <88748592+agiardin@users.noreply.github.com>
If the (addr + size) overflows the memory address space,
the inner loop may not run to check for permission. Since
the default return value was 0 (meaning permitted), it
would incorrectly say memory access was okay. Fix this by
changing the default return value to -EINVAL. Only after
the loop of validating the whole input address range then
we set the return value to 0 to say memory access is
permitted. Also check for addition overflow.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Fix the early TLS initialization sequence on RISC-V to correctly set up
the thread pointer (tp) before any C code runs, and save/restore
callee-saved registers as required by the ABI.
Signed-off-by: Mayur Salve <msalve@qti.qualcomm.com>
This change enables per thread stack canary for RISC-V.
RISC-V GCC accesses the stack canary via a fixed offset from the
thread pointer (tp) when -mstack-protector-guard=tls is used. The
compiler emits code equivalent to:
lw t0, 0(tp) # load canary from tp+0
Additionally, tp is zeroed in arch_kernel_init() when TLS is enabled,
which means any C function called before thread setup completes (such
as z_early_rand_get or data_copy_xip_relocation) would fault trying
to access the canary.
Introduce STACK_CANARIES_TLS_PREPEND, which places the
.stack_chk.guard section at offset 0 of the TLS block, before .tdata
and .tbss. The compiler flags -mstack-protector-guard-reg=tp and
-mstack-protector-guard-offset=0 are passed so GCC generates the
correct canary access.
With STACK_CANARIES_TLS_PREPEND the per-thread TLS block layout is:
tp --> +------------------+ offset 0
| .stack_chk.guard | (__stack_chk_guard)
+------------------+
| .tdata | (initialized TLS data)
+------------------+
| .tbss | (zero-initialized TLS data)
+------------------+
The RISC-V reset path is extended to initialize tp before any C code
runs by allocating a TLS area on the boot stack and calling
arch_riscv_early_tls_stack_update(). Early boot functions that run
before tp is set up (z_early_rand_get, data_copy_xip_relocation) are
marked FUNC_NO_STACK_PROTECTOR to avoid canary access before tp is
valid.
Signed-off-by: Mayur Salve <msalve@qti.qualcomm.com>
When an LLEXT is loaded in memory and calls kernel or libc symbols
located in flash, the target may fall outside the range of branches
on some architectures, such as Thumb-2 BL, causing relocations to
fail.
Add CONFIG_LLEXT_VENEERS which, when enabled, generates trampoline
stubs for such out-of-range relocations.
Stubs are allocated from the LLEXT heap into a new LLEXT_MEM_VENEER
region, instead of using new state variables to track the memory, to
leverage the existing llext machinery that flushes cache, frees memory
on unload, etc.
A test extension is added, compiled with -mno-long-calls to force
direct branches on Arm and exercise the veneer path against multiple
libc symbols.
Signed-off-by: Ibrahim Abdalkader <i.abdalkader@gmail.com>
Two QEMU bugs were worked around in this file:
1. NAPOT range computation overflow (the catch-all entry covering
the full address range was decoded as an empty range).
2. Bad transient PMP representations (negative-sized TOR ranges
during read-modify-write updates causing spurious access faults).
Both are fixed in upstream QEMU 7.1 (released August 2022) by commits
6248a8fe4d8a ("fix NAPOT range computation overflow") and 2e983399186b
("guard against PMP ranges with a negative size"), respectively.
The minimum Zephyr SDK version for this branch is v1.0.0, which ships
QEMU 10.0.2 -- well past 7.1 -- so both fixes are present and the
workarounds are no longer needed.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
The LRU eviction algorithm needs to catch the first access to a loaded
page in order to call k_mem_paging_eviction_accessed() and move that
page to the tail of the queue. On ARM64 this is done with the MMU's
Access Flag: clearing AF causes a distinct fault on the next access.
On x86 there is no access-flag fault. The Accessed bit (PTE bit 5) is
set by hardware on access but never traps. The only way to force a
fault is to clear the Present bit, which already encodes the
"paged out" state — so a new state is needed:
PTE == 0 -> unmapped
P=0, A=1, upper=location -> paged out
P=0, G=1, upper=PFN -> LRU-tracked (new)
P=1 -> normally mapped
Bit G (Global, bit 8) is never set by Zephyr on x86 (CR4.PGE is not
used), so it is free to use as a private marker when P=0. No existing
PTE state needs to be displaced. This stays out of the way of the
KPTI path (which uses the PAT bit) and of the permission-backup bits
(IGNORED0..2) used for memory domain handling.
arch_page_info_get(addr, NULL, clear_accessed=true) is overloaded
under CONFIG_EVICTION_LRU to both query the prior flags and transition
the page to the LRU-tracked state via a new helper that updates all
domain ptables. arch_page_location_get() recognizes the tracked state
as paged-in so the core demand-paging code treats the page as resident.
The page fault handler intercepts LRU-tracking faults in-line before
k_mem_page_fault() dispatch: restore P, clear the tracking bit, and
call k_mem_paging_eviction_accessed() directly. This avoids the risk
of recursing through do_page_fault() with z_mm_lock held.
KPTI co-exists with demand paging but its PTE encoding is not yet
wired up to the LRU state, so tracking is gated on !X86_KPTI for now.
Fixes: #75132
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Deprecates these Kconfigs and emits a deprecated warning when
either of them are changed from their defaults (on a different
symbol, due to Kconfig limitations)
Signed-off-by: Jamie McCrae <jamie.mccrae@nordicsemi.no>
Adds a Kconfig which will be used to determine where the source of
truth will be for RAM configuration for a board target, to allow
moving to a pure DTS approach
Signed-off-by: Jamie McCrae <jamie.mccrae@nordicsemi.no>
Tuning CONFIG_MAX_XLAT_TABLES is currently trial-and-error: the only
feedback on overflow is a "too small" panic with no hint of how much
to bump.
Track the high-water mark of allocated translation tables and use it
to emit two signals from new_table(): a one-shot LOG_WRN when the
pool drops below 12.5 % free (always compiled in, advance notice
before the panic), and an opt-in LOG_INF on every new peak (under
CONFIG_ARM64_MMU_REPORT_XLAT_TABLES_USAGE) so the last logged "peak
N of M allocated" line gives a concrete lower bound on MAX for the
workload that ran.
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
Implement arch_pm_s2ram_suspend() and arch_pm_s2ram_resume() for
RISC-V, mirroring the ARM Cortex-M implementation. The assembly
saves and restores callee-saved GPRs, FP registers (when enabled),
and critical CSRs (mstatus, mtvec, mscratch, mie/mtvt).
Both CLIC (mtvt) and non-CLIC (mie) interrupt controller
configurations are handled via conditional compilation.
The CSR_MTVT define is placed in the shared csr.h header for
reuse across the architecture.
Signed-off-by: William Markezana <william.markezana@gmail.com>
Add new implementations for entropy driver and random subsystem
based on ARM64 RNDRRS and RNDR instructions.
Signed-off-by: Christoph Busold <cbusold@qti.qualcomm.com>
Introduce CONFIG_RISCV_S_MODE to select Supervisor-mode execution.
Add depends on !RISCV_S_MODE to RISCV_PMP since PMP CSRs are
inaccessible from S-mode.
Add an M-mode SBI shim (reset.S + sbi.S) that configures exception
delegation, PMP, and counter access before dropping to S-mode via mret.
The shim handles SBI_SET_TIMER ecalls from S-mode and forwards MTIP to
STIP so the supervisor timer driver works without a full SBI firmware.
Introduce privilege-level abstractions in isr.S (RV_CAUSE, RV_EPC,
RV_STATUS, RV_TVAL, etc.) and update all runtime code that previously
accessed M-mode-only CSRs (mcause, mtval, mstatus, mie, mip) to use the
S-mode equivalents when CONFIG_RISCV_S_MODE is set.
ARCH_EXCEPT in kernel context uses ebreak (cause=3, Breakpoint) instead
of a direct z_riscv_fatal_error() call. In S-mode, ecall (cause=9) is
kept in M-mode for SBI and never reaches the S-mode exception handler;
a direct call with NULL esf caused the stack unwinder to crash into an
infinite fault loop. ebreak is delegated to S-mode by our medeleg
configuration; isr.S treats ebreak with t0=RV_ECALL_RUNTIME_EXCEPT the
same way M-mode treats ecall-based ARCH_EXCEPT.
Signed-off-by: Alexios Lyrakis <alexios.lyrakis@gmail.com>
Modern ARM64 SoCs with SMP and 36-bit virtual addressing require more
translation tables than the current default of 8. This is particularly
evident on TI K3 SoCs (AM62X, AM62LX) where there are more SoC
peripherals which are beyond the 2MB boundary.
Add a new default: 16 tables for SMP && (ARM64_VA_BITS >= 36)
Signed-off-by: Soumya Tripathy <s-tripathy@ti.com>
arch_mem_page_in() used to clear the access flag on paged-in pages so
the first access after page-in would fault and feed the tracking hook.
This predates the LRU eviction algorithm and is no longer carrying its
weight: the demand-paging core adds a freshly paged-in frame at the
tail of the LRU queue via k_mem_paging_eviction_add(), so the first-
access AF fault ends up moving the frame from tail to tail — wasted
work.
Measured on qemu_cortex_a53 with the mem_map demand_paging test:
total page faults, eviction counts, and new-head mark counts are
byte-identical before and after this change. Only the tracking-hit
count drops (308 -> 136), which exactly corresponds to the redundant
first-access minor faults we no longer take.
The new-head path in lru_pf_remove() still clears AF on the page at
the head of the queue to detect whether it is actually stale before
eviction — that mechanism is unchanged.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
When a Data Abort or Instruction Abort exception includes a valid FAR,
walk the current TTBR0 page tables and display the PTE at each level.
This shows the actual page table content at the time of the fault,
which is invaluable for diagnosing permission faults where the PTE
may be correct (pointing to a TLB coherency issue) versus cases
where the PTE itself has wrong permissions.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
It makes sense that userspace threads shouldn't cause system level
exceptions, but there is no real dependency on that choice. Moreover
userspace applications can anyway cause exceptions by other means.
Leave the decision to the system configuration instead of making it a
hard requirement.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
acpi is not really library code based on the new definition of what
should go into lib/. Move acpi into arch/common/ as it is cross arch
feature.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
This adds support for using L32EX/S32EX for atomic CAS operation
in the architecture layer. This is an alternative to S32C1I if
the SoC supports it.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
gen_offset.h is an architecture-specific header, not a kernel one.
Move it under the arch tree where it belongs.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Include the terminating NULL character in the path length passed to the
semihosting open call, as required by some debuggers.
Signed-off-by: Tahsin Mutlugun <Tahsin.Mutlugun@analog.com>
Add semihosting support for Xtensa QEMU targets. Although QEMU uses the
same simcall instruction, it relies on different flag values for file
open operations compared to Xtensa Instruction Set Simulator.
Prioritize QEMU-specific compatibility if both SIMULATOR_XTENSA and
QEMU_TARGET are enabled.
Signed-off-by: Tahsin Mutlugun <Tahsin.Mutlugun@analog.com>
Add compile-time macros to auto-generate ARM64 MMU region entries
from devicetree nodes with zephyr,memory-attr property:
- DT_MEM_ATTR_TO_MT(): converts DT memory attributes to ARM64 MT_*
flags at compile time using the composable-bitmask design — the
generic DT_MEM_CACHEABLE bit selects cacheable vs non-cacheable,
and the arch-specific ATTR_ARM64_CACHE_WB sub-bit selects
write-back vs write-through when cacheable
- MMU_REGION_DT_FLAT_ENTRY_FROM_DT(): generates a single MMU region
entry from a DT node, skipping nodes without zephyr,memory-attr
- MMU_REGION_DT_COMPAT_FOREACH_FLAT_ENTRY_FROM_DT(): iterates over
all status="okay" nodes of a given compatible
The convenience macros in memory-attr-arm64.h are defined as
composable combinations of generic and arch-specific attributes
rather than standalone one-hot enumerations:
DT_MEM_ARM64_MMU_NORMAL_NC = (0)
DT_MEM_ARM64_MMU_NORMAL_WT = (DT_MEM_CACHEABLE)
DT_MEM_ARM64_MMU_NORMAL = (DT_MEM_CACHEABLE | ATTR_ARM64_CACHE_WB)
Only Normal memory types are supported by the DT auto-mapping
mechanism. Device memory must be mapped through the MMIO device
API instead. Unsupported zephyr,memory-attr values are rejected
at compile time via BUILD_ASSERT.
The arch-level mmu.c automatically picks up all zephyr,memory-region
nodes with zephyr,memory-attr at boot, so individual SoC
mmu_regions.c files do not need any changes.
The DT region iteration is factored into max_region_bounds() and
map_mmu_regions() helpers to avoid code duplication in
setup_page_tables().
Introduces include/zephyr/dt-bindings/memory-attr/memory-attr-arm64.h
with ARM64-specific DT memory attribute sub-bits.
Add an ARM64 Developer Guide (doc/hardware/arch/arm64.rst)
documenting the DT-based MMU region mapping feature.
Signed-off-by: Appana Durga Kedareswara rao <appana.durga.kedareswara.rao@amd.com>
Enclose default LLEXT heap section placements with
!CONFIG_LLEXT_CUSTOM_HEAP_PLACEMENT. Users who want to
write a custom linker script can include files like
snippets-noinit.ld, common-noinit.ld or the SOC linker
scripts without including the default LLEXT heap
section placements.
Signed-off-by: Lauren Murphy <lauren.murphy@intel.com>
Allow to use a switch-case instead of an array holding ISR entries.
When most of IRQs are not used, they share the same, default entry.
It results in most of the ISR array entries being identical duplicates.
This change allows to use dynamically generated function (after first
linker pass) that uses switch-case instead of a full array.
Default entries are handled only once, in a default section.
Used IRQs have their own case sections.
This can help reduce binary size.
Signed-off-by: Adam Szczygieł <adam.szczygiel@nordicsemi.no>
This will allow to reuse fcntl middle layer in other parts besides NSOS,
such as planned Native Simulator host FS mounting.
Signed-off-by: Marcin Niestroj <m.niestroj@emb.dev>
Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
Since commit 0026a5610ac ("arm64: mm: use identity mapping for device
MMIO"), device_map() creates identity mappings (VA = PA) instead of
allocating virtual addresses from a contiguous pool. Each device at a
distinct 2MB-aligned physical address now requires its own L3 page
table, increasing the total number of translation tables needed.
Bump the USERSPACE && TEST default from 24 to 28 to accommodate the
additional tables required by identity-mapped device MMIO.
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
When a memory domain is freed on Xtensa, it also has to be removed
from the global domain list. Leaving it on the list can cause
use-after-free exceptions.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
This relocation is used by the ARM TLS code to access thread local
variables. It is a simple absolute relocation that adds the symbol's
offset to the value at the location. This allows the code to access
thread local variables using a fixed offset from the thread pointer,
which is determined at runtime.
Signed-off-by: Luca Burelli <l.burelli@arduino.cc>
The official version of the RISC-V privileged architecture
specification extends the number of supported PMP registers to 64.
Signed-off-by: Christoph Busold <cbusold@qti.qualcomm.com>
The ARM64 coredump arch block did not include FP (x29) and SP,
making GDB unable to unwind the stack. The GDB stub also
misinterpreted SPSR as SP (tu[20] mapped to SP_EL0), producing
corrupted stack pointer values and broken backtraces.
Bump the arch block version to v2 (24 registers, 192 bytes)
adding FP and SP after the existing 22 registers. Update the
GDB stub to auto-detect v1 vs v2 blocks by payload size and
correctly map SPSR (skip), ELR (PC), FP (x29), and SP.
When CONFIG_ARM64_SAFE_EXCEPTION_STACK is enabled and the
exception originated from EL0, use the saved esf->sp (original
sp_el0 stored by the exception entry code) instead of computing
it from the ESF address, since the exception handler may be
running on a separate safe stack.
Fixes#99054
Signed-off-by: Anirudha Sarangi <anirudha.sarangi@amd.com>
Signed-off-by: Appana Durga Kedareswara rao <appana.durga.kedareswara.rao@amd.com>
On ARM64, Zephyr uses identity mappings (VA = PA) for kernel code, data and
boot-time device regions. The MMU fully supports address translation but
Zephyr uses it primarily for access permission enforcement.
There are currently two independent paths for mapping device MMIO regions:
1. SoC-level mmu_regions.c files use MMU_REGION_FLAT_ENTRY() to create
identity mappings (VA = PA) directly in the page tables at boot. This
bypasses the kernel's virtual memory tracking entirely. SoC maintainers
must manually list peripherals in mmu_regions.c for drivers that do not
use the device MMIO API (e.g. most existing drivers) or cannot use it
(e.g. the GIC, which is not a regular driver).
2. The device MMIO API (device_map()) goes through k_mem_map_phys_bare(),
which allocates a virtual address from the SRAM range and maps (VA !=
PA) device registers there. Mapping device MMIO into the SRAM virtual
address space is nonsensical: it conflates device registers with memory,
wastes virtual address pool space, and produces addresses that bear no
relation to the hardware.
The CONFIG_KERNEL_DIRECT_MAP mechanism already supports identity mapping
through k_mem_map_phys_bare() with the K_MEM_DIRECT_MAP flag, but it
requires each board defconfig to enable the Kconfig and each driver to
explicitly pass the flag.
Make identity-mapped device MMIO automatic on ARM64:
1. ARM64 CPU_CORTEX_A selects KERNEL_DIRECT_MAP when MMU is enabled. This
eliminates the need for per-board defconfig opt-in.
2. device_map() automatically injects K_MEM_DIRECT_MAP when
CONFIG_KERNEL_DIRECT_MAP is enabled. This is transparent to
drivers so no per-driver changes needed. The flag is gated on
CONFIG_KERNEL_DIRECT_MAP rather than CONFIG_ARM64, keeping it
architecture-agnostic.
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
Implement thread-based unwinding to support the 'kernel thread unwind'
shell command on ARM64. This update ensures that 'thread' defaults to
'_current' when NULL, complying with the arch_stack_walk() API contract.
To enhance security, add stack bounds validation using stack_info,
and TLS pointers of the target thread. If these are not available, the
logic correctly falls back to is_address_mapped() to ensure robustness
during the unwinding process.
Signed-off-by: Archilis Wang <awm02289@gmail.com>