arch_mem_map() takes in some flags to describe the to-be mapped
memory regions' permissions and cache status. When the flags are
translated, they become attributes in PTEs. So for functions
being called by arch_mem_map() and beyond, rename flags to
attrs to better describe its purpose.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
In functions which manipulate both L1 and L2 tables, make
the variable names obvious by prefixing them with l1_ or l2_.
This is mainly done to avoid confusion when reading through
those functions.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The assert error messages when l2_page_table_map() fails are not
correct. It returns false when it cannot allocate new L2 table,
and it is not able the address having already mapped. So correct
the error message.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This adds a function to handle DTLB multihit exception when
userspace is enabled, as this exception may be raised due to
the same page being able to accessed by both kernel and user
threads. The auto-refill DTLBs may contain entries for same
page, one for kernel and one for user under some situations.
We need to invalidate those existing DTLB entries so that
hardware can reload from the page table.
This is an alternative to the kconfig invalidating DTLBs on
every swap: CONFIG_XTENSA_MMU_FLUSH_AUTOREFILL_DTLBS_ON_SWAP.
Although this does not have extra processing per context
switching, exception handling itself has a high cost. So
care needs to be taken on whether to enable that kconfig.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
When a page can be accessed by both kernel and user threads,
the autofill DTLB may contain an entry for kernel thread.
This will result in load/store ring exception when it is
accessed by user thread later. In this case, we need to
invalidate all associated TLBs related to kernel access so
hardware can reload the page table the correct permission
for user thread.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The software bits inside PTE are used to store original PTE
attributes and ring value, and those bits are used to
restore PTE to previous state.
This modifies reset_region() to properly restore attributes
and ring value when resetting memory regions to the same as
when the system boots.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Several static functions in ptables.c always return 0, make them void
to improve readability.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
This adds a new kconfig and corresponding code to allow flushing
auto-refill data TLBs when page tables are swapped (e.g. during
context switching). This is mainly used to avoid multi-hit TLB
exception raised by certain memory access pattern. If memory is
only marked for user mode access but not inside a memory domain,
accessing that page in kernel mode would result in a TLB being
filled with kernel ASID. When going back into user mode, access
to the memory would result in another TLB being filled with
the user mode ASID. Now there are two entries on the same memory
page, and the multi-hit TLB exception will be raised if that
memory page is accessed. This type of access is better served
using memory partition and memory domain to share data. However,
this type of access is not prohibited but highly discouraged.
Wrapping the code in kconfig is simply because of the execution
penalty as there will be unnecessary TLB refilling being done.
So only enable this if necessary.
Fixes#88772
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Since the necessary register values are now pre-computed and
stored in the memory domain struct, we can use them directly
in various assembly locations, thus replacing the function
call to xtensa_swap_update_page_tables().
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Instead of computing all the needed register values when
swapping page tables, we can pre-compute those values when
the memory domain is first initialized. Should save some
time during context switching.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Remove CONFIG_XTENSA_INVALIDATE_MEM_DOMAIN_TLB_ON_SWAP as it is
remnant from early MMU enabling work which is not needed as
the page table code is different from early version where
the PTEVADDR would be the same for all memory domains.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
thread_page_tables_get() is only used when userspace is
enabled. So move it with userspace #ifdef, or else
compiler would complain about it being unused.
Fixes#88421
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Inside map_memory() with double mapping enabled, we should not
be mapping the memory with the incoming attributes as-is since
the incoming address may be on un-cached region but with
caching attribute. So we need to sanitize the attributes
according to the incoming address.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Mostly a revert of commit b1def7145f ("arch: deprecate `_current`").
This commit was part of PR #80716 whose initial purpose was about providing
an architecture specific optimization for _current. The actual deprecation
was sneaked in later on without proper discussion.
The Zephyr core always used _current before and that was fine. It is quite
prevalent as well and the alternative is proving rather verbose.
Furthermore, as a concept, the "current thread" is not something that is
necessarily architecture specific. Therefore the primary abstraction
should not carry the arch_ prefix.
Hence this revert.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
`_current` is now functionally equals to `arch_curr_thread()`, remove
its usage in-tree and deprecate it instead of removing it outright,
as it has been with us since forever.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
xtensa_mmu_init() is called really early in the boot process
where the _kernel struct has not yet been initialized, and
thus we cannot use it to determine if the current CPU is
the boot CPU. In some cases, this may skip the call to
initialize the page tables which leaves us with incorrect
page table entries. Fix it by using a static variable to
determine whether the page tables have been initialized so
we only do it once per boot.
Fixes#76909
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
ZSR_DEPC_SAVE is being used to determine whether we are faulting
inside double exception if this is not zero. It is possible that
the boot ROM or custom startup code leaves this non-zero, which
would result in a fake triple fault. So clear it at boot. Note
that the zeroing is done in MMU init code as these triple
faults are not actual hardware ones but only semantics, and will
occur once MMU is enabled.
Fixes#75194
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This adds a new function xtensa_mem_kernel_has_access() to
determine if a memory region can be accessed by kernel threads.
This allows checking for valid mapped memory before accessing
them to avoid relying on page faults to detect invalid access.
Also fixed an issue with arch_buffer_validate() on MPU where
it may return okay even if the incoming memory region has no
corresponding entry in the MPU table.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
arch_buffer_validate() is only to verify that user threads have
access to the memory region. It should not be used to verify
if kernel thread has access (which they should anyway). So
change the logic.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Also any demand paging and page frame related bits are
renamed.
This is part of a series to move memory management related
stuff out of the Z_ namespace into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
With power managment is enabled, depending on the SoC power state
used when idle, the MMU may lose context and may need to be re-initialized.
When re-initializing the MMU, we must not re-create the page table
because it may overwrite changes done during the execution, but we still
need to set the asid and page table for the current context.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
The only page table duplicated is the kernel page table. This function
does not need a parameter.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
We can use some extra bits available for SW implementation to
save original permissions and avoid duplicating the kernel page tables
for the default memory domain.
Whe duplicating the page table to a new domain we just ensure
to restore the original map.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Simplify the logic around the shared attribute. Checks if a memory
region should be shared only in the function that actually maps the
memory.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Introduce z_page_frame_set() and z_page_frame_clear() to manipulate
flags. Obtain the virtual address using the existing
z_page_frame_to_virt(). This will make changes to the page frame
structure easier.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
When duplicating a page table, we don't need to copy
the mapping to the kernel l1 page table virtual address.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
The #pragma to ignore array bounds for xtensa_soc_mmu_ranges[]
was a remnant before code refactoring during review. Since
this array is no longer declared __weak and as a zero length
array, the #pragma to ignore array bounds is no longer needed.
So remove them.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This adds a kconfig to enable invalidating the TLBs related to
the incoming thread's memory domain during page table swaps.
It provides a workaround, if needed, to clear out stale TLB
entries used by the thread being swapped out. Those stale
entries may contain incorrect permissions and rings.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
There are several subsystems and boards which require a relatively large
system heap (used by k_malloc()) to function properly. This became even
more notable with the recent introduction of the ACPICA library, which
causes ACPI-using boards to require a system heap of up to several
megabytes in size.
Until now, subsystems and boards have tried to solve this by having
Kconfig overlays which modify the default value of HEAP_MEM_POOL_SIZE.
This works ok, except when applications start explicitly setting values
in their prj.conf files:
$ git grep CONFIG_HEAP_MEM_POOL_SIZE= tests samples|wc -l
157
The vast majority of values set by current sample or test applications
is much too small for subsystems like ACPI, which results in the
application not being able to run on such boards.
To solve this situation, we introduce support for subsystems to specify
their own custom system heap size requirement. Subsystems do
this by defining Kconfig options with the prefix HEAP_MEM_POOL_ADD_SIZE_.
The final value of the system heap is the sum of the custom
minimum requirements, or the value existing HEAP_MEM_POOL_SIZE option,
whichever is greater.
We also introduce a new HEAP_MEM_POOL_IGNORE_MIN Kconfig option which
applications can use to force a lower value than what subsystems have
specficied, however this behavior is disabled by default.
Whenever the minimum is greater than the requested value a CMake warning
will be issued in the build output.
This patch ends up modifying several places outside of kernel code,
since the presence of the system heap is no longer detected using a
non-zero CONFIG_HEAP_MEM_POOL_SIZE value, rather it's now detected using
a new K_HEAP_MEM_POOL_SIZE value that's evaluated at build.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This follows the idea to remove any z_ prefix. Since MMU has
a large number of these, separate out these changes into one
commit to ease review effort.
Since these are no longer have z_, these need proper doxygen
doc. So add them too.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Fix the way we read the current l1 page table set in the mmu.
We use it check if the current page table is different from the
running thread.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
The ring field in the pte mapping a memory partition should
be based in the partition attribute and not in the domain
asid that is used only to set the ASID (in RASID) position for
user ring.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Andy Ross re-implementation of MMU layer with some subtle changes,
like re-using existent macros, fix page table cache property when
direct mapping it in TLB.
From Andy's original commit message:
This is a reworked MMU layer, sitting cleanly below the page table
handling in the OS. Notable differences from the original work:
+ Significantly smaller code and simpler API (just three functions to
be called from the OS/userspace/ptable layer).
+ Big README-MMU document containing my learnings over the process, so
hopefully fewer people need to go through this in the future.
+ No TLB flushing needed. Clean separation of ASIDs, just requires
that the upper levels match the ASID to the L1 page table page
consistently.
+ Vector mapping is done with a 4k page and not a 4M page, leading to
much more flexibility with hardware memory layout. The original
scheme required that the 4M region containing vecbase be mapped
virtually to a location other than the hardware address, which makes
confusing linkage with call0 and difficult initialization
constraints where the exception vectors run at different addresses
before and after MMU setup (effectively forcing them to be PIC
code).
+ More provably correct initialization, all MMU changes happen in a
single asm block with no memory accesses which would generate a
refill.
Signed-off-by: Andy Ross <andyross@google.com>
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>