zephyr/kernel/mempool.c

143 lines
3 KiB
C
Raw Permalink Normal View History

k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
/*
* Copyright (c) 2017 Intel Corporation
*
* SPDX-License-Identifier: Apache-2.0
*/
#include <zephyr/kernel.h>
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
#include <string.h>
#include <zephyr/sys/math_extras.h>
#include <zephyr/sys/util.h>
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
static void *z_heap_aligned_alloc(struct k_heap *heap, size_t align, size_t size)
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
{
void *mem;
struct k_heap **heap_ref;
size_t __align;
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
/*
* Adjust the size to make room for our heap reference.
* Merge a rewind bit with align value (see sys_heap_aligned_alloc()).
* This allows for storing the heap pointer right below the aligned
* boundary without wasting any memory.
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
*/
if (size_add_overflow(size, sizeof(heap_ref), &size)) {
return NULL;
}
__align = align | sizeof(heap_ref);
mem = k_heap_aligned_alloc(heap, __align, size, K_NO_WAIT);
if (mem == NULL) {
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
return NULL;
}
heap_ref = mem;
*heap_ref = heap;
mem = ++heap_ref;
__ASSERT(align == 0 || ((uintptr_t)mem & (align - 1)) == 0,
"misaligned memory at %p (align = %zu)", mem, align);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
return mem;
}
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
void k_free(void *ptr)
{
struct k_heap **heap_ref;
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
if (ptr != NULL) {
heap_ref = ptr;
ptr = --heap_ref;
SYS_PORT_TRACING_OBJ_FUNC_ENTER(k_heap_sys, k_free, *heap_ref, heap_ref);
k_heap_free(*heap_ref, ptr);
SYS_PORT_TRACING_OBJ_FUNC_EXIT(k_heap_sys, k_free, *heap_ref, heap_ref);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
}
}
kernel: Introduce a way to specify minimum system heap size There are several subsystems and boards which require a relatively large system heap (used by k_malloc()) to function properly. This became even more notable with the recent introduction of the ACPICA library, which causes ACPI-using boards to require a system heap of up to several megabytes in size. Until now, subsystems and boards have tried to solve this by having Kconfig overlays which modify the default value of HEAP_MEM_POOL_SIZE. This works ok, except when applications start explicitly setting values in their prj.conf files: $ git grep CONFIG_HEAP_MEM_POOL_SIZE= tests samples|wc -l 157 The vast majority of values set by current sample or test applications is much too small for subsystems like ACPI, which results in the application not being able to run on such boards. To solve this situation, we introduce support for subsystems to specify their own custom system heap size requirement. Subsystems do this by defining Kconfig options with the prefix HEAP_MEM_POOL_ADD_SIZE_. The final value of the system heap is the sum of the custom minimum requirements, or the value existing HEAP_MEM_POOL_SIZE option, whichever is greater. We also introduce a new HEAP_MEM_POOL_IGNORE_MIN Kconfig option which applications can use to force a lower value than what subsystems have specficied, however this behavior is disabled by default. Whenever the minimum is greater than the requested value a CMake warning will be issued in the build output. This patch ends up modifying several places outside of kernel code, since the presence of the system heap is no longer detected using a non-zero CONFIG_HEAP_MEM_POOL_SIZE value, rather it's now detected using a new K_HEAP_MEM_POOL_SIZE value that's evaluated at build. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2023-11-29 10:22:39 +01:00
#if (K_HEAP_MEM_POOL_SIZE > 0)
kernel: Introduce a way to specify minimum system heap size There are several subsystems and boards which require a relatively large system heap (used by k_malloc()) to function properly. This became even more notable with the recent introduction of the ACPICA library, which causes ACPI-using boards to require a system heap of up to several megabytes in size. Until now, subsystems and boards have tried to solve this by having Kconfig overlays which modify the default value of HEAP_MEM_POOL_SIZE. This works ok, except when applications start explicitly setting values in their prj.conf files: $ git grep CONFIG_HEAP_MEM_POOL_SIZE= tests samples|wc -l 157 The vast majority of values set by current sample or test applications is much too small for subsystems like ACPI, which results in the application not being able to run on such boards. To solve this situation, we introduce support for subsystems to specify their own custom system heap size requirement. Subsystems do this by defining Kconfig options with the prefix HEAP_MEM_POOL_ADD_SIZE_. The final value of the system heap is the sum of the custom minimum requirements, or the value existing HEAP_MEM_POOL_SIZE option, whichever is greater. We also introduce a new HEAP_MEM_POOL_IGNORE_MIN Kconfig option which applications can use to force a lower value than what subsystems have specficied, however this behavior is disabled by default. Whenever the minimum is greater than the requested value a CMake warning will be issued in the build output. This patch ends up modifying several places outside of kernel code, since the presence of the system heap is no longer detected using a non-zero CONFIG_HEAP_MEM_POOL_SIZE value, rather it's now detected using a new K_HEAP_MEM_POOL_SIZE value that's evaluated at build. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2023-11-29 10:22:39 +01:00
K_HEAP_DEFINE(_system_heap, K_HEAP_MEM_POOL_SIZE);
#define _SYSTEM_HEAP (&_system_heap)
void *k_aligned_alloc(size_t align, size_t size)
{
__ASSERT(align / sizeof(void *) >= 1
&& (align % sizeof(void *)) == 0,
"align must be a multiple of sizeof(void *)");
__ASSERT((align & (align - 1)) == 0,
"align must be a power of 2");
SYS_PORT_TRACING_OBJ_FUNC_ENTER(k_heap_sys, k_aligned_alloc, _SYSTEM_HEAP);
void *ret = z_heap_aligned_alloc(_SYSTEM_HEAP, align, size);
SYS_PORT_TRACING_OBJ_FUNC_EXIT(k_heap_sys, k_aligned_alloc, _SYSTEM_HEAP, ret);
return ret;
}
void *k_malloc(size_t size)
{
SYS_PORT_TRACING_OBJ_FUNC_ENTER(k_heap_sys, k_malloc, _SYSTEM_HEAP);
void *ret = k_aligned_alloc(sizeof(void *), size);
SYS_PORT_TRACING_OBJ_FUNC_EXIT(k_heap_sys, k_malloc, _SYSTEM_HEAP, ret);
return ret;
}
void *k_calloc(size_t nmemb, size_t size)
{
void *ret;
size_t bounds;
SYS_PORT_TRACING_OBJ_FUNC_ENTER(k_heap_sys, k_calloc, _SYSTEM_HEAP);
if (size_mul_overflow(nmemb, size, &bounds)) {
SYS_PORT_TRACING_OBJ_FUNC_EXIT(k_heap_sys, k_calloc, _SYSTEM_HEAP, NULL);
return NULL;
}
ret = k_malloc(bounds);
if (ret != NULL) {
(void)memset(ret, 0, bounds);
}
SYS_PORT_TRACING_OBJ_FUNC_EXIT(k_heap_sys, k_calloc, _SYSTEM_HEAP, ret);
return ret;
}
void k_thread_system_pool_assign(struct k_thread *thread)
{
thread->resource_pool = _SYSTEM_HEAP;
}
#else
#define _SYSTEM_HEAP NULL
#endif /* K_HEAP_MEM_POOL_SIZE */
void *z_thread_aligned_alloc(size_t align, size_t size)
{
void *ret;
struct k_heap *heap;
if (k_is_in_isr()) {
heap = _SYSTEM_HEAP;
} else {
heap = _current->resource_pool;
}
if (heap != NULL) {
ret = z_heap_aligned_alloc(heap, align, size);
} else {
ret = NULL;
}
return ret;
}