doc: llext: add extension debugging guide
Add a new section to the llext documentation that explains how to debug extensions and how to address the issues that may arise when doing so. Signed-off-by: Luca Burelli <l.burelli@arduino.cc>
This commit is contained in:
parent
8660020205
commit
00ccbce2c4
3 changed files with 361 additions and 7 deletions
359
doc/services/llext/debug.rst
Normal file
359
doc/services/llext/debug.rst
Normal file
|
@ -0,0 +1,359 @@
|
|||
.. _llext_debug:
|
||||
|
||||
Debugging extensions
|
||||
####################
|
||||
|
||||
Debugging extensions is a complex task. Since the extension code is by
|
||||
definition not built with the Zephyr application, the final Zephyr ELF file
|
||||
does not contain the symbols for extension code. Furthermore, the extension is
|
||||
dynamically relocated by :c:func:`llext_load` at runtime, so even if the
|
||||
symbols were available, it would be impossible for the debugger to know the
|
||||
final locations of the symbols in the extension code.
|
||||
|
||||
Setting up the debugger session properly in this case requires a few manual
|
||||
steps. The following sections will provide some tips on how to do it with the
|
||||
Zephyr SDK and the debug features provided by ``west``, but the instructions
|
||||
can be adapted to any GDB-based debugging environment.
|
||||
|
||||
Extension debugging process
|
||||
===========================
|
||||
|
||||
1. Make sure the project is set up to display the verbose LLEXT debug output
|
||||
(:kconfig:option:`CONFIG_LOG` and :kconfig:option:`CONFIG_LLEXT_LOG_LEVEL_DBG`
|
||||
are set).
|
||||
|
||||
2. Build the Zephyr application and the extensions.
|
||||
|
||||
For each target ``name`` included in the current build, two files will be
|
||||
generated into the ``llext`` subdirectory of the build root:
|
||||
|
||||
``name_ext_debug.elf``
|
||||
|
||||
An intermediate ELF file with full debugging information.
|
||||
|
||||
``name.llext``
|
||||
|
||||
The final extension binary, stripped to the essential data required for
|
||||
loading into the Zephyr application.
|
||||
|
||||
Other files may be present, depending on the target architecture and the
|
||||
build configuration.
|
||||
|
||||
3. Start a debugging session of the main Zephyr application. This is described
|
||||
in the :ref:`Debugging <west-debugging>` section of the documentation; on
|
||||
supported boards it is as easy as running ``west debug``, perhaps with some
|
||||
additional arguments.
|
||||
|
||||
4. Set a breakpoint just after the :c:func:`llext_load` function in your code
|
||||
and let it run. This will load the extension into memory and relocate it.
|
||||
The output logs will contain a line with ``gdb add-symbol-file flags:``,
|
||||
followed by lines all starting with ``-s``.
|
||||
|
||||
5. Type this command in the GDB console to load this extension's symbols:
|
||||
|
||||
.. code-block::
|
||||
|
||||
add-symbol-file <path-to-debug.elf> <load-addresses>
|
||||
|
||||
where ``<path-to-debug.elf>`` is the full path of the ELF file with debug
|
||||
information identified in step 2, and ``<load-addresses>`` is a space
|
||||
separated list of all the ``-s`` lines collected from the log in the
|
||||
previous step.
|
||||
|
||||
6. The extension symbols are now available to the debugger. You can set
|
||||
breakpoints, inspect variables, and step through the code as usual.
|
||||
|
||||
Steps 4-6 can be repeated for every extension that is loaded by the
|
||||
application, if there are several.
|
||||
|
||||
Symbol lookup issues
|
||||
====================
|
||||
|
||||
.. warning::
|
||||
|
||||
It is almost certain that the loaded symbols will be shadowed by others in
|
||||
the main application; for example, they may be located inside the memory
|
||||
area of the ELF buffer or the LLEXT heap.
|
||||
|
||||
In this case GDB chooses the first known symbol and therefore associates the
|
||||
addresses to some ``elf_buffer+0x123`` instead of an expected ``ext_fn``.
|
||||
This further confuses its high-level operations like source stepping or
|
||||
inspecting locals, since they are meaningless in that context.
|
||||
|
||||
Two possible solutions to this problem are discussed in the following
|
||||
paragraphs.
|
||||
|
||||
Discard all Zephyr symbols
|
||||
--------------------------
|
||||
|
||||
The simplest option is to drop all the Zephyr application symbols from GDB by
|
||||
invoking ``add-symbol-file`` with no arguments, before step 5. This will
|
||||
however focus the debugging session to the llext only, as all information about
|
||||
the Zephyr application will be lost. For example, the debugger may not be able to
|
||||
properly follow stack traces outside the extension code.
|
||||
|
||||
It is possible to use the same technique multiple times in the same session to
|
||||
switch between the main and extension symbol tables as required, but it rapidly
|
||||
becomes cumbersome.
|
||||
|
||||
Edit the ELF file
|
||||
-----------------
|
||||
|
||||
This alternative is more complex but allows for a more seamless debugging
|
||||
experience. The idea is to edit the main Zephyr ELF file to remove information
|
||||
about the symbols that overlap with the extension that is to be debugged, so
|
||||
that when the extension symbols are loaded, GDB will not have any ambiguity.
|
||||
This can be done by using ``objcopy`` with the ``-N <symbol>`` option.
|
||||
|
||||
Identifying the offending symbols is however an iterative trial-and-error
|
||||
procedure, as there can be many different layers; for example, the ELF buffer
|
||||
may be itself contained in a symbol for the data segment. Fortunately, this
|
||||
knowledge can then be used several times as the list is unlikely to change for
|
||||
a given project.
|
||||
|
||||
Example debugging session
|
||||
=========================
|
||||
|
||||
This example demonstrates how to debug the ``detached_fn`` extension in the
|
||||
``tests/subsys/llext`` project (specifically, the ``writable`` case), on an
|
||||
emulated ``mps2/an385`` board which is based on an ARM Cortex-M3.
|
||||
|
||||
.. note::
|
||||
|
||||
The logs below have been obtained using Zephyr version 4.1 and the Zephyr
|
||||
SDK version 0.17.0. However, the exact addresses may still vary between
|
||||
runs even when using the same versions. Adjust the commands below to
|
||||
match the results of your own session.
|
||||
|
||||
The following command will build the project and start the emulator in
|
||||
debugging mode:
|
||||
|
||||
.. code-block::
|
||||
:caption: Terminal 1 (build, QEMU emulator, GDB server)
|
||||
|
||||
zephyr$ west build -p -b mps2/an385 tests/subsys/llext/ -T llext.writable -t debugserver_qemu
|
||||
-- west build: generating a build system
|
||||
[...]
|
||||
-- west build: running target debugserver_qemu
|
||||
[...]
|
||||
[186/187] To exit from QEMU enter: 'CTRL+a, x'[QEMU] CPU: cortex-m3
|
||||
|
||||
On a separate terminal, set ``ZEPHYR_SDK_INSTALL_DIR`` to the directory for the
|
||||
Zephyr SDK on your installation, then start the GDB client for the target:
|
||||
|
||||
.. code-block::
|
||||
:caption: Terminal 2 (GDB client)
|
||||
|
||||
zephyr$ export LLEXT_SDK_INSTALL_DIR=/opt/zephyr-sdk-0.17.0
|
||||
zephyr$ ${LLEXT_SDK_INSTALL_DIR}/arm-zephyr-eabi/bin/arm-zephyr-eabi-gdb build/zephyr/zephyr.elf
|
||||
GNU gdb (Zephyr SDK 0.17.0) 12.1
|
||||
[...]
|
||||
Reading symbols from build/zephyr/zephyr.elf...
|
||||
(gdb)
|
||||
|
||||
Connect, set a breakpoint on the ``llext_load`` function and run until it
|
||||
finishes:
|
||||
|
||||
.. code-block::
|
||||
:caption: Terminal 2 (GDB client)
|
||||
|
||||
(gdb) target extended-remote :1234
|
||||
Remote debugging using :1234
|
||||
z_arm_reset () at zephyr/arch/arm/core/cortex_m/reset.S:124
|
||||
124 movs.n r0, #_EXC_IRQ_DEFAULT_PRIO
|
||||
(gdb) break llext_load
|
||||
Breakpoint 1 at 0x236c: file zephyr/subsys/llext/llext.c, line 168.
|
||||
(gdb) continue
|
||||
Continuing.
|
||||
|
||||
Breakpoint 1, llext_load (ldr=ldr@entry=0x2000bef0 <ztest_thread_stack+3488>,
|
||||
name=name@entry=0x9d98 "test_detached",
|
||||
ext=ext@entry=0x2000abb8 <detached_llext>,
|
||||
ldr_parm=ldr_parm@entry=0x2000bee8 <ztest_thread_stack+3480>)
|
||||
at zephyr/subsys/llext/llext.c:168
|
||||
168 *ext = llext_by_name(name);
|
||||
(gdb) finish
|
||||
Run till exit from #0 llext_load ([...])
|
||||
at zephyr/subsys/llext/llext.c:168
|
||||
llext_test_detached () at zephyr/tests/subsys/llext/src/test_llext.c:481
|
||||
481 zassert_ok(res, "load should succeed");
|
||||
|
||||
The first terminal will have printed lots of debugging information related to
|
||||
the extension loading. Find the section with the addresses:
|
||||
|
||||
.. code-block::
|
||||
:caption: Terminal 1 (build, QEMU emulator, GDB server)
|
||||
|
||||
[...]
|
||||
D: Allocate and copy regions...
|
||||
[...]
|
||||
D: gdb add-symbol-file flags:
|
||||
D: -s .text 0x20000034
|
||||
D: -s .data 0x200000b4
|
||||
D: -s .bss 0x2000c2e0
|
||||
D: -s .rodata 0x200000b8
|
||||
D: -s .detach 0x200001d0
|
||||
D: Counting exported symbols...
|
||||
[...]
|
||||
|
||||
Use these addresses to load the symbols into GDB:
|
||||
|
||||
.. code-block::
|
||||
:caption: Terminal 2 (GDB client)
|
||||
|
||||
(gdb) add-symbol-file build/llext/detached_fn_ext_debug.elf -s .text 0x20000034 -s .data 0x200000b4 -s .bss 0x2000c2e0 -s .rodata 0x200000b8 -s .detach 0x200001d0
|
||||
add symbol table from file "build/llext/detached_fn_ext_debug.elf" at
|
||||
.text_addr = 0x20000034
|
||||
.data_addr = 0x200000b4
|
||||
.bss_addr = 0x2000c2e0
|
||||
.rodata_addr = 0x200000b8
|
||||
.detach_addr = 0x200001d0
|
||||
(y or n) y
|
||||
Reading symbols from build/llext/detached_fn_ext_debug.elf...
|
||||
(gdb) break detached_entry
|
||||
Breakpoint 2 at 0x200001d0 (2 locations)
|
||||
(gdb) continue
|
||||
Continuing.
|
||||
|
||||
Breakpoint 2, 0x200001d0 in test_detached_ext ()
|
||||
(gdb) backtrace
|
||||
#0 0x200001d0 in test_detached_ext ()
|
||||
#1 0x200000ac in test_detached_ext ()
|
||||
#2 0x00000706 in llext_test_detached () at zephyr/tests/subsys/llext/src/test_llext.c:496
|
||||
#3 0x00001a36 in run_test_functions (suite=0x92bc <z_ztest_test_node_llext>, data=0x0 <cbvprintf_package>, test=0x92d8 <z_ztest_unit_test.llext.test_detached>) at zephyr/subsys/testsuite/ztest/src/ztest.c:328
|
||||
#4 test_cb (a=0x92bc <z_ztest_test_node_llext>, b=0x92d8 <z_ztest_unit_test.llext.test_detached>, c=0x0 <cbvprintf_package>) at zephyr/subsys/testsuite/ztest/src/ztest.c:662
|
||||
#5 0x00000e96 in z_thread_entry (entry=0x1a05 <test_cb>, p1=0x92bc <z_ztest_test_node_llext>, p2=0x92d8 <z_ztest_unit_test.llext.test_detached>, p3=0x0 <cbvprintf_package>) at zephyr/lib/os/thread_entry.c:48
|
||||
#6 0x00000000 in ?? ()
|
||||
|
||||
The symbol associated with the breakpoint location and the last stack frames
|
||||
mistakenly reference the ELF buffer in the Zephyr application instead of the
|
||||
extension symbols. Note that GDB however knows both:
|
||||
|
||||
.. code-block::
|
||||
:caption: Terminal 2 (GDB client)
|
||||
|
||||
(gdb) info sym 0x200001d0
|
||||
test_detached_ext + 464 in section datas of zephyr/build/zephyr/zephyr.elf
|
||||
detached_entry in section .detach of zephyr/build/llext/detached_fn_ext_debug.elf
|
||||
(gdb) info sym 0x200000ac
|
||||
test_detached_ext + 172 in section datas of zephyr/build/zephyr/zephyr.elf
|
||||
test_entry + 8 in section .text of zephyr/build/llext/detached_fn_ext_debug.elf
|
||||
|
||||
It is also impossible to inspect the variables in the extension or step through
|
||||
code properly:
|
||||
|
||||
.. code-block::
|
||||
:caption: Terminal 2 (GDB client)
|
||||
|
||||
(gdb) print bss_cnt
|
||||
No symbol "bss_cnt" in current context.
|
||||
(gdb) print data_cnt
|
||||
No symbol "data_cnt" in current context.
|
||||
(gdb) next
|
||||
Single stepping until exit from function test_detached_ext,
|
||||
which has no line number information.
|
||||
|
||||
Breakpoint 2, 0x200001ea in test_detached_ext ()
|
||||
(gdb)
|
||||
|
||||
Discarding symbols
|
||||
------------------
|
||||
|
||||
Discarding the Zephyr symbols and only focusing on the extension restores full
|
||||
debugging functionality at the cost of losing the global context (note the
|
||||
backtrace stops outside the extension):
|
||||
|
||||
.. code-block::
|
||||
:caption: Terminal 2 (GDB client)
|
||||
|
||||
(gdb) symbol-file
|
||||
Discard symbol table from `zephyr/build/zephyr/zephyr.elf'? (y or n) y
|
||||
Error in re-setting breakpoint 1: No symbol table is loaded. Use the "file" command.
|
||||
No symbol file now.
|
||||
(gdb) add-symbol-file build/llext/detached_fn_ext_debug.elf -s .text 0x20000034 -s .data 0x200000b4 -s .bss 0x2000c2e0 -s .rodata 0x200000b8 -s .detach 0x200001d0
|
||||
add symbol table from file "build/llext/detached_fn_ext_debug.elf" at
|
||||
.text_addr = 0x20000034
|
||||
.data_addr = 0x200000b4
|
||||
.bss_addr = 0x2000c2e0
|
||||
.rodata_addr = 0x200000b8
|
||||
.detach_addr = 0x200001d0
|
||||
(y or n) y
|
||||
Reading symbols from build/llext/detached_fn_ext_debug.elf...
|
||||
(gdb) backtrace
|
||||
#0 detached_entry () at zephyr/tests/subsys/llext/src/detached_fn_ext.c:18
|
||||
#1 0x200000ac in test_entry () at zephyr/tests/subsys/llext/src/detached_fn_ext.c:26
|
||||
#2 0x00000706 in ?? ()
|
||||
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
|
||||
(gdb) next
|
||||
19 zassert_true(data_cnt < 0);
|
||||
(gdb) print bss_cnt
|
||||
$1 = 1
|
||||
(gdb) print data_cnt
|
||||
$2 = -2
|
||||
(gdb)
|
||||
|
||||
|
||||
Editing the ELF file
|
||||
--------------------
|
||||
|
||||
In this alternative approach, the patches to the Zephyr ELF file must be
|
||||
performed after building the Zephyr binary and starting the emulator on
|
||||
Terminal 1, but before starting the GDB client on Terminal 2.
|
||||
|
||||
The above debugging session already identified ``test_detached_ext``, the char
|
||||
array that holds the ELF file, as an offending symbol, so that will be removed
|
||||
in a first pass. Performing the same steps multiple times, ``__data_start`` and
|
||||
``__data_region_start`` can also be found to overlap the memory area of
|
||||
interest.
|
||||
|
||||
The following commands will remove all of these from the Zephyr ELF file, then
|
||||
start a debugging session on the modified file:
|
||||
|
||||
.. code-block::
|
||||
:caption: Terminal 2 (GDB client)
|
||||
|
||||
zephyr$ export LLEXT_SDK_INSTALL_DIR=/opt/zephyr-sdk-0.17.0
|
||||
zephyr$ ${LLEXT_SDK_INSTALL_DIR}/arm-zephyr-eabi/bin/arm-zephyr-eabi-objcopy -N test_detached_ext -N __data_start -N __data_region_start build/zephyr/zephyr.elf build/zephyr/zephyr-edit.elf
|
||||
zephyr$ ${LLEXT_SDK_INSTALL_DIR}/arm-zephyr-eabi/bin/arm-zephyr-eabi-gdb build/zephyr/zephyr-edit.elf
|
||||
GNU gdb (Zephyr SDK 0.17.0) 12.1
|
||||
[...]
|
||||
Reading symbols from build/zephyr/zephyr-edit.elf...
|
||||
(gdb)
|
||||
|
||||
The same steps used in the previous run can be performed again to attach to the
|
||||
GDB server and load both the extension and its debug symbols. This time, however,
|
||||
the result is rather different:
|
||||
|
||||
* the ``break`` command includes line number information;
|
||||
|
||||
* the output from ``backtrace`` contains functions from both the extension and
|
||||
the Zephyr application;
|
||||
|
||||
* the local variables can be properly inspected.
|
||||
|
||||
.. code-block::
|
||||
:caption: Terminal 2 (GDB client)
|
||||
|
||||
(gdb) add-symbol-file build/llext/detached_fn_ext_debug.elf [...]
|
||||
[...]
|
||||
Reading symbols from build/llext/detached_fn_ext_debug.elf...
|
||||
(gdb) break detached_entry
|
||||
Breakpoint 2 at 0x200001d6: file zephyr/tests/subsys/llext/src/detached_fn_ext.c, line 17.
|
||||
(gdb) continue
|
||||
Continuing.
|
||||
|
||||
Breakpoint 2, detached_entry () at zephyr/tests/subsys/llext/src/detached_fn_ext.c:17
|
||||
17 printk("bss %u @ %p\n", bss_cnt++, &bss_cnt);
|
||||
(gdb) backtrace
|
||||
#0 detached_entry () at zephyr/tests/subsys/llext/src/detached_fn_ext.c:17
|
||||
#1 0x200000ac in test_entry () at zephyr/tests/subsys/llext/src/detached_fn_ext.c:26
|
||||
#2 0x00000706 in llext_test_detached () at zephyr/tests/subsys/llext/src/test_llext.c:496
|
||||
#3 0x00001a36 in run_test_functions (suite=0x92bc <z_ztest_test_node_llext>, data=0x0 <cbvprintf_package>, test=0x92d8 <z_ztest_unit_test.llext.test_detached>) at zephyr/subsys/testsuite/ztest/src/ztest.c:328
|
||||
#4 test_cb (a=0x92bc <z_ztest_test_node_llext>, b=0x92d8 <z_ztest_unit_test.llext.test_detached>, c=0x0 <cbvprintf_package>) at zephyr/subsys/testsuite/ztest/src/ztest.c:662
|
||||
#5 0x00000e96 in z_thread_entry (entry=0x1a05 <test_cb>, p1=0x92bc <z_ztest_test_node_llext>, p2=0x92d8 <z_ztest_unit_test.llext.test_detached>, p3=0x0 <cbvprintf_package>) at zephyr/lib/os/thread_entry.c:48
|
||||
#6 0x00000000 in ?? ()
|
||||
(gdb) print bss_cnt
|
||||
$1 = 0
|
||||
(gdb) print data_cnt
|
||||
$2 = -3
|
||||
(gdb)
|
|
@ -16,6 +16,7 @@ and introspected to some degree, as well as unloaded when no longer needed.
|
|||
config
|
||||
build
|
||||
load
|
||||
debug
|
||||
api
|
||||
|
||||
.. note::
|
||||
|
|
|
@ -94,13 +94,7 @@ If any of this happens, the following tips may help understand the issue:
|
|||
the issue.
|
||||
|
||||
* Use a debugger to inspect the memory and registers to try to understand what
|
||||
is happening.
|
||||
|
||||
.. note::
|
||||
When using GDB, the ``add_symbol_file`` command may be used to load the
|
||||
debugging information and symbols from the ELF file. Make sure to specify
|
||||
the proper offset (usually the start of the ``.text`` section, reported
|
||||
as ``region 0`` in the debug logs.)
|
||||
is happening. See :ref:`Debugging extensions <llext_debug>` for more details.
|
||||
|
||||
If the issue persists, please open an issue in the GitHub repository, including
|
||||
all the above information.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue