Reorganize the initialization code to cleanly separate the platforms
and clarify which code is common. The #if'ery was sort of a mess.
This is in preparation for an incoming patch that unifies the shim
register definitions across platform variants.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Misc cleanup, no non-trivial logic changes.
Swap in new ("rsr <reg>, REGISTER_NAME") syntax for Xtensa SR's in
place of inconsistent usage of the older one ("rsr.REGISTER_NAME
<reg>").
Remove the legacy handling of !KERNEL_COHERENCE cases for allocating
the cpu start record. That has long been a requirement of
multiprocessor code on this platform.
Remove the synchronous testing of the "alive" flag in
arch_start_cpu(). Nothign about that API is intended to be
synchronous, and in fact the Zephyr SMP layer is already doing the
same trick.
Remove some vestigial dead code at the end of z_mp_entry(). It was
apparently intended to handle the case where a CPU function returned,
but that's not legal anyway. And it was only enabled in the case
where there was only one CPU anyway, which was an impossible situation
(you can't evercall arch_start_cpu() successfully on a system with
only one core, for obvious reasons -- the only core is already
running!). Replace with an assertion.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The cAVS bootloader code (for... somewhat silly reasons) doesn't build
via the same paths as regular Zephyr object files, so it wasn't
getting the _ASMLANGUAGE define. That meant that Zephyr headers
defining BIT() were using syntax incompatible with some assemblers
(specifically the Cadence xcc assembly; current gas versions were
fine).
Not 100% sure this is the best spot to put this, but the root fix is
to get the bootloader building into the same link as the rest of
Zephyr anyway.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The mismatch between the slot number and the sequence ("id") made me
suspect a bug for too long. Fix one related comment and add two more. No
code change.
Signed-off-by: Marc Herbert <marc.herbert@intel.com>
When built with XCC for Apollolake, Zephyr fails to boot with the
default multi-core option enabled. Invalidate cache before reading
the firmware image to fix that.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
This feature got written twice for two different purposes (to inform
the SOF app of which CPUs are running, and to predicate the delivery
of IPIs to the cores ready to receive the interrupt). Use only one.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
On pre-2.5 cAVS, the initial IDC interrupt to start the other core is
handled by software in the firmware ROM. That means that it has to be
unmasked for the mechanism to work (with 2.5, the interrupt is handled
by hardware regardless of what the masking state in the interrupt
controller is).
Similarly, the Xtensa Region Protection Option entries have already
been set by ROM code when we arrive in enable_l1_cache(), so we can
skip that part on older machines. Also removed because trying to
rewrite those entries was causing inexplicable hangs on cAVS 1.5,
plausibly because the region had active cache lines.
(This patch is separate for easier review in a long evolving PR.
Technically it represents a bisection problem as the "New IDC Driver"
patch before this was a regression. Seems like a safe enough thing to
handle if you land on this.)
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Fix various bugs with the new IDC layer that show up in edge cases
where code relies on correct timing of IPIs (unsurprisingly there is a
lot of code that recovers anyway even if the IPI doesn't arrive
promptly). Leaving this as a separate patch because the prior code in
the PR has already been reviewed and it "mostly" worked:
The unmasking of the L2 interrupt bit (remember there are three layers
of masking of the IDC interrupt) was always operating on CPU0 at CPU
startup because the code had been copied blindly. Unmask the CPU
we're actually launching. It turns out cAVS 2.x re-masks this on CPU
launch automatically.
The global init code to unmask all these interrupts at startup had the
same bug, even though it turned out to be needless (the initialization
state has it unmasked until it turns it back off). Do it right
anyway. Similarly add code to clear out existing interrupt latch
state by ACKing all IDC interrupts at startup. Seems needless, but
behavior isn't documented so let's be safe.
Flag CPU0 as always "active" for the purposes of IPIs. Forgot to do
this earlier, oops.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Add a SOC API to allow for application control over deep idle power
states. Note that the hardware idle entry happens out of the WAITI
instruction, so the application has to be responsibile for ensuring
the CPU to be halted actually reaches idle deterministically. Lots of
warnings in the docs to this effect.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
There is a hardware startup state where power gating can be "enabled"
even though the core is actually launchable via an IDC interrupt (in
fact that's the hardware default). In that state, the CPU will launch
correctly but then unexpectedly shut itself off then it enters the
idle thread.
Don't rely on initialization state, always set the power and clock
gating bits (to disable gating) immediately before CPU launch.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Add a struct-based interrupt masking API to match the existing shim
and IDC register interfaces. The existing interrupt controller code
isn't using it yet.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
On cAVS 2.5, there is an inherent race with the IDC interrupt. It's
used for routine IPIs during OS operation, but also for launching a
power-gated core. Recent changes moved the unmasking of the IDC
interrupt earlier, which made it possible for early OS scheduler
behavior (e.g. adding the main thread to the run queue) to
accidentally launch the other cores into LP-SRAM that had not been
initialized.
Instead of treating this with initialization ordering, keep and
maintain a list of active CPUs and check them at runtime to be sure we
never try to IPI a CPU that isn't running yet. We're going to need
this feature when we add live core offlining anyway.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The original interface for the intra-DSP communication hardware on
these devices was buried inside a Zephyr IPM implementation.
Unfortunately IPM is a two-endpoint point-to-point communication
layer, it can't represent the idea of devices with more than 2 cores.
And our usage (to push a no-argument/no-response scheduler IPI) was
sort of an abuse of that metaphor anyway.
Add a new IDC interface at the SOC layer, borrowing the C struct
convention already used for the DSP shim registers.
Augment with extensive documentation, extracted via a ton of
experimentation on cAVS 2.5 hardware.
Note that this leaves the previous driver in place for the cavs_v15
and intel_s1000 devices. In principle they should use it too (the
hardware registers are identical), but this hasn't been validated yet.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Earlier platforms were relying on the system ROM to have done this
correctly, but with CAVS 2.5 we launch the CPU into our own code
directly. So we need to do those steps manually. And there's also a
new one on this hardware, which has software power control over the
cache SRAM.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Hardware defaults for the secondary CPUs have the S32C1I instruction
set to be atomic only with respect to the local L1 cache, which is
basically useless on a multiprocessor platform. The CPU0 boot path
sets this manually, so we need to duplicate that here.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
On MP cores that don't come through the core entry point
(e.g. TGL/v2.5) we reach C code with hardware defaults for the RPO/TLB
settings. Set these up correctly on entry.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This hardware works a little differently. The cores will start up
immediately on receipt of an IDC interrupt (they don't need the host
to be involved), but they don't have a ROM. They start executing at
the start of the LP-SRAM block always. Copy over a tiny trampoline
for them that jumps to the existing multiprocessor startup path.
Also set the PS WOE bit to enable register windows in the startup
path. This isn't the hardware default, and where the ROM would do
that for us before here we need to make sure it's on.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This aligns the SoC initialization with the one in SOF,
especially the manipulation of clock control and power control
registers. These registers are not entirely the same across
CAVS versions, so we need to deal with them according to
which version we are building for. This also consolidates
the macros for these registers to the one provided by SOF
(soc/shim.h) to avoid duplication. Another note is that
the usage of clock gating bit was not correct. In SOF,
clock gating of SoC cores should be allowed but the old code
in Zephyr prevented clock gating, which has the potential to
prevent the whole DSP from going into low power mode.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The wall clock timer is not (per documentation) part of the
"timestamping" register set on the DSP. And its counter and
comparator registers work fine always. But if the DSP isn't set as
the "owner" of the timestamp hardware, wall clock interrupts never
arrive.
Also grab the PLL ownership too, because SOF already does anyway.
While we don't have a dynamic clock driver yet, we will surely want
one soon and will needt this.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
These windows control host visibility of the trace output buffer. The
buffer itself is writable memory always, but until we get to the
register init the host can't see them. Since they contain
printk/logging output, they REALLY need to be initialized earlier than
anything else.
Also remove a rogue memset of the trace buffer. That buffer is
already being initialized in a lazy-evaluated way by the trace output
code, and blowing it away here has the effect of forgetting anything
earlier code was trying to log!
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
XCC's own xt-objcopy does not recognize the "--dump-section"
command. So simply copy the objecy file into binary so it can
be included in the final bootloader binary.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
With SOF secondary cores are booted later at run-time instead
of the traditional simultaneous booting of all the cores.
Adjust arch_start_cpu() to make that possible.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
To disable the IDC interrupt on the interrupt controller a bit
must be set in the MSD register instead of clearing the bit in
the MCD register, which has no effect.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
irq_enable() should be called with the composite IRQ code as its
argument, not just the Xtensa proper part of it.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Turns out that the user can configure the "zephyr.elf" name via
kconfig to be "something_else.elf" instead. And there's a test the
does this. Use the right variable; don't hardcode.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The Xtensa L1 cache layer has straightforward semantics accessible via
single-instructions that operate on cache lines via physical
addresses. These are very amenable to inlining.
Unfortunately the Xtensa HAL layer requires function calls to do this,
leading to significant code waste at the calling site, an extra frame
on the stack and needless runtime instructions for situations where
the call is over a constant region that could elide the loop. This is
made even worse because the HAL library is not built with
-ffunction-sections, so pulling in even one of these tiny cache
functions has the effect of importing a 1500-byte object file into the
link!
Add our own tiny cache layer to include/arch/xtensa/cache.h and use
that instead.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Instead of passing the crt1 _start function as the entry code for
auxiliary CPUs, use a tiny assembly stub instead which can avoid the
runtime testing needed to skip the work in _start. All the crt1 code
was doing was clearing BSS (which must not happen on a second CPU) and
setting the stack pointer (which is wrong on the second CPU).
This allows us to clean out the SMP code in crt1.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The kernel passes the CPU's interrupt stack expected that it will
start on that, so do it. Pass the initial stack pointer from the SOC
layer in the variable "z_mp_stack_top" and set it in the assembly
startup before calling z_mp_entry().
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
There's no need to muck with the cache directly as long as we're
careful about addressing the shared start record through an uncached
volatile pointer.
Correct a theoretical bug with the initial cache invalidate on the
second CPU which was actually doing a flush (and thus potentially
pushing things the boot ROM wrote into RAM now owned by the OS).
Optimize memory layout a bit when using KERNEL_COHERENCE; we don't
need a full cache line for the start record there as it's already in
uncached memory.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The multiprocessor entry code here had some bits that look to have
been copied from esp32, including a clumsy stack switch that's needed
there. But it wasn't actually switching the stack at all, which on
this device is pointed at the top of HP-SRAM and can stay there until
the second CPU swaps away into a real thread (this will need to change
once we support >2 CPUS though).
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The trace output layer was using this transformation already, make it
an official API. There are other places doing similar logic that can
benefit.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
There was a bunch of dead historical cruft floating around in the
arch/xtensa tree, left over from older code versions. It's time to do
a cleanup pass. This is entirely refactoring and size optimization,
no behavior changes on any in-tree devices should be present.
Among the more notable changes:
+ xtensa_context.h offered an elaborate API to deal with a stack frame
and context layout that we no longer use.
+ xtensa_rtos.h was entirely dead code
+ xtensa_timer.h was a parallel abstraction layer implementing in the
architecture layer what we're already doing in our timer driver.
+ The architecture thread structs (_callee_saved and _thread_arch)
aren't used by current code, and had dead fields that were removed.
Unfortunately for standards compliance and C++ compatibility it's
not possible to leave an empty struct here, so they have a single
byte field.
+ xtensa_api.h was really just some interrupt management inlines used
by irq.h, so fold that code into the outer header.
+ Remove the stale assembly offsets. This architecture doesn't use
that facility.
All told, more than a thousand lines have been removed. Not bad.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Fix kernel.common and kernel.common test cases fail due to build error
in platform intel_adsp_cavs15, 18, 20, 25.
Signed-off-by: Enjia Mai <enjiax.mai@intel.com>
Currently Zephyr links reset-vector.S twice in xtensa builds:
into the bootloader and the main image. It is run at the end
of the boot loader execution and immediately after that again
in the beginning of the main code. This patch adds a
configuration option to select whether to link the file to the
bootloader or to the application. The default is to the
application, as needed e.g. for QEMU, SOF links it to the
bootloader like in native builds.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
On cAVS 1.8, 2.0 and 2.5 LSPGISTS and LSPGCTL are located in a
different shim register range, they cannot be accessed, using the
usual SHIM_BASE offset.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
A configuration with CONFIG_MP_NUM_CPUS > 1 and CONFIG_IPM_CAVS_IDC not
defined is valid if COMFIG_SMP is disabled.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Shim register location on cAVS 1.5 is different than on 1.8 and up,
fix it.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
The current unused memory calculation is broken because it doesn't
take into account the stack area, allocated at the top of HP SRAM.
Until this is fixed disable powering down unused RAM.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
1. SOF doesn't have to be built in .bin format
2. don't include soc.c and soc_mp.c twice in cmake
3. remove an unused mailbox.h header
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
The existing implementation of the adsplog.py script worked fine for
individual runs (e.g. when running specific code) but had no support
for detecting system reset events and thus could not be used for
monitoring applications like test automation. It also could not
handle the case where a rapid log burst would overflow the buffer
before being noticed at the client. Also, the protocol here was also
rife with opportunities for race conditions. Fix all that up via what
is mostly a rewrite of the script. The protocol itself hasn't
changed, just the handling.
Also includes some changes to the trace_out.c code on the device side.
These are required to get ordering correct to make race conditions
tractably handleable on the reader side.
Some of the specific cases that are managed:
* There is a 0.4s backoff when a reset is detected. Continuing to
poll the buffer has been observed to hang the device (I'm fairly
sure this is actually a hardware bug, reads aren't visible to the
DSP software).
* The "no magic number" case needs to be reserved for detecting system
reset.
* Slot data must be read BETWEEN two reads of the ID value to detect
the case where the slot gets clobbered while being read.
* The "currently being filled" slot needs to always have an ID value
that does not appear in sequence from the prior slot.
* We need to check the full history in the buffer at each poll to
detect resets, which opens up a race between the read of the "next
slot" (which is absent) and the full history retrieval (when it can
now be present!). Detect that.
* A null termination bug in the current output slot got fixed.
Broadly: this was a huge bear to make work. It sounds like this
should be a simple protocol, but it's not in practice.
Also: clean up the error reporting in the script so it can handle new
PCI IDs being added, and reports permissions failures on the required
sysfs file as a human-readable error.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The fix_elf_addrs script runs objcopy over a binary that (due to some
legacy section definitions in a mildly complicated linker file) has a
few zero-length sections at address zero. Objcopy considers this a
warning condition (though oddly the linker from the same version of
binutils which produced that binary does not!), which will be detected
as a CI failure.
Just eat the warnings. Long term we should rework linkage to remove
the legacy stuff that is getting tripped over.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This platform had separate backends for the log subsystem and printk
handler, which was silly. Unify them to use the same backend so they
don't clobber each other.
This patch appears to be a lot of lines, but it's really mostly code
motion and renaming.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>