diff --git a/arch/xtensa/core/README-WINDOWS.rst b/arch/xtensa/core/README-WINDOWS.rst new file mode 100644 index 00000000000..eaf9156e195 --- /dev/null +++ b/arch/xtensa/core/README-WINDOWS.rst @@ -0,0 +1,108 @@ +# How Xtensa register windows work + +There is a paucity of introductory material on this subject, and +Zephyr plays some tricks here that require understanding the base +layer. + +## Hardware + +When register windows are configured in the CPU, there are either 32 +or 64 "real" registers in hardware, with 16 visible at one time. +Registers are grouped and rotated in units of 4, so there are 8 or 16 +such "quads" (my term, not Tensilica's) in hardware of which 4 are +visible as A0-A15. + +The first quad (A0-A3) is pointed to by a special register called +WINDOWBASE. The register file is cyclic, so for example if NREGS==64 +and WINDOWBASE is 15, quads 15, 0, 1, and 2 will be visible as +(respectively) A0-A3, A4-A7, A8-A11, and A12-A15. + +There is a ROTW instruction that can be used to manually rotate the +window by a immediate number of quads that are added to WINDOWBASE. +Positive rotations "move" high registers into low registers +(i.e. after "ROTW 1" the register that used to be called A4 is now +A0). + +There are CALL4/CALL8/CALL12 instructions to effect rotated calls +which rotate registers upward (i.e. "hiding" low registers from the +callee) by 1, 2 or 3 quads. These do not rotate the window +themselves. Instead they place the rotation amount in two places +(yes, two; see below): the 2-bit CALLINC field of the PS register, and +the top two bits of the return address placed in A0. + +There is an ENTRY instruction that does the rotation. It adds CALLINC +to WINDOWBASE, at the same time copying the old (now hidden) stack +pointer in A1 into the "new" A1 in the rotated frame, subtracting an +immediate offset from it to make space for the new frame. + +There is a RETW instruction that undoes the rotation. It reads the +top two bits from the return address in A0 and subtracts that value +from WINDOWBASE before returning. This is why the CALLINC bits went +in two places. They have to be stored on the stack across potentially +many calls, so they need to be GPR data that lives in registers and +can be spilled. But ENTRY isn't specified to assume a particular +return value format and is used immediately, so it makes more sense +for it to use processor state instead. + +Note that we still don't know how to detect when the register file has +wrapped around and needs to be spilled or filled. To do this there is +a WINDOWSTART register used to detect which register quads are in use. +The name "start" is somewhat confusing, this is not a pointer. +WINDOWSTART stores a bitmask with one bit per hardware quad (so it's 8 +or 16 bits wide). The bit in windowstart corresponding to WINDOWBASE +will be set by the ENTRY instruction, and remain set after rotations +until cleared by a function return (by RETW, see below). Other bits +stay zero. So there is one set bit in WINDOWSTART corresponding to +each call frame that is live in hardware registers, and it will be +followed by 0, 1 or 2 zero bits that tell you how "big" (how many +quads of registers) that frame is. + +So the CPU executing RETW checks to make sure that the register quad +being brought into A0-A3 (i.e. the new WINDOWBASE) has a set bit +indicating it's valid. If it does not, the registers must have been +spilled and the CPU traps to an exception handler to fill them. + +Likewise, the processor can tell if a high register is "owned" by +another call by seeing if there is a one in WINDOWSTART between that +register's quad and WINDOWBASE. If there is, the CPU traps to a spill +handler to spill one frame. Note that a frame might be only four +registers, but it's possible to hit registers 12 out from WINDOWBASE, +so it's actually possible to trap again when the instruction restarts +to spill a second quad, and even a third time at maximum. + +Finally: note that hardware checks the two bits of WINDOWSTART after +the frame bit to detect how many quads are represented by the one +frame. So there are six separate exception handlers to spill/fill +1/2/3 quads of registers. + +## Software & ABI + +The advantage of the scheme above is that it allows the registers to +be spilled naturally into the stack by using the stack pointers +embedded in the register file. But the hardware design assumes and to +some extent enforces a fairly complicated stack layout to make that +work: + +The spill area for a single frame's A0-A3 registers is not in its own +stack frame. It lies in the 16 bytes below its CALLEE's stack +pointer. This is so that the callee (and exception handlers invoked +on its behalf) can see its caller's potentially-spilled stack pointer +register (A1) on the stack and be able to walk back up on return. +Other architectures do this too by e.g. pushing the incoming stack +pointer onto the stack as a standard "frame pointer" defined in the +platform ABI. Xtensa wraps this together with the natural spill area +for register windows. + +By convention spill regions always store the lowest numbered register +in the lowest address. + +The spill area for a frame's A4-A11 registers may or may not exist +depending on whether the call was made with CALL8/CALL12. It is legal +to write a function using only A0-A3 and CALL4 calls and ignore higher +registers. But if those 0-2 register quads are in use, they appear at +the top of the stack frame, immediately below the parent call's A0-A3 +spill area. + +There is no spill area for A12-A15. Those registers are always +caller-save. When using CALLn, you always need to overlap 4 registers +to provide arguments and take a return value.