diff --git a/arch/xtensa/core/README-WINDOWS.rst b/arch/xtensa/core/README-WINDOWS.rst
new file mode 100644
index 00000000000..eaf9156e195
--- /dev/null
+++ b/arch/xtensa/core/README-WINDOWS.rst
@@ -0,0 +1,108 @@
+# How Xtensa register windows work
+
+There is a paucity of introductory material on this subject, and
+Zephyr plays some tricks here that require understanding the base
+layer.
+
+## Hardware
+
+When register windows are configured in the CPU, there are either 32
+or 64 "real" registers in hardware, with 16 visible at one time.
+Registers are grouped and rotated in units of 4, so there are 8 or 16
+such "quads" (my term, not Tensilica's) in hardware of which 4 are
+visible as A0-A15.
+
+The first quad (A0-A3) is pointed to by a special register called
+WINDOWBASE.  The register file is cyclic, so for example if NREGS==64
+and WINDOWBASE is 15, quads 15, 0, 1, and 2 will be visible as
+(respectively) A0-A3, A4-A7, A8-A11, and A12-A15.
+
+There is a ROTW instruction that can be used to manually rotate the
+window by a immediate number of quads that are added to WINDOWBASE.
+Positive rotations "move" high registers into low registers
+(i.e. after "ROTW 1" the register that used to be called A4 is now
+A0).
+
+There are CALL4/CALL8/CALL12 instructions to effect rotated calls
+which rotate registers upward (i.e. "hiding" low registers from the
+callee) by 1, 2 or 3 quads.  These do not rotate the window
+themselves.  Instead they place the rotation amount in two places
+(yes, two; see below): the 2-bit CALLINC field of the PS register, and
+the top two bits of the return address placed in A0.
+
+There is an ENTRY instruction that does the rotation.  It adds CALLINC
+to WINDOWBASE, at the same time copying the old (now hidden) stack
+pointer in A1 into the "new" A1 in the rotated frame, subtracting an
+immediate offset from it to make space for the new frame.
+
+There is a RETW instruction that undoes the rotation.  It reads the
+top two bits from the return address in A0 and subtracts that value
+from WINDOWBASE before returning.  This is why the CALLINC bits went
+in two places.  They have to be stored on the stack across potentially
+many calls, so they need to be GPR data that lives in registers and
+can be spilled.  But ENTRY isn't specified to assume a particular
+return value format and is used immediately, so it makes more sense
+for it to use processor state instead.
+
+Note that we still don't know how to detect when the register file has
+wrapped around and needs to be spilled or filled.  To do this there is
+a WINDOWSTART register used to detect which register quads are in use.
+The name "start" is somewhat confusing, this is not a pointer.
+WINDOWSTART stores a bitmask with one bit per hardware quad (so it's 8
+or 16 bits wide).  The bit in windowstart corresponding to WINDOWBASE
+will be set by the ENTRY instruction, and remain set after rotations
+until cleared by a function return (by RETW, see below).  Other bits
+stay zero.  So there is one set bit in WINDOWSTART corresponding to
+each call frame that is live in hardware registers, and it will be
+followed by 0, 1 or 2 zero bits that tell you how "big" (how many
+quads of registers) that frame is.
+
+So the CPU executing RETW checks to make sure that the register quad
+being brought into A0-A3 (i.e. the new WINDOWBASE) has a set bit
+indicating it's valid. If it does not, the registers must have been
+spilled and the CPU traps to an exception handler to fill them.
+
+Likewise, the processor can tell if a high register is "owned" by
+another call by seeing if there is a one in WINDOWSTART between that
+register's quad and WINDOWBASE.  If there is, the CPU traps to a spill
+handler to spill one frame.  Note that a frame might be only four
+registers, but it's possible to hit registers 12 out from WINDOWBASE,
+so it's actually possible to trap again when the instruction restarts
+to spill a second quad, and even a third time at maximum.
+
+Finally: note that hardware checks the two bits of WINDOWSTART after
+the frame bit to detect how many quads are represented by the one
+frame.  So there are six separate exception handlers to spill/fill
+1/2/3 quads of registers.
+
+## Software & ABI
+
+The advantage of the scheme above is that it allows the registers to
+be spilled naturally into the stack by using the stack pointers
+embedded in the register file.  But the hardware design assumes and to
+some extent enforces a fairly complicated stack layout to make that
+work:
+
+The spill area for a single frame's A0-A3 registers is not in its own
+stack frame.  It lies in the 16 bytes below its CALLEE's stack
+pointer.  This is so that the callee (and exception handlers invoked
+on its behalf) can see its caller's potentially-spilled stack pointer
+register (A1) on the stack and be able to walk back up on return.
+Other architectures do this too by e.g. pushing the incoming stack
+pointer onto the stack as a standard "frame pointer" defined in the
+platform ABI.  Xtensa wraps this together with the natural spill area
+for register windows.
+
+By convention spill regions always store the lowest numbered register
+in the lowest address.
+
+The spill area for a frame's A4-A11 registers may or may not exist
+depending on whether the call was made with CALL8/CALL12.  It is legal
+to write a function using only A0-A3 and CALL4 calls and ignore higher
+registers.  But if those 0-2 register quads are in use, they appear at
+the top of the stack frame, immediately below the parent call's A0-A3
+spill area.
+
+There is no spill area for A12-A15.  Those registers are always
+caller-save.  When using CALLn, you always need to overlap 4 registers
+to provide arguments and take a return value.