fxos: cleaner function abstraction + analysis of delay slots

* Update documentation about functions API * Clean up rough edges, including instruction iterators, basic block ending types, and calls that were misleading about the structure * Fix the static analysis code not accounting for delay slots * Improve/enrich the program diff structure
2023-12-26 14:49:23 +01:00 · 2023-12-26 14:49:23 +01:00 · eacdf9da99
parent 944745d0e3
commit eacdf9da99
6 changed files with 555 additions and 230 deletions
--- a/doc/functions.md
+++ b/doc/functions.md
@ -24,61 +24,132 @@ for(BasicBlock &bb: function) {
 }
 ```

-Blocks are numbered from 0 to `.blockCount()` and can be accessed individually with `.basicBlockByIndex()`. The function's entry block can be found with `.entryBlock()`.
+Blocks are numbered from 0 to `.blockCount()` and can be accessed individually with `.basicBlockByIndex()` or, when you know the address, `.basicBlockByAddress()`. The function's entry block index and the block itself can be found with `.entryBlockIndex()` and `.entryBlock()`.

 #### The `BasicBlock` structure

 The `BasicBlock` structure represents a node in the CFG. It always exists in the context of a function, which can be found with `.parentFunction()`. The binary that owns the function is also available as `.parentBinary()`.

-The block has its own `.address()` and `.instructionCount()`. Its main attraction is the list of instructions that it contains, which can be iterated over with `.begin()`/`.end()` or in reverse order with `.rbegin()`/`.rend()`:
+The block has its own `.address()` which is the address of its first instruction, and it knows its own block number within the parent function, which is available as `.blockIndex()`.
+
+##### Accessing instructions
+
+Blocks have a `.instructionCount()` and the instructions can be iterated over in increasing address order with the view `.instructionsInAddressOrder()`:

 ```cpp
-for(Instruction &insn: bb) {
-    /* ... */
-}
-for(auto it = bb.rbegin(); it != bb.rend(); it++) {
-    Instruction &insn = *it;
-    /* ... */
+/* random_access_range */
+for(Instruction &insn: bb.instructionsInAddressOrder()) {
+    /* ... eg. mov #0, r0; rts; nop */
 }
 ```

 Individual instructions can also be found with `.instructionAtIndex()`.

-Basic blocks usually end with a jump instruction; however, in some cases the next block follows the current one in memory, so there is no "jump"; control just keeps going forward. For instance, in an `if/else` statement the `.false` block might fall through to whatever code follows the condition:
+A major subtlety of the SuperH ISA is that most branch and function call instructions have delay slots, meaning that the CPU fills in a pipeline bubble by executing the instruction following the branch/call while it's fetching the first few instructions at the branch target. **This makes jumps-and-delay-slot pairs unique instructions** that are not equivalent to any two-instruction sequence.

-```
-         ╒═══════════╕
-         │ condition │     true
-         │ bf .false │───────────────╮
-         ╘═══════════╛         ╒═══════════╕
-         false │        .true: │ ...       │
-               ↓               │ bra .end  │
-         ╒═══════════╕         ╘═══════════╛
- .false: │ ...       │               │
-         ╘═══════════╛               │
-  fall-through ↓                     │
-         ╒═══════════╕               │
-   .end: │ ...       │←──────────────╯
-         ╘═══════════╛
+The process of executing a jump-and-delay-slot pair is as follows:
+
+1. Compute the branch target using the initial state.
+2. Set PC to that target (and start fetching the code there).
+3. Run the delay slot instruction.
+
+This means that the delay slot instructions runs "before" the jump but does not affect the jump target. It also runs with the target PC, so PC-relative instructions don't behave in the natural way. (This is not much of a problem in practice because most are illegal as delay slots anyway and I don't think any compiler abuses this for optimization.)
+
+Instructions that possess delay slots can be identified with `.opcode().hasDelaySlot()` (see `Instruction` below). Additionally, the basic block provides `.instructionsAndDelaySlots()` which iterates over instructions as pairs. Instructions without a delay slot are returned as `{ins, nullptr}`, while instructions with delay slots are returned as `{ins, &delaySlotIns}`. As a result, pairs are returned in execution order; the user simply has to handle `delaySlotIns` before `ins` when the former is not null.
+
+```cpp
+/* input_range */
+for(auto [ins, delaySlotIns]: bb.instructionsAndDelaySlots()) {
+    // ins : Instruction &
+    // delaySlotIns : Instruction *
+    /* ... eg. {(mov #0, r0), nullptr}; {rts, nop} */
+}
 ```

-In this case `.false` has no jump. (It still wouldn't be possible to merge `.false` and `.end` because then `.true` would jump into the _middle_ of a block, which is forbidden.)
+TODO: Other iteration methods

-The function `.hasFallthrough()` will indicate whether the block falls through. If it doesn't, `.terminatorInstruction()` will return a pointer to the jump that terminates it (if it does, `.terminatorInstruction()` will return `nullptr`).
+##### Block endings

-A major detail of the SuperH ISA is that most branch instructions have delay slots. This means that even though a basic block conceptually ends when a jump is executed, typically the instruction following the jump instruction (which the CPU executes on-the-fly during the jump) is also part of the block. Hence, the block terminator is either the last or the second-to-last instruction in the block. The function `.hasDelaySlot()` will indicate whether the block has a delay slot.
+Basic blocks can end either explicitly because of a general jump/branch/return instruction, or implicitly by falling through to the next block. As an illustration of both, consider the following CFG for an if/else statement:

-Navigation in the CFG can be done by querying the block's `.successors()` and `.predecessors()` (both functions return read-only vectors of pointers to other blocks). Additional, hopefully self-explaining information, is available through `.successorCount()`, `.predecessorCount()`, `.isEntryBlock()` and `.isTerminator()`.
+```
+         ╒═══════════════╕
+ .entry: │ cmp/eq r4, r5 │     true
+         │ bf .false     │─────────────────╮
+         ╘═══════════════╛         ╒═══════════════╕
+           false │          .true: │ mov #4, r0    │
+                 ↓                 │ bra .end      │
+         ╒═══════════════╕         ╘═══════════════╛
+ .false: │ mov #7, r0    │                │
+         ╘═══════════════╛                │
+     fallthrough ↓                        │
+         ╒═══════════════╕                │
+   .end: │ rts           │←───────────────╯
+         │ nop           │
+         ╘═══════════════╛
+```
+
+The `.entry` block is terminated by a `bf` and the `.true` block is terminated by a `bra`, so both end explicitly. By contrast, the `.false` block doesn't have a terminator; when executed, it merely flows into (“falls through” to) `.end`. It is not possible to merge `.false` and `.end` into a single block because the jump at the end of `.true` jumps to `.end`, and in a CFG jump targets must be at the beginning of blocks. Note that `.entry` might also fall through to `.true` when `r4 != r5`, which shows that blocks with conditional terminators can end in multiple ways.
+
+The function `.terminatorInstruction()` returns a pointer to the terminator of a block, `nullptr` if there is none; the latter case is also indicated by the function `.hasNoTerminator()`.
+
+When accounting for the nature of the terminator instructions, after a basic block ends one of four things can happen:
+
+1. Static branch: branch to another block in the same function, whose address is statically known.
+   - `.mayStaticBranch()` and `.mustStaticBranch()` indicate this ending
+   - `.staticBranchTarget()` returns the address of the next block
+2. Fallthrough to the next block.
+   - `.mayFallthrough()` and `.mustFallthrough()` indicate this ending
+   - `.fallthroughTarget()` returns the address of the next block
+3. Return from the function.
+   - `.mustReturn()` indicates this ending (there are no conditional returns)
+4. Perform a tail call and jump somewhere else dynamic in another function.
+   - `.mustDynamicBranch()` indicates this ending (there are no conditional dynamic branches)
+   - This also covers, as a default, things like jump tables where we don't know if we leave the function or not. This is due to imperfect function reconstruction.
+
+Depending on the terminator instruction (or absence thereof), one or more endings are possible for a block. The table below lists all types of terminators and how they are identified in terms of `AsmInstruction`.
+
+| Block ending         | Corresponding terminators  | Terminator identification | `.mayStaticBranch()` | `.mayFallthrough()` | `.mustReturn()`   | `.mustDynamicBranch()` |
+| -------------------- | -------------------------- | ------------------------- | -------------------- | ------------------- | ----------------- | ---------------------- |
+| Unconditional branch | `bra`                      | `.isUnconditionalJump()`  | `true`               | `false`             | `false`           | `false`                |
+| Conditional branch   | `bf.s`, `bt.s`, `bf`, `bt` | `.isConditionalJump()`    | `true`               | `true`              | `false`           | `false`                |
+| Tail call            | `jmp @rn`, `braf rm`       | `.isDynamicJump()`        | `false`              | `false`             | `true`            | `true`                 |
+| Function return      | `rte`, `rts`               | `.isReturn()`             | `false`              | `false`             | `false`           | `false`                |
+| Fallthrough          | No terminator              | No terminator             | `false`              | `true`              | `false`           | `false`                |
+
+`.mustStaticBranch()` and `.mustFallthrough()` return true for a given basic block if the only may-function which returns true for that block is `.mayStaticBranch()` and `.mayFallthrough()`, respectively.
+
+##### Accessing related blocks in the CFG
+
+Navigation in the CFG mostly consists in querying the block's successors (ie. which blocks this block can statically branch to) and its predecessors (which blocks statically branch to it). The number of successors and predecessors can be found with `.successorCount()` and `.predecessorCount()`, and the blocks themselves can be obtained with a few different views:
+
+```cpp
+/* Get blocks by reference: .successors(), .predecessors() */
+for(BasicBlock &succ: bb.successors()) {
+    /* ... */
+}
+/* Get their addresses: .successorsByAddress(), .predecessorsByAddress() */
+for(u32 address: bb.successorsByAddress()) {
+    /* ... */
+}
+/* Get their indices within the shared parent function: .successorsByIndex(),
+   .predecessorsByIndex() */
+for(uint index: bb.successorsByIndex()) {
+    /* ... */
+}
+```
+
+The methods `.isEntryBlock()` and `.isTerminator()` identify the first block in the function and all blocks that might return (the second is equivalent to `.mustReturn()`).

 ### The `Instruction` structure

 The `Instruction` structure represents a single instruction, within the context of a function. The basic block, function and binary owning it can be queried with `.parentBlock()`, `.parentFunction()` and `.parentBinary()`.

-This structure is instantiated in RAM for every single instruction registered as part of a function (an order of magnitude is several millions for a standard OS binary) so this structure keeps a minimal number of attributes. In particular, analysis results are not stored here, and instead queried from the binary as annotations.
+This structure is instantiated in RAM for every single instruction registered as part of a function (an order of magnitude is several millions for a standard OS binary) so this structure keeps a minimal number of attributes. In particular, analysis results are not stored here; they are stored at the binary/function level instead (as appropriate) using compact formats when needed.

 The instruction's opcode can be accessed with `.opcode()`, and this can be used to check if the instruction is a branch, a memory access, what its operands are, etc. `Instruction` only tracks the context and analysis results. Along with the opcode, `.size()` will give the instruction's size in bytes (which is usually 2 but can be 4 for DSP instructions).

-The instruction has its own `.address()` and its relationship to other instructions in its block can be found with `.indexInBlock()`, `.isFirstInBlock()`, `.isLastInBlock()` and `.isInDelaySlot()`. Note that again, due to delay slots, being last and being a jump are not the same thing. However, since only jumps have delay slots and jumps are always block terminators, being in a delay slot does imply being the last instruction in a block.
+The instruction has its own `.address()` and its relationship to other instructions in its block can be found with `.indexInBlock()`, `.isFirstInBlock()`, `.isLastInBlock()` and `.isInDelaySlot()`. Note that again, due to delay slots, being last and being a jump are not the same thing.

 ### Function analysis

@ -87,3 +158,5 @@ TODO:
 - References
 - Cross-references
 - Dominators and post-dominators
+
+TODO: Abstract interpretation info
--- a/include/fxos/analysis.h
+++ b/include/fxos/analysis.h
@ -41,6 +41,8 @@ struct ProgramState
    void setFunctionInit();
    /* Set to initial non-entry-block state at entry of function (all bot). */
    void setBottom();
+    /* Set to completely unknown. */
+    void setTop();
    /* Apply a diff. */
    void applyDiff(ProgramStateDiff const &diff);

@ -54,56 +56,123 @@ private:
    RelConst m_regs[16];
 };

-/* Change in program state over a single (contextually known) instruction. */
+/* Change in program state. This describes the effect of either:
+   - A single instruction;
+   - A branch combined with its delay slot;
+   - A branch combined with a no-op.
+   The single instruction or delay slot is referred to as the "base
+   instruction".  */
 struct ProgramStateDiff
 {
-    enum class Target : int { None = -1, Unknown = -2, CallStandard = -3 };
+    enum class BaseType : u8 {
+        /* Instruction can do anything, makes the entire state top. */
+        Anything,
+        /* Instruction has no effect on tracked state. */
+        NoOp,
+        /* Instruction updates a register. */
+        RegisterUpdate,
+    };
+    enum class BranchType : u8 {
+        /* No branch. */
+        None,
+        /* Branch to a different PC with no other effect on state. */
+        Branch,
+        /* Branch to SPC and restore SR to SSR. */
+        Rte,
+        /* Branch is a function call with standard calling convention. */
+        CallStandard,
+    };

-    /* Number of the register that changes, or Target::*. */
-    int target() const
+    /* Type of effect for the base instruction. */
+    BaseType baseType() const
    {
-        return m_target;
+        return m_baseType;
    }
-    /* New value for that register. */
-    RelConst value() const
+    /* Type of effect for the branch (None if there's no branch). */
+    BranchType branchType() const
    {
+        return m_branchType;
+    }
+
+    /* For BaseType::RegisterUpdate, which register is targeted. */
+    CpuRegister registerName() const
+    {
+        assert(baseType() == BaseType::RegisterUpdate && "wrong accessor");
+        return m_register;
+    }
+    /* For BaseType::RegisterUpdate, new value of the register. */
+    RelConst registerValue() const
+    {
+        assert(baseType() == BaseType::RegisterUpdate && "wrong accessor");
        return m_value;
    }

-    // TODO: Needs way more flexibility
+    /*** Pseudo-constructors ***/

-    /* Set the diff to changing register rn to new value v. */
-    void setRegisterUpdate(int n, RelConst v)
+    /* Clear the base diff. */
+    void clearBase()
    {
-        m_target = n;
+        m_baseType = BaseType::NoOp;
+    }
+    /* Set the base diff to changing register r to new value v. */
+    void setRegisterUpdate(CpuRegister reg, RelConst v)
+    {
+        m_baseType = BaseType::RegisterUpdate;
+        m_register = reg;
        m_value = v;
    }
-    /* Set the diff to changing register rn to an unknown value. */
-    void setRegisterTouched(int n)
+    /* Set the base diff to changing register r to an unknown value. */
+    void setRegisterTouched(CpuRegister reg)
    {
-        setRegisterUpdate(n, RelConstDomain().top());
+        setRegisterUpdate(reg, RelConstDomain().top());
    }
-    /* Set the diff to changing no register state. */
-    void setNoop()
+    /* Set the base diff to changing no register state. */
+    void setNoOp()
    {
-        m_target = static_cast<int>(Target::None);
+        m_baseType = BaseType::NoOp;
    }
-    /* Set the diff to modifyin register states as allowed by the standard
-       function calling convention. */
+    /* Set the dif to an unknown effect. */
+    void setAnything()
+    {
+        m_baseType = BaseType::Anything;
+    }
+
+    /* Clear the branch diff. */
+    void clearBranch()
+    {
+        m_branchType = BranchType::None;
+    }
+    /* Set the branch diff to a normal branch. */
+    void setBranch()
+    {
+        m_branchType = BranchType::Branch;
+    }
+    /* Set the branch diff to the rte instruction. */
+    void setRte()
+    {
+        m_branchType = BranchType::Rte;
+    }
+    /* Set the branch diff to a call with standard calling convention. */
    void setCallStandard()
    {
-        m_target = static_cast<int>(Target::CallStandard);
-    }
-    /* Set the diff to unknown effect on registers. */
-    void setUnknown()
-    {
-        m_target = static_cast<int>(Target::Unknown);
+        m_branchType = BranchType::CallStandard;
    }

-    std::string str() const;
+    /* Merge a branch diff with a delay slot diff. The bound instance should be
+       the branch instruction diff and it can't have a base. The other one
+       should not have a branch. */
+    void mergeWithDelaySlot(ProgramStateDiff const &delaySlotDiff);
+
+    std::string baseStr() const;
+    std::string branchStr() const;
+    /* String representation. If optional is true, returns an empty string for
+       no-op diffs. Otherwise always returns an unambiguous string. */
+    std::string str(bool optional = false) const;

 private:
-    int m_target;
+    BaseType m_baseType = BaseType::NoOp;
+    BranchType m_branchType = BranchType::None;
+    CpuRegister m_register;
    RelConst m_value;
 };

@ -122,7 +191,7 @@ struct StaticFunctionAnalysis

 /* Analyze a function; returns analysis results if successful, a null pointer
   on error. Does not store the results in f itself. */
-std::unique_ptr<StaticFunctionAnalysis> analyzeFunction(Function const &f);
+std::unique_ptr<StaticFunctionAnalysis> interpretFunction(Function const &f);

 }  // namespace FxOS

--- a/include/fxos/function.h
+++ b/include/fxos/function.h
@ -130,35 +130,7 @@ private:
 /* Basic block within a function. */
 struct BasicBlock
 {
-    enum Flags {
-        IsEntryBlock = 0x01,
-        IsTerminator = 0x02,
-        HasDelaySlot = 0x04,
-        NoTerminator = 0x08,
-
-        Last,
-        ValidFlags = (Last - 2) * 2 + 1,
-    };
-
-    // Basic blocks can exit in four ways:
-    // 1. Fall through
-    // 2. Jump to static destination
-    // 3. Jump to dynamic destination
-    // 4. Function return
-    // A given block might have multiple options (typically 1/2)
-
-    BasicBlock(Function &function, u32 address, bool isEntryBlock);
-
-    /* Block's address (address of first instruction). */
-    u32 address() const
-    {
-        return m_address;
-    }
-    /* Number of instructions. */
-    uint instructionCount() const
-    {
-        return m_instructions.size();
-    }
+    /*** General properties ***/

    /* Binary and function that own the basic block. */
    Binary &parentBinary()
@ -178,10 +150,24 @@ struct BasicBlock
        return m_function.get();
    }

+    /* Block's address (address of first instruction). */
+    u32 address() const
+    {
+        return m_address;
+    }
    /* Block's index within function. */
    uint blockIndex() const;

-    /* Instruction at a given index in the block (index < size()). */
+    /*** Access to instructions ***/
+
+    /* Number of instructions. */
+    uint instructionCount() const
+    {
+        return m_instructions.size();
+    }
+
+    /* Instruction at a given index in the block (index < instructionCount()).
+       This function returns instructions in increasing order of address. */
    Instruction &instructionAtIndex(uint index)
    {
        assert(index < instructionCount()
@ -195,75 +181,202 @@ struct BasicBlock
        return m_instructions[index];
    }

+    /* View over instructions in storage/address order */
+    auto instructionsInAddressOrder()  // -> [Instruction &]
+    {
+        return std::views::all(m_instructions);
+    }
+    auto instructionsInAddressOrder() const  // -> [Instruction const &]
+    {
+        return std::views::all(m_instructions);
+    }
+
+    /* View over instructions as pairs [instruction, delaySlotInstruction]. The
+       second member is null unless the first member has a delay slot in which
+       case the second member is a pointer to the instruction in that delay
+       slot, and that instruction never appears as the first member. */
+
+    template<typename Ins, typename Vec>
+    struct InsnPairIterator
+    {
+        /* This type is an input iterator. It satisfies __LegacyIterator so it
+           gets iterator_traits<> automatically. Ins is Instruction (maybe
+           const), Vec is std::vector<Instruction> (maybe const). */
+        InsnPairIterator(Vec *v, uint i): m_v {v}, m_i {i}
+        {
+        }
+
+        friend bool operator==(
+            InsnPairIterator<Ins, Vec> &left, InsnPairIterator<Ins, Vec> &right)
+        {
+            return left.m_v == right.m_v && left.m_i == right.m_i;
+        }
+
+        std::pair<Ins &, Ins *> operator*() const
+        {
+            Ins &ins = (*m_v)[m_i];
+            return {
+                ins, ins.opcode().hasDelaySlot() ? &(*m_v)[m_i + 1] : nullptr};
+        }
+
+        InsnPairIterator &operator++()
+        {
+            Ins &ins = (*m_v)[m_i];
+            m_i += 1 + ins.opcode().hasDelaySlot();
+            return *this;
+        }
+
+    private:
+        Vec *m_v;
+        uint m_i;
+    };
+
+    template<typename Ins, typename Vec>
+    struct InsnPairView: std::ranges::view_interface<InsnPairView<Ins, Vec>>
+    {
+        InsnPairView(Vec &v): m_v {v}
+        {
+        }
+
+        InsnPairIterator<Ins, Vec> begin() const
+        {
+            return InsnPairIterator<Ins, Vec>(&m_v, 0);
+        }
+        InsnPairIterator<Ins, Vec> end() const
+        {
+            return InsnPairIterator<Ins, Vec>(&m_v, m_v.size());
+        }
+
+    private:
+        Vec &m_v;
+    };
+
+    /* Input range for instructions with their delay slots. */
+    auto instructionsAndDelaySlots()  // -> [[Instruction &, Instruction *]]
+    {
+        return InsnPairView<Instruction, std::vector<Instruction>>(
+            m_instructions);
+    }
+    auto instructionsAndDelaySlots() const
+    // -> [[Instruction const &, Instruction const *]]
+    {
+        return InsnPairView<Instruction const, std::vector<Instruction> const>(
+            m_instructions);
+    }
+
    /* Terminator instruction. */
    Instruction *terminatorInstruction()
    {
+        bool hasDelaySlot = (m_flags & Flags::HasDelaySlot) != 0;
        return hasNoTerminator()
                   ? nullptr
-                   : &m_instructions[instructionCount() - hasDelaySlot() - 1];
+                   : &m_instructions[instructionCount() - hasDelaySlot - 1];
    }
    Instruction const *terminatorInstruction() const
    {
+        bool hasDelaySlot = (m_flags & Flags::HasDelaySlot) != 0;
        return hasNoTerminator()
                   ? nullptr
-                   : &m_instructions[instructionCount() - hasDelaySlot() - 1];
+                   : &m_instructions[instructionCount() - hasDelaySlot - 1];
    }

-    /* Instruction in terminator's delay slot. */
-    Instruction *delaySlotInstruction()
-    {
-        return hasDelaySlot() ? &m_instructions[instructionCount() - 1]
-                              : nullptr;
-    }
-    Instruction const *delaySlotInstruction() const
-    {
-        return hasDelaySlot() ? &m_instructions[instructionCount() - 1]
-                              : nullptr;
-    }
-
-    /* Iterators over instructions. */
-
-    auto const begin() const
-    {
-        return m_instructions.begin();
-    }
-    auto begin()
-    {
-        return m_instructions.begin();
-    }
-    auto const end() const
-    {
-        return m_instructions.end();
-    }
-    auto end()
-    {
-        return m_instructions.end();
-    }
-    auto const rbegin() const
-    {
-        return m_instructions.rbegin();
-    }
-    auto rbegin()
-    {
-        return m_instructions.rbegin();
-    }
-    auto const rend() const
-    {
-        return m_instructions.rend();
-    }
-    auto rend()
-    {
-        return m_instructions.rend();
-    }
+    /*** Analysis results ***/

    /* Entry state after analysis, if analysis was performed. */
    ProgramState const *initialState() const;

-    /* TODO: Iterator over instructions that also give the program state at the
-       point of the instruction. If no analysis was performed, the pointer will
-       be null. */
+    /* Iteration over instructions in address order with static value info.
+       This iteration is not quite in execution order but the change of state
+       due to delay slot instructions is merged into the associated branch and
+       the delay slot is a nop-diff so the sequencing is still correct. */

-    /* Functions for checking and setting flags */
+    template<typename It, typename State>
+    struct InsnStateIterator
+    {
+        /* This type is an input iterator. It satisfies __LegacyIterator so it
+           gets iterator_traits<> automatically. It is the iterator over the
+           vector of instructions. State is ProgramState. */
+        InsnStateIterator(It it, State const *initPS): m_it {it}
+        {
+            if(initPS)
+                m_PS = *initPS;
+            else
+                m_PS.setTop();
+        }
+
+        friend bool operator==(InsnStateIterator<It, State> &left,
+            InsnStateIterator<It, State> &right)
+        {
+            return left.m_it == right.m_it;
+        }
+
+        auto operator*() const
+        {
+            return std::make_pair(*m_it, m_PS);
+        }
+
+        InsnStateIterator &operator++()
+        {
+            auto *diff = (*m_it).stateDiff();
+            if(diff)
+                m_PS.applyDiff(*diff);
+            ++m_it;
+            return *this;
+        }
+
+    private:
+        It m_it;
+        State m_PS;
+    };
+
+    template<typename Vec>
+    struct InsnStateView: std::ranges::view_interface<InsnStateView<Vec>>
+    {
+        InsnStateView(Vec &v, ProgramState const *PS): m_v {v}, m_PS {PS}
+        {
+        }
+
+        auto begin() const
+        {
+            return InsnStateIterator(m_v.begin(), m_PS);
+        }
+        auto end() const
+        {
+            /* Final state does not matter */
+            return InsnStateIterator(m_v.end(), m_PS);
+        }
+
+    private:
+        Vec &m_v;
+        ProgramState const *m_PS;
+    };
+
+    /* Input range for instructions with their delay slots. */
+    auto instructionsWithState()  // -> [[Instruction &, ProgramState const &]]
+    {
+        return InsnStateView<std::vector<Instruction>>(
+            m_instructions, initialState());
+    }
+    auto instructionsWithState() const
+    // -> [[Instruction const &, ProgramState const &]]
+    {
+        return InsnStateView<std::vector<Instruction> const>(
+            m_instructions, initialState());
+    }
+
+    /*** Flags ***/
+
+    /* The following flags should be considered private, they're exposed only
+       for construction and debugging. Use the associated functions. */
+    enum Flags {
+        IsEntryBlock = 0x01,
+        IsTerminator = 0x02,
+        HasDelaySlot = 0x04,
+        NoTerminator = 0x08,
+
+        Last,
+        ValidFlags = (Last - 2) * 2 + 1,
+    };

    u32 getFlags() const
    {
@ -276,24 +389,30 @@ struct BasicBlock
        m_flags = flags;
    }

+    /* Whether this block is the parent function's entry block. */
    bool isEntryBlock() const
    {
        return (m_flags & Flags::IsEntryBlock) != 0;
    }
+    /* Same as .mustReturn(). */
    bool isTerminator() const
    {
        return (m_flags & Flags::IsTerminator) != 0;
    }
-    bool hasDelaySlot() const
-    {
-        return (m_flags & Flags::HasDelaySlot) != 0;
-    }
+    /* Whether this block lacks a terminator. */
    bool hasNoTerminator() const
    {
        return (m_flags & Flags::NoTerminator) != 0;
    }

-    /* Block exit information. */
+    /*** Block ending information ***/
+
+    /* Whether the block might end with a branch to a static target. */
+    bool mayStaticBranch() const;
+    /* Whether the block always ends with a branch to a static target. */
+    bool mustStaticBranch() const;
+    /* Target of the static jump, -1 if there is none. */
+    u32 staticBranchTarget() const;

    /* Whether the block might fall through (conditional or no jump). */
    bool mayFallthrough() const;
@ -303,17 +422,17 @@ struct BasicBlock
        return hasNoTerminator();
    }

-    /* Whether the block has a statically-known jump target. The jump might be
-       conditional, so this doesn't guarantee the target will be followed. */
-    bool hasStaticTarget() const;
-    /* Get said target, -1 if there is none. */
-    u32 staticTarget() const;
+    /* Whether the block ends with a function return. */
+    bool mustReturn() const
+    {
+        /* Same as terminatorInstruction()->opcode().isReturn() */
+        return (m_flags & Flags::IsTerminator) != 0;
+    }

-    /* Whether the block ends with a dynamically-known jump target. In SuperH
-       none of these are conditional so that makes it the only option. */
-    bool hasDynamicTarget() const;
+    /* Whether the block ends with a dynamically-known jump target. */
+    bool mustDynamicBranch() const;

-    /* CFG navigation. */
+    /*** CFG navigation ***/

    auto successors()  // -> [BasicBlock &]
    {
@ -329,7 +448,7 @@ struct BasicBlock
                     return parentFunction().basicBlockByIndex(index);
                 });
    }
-    std::vector<int> const &successorsByIndex() const
+    std::vector<uint> const &successorsByIndex() const
    {
        return m_successors;
    }
@ -353,7 +472,7 @@ struct BasicBlock
                     return parentFunction().basicBlockByIndex(index);
                 });
    }
-    std::vector<int> const &predecessorsByIndex() const
+    std::vector<uint> const &predecessorsByIndex() const
    {
        return m_predecessors;
    }
@ -372,7 +491,9 @@ struct BasicBlock
        return m_predecessors.size();
    }

-    /* Construction functions to be used only by the cfg pass. */
+    /*** Construction functions (semi-private) ***/
+
+    BasicBlock(Function &function, u32 address, bool isEntryBlock);
    void addInstruction(Instruction &&insn);
    void finalizeBlock();
    void addSuccessor(BasicBlock *succ);
@ -382,8 +503,8 @@ private:
    std::reference_wrapper<Function> m_function;
    std::vector<Instruction> m_instructions;
    /* TODO: More compact storage for CFG edges, especially successors (≤ 2) */
-    std::vector<int> m_successors;
-    std::vector<int> m_predecessors;
+    std::vector<uint> m_successors;
+    std::vector<uint> m_predecessors;
    u32 m_address;
    u32 m_flags;
 };
--- a/lib/analysis.cpp
+++ b/lib/analysis.cpp
@ -25,25 +25,42 @@ void ProgramState::setBottom()
        m_regs[i] = RelConstDomain().bottom();
 }

+void ProgramState::setTop()
+{
+    for(int i = 0; i < 16; i++)
+        m_regs[i] = RelConstDomain().top();
+}
+
 void ProgramState::applyDiff(ProgramStateDiff const &diff)
 {
    RelConstDomain RCD;
-    int t = diff.target();

-    if(t == static_cast<int>(ProgramStateDiff::Target::None)) {
-        /* Nothing */
+    switch(diff.baseType()) {
+    case ProgramStateDiff::BaseType::Anything:
+        setTop();
+        break;
+
+    case ProgramStateDiff::BaseType::NoOp:
+        break;
+
+    case ProgramStateDiff::BaseType::RegisterUpdate: {
+        CpuRegister reg = diff.registerName();
+        if(reg.getR() >= 0)
+            m_regs[reg.getR()] = diff.registerValue();
+        break;
    }
-    else if(t == static_cast<int>(ProgramStateDiff::Target::Unknown)) {
-        for(int i = 0; i < 16; i++)
-            m_regs[i] = RCD.top();
    }
-    else if(t == static_cast<int>(ProgramStateDiff::Target::CallStandard)) {
+
+    switch(diff.branchType()) {
+    case ProgramStateDiff::BranchType::None:
+    case ProgramStateDiff::BranchType::Branch:
+    case ProgramStateDiff::BranchType::Rte:
+        break;
+
+    case ProgramStateDiff::BranchType::CallStandard:
        for(int i = 0; i < 7; i++)
            m_regs[i] = RCD.top();
-    }
-    else {
-        assert((unsigned)t < 16 && "invalid register target");
-        m_regs[t] = diff.value();
+        break;
    }
 }

@ -67,16 +84,54 @@ bool ProgramState::le(ProgramState const &other) const
    return true;
 }

-std::string ProgramStateDiff::str() const
+void ProgramStateDiff::mergeWithDelaySlot(ProgramStateDiff const &slotDiff)
 {
-    if(m_target == static_cast<int>(Target::None))
-        return "()";
-    if(m_target == static_cast<int>(Target::Unknown))
-        return "⊤";
-    if(m_target == static_cast<int>(Target::CallStandard))
-        return "call(std)";
+    assert(baseType() == BaseType::NoOp && "merge diff from not a branch");
+    assert(slotDiff.branchType() == BranchType::None
+           && "merge diff from not a delay slot");

-    return fmt::format("r{} ← {}", m_target, m_value.str(false));
+    auto bt = branchType();
+    *this = slotDiff;
+    m_branchType = bt;
+}
+
+std::string ProgramStateDiff::baseStr() const
+{
+    switch(baseType()) {
+    case BaseType::Anything:
+        return "⊤";
+    case BaseType::NoOp:
+        return "";
+    case BaseType::RegisterUpdate:
+        return fmt::format("{} ← {}", m_register.str(), m_value.str(false));
+    }
+    return "???";
+}
+
+std::string ProgramStateDiff::branchStr() const
+{
+    switch(branchType()) {
+    case BranchType::None:
+    case BranchType::Branch:
+        return "";
+    case BranchType::Rte:
+        return "rte";
+    case BranchType::CallStandard:
+        return "stdcall";
+    }
+    return "???";
+}
+
+std::string ProgramStateDiff::str(bool optional) const
+{
+    std::string base = baseStr();
+    std::string branch = branchStr();
+
+    if(!base.empty() && !branch.empty())
+        return base + " | " + branch;
+    if(base.empty() && branch.empty() && !optional)
+        return "()";
+    return base.empty() ? branch : base;
 }

 /* Information stored for each block during the fixpoint iteration */
@ -131,7 +186,7 @@ static ProgramStateDiff interpretInstruction(
 {
    RelConstDomain RCD;
    ProgramStateDiff diff;
-    diff.setUnknown();
+    diff.setAnything();

    AsmInstruction asmins = ins.opcode();

@ -148,13 +203,13 @@ static ProgramStateDiff interpretInstruction(
        AsmOperand dst = asmins.operand(1);

        if(!dst.isReg())
-            diff.setNoop();
+            diff.setNoOp();
        else if(src.isConstant()) {
            RelConst c = RCD.constant(computeConstantOperand(ins, src));
-            diff.setRegisterUpdate(dst.base().getR(), c);
+            diff.setRegisterUpdate(dst.base(), c);
        }
        else
-            diff.setRegisterTouched(dst.base().getR());
+            diff.setRegisterTouched(dst.base());
        break;
    }

@ -177,11 +232,7 @@ static ProgramStateDiff interpretInstruction(
    case AsmInstruction::SH_shlr16: {
        AsmOperand op = asmins.operand(0);
        assert(op.isReg());
-
-        if(op.base().getR() >= 0)
-            diff.setRegisterTouched(op.base().getR());
-        else
-            diff.setNoop();
+        diff.setRegisterTouched(op.base());
        break;
    }

@ -206,11 +257,7 @@ static ProgramStateDiff interpretInstruction(
    case AsmInstruction::SH_xor:
    case AsmInstruction::SH_xtrct: {
        AsmOperand op = asmins.operand(1);
-
-        if(op.isReg() && op.base().getR() >= 0)
-            diff.setRegisterTouched(op.base().getR());
-        else
-            diff.setNoop();
+        diff.setRegisterTouched(op.base());
        break;
    }

@ -258,12 +305,13 @@ static ProgramStateDiff interpretInstruction(
    case AsmInstruction::SH_ocbwb:
    case AsmInstruction::SH_prefi:
    case AsmInstruction::SH_synco:
-        diff.setNoop();
+        diff.setNoOp();
        break;

    case AsmInstruction::SH_bsr:
    case AsmInstruction::SH_bsrf:
    case AsmInstruction::SH_jsr:
+        diff.setNoOp();
        diff.setCallStandard();
        break;

@ -271,14 +319,14 @@ static ProgramStateDiff interpretInstruction(
    case AsmInstruction::SH_movli:
    case AsmInstruction::SH_movua:
    case AsmInstruction::SH_movca:
-        diff.setUnknown();
+        diff.setAnything();
        break;
    }

    for(auto op: ins.opcode().operands()) {
        /* TODO: Properly handle pre-decr/post-dec */
        if(op.kind() == AsmOperand::PreDec || op.kind() == AsmOperand::PostInc)
-            diff.setUnknown();
+            diff.setAnything();
    }

    return diff;
@ -289,16 +337,27 @@ static void interpretBlock(BasicBlock const &bb, BlockStates &states)
    ProgramState PS {states.entry};
    states.diffs.clear();

-    for(Instruction const &i: bb) {
-        ProgramStateDiff diff = interpretInstruction(i, PS);
-        states.diffs.push_back(diff);
-        PS.applyDiff(diff);
+    // TODO: Fix that, use delay slots
+    for(auto const &[ins, delaySlot]: bb.instructionsAndDelaySlots()) {
+        ProgramStateDiff diff = interpretInstruction(ins, PS);
+
+        if(delaySlot) {
+            ProgramStateDiff diff2 = interpretInstruction(*delaySlot, PS);
+            diff.mergeWithDelaySlot(diff2);
+            states.diffs.push_back(diff);
+            states.diffs.emplace_back();
+            PS.applyDiff(diff);
+        }
+        else {
+            states.diffs.push_back(diff);
+            PS.applyDiff(diff);
+        }
    }

    states.exit = PS;
 }

-std::unique_ptr<StaticFunctionAnalysis> analyzeFunction(Function const &f)
+std::unique_ptr<StaticFunctionAnalysis> interpretFunction(Function const &f)
 {
    std::vector<BlockStates> VBS;

--- a/lib/function.cpp
+++ b/lib/function.cpp
@ -73,7 +73,7 @@ void Function::sortBasicBlocks()

    /* Update instruction's parent block numbers */
    for(uint i = 0; i < m_blocks.size(); i++) {
-        for(Instruction &ins: m_blocks[i])
+        for(Instruction &ins: m_blocks[i].instructionsInAddressOrder())
            ins.setBlockContext(i, ins.indexInBlock());
    }
 }
@ -102,7 +102,7 @@ void Function::setAnalysisVersion(int version)

 void Function::runAnalysis()
 {
-    m_analysisResult = analyzeFunction(*this);
+    m_analysisResult = interpretFunction(*this);
 }

 /* The first step in building function CFGs is delimiting the blocks. Starting
@ -339,26 +339,33 @@ ProgramState const *BasicBlock::initialState() const
    return &SFA->blocks[blockIndex()].entry;
 }

-bool BasicBlock::mayFallthrough() const
-{
-    Instruction const *ins = terminatorInstruction();
-    return !ins || ins->opcode().isConditionalJump();
-}
-
-bool BasicBlock::hasStaticTarget() const
+bool BasicBlock::mayStaticBranch() const
 {
    Instruction const *ins = terminatorInstruction();
    return ins && ins->opcode().isAnyStaticJump();
 }

-u32 BasicBlock::staticTarget() const
+bool BasicBlock::mustStaticBranch() const
+{
+    Instruction const *ins = terminatorInstruction();
+    return ins && ins->opcode().isUnconditionalJump();
+}
+
+u32 BasicBlock::staticBranchTarget() const
 {
    Instruction const *ins = terminatorInstruction();
    if(!ins || !ins->opcode().isAnyStaticJump())
        return 0xffffffff;
    return ins->opcode().getPCRelativeTarget(ins->address());
 }
-bool BasicBlock::hasDynamicTarget() const
+
+bool BasicBlock::mayFallthrough() const
+{
+    Instruction const *ins = terminatorInstruction();
+    return !ins || ins->opcode().isConditionalJump();
+}
+
+bool BasicBlock::mustDynamicBranch() const
 {
    Instruction const *ins = terminatorInstruction();
    return ins && ins->opcode().isDynamicJump();
@ -376,14 +383,14 @@ void BasicBlock::finalizeBlock()

    /* Instruction must be sequential. */
    u32 pc = this->address();
-    for(Instruction &insn: *this) {
+    for(Instruction &insn: instructionsInAddressOrder()) {
        assert(insn.address() == pc && "non-sequential instructions in bb");
        pc += insn.encodingSize();
    }

    /* The block must have no more than one terminator. */
    Instruction *term = nullptr;
-    for(Instruction &insn: *this) {
+    for(Instruction &insn: instructionsInAddressOrder()) {
        bool isReturn = insn.opcode().isBlockTerminator();
        assert(!(term && isReturn) && "bb with multiple terminators");
        if(isReturn)
@ -408,9 +415,12 @@ void BasicBlock::finalizeBlock()
    if(term && (term->opcode().isReturn() || term->opcode().isDynamicJump()))
        m_flags |= Flags::IsTerminator;

-    if(hasDelaySlot) {
-        Instruction *DSI = delaySlotInstruction();
-        DSI->setFlags(DSI->flags() | Instruction::Flags::InDelaySlot);
+    /* TODO: Check that insns with delay slots are valid/not in last place. */
+    for(uint i = 0; i < instructionCount(); i++) {
+        if(instructionAtIndex(i).opcode().hasDelaySlot()) {
+            Instruction &DSI = instructionAtIndex(i + 1);
+            DSI.setFlags(DSI.flags() | Instruction::Flags::InDelaySlot);
+        }
    }
 }

--- a/lib/view/assembly.cpp
+++ b/lib/view/assembly.cpp
@ -297,9 +297,9 @@ void viewAssemblyInstruction(
    if(opts->dumpFunctionAnalysis && an) {
        auto &block = an->blocks[ins.parentBlock().blockIndex()];
        ProgramStateDiff const &diff = block.diffs[ins.indexInBlock()];
-        if(diff.target() != static_cast<int>(ProgramStateDiff::Target::None))
-            comments.emplace_back(
-                diff.str(), fmt::fg(fmt::terminal_color::cyan));
+        auto diffStr = diff.str(true);
+        if(diffStr.size())
+            comments.emplace_back(diffStr, fmt::fg(fmt::terminal_color::cyan));
    }

    switch(ins.opcode().operation()) {
@ -397,7 +397,7 @@ void viewAssemblyBasicBlock(BasicBlock const &bb, ViewAssemblyOptions *opts)
            printf(" IsEntryBlock");
        if(bb.isTerminator())
            printf(" IsTerminator");
-        if(bb.hasDelaySlot())
+        if(bb.getFlags() & BasicBlock::HasDelaySlot)
            printf(" HasDelaySlot");
        if(bb.hasNoTerminator())
            printf(" NoTerminator");
@ -412,20 +412,13 @@ void viewAssemblyBasicBlock(BasicBlock const &bb, ViewAssemblyOptions *opts)
        viewProgramState(block.entry, fmt_rgb("    | ", fmt::color::gray));
    }

-    if(bb.parentFunction().hasAnalysis()) {
-        ProgramState PS = *bb.initialState();
-        for(Instruction const &ins: bb) {
-            viewAssemblyInstruction(ins, &PS, opts);
-            PS.applyDiff(*ins.stateDiff());
-        }
+    bool hasAnalysis = bb.parentFunction().hasAnalysis();
+    for(auto const &[ins, PS]: bb.instructionsWithState()) {
+        viewAssemblyInstruction(ins, hasAnalysis ? &PS : nullptr, opts);

-        if(opts->showAllProgramStates)
+        if(hasAnalysis && opts->showAllProgramStates)
            viewProgramState(PS, fmt_rgb("    | ", fmt::color::gray));
    }
-    else {
-        for(Instruction const &ins: bb)
-            viewAssemblyInstruction(ins, nullptr, opts);
-    }

    printf("\n");
 }