fxos: new function interface + cfg construction, to be tested

This commit is contained in:
Lephenixnoir 2023-10-07 22:53:51 +02:00
parent ede0a79b33
commit 6b4a122866
Signed by: Lephenixnoir
GPG Key ID: 1BBA026E13FC0495
10 changed files with 907 additions and 23 deletions

View File

@ -38,6 +38,7 @@ set(fxos_core_SOURCES
lib/binary.cpp
lib/disassembly.cpp
lib/lang.cpp
lib/function.cpp
lib/memory.cpp
lib/os.cpp
lib/passes/cfg.cpp

View File

@ -12,7 +12,7 @@
# @(disp,rn) [d,n] @(disp,rm) [d,m] @(r0,rn) [n] @(r0,rm) [m]
# @(disp,gbr) [d] @(r0, gbr] []
#
# The possible tags are %ret, %uncondjump, %condjump, %call, %delay, %islot.
# The tags are %ret, %uncondjump, %condjump, %dynjump, %call, %delay, %islot.
0000000001001000 clrs
0000000000001000 clrt
@ -115,7 +115,7 @@
0000nnnn00011010 sts macl, rn
0000nnnn00101010 sts pr, rn
0100nnnn00101011 jmp @rn %ret %delay
0100nnnn00101011 jmp @rn %dynjump %delay
0100nnnn00001011 jsr @rn %call %delay
0000nnnn10000011 pref @rn
0100nnnn00011011 tas.b @rn
@ -198,7 +198,7 @@
1101nnnndddddddd mov.l @(disp,pc), rn
11000111dddddddd mova.l @(disp,pc), r0 %islot
0000mmmm00100011 braf rm %ret %delay
0000mmmm00100011 braf rm %dynjump %delay
0000mmmm00000011 bsrf rm %call %delay
10001011dddddddd bf jump8 %condjump
10001111dddddddd bf.s jump8 %condjump %delay

89
doc/functions.md Normal file
View File

@ -0,0 +1,89 @@
## Functions
Probably the most common object of interest is code. In fxos, the “proper” way to deal with code is through functions. Random instructions not tied to functions have much less support in terms of tooling and analysis.
This document describes structures defined in
```cpp
#include <fxos/function.h>
```
### Navigating functions, basic blocks and instructions
Functions in fxos are stored as [Control Flow Graphs](https://en.wikipedia.org/wiki/Control-flow_graph) (CFG), which is a friendly format for analysis. The function itself is split into _basic blocks_, each consisting of a straight series of instructions terminated by an explicit or implicit jump to one or two other blocks. In essence, a basic block is the largest unit of sequential code that you can find in a function.
#### The `Function` structure
The `Function` structure is a `BinaryObject`, so it always lives in a binary that can be found with `.parentBinary()`. It also has the usual `.address()`, `.size()`, and name/comments.
When iterated with `.begin()`/`.end()`, it produces references to its basic blocks in an arbitrary order:
```cpp
for(BasicBlock &bb: function) {
/* ... */
}
```
Blocks are numbered from 0 to `.blockCount()` and can be accessed individually with `.basicBlockByIndex()`. The function's entry block can be found with `.entryBlock()`.
#### The `BasicBlock` structure
The `BasicBlock` structure represents a node in the CFG. It always exists in the context of a function, which can be found with `.parentFunction()`. The binary that owns the function is also available as `.parentBinary()`.
The block has its own `.address()` and `.instructionCount()`. Its main attraction is the list of instructions that it contains, which can be iterated over with `.begin()`/`.end()` or in reverse order with `.rbegin()`/`.rend()`:
```cpp
for(Instruction &insn: bb) {
/* ... */
}
for(auto it = bb.rbegin(); it != bb.rend(); it++) {
Instruction &insn = *it;
/* ... */
}
```
Individual instructions can also be found with `.instructionAtIndex()`.
Basic blocks usually end with a jump instruction; however, in some cases the next block follows the current one in memory, so there is no "jump"; control just keeps going forward. For instance, in an `if/else` statement the `.false` block might fall through to whatever code follows the condition:
```
╒═══════════╕
│ condition │ true
│ bf .false │───────────────╮
╘═══════════╛ ╒═══════════╕
false │ .true: │ ... │
↓ │ bra .end │
╒═══════════╕ ╘═══════════╛
.false: │ ... │ │
╘═══════════╛ │
fall-through ↓ │
╒═══════════╕ │
.end: │ ... │←──────────────╯
╘═══════════╛
```
In this case `.false` has no jump. (It still wouldn't be possible to merge `.false` and `.end` because then `.true` would jump into the _middle_ of a block, which is forbidden.)
The function `.hasFallthrough()` will indicate whether the block falls through. If it doesn't, `.terminatorInstruction()` will return a pointer to the jump that terminates it (if it does, `.terminatorInstruction()` will return `nullptr`).
A major detail of the SuperH ISA is that most branch instructions have delay slots. This means that even though a basic block conceptually ends when a jump is executed, typically the instruction following the jump instruction (which the CPU executes on-the-fly during the jump) is also part of the block. Hence, the block terminator is either the last or the second-to-last instruction in the block. The function `.hasDelaySlot()` will indicate whether the block has a delay slot.
Navigation in the CFG can be done by querying the block's `.successors()` and `.predecessors()` (both functions return read-only vectors of pointers to other blocks). Additional, hopefully self-explaining information, is available through `.successorCount()`, `.predecessorCount()`, `.isEntryBlock()` and `.isTerminator()`.
### The `Instruction` structure
The `Instruction` structure represents a single instruction, within the context of a function. The basic block, function and binary owning it can be queried with `.parentBlock()`, `.parentFunction()` and `.parentBinary()`.
This structure is instantiated in RAM for every single instruction registered as part of a function (an order of magnitude is several millions for a standard OS binary) so this structure keeps a minimal number of attributes. In particular, analysis results are not stored here, and instead queried from the binary as annotations.
The instruction's opcode can be accessed with `.opcode()`, and this can be used to check if the instruction is a branch, a memory access, what its operands are, etc. `Instruction` only tracks the context and analysis results. Along with the opcode, `.size()` will give the instruction's size in bytes (which is usually 2 but can be 4 for DSP instructions).
The instruction has its own `.address()` and its relationship to other instructions in its block can be found with `.indexInBlock()`, `.isFirstInBlock()`, `.isLastInBlock()` and `.isInDelaySlot()`. Note that again, due to delay slots, being last and being a jump are not the same thing. However, since only jumps have delay slots and jumps are always block terminators, being in a delay slot does imply being the last instruction in a block.
### Function analysis
TODO:
- Function prototypes
- References
- Cross-references
- Dominators and post-dominators

437
include/fxos/function.h Normal file
View File

@ -0,0 +1,437 @@
//---------------------------------------------------------------------------//
// 1100101 |_ mov #0, r4 __ //
// 11 |_ <0xb380 %5c4> / _|_ _____ ___ //
// 0110 |_ 3.50 -> 3.60 | _\ \ / _ (_-< //
// |_ base# + offset |_| /_\_\___/__/ //
//---------------------------------------------------------------------------//
// fxos/function: Functions and their component blocks and instructions
#ifndef FXOS_FUNCTION_H
#define FXOS_FUNCTION_H
#include <fxos/util/types.h>
#include <fxos/binary.h>
#include <fxos/lang.h>
#include <array>
#include <vector>
#include <optional>
#include <cassert>
namespace FxOS {
class Function;
class BasicBlock;
class Instruction;
// TODO: move this extern declaration of FxOS::insmap
extern std::array<std::optional<AsmInstruction>, 65536> insmap;
/* Binary object representing a function. */
struct Function: public BinaryObject
{
Function(Binary &binary, u32 address);
/* Number of basic blocks. */
uint blockCount() const
{
return m_blocks.size();
}
/* Get basic block by its index. */
BasicBlock &basicBlockByIndex(uint index)
{
assert(index < blockCount() && "out-of-bounds block number");
return m_blocks[index];
}
BasicBlock const &basicBlockByIndex(uint index) const
{
assert(index < blockCount() && "out-of-bounds block number");
return m_blocks[index];
}
/* Get the entry block. */
BasicBlock &entryBlock()
{
return basicBlockByIndex(0);
}
BasicBlock const &entryBlock() const
{
return basicBlockByIndex(0);
}
/* Iterators over basic blocks. */
auto const begin() const
{
return m_blocks.begin();
}
auto begin()
{
return m_blocks.begin();
}
auto const end() const
{
return m_blocks.end();
}
auto end()
{
return m_blocks.end();
}
/* Construction functions to be used only by the cfg pass. */
void exploreFunctionAt(u32 address);
void addBasicBlock(BasicBlock &&bb);
void updateFunctionSize();
private:
/* List of basic blocks (entry block is always number 0) */
std::vector<BasicBlock> m_blocks;
};
/* Basic block within a function. */
struct BasicBlock
{
enum Flags {
IsEntryBlock = 0x01,
IsTerminator = 0x02,
HasDelaySlot = 0x04,
NoTerminator = 0x08,
Last,
ValidFlags = (Last - 2) * 2 + 1,
};
// Basic blocks can exit in four ways:
// 1. Fall through
// 2. Jump to static destination
// 3. Jump to dynamic destination
// 4. Function return
// A given block might have multiple options (typically 1/2)
BasicBlock(Function &function, u32 address, bool isEntryBlock);
/* Block's address (address of first instruction). */
u32 address() const
{
return m_address;
}
/* Number of instructions. */
uint instructionCount() const
{
return m_instructions.size();
}
/* Binary and function that own the basic block. */
Binary &parentBinary()
{
return m_function.parentBinary();
}
Binary const &parentBinary() const
{
return m_function.parentBinary();
}
Function &parentFunction()
{
return m_function;
}
Function const &parentFunction() const
{
return m_function;
}
/* Block's index within function. */
uint blockIndex() const;
/* Instruction at a given index in the block (index < size()). */
Instruction &instructionAtIndex(uint index)
{
assert(index < instructionCount()
&& "out-of-bounds access to basic block");
return m_instructions[index];
}
Instruction const &instructionAtIndex(uint index) const
{
assert(index < instructionCount()
&& "out-of-bounds access to basic block");
return m_instructions[index];
}
/* Terminator instruction. */
Instruction *terminatorInstruction()
{
return hasNoTerminator()
? nullptr
: &m_instructions[instructionCount() - hasDelaySlot() - 1];
}
Instruction const *terminatorInstruction() const
{
return hasNoTerminator()
? nullptr
: &m_instructions[instructionCount() - hasDelaySlot() - 1];
}
/* Instruction in terminator's delay slot. */
Instruction *delaySlotInstruction()
{
return hasDelaySlot() ? &m_instructions[instructionCount() - 1]
: nullptr;
}
Instruction const *delaySlotInstruction() const
{
return hasDelaySlot() ? &m_instructions[instructionCount() - 1]
: nullptr;
}
/* Iterators over instructions. */
auto const begin() const
{
return m_instructions.begin();
}
auto begin()
{
return m_instructions.begin();
}
auto const end() const
{
return m_instructions.end();
}
auto end()
{
return m_instructions.end();
}
auto const rbegin() const
{
return m_instructions.rbegin();
}
auto rbegin()
{
return m_instructions.rbegin();
}
auto const rend() const
{
return m_instructions.rend();
}
auto rend()
{
return m_instructions.rend();
}
/* Functions for checking and setting flags */
u32 getFlags() const
{
return m_flags;
}
void setFlags(u32 flags)
{
assert(!(flags & ~Flags::ValidFlags)
&& "setting invalid basic block flags");
m_flags = flags;
}
bool isEntryBlock() const
{
return (m_flags & Flags::IsEntryBlock) != 0;
}
bool isTerminator() const
{
return (m_flags & Flags::IsTerminator) != 0;
}
bool hasDelaySlot() const
{
return (m_flags & Flags::HasDelaySlot) != 0;
}
bool hasNoTerminator() const
{
return (m_flags & Flags::NoTerminator) != 0;
}
/* Block exit information. */
/* Whether the block might fall through (conditional or no jump). */
bool mayFallthrough() const;
/* Whether the block always falls through. */
bool mustFallthrough() const
{
return hasNoTerminator();
}
/* Whether the block has a statically-known jump target. The jump might be
conditional, so this doesn't guarantee the target will be followed. */
bool hasStaticTarget() const;
/* Get said target, -1 if there is none. */
u32 staticTarget() const;
/* Whether the block ends with a dynamically-known jump target. In SuperH
none of these are conditional so that makes it the only option. */
bool hasDynamicTarget() const;
/* CFG navigation. */
std::vector<BasicBlock *> const &successors()
{
return m_successors;
}
std::vector<BasicBlock const *> successors() const
{
std::vector<BasicBlock const *> succ(m_successors.size());
for(auto *bb: m_successors)
succ.push_back(bb);
return succ;
}
std::vector<BasicBlock *> const &predecessors()
{
return m_predecessors;
}
std::vector<BasicBlock const *> predecessors() const
{
std::vector<BasicBlock const *> pred(m_predecessors.size());
for(auto *bb: m_predecessors)
pred.push_back(bb);
return pred;
}
uint successorCount() const
{
return m_successors.size();
}
uint predecessorCount() const
{
return m_predecessors.size();
}
/* Construction functions to be used only by the cfg pass. */
void addInstruction(Instruction &&insn);
void finalizeBlock();
// TODO: Set successors and predecessors
private:
Function &m_function;
std::vector<Instruction> m_instructions;
/* TODO: More compact storage for CFG edges, especially successors (≤ 2) */
std::vector<BasicBlock *> m_successors;
std::vector<BasicBlock *> m_predecessors;
u32 m_address;
u32 m_flags;
};
/* Concrete instruction in a basic block. This class only contains a minimal
amount of data, and most analysis results provided by its methods are
instead queried from the appropriate Binary. */
struct Instruction
{
enum Flags {
InDelaySlot = 0x01,
Last,
ValidFlags = (Last - 2) * 2 + 1,
};
Instruction(Function &function, u32 address, u32 opcode);
// TODO: Rename AsmInstruction -> Opcode
// TODO: Get opcode from Instruction
AsmInstruction const &opcode() const
{
assert(insmap[m_opcode] && "use of Instruction with invalid opcode");
return *insmap[m_opcode];
}
/* Instruction's size in bytes. */
uint size() const
{
return (m_opcode >> 16) ? 4 : 2;
}
/* Binary, function and basic block that own the instruction. */
Binary &parentBinary()
{
return m_function.parentBinary();
}
Binary const &parentBinary() const
{
return m_function.parentBinary();
}
Function &parentFunction()
{
return m_function;
}
Function const &parentFunction() const
{
return m_function;
}
BasicBlock &parentBlock()
{
return m_function.basicBlockByIndex(m_blockIndex);
}
BasicBlock const &parentBlock() const
{
return m_function.basicBlockByIndex(m_blockIndex);
}
/* Instruction's address. */
u32 address() const
{
return m_address;
}
/* Index of instruction within its basic block. */
uint indexInBlock() const
{
return m_insnIndex;
}
/* Whether this instruction is the first instruction in its block. */
bool isFirstInBlock() const
{
return m_insnIndex == 0;
}
/* Whether this instruction is the last in its block. This does *not* imply
that it's a jump, because delay slots are a thing. */
bool isLastInBlock() const
{
return (uint)m_insnIndex + 1 == parentBlock().instructionCount();
}
/* Whether this instruction is in a delay slot. Since only jumps have delay
slots, this implies isLastInBlock(). */
bool isInDelaySlot() const
{
return m_flags & Flags::InDelaySlot;
}
/* Properties about parameters. This is tailored to the SuperH ISA. */
// TODO: Extract parameter info
// - Get branch target if any, immediate if any, memory access address if
// any (harder: dynamic)...
// - All successors (+ user specifiable for dynamic cases)
// - All constants
/* Functions to access and modify flags */
u32 flags() const
{
return m_flags;
}
void setFlags(u32 flags)
{
m_flags = flags;
}
/* Construction functions to be used only by the cfg pass */
void setBlockContext(uint blockIndex, uint insnIndex);
private:
/* The following members are instantiated for every instruction mapped out
in the Binary - keep it reasonably small. */
Function &m_function;
u32 m_address;
u32 m_opcode;
u32 m_flags;
u16 m_blockIndex;
u16 m_insnIndex;
};
} /* namespace FxOS */
#endif /* FXOS_FUNCTION_H */

View File

@ -30,7 +30,7 @@
#define FXOS_LANG_H
#include <string>
#include <cstdint>
#include <fxos/util/types.h>
namespace FxOS {
@ -39,7 +39,7 @@ class CpuRegister
{
public:
// clang-format off
enum CpuRegisterName: int8_t {
enum CpuRegisterName: i8 {
/* Value 0 is reserved for special purposes such as "no register" */
UNDEFINED = 0,
/* Caller-saved general-purpose registers */
@ -93,7 +93,7 @@ private:
struct AsmArgument
{
/* Various addressing modes in the language */
enum Kind : int8_t {
enum Kind : i8 {
Reg, /* rn */
Deref, /* @rn */
PostInc, /* @rn+ */
@ -118,7 +118,7 @@ struct AsmArgument
/* Index register. Valid for ArrayDeref */
CpuRegister index;
/* Operation size (0, 1, 2 or 4). Generally a multiplier for disp */
int8_t opsize;
i8 opsize;
union
{
@ -152,6 +152,7 @@ struct AsmInstruction
IsCall = 0x08,
HasDelaySlot = 0x10,
IsInvalidDelaySlot = 0x20,
IsDynamicJump = 0x40,
};
AsmInstruction() = default;
@ -163,29 +164,27 @@ struct AsmInstruction
/* Original opcode. Initialized to 0 when unset, which is an invalid
instruction by design. */
uint32_t opcode;
u32 opcode;
/* Operation size (0, 1, 2 or 4) */
int8_t opsize;
i8 opsize;
/* Number of arguments */
uint8_t arg_count;
u8 arg_count;
/* Instruction tags */
uint16_t tags;
u16 tags;
/* Mnemonic **without the size indicator** */
char mnemonic[12];
/* Arguments (up to 2) */
AsmArgument args[2];
//---
// Instruction classes
//---
//=== Instruction classes ===//
/* Whether the instruction terminates the function it's in. */
bool isReturn() const
{
return (this->tags & Tag::IsReturn) != 0;
}
/* Whether the instruction is a conditional/unconditional jump. */
/* Whether the instruction is a conditional/unconditional static jump. */
bool isConditionalJump() const
{
return (this->tags & Tag::IsConditionalJump) != 0;
@ -194,11 +193,17 @@ struct AsmInstruction
{
return (this->tags & Tag::IsUnconditionalJump) != 0;
}
bool isAnyJump() const
bool isAnyStaticJump() const
{
int IsJump = Tag::IsConditionalJump | Tag::IsUnconditionalJump;
return (this->tags & IsJump) != 0;
}
/* Whether the instruction jumps to a dynamic target. This does not include
*calls* to dynamic targets. These jumps are always unconditional. */
bool isDynamicJump() const
{
return (this->tags & Tag::IsDynamicJump) != 0;
}
/* Whether the instruction is a function call. */
bool isCall() const
{
@ -212,7 +217,7 @@ struct AsmInstruction
/* Wheher the instruction terminates its basic block. */
bool isBlockTerminator() const
{
return isAnyJump() || isReturn();
return isAnyStaticJump() || isDynamicJump() || isReturn();
}
/* Whether the instruction can be used in a delay slot. */
bool isValidDelaySlot() const
@ -220,6 +225,12 @@ struct AsmInstruction
return !isBlockTerminator() && !hasDelaySlot()
&& (this->tags & Tag::IsInvalidDelaySlot) == 0;
}
//=== Instruction info ===//
/* Get the PC-relative target, assuming the instruction is at the provided
address, for instructions with PC-relative offsets. */
u32 getPCRelativeTarget(u32 pc) const;
};
} /* namespace FxOS */

310
lib/function.cpp Normal file
View File

@ -0,0 +1,310 @@
//---------------------------------------------------------------------------//
// 1100101 |_ mov #0, r4 __ //
// 11 |_ <0xb380 %5c4> / _|_ _____ ___ //
// 0110 |_ 3.50 -> 3.60 | _\ \ / _ (_-< //
// |_ base# + offset |_| /_\_\___/__/ //
//---------------------------------------------------------------------------//
#include <fxos/function.h>
#include <fxos/util/format.h>
#include <fxos/util/log.h>
namespace FxOS {
//=== Function ===//
Function::Function(Binary &binary, u32 address):
BinaryObject(binary, BinaryObject::Function, address, 0)
{
/* Size is not determined at first. */
/* Default unambiguous name */
setName(format("FUN_%08x", address));
}
/* Add a basic block to the function. The entry block must be added first. */
void Function::addBasicBlock(BasicBlock &&bb)
{
m_blocks.push_back(bb);
}
/* Update the function's BinaryObject size by finding the last address covered
by any instruction in the function. */
void Function::updateFunctionSize()
{
u32 max_address = this->address();
for(BasicBlock &bb: *this) {
if(bb.instructionCount() == 0)
continue;
Instruction &insn = bb.instructionAtIndex(bb.instructionCount() - 1);
max_address = std::max(max_address, insn.address() + insn.size());
}
this->setSize(max_address - this->address());
}
/* The first step in building function CFGs is delimiting the blocks. Starting
from the entry point, we generate "superblocks" by reading instructions
linearly until we find a terminator.
In general, a superblock will be split into multiple basic blocks, with a
cut at every target of a jump inside the superblock. We record these as we
explore, and generate basic blocks at the end. */
struct Superblock
{
/* Addresses of all instructions in the superblock. */
std::vector<u32> addresses;
/* Addresses of all basic block leaders in the superblock. */
std::set<u32> leaders;
/* Whether the superblock ends with a dynamic jump */
bool mustDynamicJump = false;
/* Whether the superblock may end with a jump to a static target */
bool mayStaticJump = false;
/* Whether the superblock may end by a fallthrough */
bool mayFallthrough = false;
/* Whether the superblock ends with a return */
bool mustReturn = false;
/* If mayStaticJump is set, target address */
u32 staticTarget = 0xffffffff;
/* If mayFallthrough is set, fallthrough address */
u32 fallthroughTarget = 0xffffffff;
};
// TODO: Unclear what the exit status of the superblock is in case of error
static Superblock exploreSuperblock(Function &function, u32 entry)
{
Superblock sb;
sb.leaders.insert(entry);
VirtualSpace &vspace = function.parentBinary().vspace();
bool inDelaySlot = false;
bool terminatorFound = false;
u32 pc = entry;
while(!terminatorFound || inDelaySlot) {
sb.addresses.push_back(pc);
/* Read the next instruction from memory */
// TODO: Handle 32-bit DSP instructions
if(!vspace.covers(pc, 2)) {
FxOS_log(ERR, "superblock %08x exits vspace at %08x", entry, pc);
break;
}
u32 opcodeBits = vspace.read_u16(pc);
Instruction ins(function, pc, opcodeBits);
AsmInstruction opcode = ins.opcode();
if(inDelaySlot && !opcode.isValidDelaySlot()) {
FxOS_log(ERR, "superblock %08x has invalid delay slot at %08x",
entry, pc);
break;
}
/* Set exit properties when finding the terminator */
if(opcode.isBlockTerminator()) {
sb.mustDynamicJump = opcode.isDynamicJump();
sb.mayStaticJump = opcode.isAnyStaticJump();
sb.mayFallthrough = opcode.isConditionalJump();
sb.mustReturn = opcode.isReturn();
if(sb.mayStaticJump)
sb.staticTarget = opcode.getPCRelativeTarget(pc);
}
terminatorFound = terminatorFound || opcode.isBlockTerminator();
inDelaySlot = !inDelaySlot && opcode.hasDelaySlot();
pc += 2;
}
if(sb.mayFallthrough)
sb.fallthroughTarget = pc;
return sb;
}
/* Cut a superblock in the list and returns true if one contains provided
address, otherwise returns false. */
static bool cutSuperblockAt(std::vector<Superblock> &blocks, u32 address)
{
for(auto &b: blocks) {
auto const &a = b.addresses;
if(std::find(a.begin(), a.end(), address) != a.end()) {
b.leaders.insert(address);
return true;
}
}
return false;
}
void Function::exploreFunctionAt(u32 functionAddress)
{
assert(!(functionAddress & 1) && "function starts at unaligned address");
std::vector<Superblock> blocks;
std::queue<u32> queue;
queue.push(functionAddress);
while(!queue.empty()) {
u32 entry = queue.front();
queue.pop();
/* If this address was found by another superblock that was explored
while [entry] was in the queue, perform the cut now */
if(cutSuperblockAt(blocks, entry))
continue;
Superblock sb = exploreSuperblock(*this, entry);
/* Process static jump targets and fallthrough targets to queue new
superblocks or cut existing ones */
if(sb.mayFallthrough) {
if(!cutSuperblockAt(blocks, sb.fallthroughTarget))
queue.push(sb.fallthroughTarget);
}
if(sb.mayStaticJump) {
if(!cutSuperblockAt(blocks, sb.staticTarget))
queue.push(sb.staticTarget);
}
blocks.push_back(std::move(sb));
}
/* Cut superblocks. The loop on b.leaders schedules the construction of new
BasicBlock objects but the iteration is really the multi-part do loop
using the iterator on b.addresses. */
for(auto &b: blocks) {
auto it = b.addresses.begin();
for(u32 _: b.leaders) {
(void)_;
BasicBlock bb(*this, *it, *it == functionAddress);
do {
// TODO: Support 32-bit instructions
u32 opcode = parentBinary().vspace().read_u16(opcode);
Instruction ins(*this, *it, opcode);
bb.addInstruction(std::move(ins));
it++;
}
while(it != b.addresses.end() && !b.leaders.count(*it));
bb.finalizeBlock();
addBasicBlock(std::move(bb));
}
}
// TODO: Set successors and predecessors
}
//=== BasicBlock ===//
BasicBlock::BasicBlock(Function &function, u32 address, bool isEntryBlock):
m_function {function}, m_address {address}, m_flags {0}
{
if(isEntryBlock)
m_flags |= Flags::IsEntryBlock;
}
uint BasicBlock::blockIndex() const
{
for(uint i = 0; i < m_function.blockCount(); i++) {
BasicBlock &bb = m_function.basicBlockByIndex(i);
if(&bb == this)
return i;
}
assert(false && "blockIndex of block not in its own parent");
}
bool BasicBlock::mayFallthrough() const
{
Instruction const *ins = terminatorInstruction();
return !ins || ins->opcode().isConditionalJump();
}
bool BasicBlock::hasStaticTarget() const
{
Instruction const *ins = terminatorInstruction();
return ins && ins->opcode().isAnyStaticJump();
}
u32 BasicBlock::staticTarget() const
{
Instruction const *ins = terminatorInstruction();
if(!ins || !ins->opcode().isAnyStaticJump())
return 0xffffffff;
return ins->opcode().getPCRelativeTarget(ins->address());
}
bool BasicBlock::hasDynamicTarget() const
{
Instruction const *ins = terminatorInstruction();
return ins && ins->opcode().isDynamicJump();
}
void BasicBlock::addInstruction(Instruction &&insn)
{
insn.setBlockContext(this->blockIndex(), m_instructions.size());
m_instructions.push_back(std::move(insn));
}
void BasicBlock::finalizeBlock()
{
/* Ensure a bunch of invariants. */
/* Instruction must be sequential. */
u32 pc = this->address();
for(Instruction &insn: *this) {
assert(insn.address() == pc && "non-sequential instructions in bb");
pc += insn.size();
}
/* The block must have no more than one terminator. */
Instruction *term = nullptr;
for(Instruction &insn: *this) {
bool isReturn = insn.opcode().isBlockTerminator();
assert(!(term && isReturn) && "bb with multiple terminators");
}
/* The block must have a delay slot iff the terminator has one. */
bool hasDelaySlot = false;
if(term) {
hasDelaySlot = term->opcode().hasDelaySlot();
assert(
term->indexInBlock() == this->instructionCount() - hasDelaySlot - 1
&& "incorrectly placed bb terminator");
}
/* Set structural flags. */
if(hasDelaySlot)
m_flags |= Flags::HasDelaySlot;
if(!term)
m_flags |= Flags::NoTerminator;
if(term && term->opcode().isReturn())
m_flags |= Flags::IsTerminator;
if(hasDelaySlot) {
Instruction *DSI = delaySlotInstruction();
DSI->setFlags(DSI->flags() | Instruction::Flags::InDelaySlot);
}
}
//=== Instruction ===//
Instruction::Instruction(Function &function, u32 address, u32 opcode):
m_function {function}, m_address {address}, m_opcode {opcode}
{
/* Start with no flags; they will be set as needed */
m_flags = 0;
}
void Instruction::setBlockContext(uint blockIndex, uint insnIndex)
{
m_blockIndex = blockIndex;
m_insnIndex = insnIndex;
}
} /* namespace FxOS */

View File

@ -217,4 +217,24 @@ AsmInstruction::AsmInstruction(
arg_count = 2;
}
u32 AsmInstruction::getPCRelativeTarget(u32 pc) const
{
for(int i = 0; i < arg_count; i++) {
AsmArgument const &arg = args[i];
int size = opsize + (opsize == 0);
if(arg.kind == AsmArgument::PcRel)
return (pc & -size) + 4 + arg.disp;
if(arg.kind == AsmArgument::PcJump)
return pc + 4 + arg.disp;
if(arg.kind == AsmArgument::PcAddr)
return (pc & -4) + 4 + arg.disp;
/* SH3 manual says that mova uses the target address of the jump when
in a delay slot. SH4AL-DSP makes it invalid. Supporting this would
be very tricky since the target PC is often dynamic (eg. rts). */
}
return 0xffffffff;
}
} /* namespace FxOS */

View File

@ -40,7 +40,8 @@ enum Token {
/* Array dereferencing */
AT_R0RN, AT_R0RM, AT_R0GBR,
/* Tags */
TAG_RET, TAG_UNCONDJUMP, TAG_CONDJUMP, TAG_CALL, TAG_DELAY, TAG_ISLOT,
TAG_RET, TAG_UNCONDJUMP, TAG_CONDJUMP, TAG_DYNJUMP, TAG_CALL, TAG_DELAY,
TAG_ISLOT,
};
/* Instruction opcode pattern */
@ -121,6 +122,7 @@ space [ \t]+
"%ret" { return TAG_RET; }
"%uncondjump" { return TAG_UNCONDJUMP; }
"%condjump" { return TAG_CONDJUMP; }
"%dynjump" { return TAG_DYNJUMP; }
"%call" { return TAG_CALL; }
"%delay" { return TAG_DELAY; }
"%islot" { return TAG_ISLOT; }
@ -294,6 +296,8 @@ int get_tag(int t)
return AsmInstruction::Tag::IsUnconditionalJump;
if(t == TAG_CONDJUMP)
return AsmInstruction::Tag::IsConditionalJump;
if(t == TAG_DYNJUMP)
return AsmInstruction::Tag::IsDynamicJump;
if(t == TAG_CALL)
return AsmInstruction::Tag::IsCall;
if(t == TAG_DELAY)

View File

@ -35,7 +35,7 @@ bool CfgPass::analyzeInstruction(uint32_t pc, OldInstruction &i)
"terminal" to avoid the computation!) */
uint32_t jmptarget = 0xffffffff;
if(i.inst->isAnyJump()) {
if(i.inst->isAnyStaticJump()) {
auto &args = i.inst->args;
if(i.inst->arg_count != 1 || args[0].kind != AsmArgument::PcJump) {
@ -86,14 +86,14 @@ bool CfgPass::analyzeInstruction(uint32_t pc, OldInstruction &i)
}
slot.delayslot = true;
slot.terminal = i.inst->isReturn();
slot.terminal = i.inst->isReturn() || i.inst->isDynamicJump();
slot.jump = i.inst->isUnconditionalJump();
slot.condjump = i.inst->isConditionalJump();
slot.jmptarget = jmptarget;
}
/* Otherwise, use standard properties */
else if(!i.inst->hasDelaySlot()) {
i.terminal = i.inst->isReturn();
i.terminal = i.inst->isReturn() || i.inst->isDynamicJump();
i.jump = i.inst->isUnconditionalJump();
i.condjump = i.inst->isConditionalJump();
i.jmptarget = jmptarget;

View File

@ -10,6 +10,7 @@
#include <fxos/passes/pcrel.h>
#include <fxos/passes/syscall.h>
#include <fxos/view/assembly.h>
#include <fxos/function.h>
#include <fxos/util/Timer.h>
#include <fxos/util/log.h>
@ -37,7 +38,7 @@ static void disassemble(
ok = p.analyzeAllInstructions();
}
}
else if(pass == "print") {
else if(pass == "print" && address + 1) {
viewAssemblyLegacyAddress(binary, address);
}
else {
@ -53,6 +54,12 @@ static void disassemble(
break;
}
}
if(address + 1) {
printf("&<<<< function test >>>>&\n");
Function f(binary, address);
f.exploreFunctionAt(address);
}
}
//---
@ -102,7 +109,12 @@ void _d(Session &session, std::variant<long, Range> location)
for(uint32_t pc = range.start; pc < range.end; pc += 2)
b->vspace().disasm.getInstructionAt(pc, true);
disassemble(*b, {"pcrel", /*"constprop",*/ "syscall", "print"}, -1);
disassemble(*b, {"pcrel", /*"constprop",*/ "syscall"}, -1);
MemoryRegion r;
r.start = range.start;
r.end = range.end - 1;
viewAssemblyLegacyRegion(*b, r);
}
else {
uint32_t address = std::get<long>(location);