README for the shell interface

This commit is contained in:
Lephenixnoir 2022-03-06 23:39:44 +00:00
parent 2394725074
commit 54721cac93
Signed by: Lephenixnoir
GPG Key ID: 1BBA026E13FC0495
3 changed files with 88 additions and 149 deletions

233
README.md
View File

@ -1,16 +1,30 @@
# fxos
fxos is an extended disassembler specifically used to reverse-engineer the OS,
the bootcode, and syscalls. It used to be part of the
bootcode, and syscalls of CASIO fx and fx-CG series. It used to be part of the
[fxSDK](/Lephenixnoir/fxsdk). If you have a use for fxos, then be sure to also
check the [Planète Casio bible](https://bible.planet-casio.com/), which gathers
most of the reverse-engineering knowledge and research of the community.
If you're familiar with IDA, Ghidra, or other industry-grade
reverse-engineering tools, then fxos won't be able to complete. This is more of
a scripting playground with very OS-centric features for me. Some of the things
it can do that usual tools might not do directly include:
* Finding OS-specific data like bootcode/OS headers/footers, dates, versions
* Computing and checking checksums
* Analyzing syscall tables and consistently identifying syscall table entries
* (TODO) Comparing functions across OS versions to find changes
On the other hand, there are no call graph, cross-references, or function type
analysis (yet). I have plans for a simple abstract interpreter to bridge some
of the gap between pure disassembly and decompilation.
fxos runs on Linux and should build successfully on MacOS. If there are
compatibility issues with your favorite system, let me know.
fxos is not currently complete; it's definitely good enough for many practical
uses, but the overly broken analysis tools are not there yet. Hang on.
**Note**: The [fxdoc repository](/Lephenixnoir/fxdoc) is not up-to-date with
this version of fxos (yet).
## Building
@ -19,7 +33,7 @@ versions indicated are the ones I use, and clearly not the minimum
requirements.
* g++ (9.2.0)
* flex (2.6.4) and bison (3.5)
* flex (2.6.4)
* CMake (3.15) and make (eg. 4.2.1)
The only real configure option is the install path. CMake's default is
@ -33,110 +47,78 @@ The only real configure option is the install path. CMake's default is
## Setting up the library
fxos works with a library of files ranging from OS binaries to assembler
instruction tables to lists of named syscalls. These resources are usually
public for the most part, but some of the reverse-engineering results of the
community are kept private.
instruction tables to scripts. The library is formed of one or more folders
defined in the `FXOS_PATH` environment variable.
A set of base files for a working library can be found in the
[`base-library` folder](base-library) of this repository, which includes a
suitable configuration file (but not the actual OS files because Git would not
appreciate it). But unless you want to redo the research by yourself, I suggest
using shared community data from the [fxdoc repository](/Lephenixnoir/fxdoc).
Folders in the path serve two purposes:
* Any `fxosrc` file at the root of a folder is executed at startup.
* All paths are interpreted relative to the `FXOS_PATH`.
Next, fxos should be told where to find these files. A small configuration file
should be added at `$HOME/.config/fxos/config` to do this. The configuration
file specifies two types of information:
Unless you want to redo the research by yourself, I suggest using shared
community data from the [fxdoc repository](/Lephenixnoir/fxdoc). New folders
could be created easily on the same model; read the `fxosrc` script to see how
it is structured.
* Where are the library folders; this is used to resolve relative paths.
* Which folders in the library contain fxos data files.
**TODO**: fxdoc is still using an older version of fxos.
With the default library, the configuration file should look like this:
## Main concepts
fxos has a command-line interface *kind of* like rizin. Type `?` to get a list
of commands, and any command name followed by `?` to get help on a particular
command (eg. `vc?`).
The dot command `.` is used to run a script, which is a file with a series of
fxos commands. This is used at startup to run every `fxosrc` script found in
the `FXOS_PATH`.
**Notations**
* Identifiers/names are C identifiers but dots (`.`) are allowed.
* Usual decimal, hex (`0x`), binary (`0b`) values.
* Syscalls are identified with `%<hex>`, such as `%01e`.
* `$` is the current position in the selected virtual space.
* Commands accept arithmetic but only within parentheses; you can write
`e (1+2)` but not `e 1+2`.
* Ranges can be specified as `<start>:<length>` or `<start>..<end>`.
* Paths should use quotes: `"/os/fx/3.10/3.10.bin"`. Only identifiers/names can
be written without quotes in commands.
* Commands can be chained with `;`.
* Anything from a `#` to end of line is a comment.
**Virtual spaces**
A *virtual space* is an emulation of the calculator's virtual memory. Usually
there is one for each OS being studied. Each virtual space has a number of
*bindings*, which is a mapping from a virtual address to a file (usually a dump
of the calculator's ROM or RAM). Use `vl` to show the virtual spaces and their
bindings. The name of the current virtual space is shown in the prompt along
with the current position, for instance:
```
library: /path/to/base-library
load: /path/to/base-library/asmtables
load: /path/to/base-library/targets
load: /path/to/base-library/symbols
cg_3.60 @ 0x80000000>
```
This means that fxos data files will be automatically loaded at startup from
the `asmtables`, `targets` and `symbols` directories. Targets refer to OS files
and RAM dumps by path, and these paths will be interpreted relatively to the
`base-library` folder.
A new empty space can be created with `vc`, and then files can be mapped
manually with `vm`. File paths are interpreted relative to `FXOS_PATH` folders
even if they start with `/`. Alternatively, a new virtual space can be created
and initialized by running a script with the `vct` command.
## Working with fxos data files
Finally, the `vs` command is used to switch between different virtual spaces.
fxos data files are used to input documentation into fxos. There are currently
three types of data files:
**Symbols**
* Assembler decoding tables (`type: assembly`);
* Target descriptions (`type: target`);
* Symbol definitions to name registers and syscalls (`type: symbols`).
Each virtual space can have symbols defined, which are names associated to
either addresses or syscall numbers. `sa` will define a new symbol at an
explicit address, `ss` will define a new symbol at a syscall entry (which is
kept symbolic, ie. it will work across different OS versions) and `sl` lists
all symbols for the current virtual space.
They all consist of a short dictionary-like header ended with three dashes, and
a body whose syntax varies depending on the type of file. Here is the data file
`targets/fx@3.10.txt`:
## File formats
```
type: target
name: fx@3.10
---
Besides fxos scripts and the actual binary files being used, there is currently
only one other type of data file: assembly instruction listings. See
`asm/sh3.txt` for an explanation of the syntax; essentially each line has:
ROM: os/fx/3.10/3.10.bin
ROM_P2: os/fx/3.10/3.10.bin
RAM: os/fx/3.10/RAM.bin
RAM_P2: os/fx/3.10/RAM.bin
RS: os/fx/3.10/RS.bin
```
The header indicates the type (needed to select the proper parser to read the
body!) and the name of the target. The concept of target is detailed below.
This file references other files from the `os` folder of the library.
At startup, directories mentioned as `load:` in the configuration file are
traversed recursively and all files there are loaded as data files.
## Targets
A target is the system that you want to study. Usually, it's an OS file, but it
occurs at several places in memory (namely at the start of P1 and P2), and it
can use data in RAM and RS memory. A target keeps all these memory regions
together.
The header of a target must contain:
* `type: target`
* A value for the `name` property, which is used to refer to that target.
The body of target consists of a list of *bindings*, which are mappings of
files into areas of the virtual memory. The syntax to specify a binding is
`<region>: <file>`, where:
* The region can be a named region such as `ROM` or `RAM_P2`. The names and
definitions of defined memory regions can be found in
[`lib/memory.cpp`](lib/memory.cpp).
* The region can be `<address>(<size>)`, where both address and size are
specified in hexadecimal without prefix. For example, `fd800000(800)` is
equivalent to `RS`.
* The file path must be relative to one of the library directories.
An example is shown above.
The target can then be referred to by name on the command-line. For instance,
general information about version 3.10 of the fx-9860G III OS can be queried by
running `fxos info fx@3.10`.
## Assembly tables
Assembly tables describe the binary instruction set of the processor. It is
unlikely that they will need to be modified any time soon.
The header of an assembly table consists of:
* `type: assembly`
* Optionally, a name, used to track files in case an opcode conflict occurs
(when two instructions can be instantiated into the same 16-bit opcode).
The body is a list of instructions. Each line consists of:
* The opcode pattern, a 16-character string using `01nmdi`.
* A mnemonic.
* Zero, one or two arguments among a finite set.
@ -155,65 +137,20 @@ name: sh-4a-extensions
0000nnnn11000011 movca.l r0, @rn
```
Internally, fxos keeps a table with all 65k opcodes and fills it with instances
of instructions described in assembly tables.
Internally, fxos keeps a table with all 65536 opcodes and fills it with
instances of instructions described in assembly tables.
## Symbol tables
Symbol tables help keep things symbolic by giving names to objects that arise
during disassembly. Currently it tracks syscalls and raw addresses (typically
of peripheral modules).
The header of a symbol table consists of:
* `type: symbols`
* Optionally, a name for the table.
The body is a list of symbols described as `<source> <name>`, where:
* The source can be a raw hexadecimal address, for example `ff2f0004`.
* The source can be a syscall number, written in hexadecimal with a leading
percent sign, for example `%03b`.
* The name should be vaguely C-compliant. Dots are allowed.
Here is a mixed example with both syscalls and address.
(TODO) Disassembly listings are intended to be produced and maintained by fxos
while still being edited by hand. In order for this to work properly, manual
edits should only use `#`-comments, either at the start of a line or with a `#`
symbol followed by a space (to distinguish from constants like `#3`):
```
type: symbols
name: mixed-example
---
ff000020 TRA
ff000024 EXPEVT
ff000028 INTEVT
ff2f0004 EXPMASK
%42c Bfile_OpenFile_OS
%42d Bfile_CloseFile_OS
%42e Bfile_GetMediaFree_OS
%42f Bfile_GetFileSize_OS
# Set SR.BL = 1 (block interrupt) and SR.IMASK = 0x00*0 (error ?)
4143a: 04 02 stc sr,r4 # get SR register.
4143c: e5 10 mov #16,r5 # r5 = 0x00000010
```
## Command-line interface
The command-line interface (currently) has three commands, which are detailed
in the interactive help.
* The `library` command show the targets and assembly tables found in the
library, with minimal information. There is a lot of room to make it more
versatile.
* The `info` command shows a summary of an OS target. This includes versions,
checksums, and basic syscall autodetection.
* The `disasm` command is the main powerhouse of the tool. It disassembles
functions with smart function end detection, resolves references to jumps,
computes PC-relative loads, and identifies syscalls and peripheral
registers.
Some of the advertised interface is not yet implemented:
* The `analyze` command is conceived as a way to dig deep into a particular
object to understand what it is used for. An example would be: given a 32-bit
value, find all places in the code where it is loaded from memory, and match
these places with the known OS structure to see what kind of code uses it.
## Reporting issues and results
Any bug reports, issues and improvement suggestions are welcome. See the

View File

@ -63,6 +63,8 @@ static void disassemble(Session &session, Disassembly &disasm,
static uint32_t parse_d(Session &session, Parser &parser)
{
if(!session.current_space)
return 0;
uint32_t address = session.current_space->cursor;
if(!parser.at_end())

View File

@ -158,7 +158,7 @@ void _dot(Session &s, std::vector<std::string> const &files, bool absolute)
static std::string read_interactive(Session const &s, bool &leave)
{
std::string prompt = "(empty)> ";
std::string prompt = "no_vspace> ";
if(s.current_space) {
std::string name = "(none)";