From b8faddce5b99a47e94a8c5e2d2caae0a54fd8ac8 Mon Sep 17 00:00:00 2001 From: Lephenixnoir Date: Sun, 16 Feb 2020 00:22:05 +0100 Subject: [PATCH] add a detailed README --- Makefile | 1 - README.md | 230 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 230 insertions(+), 1 deletion(-) create mode 100644 README.md diff --git a/Makefile b/Makefile index 4dc68b7..cf5984c 100644 --- a/Makefile +++ b/Makefile @@ -105,7 +105,6 @@ install: $(TARGETS) uninstall: rm -f $(TARGETS:%=$(PREFIX)/%) - rm -rf $(PREFIX)/share/fxos # # Cleaning diff --git a/README.md b/README.md new file mode 100644 index 0000000..0b45352 --- /dev/null +++ b/README.md @@ -0,0 +1,230 @@ +# fxos + +fxos is an extended disassembler specifically used to reverse-engineer the OS, +the bootcode, and syscalls. It used to be part of the +[fxSDK](/Lephenixnoir/fxsdk). If you have a use for fxos, then be sure to also +check the [Planète Casio bible](https://bible.planet-casio.com/), which gathers +most of the reverse-engineering knowledge and research of the community. + +fxos runs on Linux and should build successfully on MacOS. If there are +compatibility issues with your favorite system, let me know. + +fxos is not currently complete; it's definitely good enough for many practical +uses, but the overly broken analysis tools are not there yet. Hang on. + +## Building + +fxos is mainly standalone; to build, you will need the following tools. The +versions indicated are the ones I use, and clearly not the minimum +requirements. + +* g++ (9.2.0) +* flex (2.6.4) and bison (3.5) +* make (eg. 4.2.1) + +The only configure option is the install path; it is specified on the +command-line to make. By default the only installed file is the fxos binary, +which goes to `$PREFIX/bin`. The default prefix is `$HOME/.local`. + +```sh +% make +% make install +# or, for instance: +% make PREFIX=/usr +% make install PREFIX=/usr +``` + +## Setting up the library + +fxos works with a library of files ranging from OS binaries to assembler +instruction tables to lists of named syscalls. These resources are usually +public for the most part, but some of the reverse-engineering results of the +community are kept private. + +A set of base files for a working library can be found [on my section of the +Planète Casio bible](https://bible.planet-casio.com/lephenixnoir/fxos-library/). +You can use your own files, but you probably want the assembler tables anyway. + +Next, fxos should be told where to find these files. A small configuration file +should be added at `$HOME/.config/fxos/config` to do this. The configuration +file specifies two types of information: + +* Where are the library folders; this is used to resolve relative paths. +* Which folders in the library contain fxos data files. + +With the default library, the configuration file should look like this: + +``` +library: /path/to/fxos-library +load: /path/to/fxos-library/asm +load: /path/to/fxos-library/targets +load: /path/to/fxos-library/symbols +``` + +This means that fxos data files will be automatically loaded at startup from +the `asm`, `targets` and `symbols` directories. Targets refer to OS files and +RAM dumps by path, and these paths will be interpreted relatively to the +`fxos-library` folder. If you create `$PREFIX/share/fxos`, it will also be used +as if mentioned on a `library:` line. + +## Working with fxos data files + +fxos data files are used to input documentation into fxos. There are currently +three types of data files: + +* Assembler decoding tables (`type: assembly`); +* Target descriptions (`type: target`); +* Symbol definitions to name registers and syscalls (`type: symbols`). + +They all consist of a short dictionary-like header ended with three dashes, and +a body whose syntax varies depending on the type of file. Here is the data file +`targets/fx@3.10.txt`: + +``` +type: target +name: fx@3.10 +--- + +ROM: os/fx/3.10/3.10.bin +ROM_P2: os/fx/3.10/3.10.bin + +RAM: os/fx/3.10/RAM.bin +RAM_P2: os/fx/3.10/RAM.bin + +RS: os/fx/3.10/RS.bin +``` + +The header indicates the type (needed to select the proper parser to read the +body!) and the name of the target. The concept of target is detailed below. +This file references other files from the `os` folder of the library. + +At startup, directories mentioned as `load:` in the configuration file are +traversed recursively and all files there are loaded as data files. + +## Targets + +A target is the system that you want to study. Usually, it's an OS file, but it +occurs at several places in memory (namely at the start of P1 and P2), and it +can use data in RAM and RS memory. A target keeps all these memory regions +together. + +The header of a target must contain: +* `type: target` +* A value for the `name` property, which is used to refer to that target. + +The body of target consists of a list of *bindings*, which are mappings of +files into areas of the virtual memory. The syntax to specify a binding is +`: `, where: +* The region can be a named region such as `ROM` or `RAM_P2`. The name and + definitions of the available memory regions can be found in + [`lib/memory.cpp`](lib/memory.cpp). +* The region can be `
()`, where both address and size are + specified in hexadecimal without prefix. For example, `fd800000(800)` is + equivalent to `RS`. +* The file path must be relative to one of the library directories. + +An example is shown above. + +The target can then be referred to by name on the command-line. For instance, +general information about version 3.10 of the fx-9860G III OS can be queried by +running `fxos info fx@3.10`. + +## Assembly tables + +Assembly tables describe the binary instruction set of the processor. It is +unlikely that they will need to be modified any time soon. + +The header of an assembly table consists of: +* `type: assembly` +* Optionally, a name, used to track files in case an opcode conflict occurs + (when two instructions can be instantiated into the same 16-bit opcode). + +The body is a list of instructions. Each line consists of: +* The opcode pattern, a 16-character string using `01nmdi`. +* A mnemonic. +* Zero, one or two arguments among a finite set. + +Here is an excerpt from the SH-4A extensions table. + +``` +type: assembly +name: sh-4a-extensions +--- + +0000nnnn01110011 movco.l r0, @rn +0000mmmm01100011 movli.l @rm, r0 +0100mmmm10101001 movua.l @rm, r0 +0100mmmm11101001 movua.l @rm+, r0 +0000nnnn11000011 movca.l r0, @rn +``` + +Internally, fxos keeps a table with all 65k opcodes and fills it with instances +of instructions described in assembly tables. + +## Symbol tables + +Symbol tables help keep things symbolic by giving names to objects that arise +during disassembly. Currently it tracks syscalls and raw addresses (typically +of peripheral modules). + +The header of a symbol table consists of: +* `type: symbols` +* Optionally, a name for the table. + +The body is a list of symbols described as ` `, where: +* The source can be a raw hexadecimal address, for example `ff2f0004`. +* The source can be a syscall number, written in hexadecimal with a leading + percent sign, for example `%03b`. +* The name should be vaguely C-compliant. Dots are allowed. + +Here is a mixed example with both syscalls and address. + +``` +type: symbols +name: mixed-example +--- + +ff000020 TRA +ff000024 EXPEVT +ff000028 INTEVT +ff2f0004 EXPMASK + +%42c Bfile_OpenFile_OS +%42d Bfile_CloseFile_OS +%42e Bfile_GetMediaFree_OS +%42f Bfile_GetFileSize_OS +``` + +## Command-line interface + +The command-line interface (currently) has three commands, which are detailed +in the interactive help. + +* The `library` command show the targets and assembly tables found in the + library, with minimal information. There is a lot of room to make it more + versatile. +* The `info` command shows a summary of an OS target. This includes versions, + checksums, and basic syscall autodetection. +* The `disasm` command is the main powerhouse of the tool. It disassembles + functions with smart function end detection, resolves references to jumps, + computes PC-relative loads, and identifies syscalls and peripheral + registers. + +Some of the advertised interface is not yet implemented: + +* The `analyze` command is conceived as a way to dig deep into a particular + object to understand what it is used for. An example would be: given a 32-bit + value, find all places in the code where it is loaded from memory, and match + these places with the known OS structure to see what kind of code uses it. +* The location specified `
:` is not supported right now, though I + don't know how long I'll last without it. + +## Reporting issues and results + +Any bug reports, issues and improvement suggestions are welcome. See the +[bug tracker](/Lephenixnoir/fxos/issues). + +If you have reverse-engineering results so share, the best place to do so is on +the [Planète Casio bible](https://bible.planet-casio.com). Ping me or +Breizh_craft on the Planète Casio shoutbox to have an SSH access set up for +you.