Advanced OS disassembler and reverse-engineering tool.
Go to file
Lephenixnoir ca1217af1b
lib/load-asm: greatly improve loading time with less strings
2021-03-16 13:40:36 +01:00
base-library base-library: add minimal library example 2020-02-28 16:50:08 +01:00
fxos main: print malloc() stats after disassembly in verbose mode 2021-03-16 13:35:23 +01:00
include/fxos masive improvements to memory use by compacting core objects 2021-03-16 13:37:55 +01:00
lib lib/load-asm: greatly improve loading time with less strings 2021-03-16 13:40:36 +01:00
.gitignore initial system: instruction load, target creation 2019-12-14 22:33:57 +01:00
Makefile library: don't use the install folder as library anymore 2021-03-16 12:22:55 +01:00
README.md library: don't use the install folder as library anymore 2021-03-16 12:22:55 +01:00

README.md

fxos

fxos is an extended disassembler specifically used to reverse-engineer the OS, the bootcode, and syscalls. It used to be part of the fxSDK. If you have a use for fxos, then be sure to also check the Planète Casio bible, which gathers most of the reverse-engineering knowledge and research of the community.

fxos runs on Linux and should build successfully on MacOS. If there are compatibility issues with your favorite system, let me know.

fxos is not currently complete; it's definitely good enough for many practical uses, but the overly broken analysis tools are not there yet. Hang on.

Building

fxos is mainly standalone; to build, you will need the following tools. The versions indicated are the ones I use, and clearly not the minimum requirements.

  • g++ (9.2.0)
  • flex (2.6.4) and bison (3.5)
  • make (eg. 4.2.1)

The only configure option is the install path; it is specified on the command-line to make. By default the only installed file is the fxos binary, which goes to $PREFIX/bin. The default prefix is $HOME/.local.

% make
% make install
# or, for instance:
% make PREFIX=/usr
% make install PREFIX=/usr

Setting up the library

fxos works with a library of files ranging from OS binaries to assembler instruction tables to lists of named syscalls. These resources are usually public for the most part, but some of the reverse-engineering results of the community are kept private.

A set of base files for a working library can be found in the base-library folder of this repository, which includes a suitable configuration file (but not the actual OS files because Git would not appreciate it). But unless you want to redo the research by yourself, I suggest using shared community data from the fxdoc repository.

Next, fxos should be told where to find these files. A small configuration file should be added at $HOME/.config/fxos/config to do this. The configuration file specifies two types of information:

  • Where are the library folders; this is used to resolve relative paths.
  • Which folders in the library contain fxos data files.

With the default library, the configuration file should look like this:

library: /path/to/base-library
load: /path/to/base-library/asmtables
load: /path/to/base-library/targets
load: /path/to/base-library/symbols

This means that fxos data files will be automatically loaded at startup from the asmtables, targets and symbols directories. Targets refer to OS files and RAM dumps by path, and these paths will be interpreted relatively to the base-library folder.

Working with fxos data files

fxos data files are used to input documentation into fxos. There are currently three types of data files:

  • Assembler decoding tables (type: assembly);
  • Target descriptions (type: target);
  • Symbol definitions to name registers and syscalls (type: symbols).

They all consist of a short dictionary-like header ended with three dashes, and a body whose syntax varies depending on the type of file. Here is the data file targets/fx@3.10.txt:

type: target
name: fx@3.10
---

ROM: os/fx/3.10/3.10.bin
ROM_P2: os/fx/3.10/3.10.bin

RAM: os/fx/3.10/RAM.bin
RAM_P2: os/fx/3.10/RAM.bin

RS: os/fx/3.10/RS.bin

The header indicates the type (needed to select the proper parser to read the body!) and the name of the target. The concept of target is detailed below. This file references other files from the os folder of the library.

At startup, directories mentioned as load: in the configuration file are traversed recursively and all files there are loaded as data files.

Targets

A target is the system that you want to study. Usually, it's an OS file, but it occurs at several places in memory (namely at the start of P1 and P2), and it can use data in RAM and RS memory. A target keeps all these memory regions together.

The header of a target must contain:

  • type: target
  • A value for the name property, which is used to refer to that target.

The body of target consists of a list of bindings, which are mappings of files into areas of the virtual memory. The syntax to specify a binding is <region>: <file>, where:

  • The region can be a named region such as ROM or RAM_P2. The names and definitions of defined memory regions can be found in lib/memory.cpp.
  • The region can be <address>(<size>), where both address and size are specified in hexadecimal without prefix. For example, fd800000(800) is equivalent to RS.
  • The file path must be relative to one of the library directories.

An example is shown above.

The target can then be referred to by name on the command-line. For instance, general information about version 3.10 of the fx-9860G III OS can be queried by running fxos info fx@3.10.

Assembly tables

Assembly tables describe the binary instruction set of the processor. It is unlikely that they will need to be modified any time soon.

The header of an assembly table consists of:

  • type: assembly
  • Optionally, a name, used to track files in case an opcode conflict occurs (when two instructions can be instantiated into the same 16-bit opcode).

The body is a list of instructions. Each line consists of:

  • The opcode pattern, a 16-character string using 01nmdi.
  • A mnemonic.
  • Zero, one or two arguments among a finite set.

Here is an excerpt from the SH-4A extensions table.

type: assembly
name: sh-4a-extensions
---

0000nnnn01110011  movco.l r0, @rn
0000mmmm01100011  movli.l @rm, r0
0100mmmm10101001  movua.l @rm, r0
0100mmmm11101001  movua.l @rm+, r0
0000nnnn11000011  movca.l r0, @rn

Internally, fxos keeps a table with all 65k opcodes and fills it with instances of instructions described in assembly tables.

Symbol tables

Symbol tables help keep things symbolic by giving names to objects that arise during disassembly. Currently it tracks syscalls and raw addresses (typically of peripheral modules).

The header of a symbol table consists of:

  • type: symbols
  • Optionally, a name for the table.

The body is a list of symbols described as <source> <name>, where:

  • The source can be a raw hexadecimal address, for example ff2f0004.
  • The source can be a syscall number, written in hexadecimal with a leading percent sign, for example %03b.
  • The name should be vaguely C-compliant. Dots are allowed.

Here is a mixed example with both syscalls and address.

type: symbols
name: mixed-example
---

ff000020  TRA
ff000024  EXPEVT
ff000028  INTEVT
ff2f0004  EXPMASK

%42c  Bfile_OpenFile_OS
%42d  Bfile_CloseFile_OS
%42e  Bfile_GetMediaFree_OS
%42f  Bfile_GetFileSize_OS

Command-line interface

The command-line interface (currently) has three commands, which are detailed in the interactive help.

  • The library command show the targets and assembly tables found in the library, with minimal information. There is a lot of room to make it more versatile.
  • The info command shows a summary of an OS target. This includes versions, checksums, and basic syscall autodetection.
  • The disasm command is the main powerhouse of the tool. It disassembles functions with smart function end detection, resolves references to jumps, computes PC-relative loads, and identifies syscalls and peripheral registers.

Some of the advertised interface is not yet implemented:

  • The analyze command is conceived as a way to dig deep into a particular object to understand what it is used for. An example would be: given a 32-bit value, find all places in the code where it is loaded from memory, and match these places with the known OS structure to see what kind of code uses it.

Reporting issues and results

Any bug reports, issues and improvement suggestions are welcome. See the bug tracker.

If you have reverse-engineering results so share, the best place to do so is on the Planète Casio bible. Ping me or Breizh_craft on the Planète Casio shoutbox to have an SSH access set up for you.