From 54721cac93e6b5a47029bce57c714b990e300341 Mon Sep 17 00:00:00 2001 From: Lephenixnoir Date: Sun, 6 Mar 2022 23:39:44 +0000 Subject: [PATCH] README for the shell interface --- README.md | 233 ++++++++++++++++++------------------------------- shell/d.cpp | 2 + shell/main.cpp | 2 +- 3 files changed, 88 insertions(+), 149 deletions(-) diff --git a/README.md b/README.md index fad1a68..c3a174a 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,30 @@ # fxos fxos is an extended disassembler specifically used to reverse-engineer the OS, -the bootcode, and syscalls. It used to be part of the +bootcode, and syscalls of CASIO fx and fx-CG series. It used to be part of the [fxSDK](/Lephenixnoir/fxsdk). If you have a use for fxos, then be sure to also check the [Planète Casio bible](https://bible.planet-casio.com/), which gathers most of the reverse-engineering knowledge and research of the community. +If you're familiar with IDA, Ghidra, or other industry-grade +reverse-engineering tools, then fxos won't be able to complete. This is more of +a scripting playground with very OS-centric features for me. Some of the things +it can do that usual tools might not do directly include: + +* Finding OS-specific data like bootcode/OS headers/footers, dates, versions +* Computing and checking checksums +* Analyzing syscall tables and consistently identifying syscall table entries +* (TODO) Comparing functions across OS versions to find changes + +On the other hand, there are no call graph, cross-references, or function type +analysis (yet). I have plans for a simple abstract interpreter to bridge some +of the gap between pure disassembly and decompilation. + fxos runs on Linux and should build successfully on MacOS. If there are compatibility issues with your favorite system, let me know. -fxos is not currently complete; it's definitely good enough for many practical -uses, but the overly broken analysis tools are not there yet. Hang on. +**Note**: The [fxdoc repository](/Lephenixnoir/fxdoc) is not up-to-date with +this version of fxos (yet). ## Building @@ -19,7 +33,7 @@ versions indicated are the ones I use, and clearly not the minimum requirements. * g++ (9.2.0) -* flex (2.6.4) and bison (3.5) +* flex (2.6.4) * CMake (3.15) and make (eg. 4.2.1) The only real configure option is the install path. CMake's default is @@ -33,110 +47,78 @@ The only real configure option is the install path. CMake's default is ## Setting up the library fxos works with a library of files ranging from OS binaries to assembler -instruction tables to lists of named syscalls. These resources are usually -public for the most part, but some of the reverse-engineering results of the -community are kept private. +instruction tables to scripts. The library is formed of one or more folders +defined in the `FXOS_PATH` environment variable. -A set of base files for a working library can be found in the -[`base-library` folder](base-library) of this repository, which includes a -suitable configuration file (but not the actual OS files because Git would not -appreciate it). But unless you want to redo the research by yourself, I suggest -using shared community data from the [fxdoc repository](/Lephenixnoir/fxdoc). +Folders in the path serve two purposes: +* Any `fxosrc` file at the root of a folder is executed at startup. +* All paths are interpreted relative to the `FXOS_PATH`. -Next, fxos should be told where to find these files. A small configuration file -should be added at `$HOME/.config/fxos/config` to do this. The configuration -file specifies two types of information: +Unless you want to redo the research by yourself, I suggest using shared +community data from the [fxdoc repository](/Lephenixnoir/fxdoc). New folders +could be created easily on the same model; read the `fxosrc` script to see how +it is structured. -* Where are the library folders; this is used to resolve relative paths. -* Which folders in the library contain fxos data files. +**TODO**: fxdoc is still using an older version of fxos. -With the default library, the configuration file should look like this: +## Main concepts + +fxos has a command-line interface *kind of* like rizin. Type `?` to get a list +of commands, and any command name followed by `?` to get help on a particular +command (eg. `vc?`). + +The dot command `.` is used to run a script, which is a file with a series of +fxos commands. This is used at startup to run every `fxosrc` script found in +the `FXOS_PATH`. + +**Notations** + +* Identifiers/names are C identifiers but dots (`.`) are allowed. +* Usual decimal, hex (`0x`), binary (`0b`) values. +* Syscalls are identified with `%`, such as `%01e`. +* `$` is the current position in the selected virtual space. +* Commands accept arithmetic but only within parentheses; you can write + `e (1+2)` but not `e 1+2`. +* Ranges can be specified as `:` or `..`. +* Paths should use quotes: `"/os/fx/3.10/3.10.bin"`. Only identifiers/names can + be written without quotes in commands. +* Commands can be chained with `;`. +* Anything from a `#` to end of line is a comment. + +**Virtual spaces** + +A *virtual space* is an emulation of the calculator's virtual memory. Usually +there is one for each OS being studied. Each virtual space has a number of +*bindings*, which is a mapping from a virtual address to a file (usually a dump +of the calculator's ROM or RAM). Use `vl` to show the virtual spaces and their +bindings. The name of the current virtual space is shown in the prompt along +with the current position, for instance: ``` -library: /path/to/base-library -load: /path/to/base-library/asmtables -load: /path/to/base-library/targets -load: /path/to/base-library/symbols +cg_3.60 @ 0x80000000> ``` -This means that fxos data files will be automatically loaded at startup from -the `asmtables`, `targets` and `symbols` directories. Targets refer to OS files -and RAM dumps by path, and these paths will be interpreted relatively to the -`base-library` folder. +A new empty space can be created with `vc`, and then files can be mapped +manually with `vm`. File paths are interpreted relative to `FXOS_PATH` folders +even if they start with `/`. Alternatively, a new virtual space can be created +and initialized by running a script with the `vct` command. -## Working with fxos data files +Finally, the `vs` command is used to switch between different virtual spaces. -fxos data files are used to input documentation into fxos. There are currently -three types of data files: +**Symbols** -* Assembler decoding tables (`type: assembly`); -* Target descriptions (`type: target`); -* Symbol definitions to name registers and syscalls (`type: symbols`). +Each virtual space can have symbols defined, which are names associated to +either addresses or syscall numbers. `sa` will define a new symbol at an +explicit address, `ss` will define a new symbol at a syscall entry (which is +kept symbolic, ie. it will work across different OS versions) and `sl` lists +all symbols for the current virtual space. -They all consist of a short dictionary-like header ended with three dashes, and -a body whose syntax varies depending on the type of file. Here is the data file -`targets/fx@3.10.txt`: +## File formats -``` -type: target -name: fx@3.10 ---- +Besides fxos scripts and the actual binary files being used, there is currently +only one other type of data file: assembly instruction listings. See +`asm/sh3.txt` for an explanation of the syntax; essentially each line has: -ROM: os/fx/3.10/3.10.bin -ROM_P2: os/fx/3.10/3.10.bin - -RAM: os/fx/3.10/RAM.bin -RAM_P2: os/fx/3.10/RAM.bin - -RS: os/fx/3.10/RS.bin -``` - -The header indicates the type (needed to select the proper parser to read the -body!) and the name of the target. The concept of target is detailed below. -This file references other files from the `os` folder of the library. - -At startup, directories mentioned as `load:` in the configuration file are -traversed recursively and all files there are loaded as data files. - -## Targets - -A target is the system that you want to study. Usually, it's an OS file, but it -occurs at several places in memory (namely at the start of P1 and P2), and it -can use data in RAM and RS memory. A target keeps all these memory regions -together. - -The header of a target must contain: -* `type: target` -* A value for the `name` property, which is used to refer to that target. - -The body of target consists of a list of *bindings*, which are mappings of -files into areas of the virtual memory. The syntax to specify a binding is -`: `, where: -* The region can be a named region such as `ROM` or `RAM_P2`. The names and - definitions of defined memory regions can be found in - [`lib/memory.cpp`](lib/memory.cpp). -* The region can be `
()`, where both address and size are - specified in hexadecimal without prefix. For example, `fd800000(800)` is - equivalent to `RS`. -* The file path must be relative to one of the library directories. - -An example is shown above. - -The target can then be referred to by name on the command-line. For instance, -general information about version 3.10 of the fx-9860G III OS can be queried by -running `fxos info fx@3.10`. - -## Assembly tables - -Assembly tables describe the binary instruction set of the processor. It is -unlikely that they will need to be modified any time soon. - -The header of an assembly table consists of: -* `type: assembly` -* Optionally, a name, used to track files in case an opcode conflict occurs - (when two instructions can be instantiated into the same 16-bit opcode). - -The body is a list of instructions. Each line consists of: * The opcode pattern, a 16-character string using `01nmdi`. * A mnemonic. * Zero, one or two arguments among a finite set. @@ -155,65 +137,20 @@ name: sh-4a-extensions 0000nnnn11000011 movca.l r0, @rn ``` -Internally, fxos keeps a table with all 65k opcodes and fills it with instances -of instructions described in assembly tables. +Internally, fxos keeps a table with all 65536 opcodes and fills it with +instances of instructions described in assembly tables. -## Symbol tables - -Symbol tables help keep things symbolic by giving names to objects that arise -during disassembly. Currently it tracks syscalls and raw addresses (typically -of peripheral modules). - -The header of a symbol table consists of: -* `type: symbols` -* Optionally, a name for the table. - -The body is a list of symbols described as ` `, where: -* The source can be a raw hexadecimal address, for example `ff2f0004`. -* The source can be a syscall number, written in hexadecimal with a leading - percent sign, for example `%03b`. -* The name should be vaguely C-compliant. Dots are allowed. - -Here is a mixed example with both syscalls and address. +(TODO) Disassembly listings are intended to be produced and maintained by fxos +while still being edited by hand. In order for this to work properly, manual +edits should only use `#`-comments, either at the start of a line or with a `#` +symbol followed by a space (to distinguish from constants like `#3`): ``` -type: symbols -name: mixed-example ---- - -ff000020 TRA -ff000024 EXPEVT -ff000028 INTEVT -ff2f0004 EXPMASK - -%42c Bfile_OpenFile_OS -%42d Bfile_CloseFile_OS -%42e Bfile_GetMediaFree_OS -%42f Bfile_GetFileSize_OS +# Set SR.BL = 1 (block interrupt) and SR.IMASK = 0x00*0 (error ?) + 4143a: 04 02 stc sr,r4 # get SR register. + 4143c: e5 10 mov #16,r5 # r5 = 0x00000010 ``` -## Command-line interface - -The command-line interface (currently) has three commands, which are detailed -in the interactive help. - -* The `library` command show the targets and assembly tables found in the - library, with minimal information. There is a lot of room to make it more - versatile. -* The `info` command shows a summary of an OS target. This includes versions, - checksums, and basic syscall autodetection. -* The `disasm` command is the main powerhouse of the tool. It disassembles - functions with smart function end detection, resolves references to jumps, - computes PC-relative loads, and identifies syscalls and peripheral - registers. - -Some of the advertised interface is not yet implemented: - -* The `analyze` command is conceived as a way to dig deep into a particular - object to understand what it is used for. An example would be: given a 32-bit - value, find all places in the code where it is loaded from memory, and match - these places with the known OS structure to see what kind of code uses it. - ## Reporting issues and results Any bug reports, issues and improvement suggestions are welcome. See the diff --git a/shell/d.cpp b/shell/d.cpp index c142af0..b1728fd 100644 --- a/shell/d.cpp +++ b/shell/d.cpp @@ -63,6 +63,8 @@ static void disassemble(Session &session, Disassembly &disasm, static uint32_t parse_d(Session &session, Parser &parser) { + if(!session.current_space) + return 0; uint32_t address = session.current_space->cursor; if(!parser.at_end()) diff --git a/shell/main.cpp b/shell/main.cpp index 3d52751..41f75c5 100644 --- a/shell/main.cpp +++ b/shell/main.cpp @@ -158,7 +158,7 @@ void _dot(Session &s, std::vector const &files, bool absolute) static std::string read_interactive(Session const &s, bool &leave) { - std::string prompt = "(empty)> "; + std::string prompt = "no_vspace> "; if(s.current_space) { std::string name = "(none)";