114 lines
4.9 KiB
Markdown
114 lines
4.9 KiB
Markdown
|
# Binary format (Version 1)
|
||
|
**This format is a draft. It will be completed as the version 1.0 of the**
|
||
|
**reference is release.**
|
||
|
|
||
|
The binary file is divided into four zones:
|
||
|
|
||
|
- the overall header;
|
||
|
- the leading character pool;
|
||
|
- the character pool;
|
||
|
- the data pool.
|
||
|
|
||
|
Multi-byte integer fields are encoded as **big endian**.
|
||
|
|
||
|
## Overall header
|
||
|
The file starts with an overall header, describing the structure of the rest
|
||
|
of the file. As all of the files under any binary format representing
|
||
|
a FONTCHARACTER reference set, the file starts with:
|
||
|
|
||
|
- Magic string (8 bytes): "CASIOFC\x7F";
|
||
|
- Version byte (1 byte) -- in this version of the format, it is 0x01.
|
||
|
|
||
|
If the magic string is not verified, the file is either corrupted or of an
|
||
|
other file. If the version byte isn't verified, then the file uses a different
|
||
|
version from the current one, and you should return that the user needs an
|
||
|
upgraded version of your utility (because you'll keep updating it... right?),
|
||
|
or a more recent utility.
|
||
|
|
||
|
From there, the file is under this specific version of the format.
|
||
|
The overall header continues with the following fields:
|
||
|
|
||
|
- Number of majors (1 byte): this is the number of entries in the leading
|
||
|
characters pool (second zone of the file);
|
||
|
- Number of characters (2 bytes): the total number of characters;
|
||
|
- Flags (1 byte): the flags:
|
||
|
|
||
|
- 0x01: Unicode is enabled (see character entry);
|
||
|
- 0x02: CAT tokens are enabled (see character entry);
|
||
|
- 0x04: Newcat tokens are enabled (see character entry);
|
||
|
- 0x08: CTF tokens are enabled (see character entry);
|
||
|
- 0x10: Casemul tokens are enabled (see character entry);
|
||
|
- Picture height (1 byte): the picture width;
|
||
|
- Picture format (2 bytes): the picture format used to represent the
|
||
|
characters, taken from libcasio's [picture.h][picture.h].
|
||
|
Allowed formats are:
|
||
|
|
||
|
- 0x0100: monochrome with fill bits;
|
||
|
- Reserved (4 bytes): should be zero;
|
||
|
- Checksum (4 bytes): basic checksum for the leading character pool,
|
||
|
character pool, and data pool (if zero, do not check the checksum);
|
||
|
- File size (4 bytes): the file size;
|
||
|
- Data zone size (4 bytes): the data zone size.
|
||
|
|
||
|
The checksumming technique is simple: you add all of the data, byte per byte,
|
||
|
in a 32-bit variable. For example, the checksum of \[0xFF, 0x02, 0x03\] is
|
||
|
0x00000104. Overflow is allowed (0xFFFFFFFF + 2 = 0x00000001).
|
||
|
|
||
|
## Leading character pool
|
||
|
This pool provides quick access to the characters under a leading-character.
|
||
|
Each entry is made of the following:
|
||
|
|
||
|
- Leading character byte (1 byte), e.g. 0x00 or 0xE5;
|
||
|
- Reserved (1 byte), always zero;
|
||
|
- Starting entry ID in the character pool (2 bytes);
|
||
|
|
||
|
The offset is to be multiplied by the size of a character entry (which is
|
||
|
constant).
|
||
|
|
||
|
## Character pool
|
||
|
This pool provides the character entries. For quick access, each entry is
|
||
|
the same size, the variable data being stored in the data pool.
|
||
|
Each entry has the following format:
|
||
|
|
||
|
- Leading character (1 byte), e.g. 0x00 or 0xE5;
|
||
|
- Main character (1 byte), e.g. 0x45 for 0xE545;
|
||
|
- FONTCHARACTER sequence size (1 byte), 0 if not a FONTCHARACTER sequence;
|
||
|
- Unicode string size (1 byte), 0 if no unicode string;
|
||
|
- CAT token size (1 byte), 0 if no CAT token;
|
||
|
- Newcat token size (1 byte), 0 if no Newcat token;
|
||
|
- CTF token size (1 byte), 0 if no CTF token;
|
||
|
- Casemul token size (1 byte), 0 if no Casemul token;
|
||
|
- FONTCHARACTER sequence offset in data pool (4 bytes);
|
||
|
- (only if Unicode is enabled in the flags)
|
||
|
Unicode string offset in data pool (4 bytes);
|
||
|
- (only if CAT is enabled in the flags)
|
||
|
CAT token offset in data pool (4 bytes);
|
||
|
- (only if Newcat tokens are enabled in the flags)
|
||
|
Newcat token offset in data pool (4 bytes);
|
||
|
- (only if CTF tokens are enabled in the flags)
|
||
|
CTF token offset in data pool (4 bytes);
|
||
|
- (only if Casemul tokens are enabled in the flags)
|
||
|
Casemul token offset in data pool (4 bytes);
|
||
|
|
||
|
The entry size is indeed different among all of the files, but constant for
|
||
|
one file, as the flags correspond to the overall header flags.
|
||
|
|
||
|
## Data pool
|
||
|
Raw data is stored here. To get the size of this zone, take the file size
|
||
|
in the overall header and remove the size of the three previous zones.
|
||
|
This size is duplicated in the header, and the correlation between the
|
||
|
calculated and given sizes should be checked.
|
||
|
|
||
|
Notice that bytes don't need to be in the same order than characters, and
|
||
|
can be indexed several times, which can lead to space optimizations.
|
||
|
For example, if a character points to [0x02, 0x03] and another
|
||
|
one points to [0x01, 0x02] (in any order), you can put [0x01, 0x02, 0x03] in
|
||
|
the data pool, then make the first character point to the offset + 1 of
|
||
|
this tab, and the second one point to the offset of this tab.
|
||
|
This system allows space optimizations to be done to this zone at build time,
|
||
|
although optimizing this depends on the [shortest superstring][superstr]
|
||
|
problem.
|
||
|
|
||
|
[picture.h]: https://github.com/PlaneteCasio/libcasio/blob/master/include/libcasio/picture.h
|
||
|
[superstr]: https://en.wikipedia.org/wiki/Shortest_common_supersequence_problem
|