casio_doc/fontcharacter/formats/BINARY1.md

# Binary format (Version 1)
**This format is a draft. It will be completed as the version 1.0 of the**
**reference is release.**

The binary file is divided into four zones:

- the overall header;
- the leading character pool;
- the character pool;
- the data pool.

Multi-byte integer fields are encoded as **big endian**.

## Overall header
The file starts with an overall header, describing the structure of the rest
of the file. As all of the files under any binary format representing
a FONTCHARACTER reference set, the file starts with:

- Magic string (8 bytes): "CASIOFC\x7F";
- Version byte (1 byte) -- in this version of the format, it is 0x01.

If the magic string is not verified, the file is either corrupted or of an
other file. If the version byte isn't verified, then the file uses a different
version from the current one, and you should return that the user needs an
upgraded version of your utility (because you'll keep updating it... right?),
or a more recent utility.

From there, the file is under this specific version of the format.
The overall header continues with the following fields:

- Number of majors (1 byte): this is the number of entries in the leading
  characters pool (second zone of the file);
- Number of characters (2 bytes): the total number of characters;
- Flags (1 byte): the flags:
  
  - 0x01: Unicode is enabled (see character entry);
  - 0x02: CAT tokens are enabled (see character entry);
  - 0x04: Newcat tokens are enabled (see character entry);
  - 0x08: CTF tokens are enabled (see character entry);
  - 0x10: Casemul tokens are enabled (see character entry);
- Picture height (1 byte): the picture width;
- Picture format (2 bytes): the picture format used to represent the
  characters, taken from libcasio's [picture.h][picture.h].
  Allowed formats are:

  - 0x0100: monochrome with fill bits;
- Reserved (4 bytes): should be zero;
- Checksum (4 bytes): basic checksum for the leading character pool,
  character pool, and data pool (if zero, do not check the checksum);
- File size (4 bytes): the file size;
- Data zone size (4 bytes): the data zone size.

The checksumming technique is simple: you add all of the data, byte per byte,
in a 32-bit variable. For example, the checksum of \[0xFF, 0x02, 0x03\] is
0x00000104. Overflow is allowed (0xFFFFFFFF + 2 = 0x00000001).

## Leading character pool
This pool provides quick access to the characters under a leading-character.
Each entry is made of the following:

- Leading character byte (1 byte), e.g. 0x00 or 0xE5;
- Reserved (1 byte), always zero;
- Starting entry ID in the character pool (2 bytes);

The offset is to be multiplied by the size of a character entry (which is
constant).

## Character pool
This pool provides the character entries. For quick access, each entry is
the same size, the variable data being stored in the data pool.
Each entry has the following format:

- Leading character (1 byte), e.g. 0x00 or 0xE5;
- Main character (1 byte), e.g. 0x45 for 0xE545;
- FONTCHARACTER sequence size (1 byte), 0 if not a FONTCHARACTER sequence;
- Unicode string size (1 byte), 0 if no unicode string;
- CAT token size (1 byte), 0 if no CAT token;
- Newcat token size (1 byte), 0 if no Newcat token;
- CTF token size (1 byte), 0 if no CTF token;
- Casemul token size (1 byte), 0 if no Casemul token;
- FONTCHARACTER sequence offset in data pool (4 bytes);
- (only if Unicode is enabled in the flags)
  Unicode string offset in data pool (4 bytes);
- (only if CAT is enabled in the flags)
  CAT token offset in data pool (4 bytes);
- (only if Newcat tokens are enabled in the flags)
  Newcat token offset in data pool (4 bytes);
- (only if CTF tokens are enabled in the flags)
  CTF token offset in data pool (4 bytes);
- (only if Casemul tokens are enabled in the flags)
  Casemul token offset in data pool (4 bytes);

The entry size is indeed different among all of the files, but constant for
one file, as the flags correspond to the overall header flags.

## Data pool
Raw data is stored here. To get the size of this zone, take the file size
in the overall header and remove the size of the three previous zones.
This size is duplicated in the header, and the correlation between the
calculated and given sizes should be checked.

Notice that bytes don't need to be in the same order than characters, and
can be indexed several times, which can lead to space optimizations.
For example, if a character points to [0x02, 0x03] and another
one points to [0x01, 0x02] (in any order), you can put [0x01, 0x02, 0x03] in
the data pool, then make the first character point to the offset + 1 of
this tab, and the second one point to the offset of this tab.
This system allows space optimizations to be done to this zone at build time,
although optimizing this depends on the [shortest superstring][superstr]
problem.

[picture.h]: https://github.com/PlaneteCasio/libcasio/blob/master/include/libcasio/picture.h
[superstr]: https://en.wikipedia.org/wiki/Shortest_common_supersequence_problem
Squashed 'fontcharacter/' content from commit 1ec490f git-subtree-dir: fontcharacter git-subtree-split: 1ec490fc8000522a1d0e89f7b6168209ce38b1e9 2018-10-01 18:05:44 +02:00			`# Binary format (Version 1)`
			`This format is a draft. It will be completed as the version 1.0 of the`
			`reference is release.`

			`The binary file is divided into four zones:`

			`- the overall header;`
			`- the leading character pool;`
			`- the character pool;`
			`- the data pool.`

			`Multi-byte integer fields are encoded as big endian.`

			`## Overall header`
			`The file starts with an overall header, describing the structure of the rest`
			`of the file. As all of the files under any binary format representing`
			`a FONTCHARACTER reference set, the file starts with:`

			`- Magic string (8 bytes): "CASIOFC\x7F";`
			`- Version byte (1 byte) -- in this version of the format, it is 0x01.`

			`If the magic string is not verified, the file is either corrupted or of an`
			`other file. If the version byte isn't verified, then the file uses a different`
			`version from the current one, and you should return that the user needs an`
			`upgraded version of your utility (because you'll keep updating it... right?),`
			`or a more recent utility.`

			`From there, the file is under this specific version of the format.`
			`The overall header continues with the following fields:`

			`- Number of majors (1 byte): this is the number of entries in the leading`
			`characters pool (second zone of the file);`
			`- Number of characters (2 bytes): the total number of characters;`
			`- Flags (1 byte): the flags:`

			`- 0x01: Unicode is enabled (see character entry);`
			`- 0x02: CAT tokens are enabled (see character entry);`
			`- 0x04: Newcat tokens are enabled (see character entry);`
			`- 0x08: CTF tokens are enabled (see character entry);`
			`- 0x10: Casemul tokens are enabled (see character entry);`
			`- Picture height (1 byte): the picture width;`
			`- Picture format (2 bytes): the picture format used to represent the`
			`characters, taken from libcasio's [picture.h][picture.h].`
			`Allowed formats are:`

			`- 0x0100: monochrome with fill bits;`
			`- Reserved (4 bytes): should be zero;`
			`- Checksum (4 bytes): basic checksum for the leading character pool,`
			`character pool, and data pool (if zero, do not check the checksum);`
			`- File size (4 bytes): the file size;`
			`- Data zone size (4 bytes): the data zone size.`

			`The checksumming technique is simple: you add all of the data, byte per byte,`
			`in a 32-bit variable. For example, the checksum of \[0xFF, 0x02, 0x03\] is`
			`0x00000104. Overflow is allowed (0xFFFFFFFF + 2 = 0x00000001).`

			`## Leading character pool`
			`This pool provides quick access to the characters under a leading-character.`
			`Each entry is made of the following:`

			`- Leading character byte (1 byte), e.g. 0x00 or 0xE5;`
			`- Reserved (1 byte), always zero;`
			`- Starting entry ID in the character pool (2 bytes);`

			`The offset is to be multiplied by the size of a character entry (which is`
			`constant).`

			`## Character pool`
			`This pool provides the character entries. For quick access, each entry is`
			`the same size, the variable data being stored in the data pool.`
			`Each entry has the following format:`

			`- Leading character (1 byte), e.g. 0x00 or 0xE5;`
			`- Main character (1 byte), e.g. 0x45 for 0xE545;`
			`- FONTCHARACTER sequence size (1 byte), 0 if not a FONTCHARACTER sequence;`
			`- Unicode string size (1 byte), 0 if no unicode string;`
			`- CAT token size (1 byte), 0 if no CAT token;`
			`- Newcat token size (1 byte), 0 if no Newcat token;`
			`- CTF token size (1 byte), 0 if no CTF token;`
			`- Casemul token size (1 byte), 0 if no Casemul token;`
			`- FONTCHARACTER sequence offset in data pool (4 bytes);`
			`- (only if Unicode is enabled in the flags)`
			`Unicode string offset in data pool (4 bytes);`
			`- (only if CAT is enabled in the flags)`
			`CAT token offset in data pool (4 bytes);`
			`- (only if Newcat tokens are enabled in the flags)`
			`Newcat token offset in data pool (4 bytes);`
			`- (only if CTF tokens are enabled in the flags)`
			`CTF token offset in data pool (4 bytes);`
			`- (only if Casemul tokens are enabled in the flags)`
			`Casemul token offset in data pool (4 bytes);`

			`The entry size is indeed different among all of the files, but constant for`
			`one file, as the flags correspond to the overall header flags.`

			`## Data pool`
			`Raw data is stored here. To get the size of this zone, take the file size`
			`in the overall header and remove the size of the three previous zones.`
			`This size is duplicated in the header, and the correlation between the`
			`calculated and given sizes should be checked.`

			`Notice that bytes don't need to be in the same order than characters, and`
			`can be indexed several times, which can lead to space optimizations.`
			`For example, if a character points to [0x02, 0x03] and another`
			`one points to [0x01, 0x02] (in any order), you can put [0x01, 0x02, 0x03] in`
			`the data pool, then make the first character point to the offset + 1 of`
			`this tab, and the second one point to the offset of this tab.`
			`This system allows space optimizations to be done to this zone at build time,`
			`although optimizing this depends on the [shortest superstring][superstr]`
			`problem.`

			`[picture.h]: https://github.com/PlaneteCasio/libcasio/blob/master/include/libcasio/picture.h`
			`[superstr]: https://en.wikipedia.org/wiki/Shortest_common_supersequence_problem`