casio_doc/fontcharacter/formats/SOURCE.md

# Source format
This format is yet to stabilize. If you just want to use the reference for
conversions between FONTCHARACTER and other character sets (which should
be managed by [libcasio][libcasio] anyway), check out the latest binary
format (`BINARYx.md`).

YAML has been chosen to store the information, as it's a storage format that
a machine and a human can read and write quite easily.

## Main file
`main.yml` is the file containing the main information about the source
reference. It only contains two fields for now:

- `version` is the version of the source reference (`0.1` corresponds to this
  version);
- `source` is the link to the FONTCHARACTER reference's source repository,
  managed through a VCS (Git, for that matter).

## Sets
A set is basically a pack of characters appeared at the same time on CASIO
calculators, or in an extension (alternative CASIO Basic
interpreters/compilers).

`sets.yml` is the sets file. For each set:

- the `description` field is the description of the set;
- if the `default` field is there, then it is the default set to use
  (generally the most recent set made by CASIO);
- if the `leading` field is there, the list of leading characters is in it,
  separated by commas;
- if the `parent` field is there, then the set inherits all of the characters
  of its parents, and, if the child has no `leading` field, its parent's
  leading characters.

## Categories
`categories.yml` is the categories file. Each category has an `id` field, which
is the identification string, an optional `prefix` field and an optional `sub`
list, which is the subcategories with each an `id` and a `prefix` fields.
To access the subcategory "Latin Capital" in the category "Letter", the
`category` field in the character information will have to be
`Letter/Latin Capital/Mini`. The name of the character will then be prefixed by
`Mini Latin Capital Letter ` (with the spaces between prefixes and an ending
space); the subcategory prefix goes first. If there is a suffix, a space then
it are appended to the character name, for example, ` Digit`.

There are some more fields -- see the _Embedded CASIO BASIC documentation_
section.

## Characters
There are two systems of characters on CASIO calculators: Simon Lothar calls
them the "characters" and the "opcodes". The "characters" are simple characters
with a display, and the "opcodes", which are defined by a set of characters
(e.g. "Locate "). The two are described in two different tables on the
calculator, but the two describe the same encoding, so that's why this
reference considers all "characters" and "opcodes" as characters ("opcodes"
are here called multi-characters).

`characters.yml` is the file containing data about the characters. For each
character, the `code` field is its `FONTCHARACTER` code, the `name` field is
the complete description of the character, the `flags` are the character flags
and the `category` field is the category(/subcategory) ID (see in the last
paragraph). If there is no category field, the category is "Other", with no
prefix.

Flags is a list of flag strings. Current flags are:

* `nl`: the character should be followed by a newline;
* `esc`: the character's CTF token is escaped with a reverse solidus;
* `sep`: the character is a Basic separator;
* `base`: only accessible in BASE programs.

Some characters have an ASCII token representation, mostly for the *cat*,
*newcat*, *ctf* and *casemul* formats. If the `tokens` field exists, then
it is a dictionary of the tokens in the different formats.  
- If the `cat` field of the dictionary doesn't exist, its value is deduced
  recursively using the `multi` field is there, or from the `unicode` field
  (if all-`ASCII`), and prefixed by a reverse solidus '\\';
- If the `newcat` field of the dictionary doesn't exist, it takes its
  value from the `cat` field;
- If the `ctf` field of the dictionary doesn't exist, it takes its value from
  the `cat` field if it was not deduced, otherwise, it is deduced the same way
  as the `cat` field, but it is not prefixed with a reverse solidus '\\';
- If the `casemul` field of the dictionary doesn't exist, it is deduced the
  same way than the `ctf` field;
- If the `ref` field of the dictionary doesn't exist, it takes the
  (first) value of the `ctf` field.

There can be multiple tokens for one format; in this case, the value of the
format field is a list.

It is possible to obtain an ASCII/HTML representation of most characters:  
- If tokens exist, take the `ref` token;
- Otherwise, if the `multi` field is specified, then the representation can be
  obtained recursively by querying this field's elements;
- Otherwise, no ASCII representation is available.

The `id` field is an identifier for the character, composed of letters,
numbers and underscores. It can be used for C defines.
If there is no `id` field, it is the value in the `ascii` field if it can
be deduced (or the `name` field if it can't), with hyphens turned into
underscores, and other non-valid characters removed (spaces, parenthesis, ...).

You have to distinguish multi-characters opcodes and simple opcodes.
Multi-character opcodes are characters that simply are a sequence of simple
characters. You can distinguish them from simple opcodes by checking the
presence of a `multi` field, which then is the `FONTCHARACTER` codes of the
characters in the sequence, separated with commas.

Multi-characters are distinguishable from simple characters by checking the
presence of a `multi` field. The `multi` field is the `FONTCHARACTER` codes of
the characters composing it, separated by commas. Be careful: there can be
only one character for the multi-character, and Yaml won't interpret this as
a string, but as a number directly!

If the character is simple, then if there is a unicode sequence equivalent of
the character, the Unicode codes of the sequences separated with commas will be
in the `unicode` field; otherwise, the field doesn't exist.

If the character data has a `set` field, then the character is in a set;
otherwise, it should be considered as part of the default set.

### Embedded CASIO BASIC documentation
Some characters will have the `type` field. This type means they have a special
meaning in CASIO Basic. There are two types: `function` and `object`. There is
an associated syntax, which is either `<name>(arg1, arg2)` or
`<name> arg1,arg2`, the first syntax is when `par` is `true` and the second one
is when it is `false`.
Note that for the first syntax, the ending parenthesis is not mandatory.

If `par` is `false` (or non-existent), then the `fix` field can be
set to `infix`, which means the function will be used with either
`arg1 <name>` or `arg1 <name> arg2`.

If the function/object should receive arguments, it can be documented using the
`args` field, and if it has, after these arguments, optional arguments, it can
be documented with the `optn` field. These fields receives a list of argument
strings. An argument type can be imposed by add-in `:<code>` at the end of the
argument string; for example, here are the `For` and `To` entries:

	-
	 code: 0xF704
	 name: For
	 category: Statement
	 args: ["to:0xF705"]
	 action: ...
	 multi: [0x46, 0x6F, 0x72, 0x20]
	-
	 code: 0xF705
	 name: To
	 category: Operator
	 args: ["assign:0x0E"]
	 optn: ["step:0xF706"]
	 action: ...
	 multi: [0x20, 0x54, 0x6F, 0x20]

If the function is supposed to make an action, this action can be documented
using the `action` field. If it is supposed to return something, it should can
be documented using the `return` field.

## Fonts
`fonts.yml` is the file containing the fonts information. For each font,
`id` is the ID string, `name` is the complete name, `author` is the complete
author name, `width` and `height` are the dimensions of each character in
the font.

For each font, there is a corresponding folder, named with the font ID.
This folder contains the characters images, organized by the leading multi-byte
character; if there is none, the file `0xXX.pbm` will be chosen, otherwise,
the file `0xLLXX.pbm` will be chosen, where `0xLL` is the leading character.
If the file doesn't exist, the character is to be considered as blank.

Each existing file is a set of 256 tiles of `width * height` each. Each row is
the tiles going from `0xR0` to `0xRF`, where `0xR` is the row number
(0x0 to 0xF).

[libcasio]: https://libcasio.planet-casio.com/
Squashed 'fontcharacter/' content from commit 1ec490f git-subtree-dir: fontcharacter git-subtree-split: 1ec490fc8000522a1d0e89f7b6168209ce38b1e9 2018-10-01 18:05:44 +02:00			`# Source format`
			`This format is yet to stabilize. If you just want to use the reference for`
			`conversions between FONTCHARACTER and other character sets (which should`
			`be managed by [libcasio][libcasio] anyway), check out the latest binary`
			format (`BINARYx.md`).

			`YAML has been chosen to store the information, as it's a storage format that`
			`a machine and a human can read and write quite easily.`

			`## Main file`
			`main.yml` is the file containing the main information about the source
			`reference. It only contains two fields for now:`

			- `version` is the version of the source reference (`0.1` corresponds to this
			`version);`
			- `source` is the link to the FONTCHARACTER reference's source repository,
			`managed through a VCS (Git, for that matter).`

			`## Sets`
			`A set is basically a pack of characters appeared at the same time on CASIO`
			`calculators, or in an extension (alternative CASIO Basic`
			`interpreters/compilers).`

			`sets.yml` is the sets file. For each set:

			- the `description` field is the description of the set;
			- if the `default` field is there, then it is the default set to use
			`(generally the most recent set made by CASIO);`
			- if the `leading` field is there, the list of leading characters is in it,
			`separated by commas;`
			- if the `parent` field is there, then the set inherits all of the characters
			of its parents, and, if the child has no `leading` field, its parent's
			`leading characters.`

			`## Categories`
			`categories.yml` is the categories file. Each category has an `id` field, which
			is the identification string, an optional `prefix` field and an optional `sub`
			list, which is the subcategories with each an `id` and a `prefix` fields.
			`To access the subcategory "Latin Capital" in the category "Letter", the`
			`category` field in the character information will have to be
			`Letter/Latin Capital/Mini`. The name of the character will then be prefixed by
			`Mini Latin Capital Letter ` (with the spaces between prefixes and an ending
			`space); the subcategory prefix goes first. If there is a suffix, a space then`
			it are appended to the character name, for example, ` Digit`.

			`There are some more fields -- see the _Embedded CASIO BASIC documentation_`
			`section.`

			`## Characters`
			`There are two systems of characters on CASIO calculators: Simon Lothar calls`
			`them the "characters" and the "opcodes". The "characters" are simple characters`
			`with a display, and the "opcodes", which are defined by a set of characters`
			`(e.g. "Locate "). The two are described in two different tables on the`
			`calculator, but the two describe the same encoding, so that's why this`
			`reference considers all "characters" and "opcodes" as characters ("opcodes"`
			`are here called multi-characters).`

			`characters.yml` is the file containing data about the characters. For each
			character, the `code` field is its `FONTCHARACTER` code, the `name` field is
			the complete description of the character, the `flags` are the character flags
			and the `category` field is the category(/subcategory) ID (see in the last
			`paragraph). If there is no category field, the category is "Other", with no`
			`prefix.`

			`Flags is a list of flag strings. Current flags are:`

			* `nl`: the character should be followed by a newline;
			* `esc`: the character's CTF token is escaped with a reverse solidus;
			* `sep`: the character is a Basic separator;
			* `base`: only accessible in BASE programs.

			`Some characters have an ASCII token representation, mostly for the cat,`
			newcat, ctf and casemul formats. If the `tokens` field exists, then
			`it is a dictionary of the tokens in the different formats.`
			- If the `cat` field of the dictionary doesn't exist, its value is deduced
			recursively using the `multi` field is there, or from the `unicode` field
			(if all-`ASCII`), and prefixed by a reverse solidus '\\';
			- If the `newcat` field of the dictionary doesn't exist, it takes its
			value from the `cat` field;
			- If the `ctf` field of the dictionary doesn't exist, it takes its value from
			the `cat` field if it was not deduced, otherwise, it is deduced the same way
			as the `cat` field, but it is not prefixed with a reverse solidus '\\';
			- If the `casemul` field of the dictionary doesn't exist, it is deduced the
			same way than the `ctf` field;
			- If the `ref` field of the dictionary doesn't exist, it takes the
			(first) value of the `ctf` field.

			`There can be multiple tokens for one format; in this case, the value of the`
			`format field is a list.`

			`It is possible to obtain an ASCII/HTML representation of most characters:`
			- If tokens exist, take the `ref` token;
			- Otherwise, if the `multi` field is specified, then the representation can be
			`obtained recursively by querying this field's elements;`
			`- Otherwise, no ASCII representation is available.`

			The `id` field is an identifier for the character, composed of letters,
			`numbers and underscores. It can be used for C defines.`
			If there is no `id` field, it is the value in the `ascii` field if it can
			be deduced (or the `name` field if it can't), with hyphens turned into
			`underscores, and other non-valid characters removed (spaces, parenthesis, ...).`

			`You have to distinguish multi-characters opcodes and simple opcodes.`
			`Multi-character opcodes are characters that simply are a sequence of simple`
			`characters. You can distinguish them from simple opcodes by checking the`
			presence of a `multi` field, which then is the `FONTCHARACTER` codes of the
			`characters in the sequence, separated with commas.`

			`Multi-characters are distinguishable from simple characters by checking the`
			presence of a `multi` field. The `multi` field is the `FONTCHARACTER` codes of
			`the characters composing it, separated by commas. Be careful: there can be`
			`only one character for the multi-character, and Yaml won't interpret this as`
			`a string, but as a number directly!`

			`If the character is simple, then if there is a unicode sequence equivalent of`
			`the character, the Unicode codes of the sequences separated with commas will be`
			in the `unicode` field; otherwise, the field doesn't exist.

			If the character data has a `set` field, then the character is in a set;
			`otherwise, it should be considered as part of the default set.`

			`### Embedded CASIO BASIC documentation`
			Some characters will have the `type` field. This type means they have a special
			meaning in CASIO Basic. There are two types: `function` and `object`. There is
			an associated syntax, which is either `<name>(arg1, arg2)` or
			`<name> arg1,arg2`, the first syntax is when `par` is `true` and the second one
			is when it is `false`.
			`Note that for the first syntax, the ending parenthesis is not mandatory.`

			If `par` is `false` (or non-existent), then the `fix` field can be
			set to `infix`, which means the function will be used with either
			`arg1 <name>` or `arg1 <name> arg2`.

			`If the function/object should receive arguments, it can be documented using the`
			`args` field, and if it has, after these arguments, optional arguments, it can
			be documented with the `optn` field. These fields receives a list of argument
			strings. An argument type can be imposed by add-in `:<code>` at the end of the
			argument string; for example, here are the `For` and `To` entries:

			`-`
			`code: 0xF704`
			`name: For`
			`category: Statement`
			`args: ["to:0xF705"]`
			`action: ...`
			`multi: [0x46, 0x6F, 0x72, 0x20]`
			`-`
			`code: 0xF705`
			`name: To`
			`category: Operator`
			`args: ["assign:0x0E"]`
			`optn: ["step:0xF706"]`
			`action: ...`
			`multi: [0x20, 0x54, 0x6F, 0x20]`

			`If the function is supposed to make an action, this action can be documented`
			using the `action` field. If it is supposed to return something, it should can
			be documented using the `return` field.

			`## Fonts`
			`fonts.yml` is the file containing the fonts information. For each font,
			`id` is the ID string, `name` is the complete name, `author` is the complete
			author name, `width` and `height` are the dimensions of each character in
			`the font.`

			`For each font, there is a corresponding folder, named with the font ID.`
			`This folder contains the characters images, organized by the leading multi-byte`
			character; if there is none, the file `0xXX.pbm` will be chosen, otherwise,
			the file `0xLLXX.pbm` will be chosen, where `0xLL` is the leading character.
			`If the file doesn't exist, the character is to be considered as blank.`

			Each existing file is a set of 256 tiles of `width * height` each. Each row is
			the tiles going from `0xR0` to `0xRF`, where `0xR` is the row number
			`(0x0 to 0xF).`

			`[libcasio]: https://libcasio.planet-casio.com/`