casio_doc/fontcharacter/formats/SOURCE.md

177 lines
8.2 KiB
Markdown
Raw Normal View History

# Source format
This format is yet to stabilize. If you just want to use the reference for
conversions between FONTCHARACTER and other character sets (which should
be managed by [libcasio][libcasio] anyway), check out the latest binary
format (`BINARYx.md`).
YAML has been chosen to store the information, as it's a storage format that
a machine and a human can read and write quite easily.
## Main file
`main.yml` is the file containing the main information about the source
reference. It only contains two fields for now:
- `version` is the version of the source reference (`0.1` corresponds to this
version);
- `source` is the link to the FONTCHARACTER reference's source repository,
managed through a VCS (Git, for that matter).
## Sets
A set is basically a pack of characters appeared at the same time on CASIO
calculators, or in an extension (alternative CASIO Basic
interpreters/compilers).
`sets.yml` is the sets file. For each set:
- the `description` field is the description of the set;
- if the `default` field is there, then it is the default set to use
(generally the most recent set made by CASIO);
- if the `leading` field is there, the list of leading characters is in it,
separated by commas;
- if the `parent` field is there, then the set inherits all of the characters
of its parents, and, if the child has no `leading` field, its parent's
leading characters.
## Categories
`categories.yml` is the categories file. Each category has an `id` field, which
is the identification string, an optional `prefix` field and an optional `sub`
list, which is the subcategories with each an `id` and a `prefix` fields.
To access the subcategory "Latin Capital" in the category "Letter", the
`category` field in the character information will have to be
`Letter/Latin Capital/Mini`. The name of the character will then be prefixed by
`Mini Latin Capital Letter ` (with the spaces between prefixes and an ending
space); the subcategory prefix goes first. If there is a suffix, a space then
it are appended to the character name, for example, ` Digit`.
There are some more fields -- see the _Embedded CASIO BASIC documentation_
section.
## Characters
There are two systems of characters on CASIO calculators: Simon Lothar calls
them the "characters" and the "opcodes". The "characters" are simple characters
with a display, and the "opcodes", which are defined by a set of characters
(e.g. "Locate "). The two are described in two different tables on the
calculator, but the two describe the same encoding, so that's why this
reference considers all "characters" and "opcodes" as characters ("opcodes"
are here called multi-characters).
`characters.yml` is the file containing data about the characters. For each
character, the `code` field is its `FONTCHARACTER` code, the `name` field is
the complete description of the character, the `flags` are the character flags
and the `category` field is the category(/subcategory) ID (see in the last
paragraph). If there is no category field, the category is "Other", with no
prefix.
Flags is a list of flag strings. Current flags are:
* `nl`: the character should be followed by a newline;
* `esc`: the character's CTF token is escaped with a reverse solidus;
* `sep`: the character is a Basic separator;
* `base`: only accessible in BASE programs.
Some characters have an ASCII token representation, mostly for the *cat*,
*newcat*, *ctf* and *casemul* formats. If the `tokens` field exists, then
it is a dictionary of the tokens in the different formats.
- If the `cat` field of the dictionary doesn't exist, its value is deduced
recursively using the `multi` field is there, or from the `unicode` field
(if all-`ASCII`), and prefixed by a reverse solidus '\\';
- If the `newcat` field of the dictionary doesn't exist, it takes its
value from the `cat` field;
- If the `ctf` field of the dictionary doesn't exist, it takes its value from
the `cat` field if it was not deduced, otherwise, it is deduced the same way
as the `cat` field, but it is not prefixed with a reverse solidus '\\';
- If the `casemul` field of the dictionary doesn't exist, it is deduced the
same way than the `ctf` field;
- If the `ref` field of the dictionary doesn't exist, it takes the
(first) value of the `ctf` field.
There can be multiple tokens for one format; in this case, the value of the
format field is a list.
It is possible to obtain an ASCII/HTML representation of most characters:
- If tokens exist, take the `ref` token;
- Otherwise, if the `multi` field is specified, then the representation can be
obtained recursively by querying this field's elements;
- Otherwise, no ASCII representation is available.
The `id` field is an identifier for the character, composed of letters,
numbers and underscores. It can be used for C defines.
If there is no `id` field, it is the value in the `ascii` field if it can
be deduced (or the `name` field if it can't), with hyphens turned into
underscores, and other non-valid characters removed (spaces, parenthesis, ...).
You have to distinguish multi-characters opcodes and simple opcodes.
Multi-character opcodes are characters that simply are a sequence of simple
characters. You can distinguish them from simple opcodes by checking the
presence of a `multi` field, which then is the `FONTCHARACTER` codes of the
characters in the sequence, separated with commas.
Multi-characters are distinguishable from simple characters by checking the
presence of a `multi` field. The `multi` field is the `FONTCHARACTER` codes of
the characters composing it, separated by commas. Be careful: there can be
only one character for the multi-character, and Yaml won't interpret this as
a string, but as a number directly!
If the character is simple, then if there is a unicode sequence equivalent of
the character, the Unicode codes of the sequences separated with commas will be
in the `unicode` field; otherwise, the field doesn't exist.
If the character data has a `set` field, then the character is in a set;
otherwise, it should be considered as part of the default set.
### Embedded CASIO BASIC documentation
Some characters will have the `type` field. This type means they have a special
meaning in CASIO Basic. There are two types: `function` and `object`. There is
an associated syntax, which is either `<name>(arg1, arg2)` or
`<name> arg1,arg2`, the first syntax is when `par` is `true` and the second one
is when it is `false`.
Note that for the first syntax, the ending parenthesis is not mandatory.
If `par` is `false` (or non-existent), then the `fix` field can be
set to `infix`, which means the function will be used with either
`arg1 <name>` or `arg1 <name> arg2`.
If the function/object should receive arguments, it can be documented using the
`args` field, and if it has, after these arguments, optional arguments, it can
be documented with the `optn` field. These fields receives a list of argument
strings. An argument type can be imposed by add-in `:<code>` at the end of the
argument string; for example, here are the `For` and `To` entries:
-
code: 0xF704
name: For
category: Statement
args: ["to:0xF705"]
action: ...
multi: [0x46, 0x6F, 0x72, 0x20]
-
code: 0xF705
name: To
category: Operator
args: ["assign:0x0E"]
optn: ["step:0xF706"]
action: ...
multi: [0x20, 0x54, 0x6F, 0x20]
If the function is supposed to make an action, this action can be documented
using the `action` field. If it is supposed to return something, it should can
be documented using the `return` field.
## Fonts
`fonts.yml` is the file containing the fonts information. For each font,
`id` is the ID string, `name` is the complete name, `author` is the complete
author name, `width` and `height` are the dimensions of each character in
the font.
For each font, there is a corresponding folder, named with the font ID.
This folder contains the characters images, organized by the leading multi-byte
character; if there is none, the file `0xXX.pbm` will be chosen, otherwise,
the file `0xLLXX.pbm` will be chosen, where `0xLL` is the leading character.
If the file doesn't exist, the character is to be considered as blank.
Each existing file is a set of 256 tiles of `width * height` each. Each row is
the tiles going from `0xR0` to `0xRF`, where `0xR` is the row number
(0x0 to 0xF).
[libcasio]: https://libcasio.planet-casio.com/