177 lines
8.2 KiB
Markdown
177 lines
8.2 KiB
Markdown
|
# Source format
|
||
|
This format is yet to stabilize. If you just want to use the reference for
|
||
|
conversions between FONTCHARACTER and other character sets (which should
|
||
|
be managed by [libcasio][libcasio] anyway), check out the latest binary
|
||
|
format (`BINARYx.md`).
|
||
|
|
||
|
YAML has been chosen to store the information, as it's a storage format that
|
||
|
a machine and a human can read and write quite easily.
|
||
|
|
||
|
## Main file
|
||
|
`main.yml` is the file containing the main information about the source
|
||
|
reference. It only contains two fields for now:
|
||
|
|
||
|
- `version` is the version of the source reference (`0.1` corresponds to this
|
||
|
version);
|
||
|
- `source` is the link to the FONTCHARACTER reference's source repository,
|
||
|
managed through a VCS (Git, for that matter).
|
||
|
|
||
|
## Sets
|
||
|
A set is basically a pack of characters appeared at the same time on CASIO
|
||
|
calculators, or in an extension (alternative CASIO Basic
|
||
|
interpreters/compilers).
|
||
|
|
||
|
`sets.yml` is the sets file. For each set:
|
||
|
|
||
|
- the `description` field is the description of the set;
|
||
|
- if the `default` field is there, then it is the default set to use
|
||
|
(generally the most recent set made by CASIO);
|
||
|
- if the `leading` field is there, the list of leading characters is in it,
|
||
|
separated by commas;
|
||
|
- if the `parent` field is there, then the set inherits all of the characters
|
||
|
of its parents, and, if the child has no `leading` field, its parent's
|
||
|
leading characters.
|
||
|
|
||
|
## Categories
|
||
|
`categories.yml` is the categories file. Each category has an `id` field, which
|
||
|
is the identification string, an optional `prefix` field and an optional `sub`
|
||
|
list, which is the subcategories with each an `id` and a `prefix` fields.
|
||
|
To access the subcategory "Latin Capital" in the category "Letter", the
|
||
|
`category` field in the character information will have to be
|
||
|
`Letter/Latin Capital/Mini`. The name of the character will then be prefixed by
|
||
|
`Mini Latin Capital Letter ` (with the spaces between prefixes and an ending
|
||
|
space); the subcategory prefix goes first. If there is a suffix, a space then
|
||
|
it are appended to the character name, for example, ` Digit`.
|
||
|
|
||
|
There are some more fields -- see the _Embedded CASIO BASIC documentation_
|
||
|
section.
|
||
|
|
||
|
## Characters
|
||
|
There are two systems of characters on CASIO calculators: Simon Lothar calls
|
||
|
them the "characters" and the "opcodes". The "characters" are simple characters
|
||
|
with a display, and the "opcodes", which are defined by a set of characters
|
||
|
(e.g. "Locate "). The two are described in two different tables on the
|
||
|
calculator, but the two describe the same encoding, so that's why this
|
||
|
reference considers all "characters" and "opcodes" as characters ("opcodes"
|
||
|
are here called multi-characters).
|
||
|
|
||
|
`characters.yml` is the file containing data about the characters. For each
|
||
|
character, the `code` field is its `FONTCHARACTER` code, the `name` field is
|
||
|
the complete description of the character, the `flags` are the character flags
|
||
|
and the `category` field is the category(/subcategory) ID (see in the last
|
||
|
paragraph). If there is no category field, the category is "Other", with no
|
||
|
prefix.
|
||
|
|
||
|
Flags is a list of flag strings. Current flags are:
|
||
|
|
||
|
* `nl`: the character should be followed by a newline;
|
||
|
* `esc`: the character's CTF token is escaped with a reverse solidus;
|
||
|
* `sep`: the character is a Basic separator;
|
||
|
* `base`: only accessible in BASE programs.
|
||
|
|
||
|
Some characters have an ASCII token representation, mostly for the *cat*,
|
||
|
*newcat*, *ctf* and *casemul* formats. If the `tokens` field exists, then
|
||
|
it is a dictionary of the tokens in the different formats.
|
||
|
- If the `cat` field of the dictionary doesn't exist, its value is deduced
|
||
|
recursively using the `multi` field is there, or from the `unicode` field
|
||
|
(if all-`ASCII`), and prefixed by a reverse solidus '\\';
|
||
|
- If the `newcat` field of the dictionary doesn't exist, it takes its
|
||
|
value from the `cat` field;
|
||
|
- If the `ctf` field of the dictionary doesn't exist, it takes its value from
|
||
|
the `cat` field if it was not deduced, otherwise, it is deduced the same way
|
||
|
as the `cat` field, but it is not prefixed with a reverse solidus '\\';
|
||
|
- If the `casemul` field of the dictionary doesn't exist, it is deduced the
|
||
|
same way than the `ctf` field;
|
||
|
- If the `ref` field of the dictionary doesn't exist, it takes the
|
||
|
(first) value of the `ctf` field.
|
||
|
|
||
|
There can be multiple tokens for one format; in this case, the value of the
|
||
|
format field is a list.
|
||
|
|
||
|
It is possible to obtain an ASCII/HTML representation of most characters:
|
||
|
- If tokens exist, take the `ref` token;
|
||
|
- Otherwise, if the `multi` field is specified, then the representation can be
|
||
|
obtained recursively by querying this field's elements;
|
||
|
- Otherwise, no ASCII representation is available.
|
||
|
|
||
|
The `id` field is an identifier for the character, composed of letters,
|
||
|
numbers and underscores. It can be used for C defines.
|
||
|
If there is no `id` field, it is the value in the `ascii` field if it can
|
||
|
be deduced (or the `name` field if it can't), with hyphens turned into
|
||
|
underscores, and other non-valid characters removed (spaces, parenthesis, ...).
|
||
|
|
||
|
You have to distinguish multi-characters opcodes and simple opcodes.
|
||
|
Multi-character opcodes are characters that simply are a sequence of simple
|
||
|
characters. You can distinguish them from simple opcodes by checking the
|
||
|
presence of a `multi` field, which then is the `FONTCHARACTER` codes of the
|
||
|
characters in the sequence, separated with commas.
|
||
|
|
||
|
Multi-characters are distinguishable from simple characters by checking the
|
||
|
presence of a `multi` field. The `multi` field is the `FONTCHARACTER` codes of
|
||
|
the characters composing it, separated by commas. Be careful: there can be
|
||
|
only one character for the multi-character, and Yaml won't interpret this as
|
||
|
a string, but as a number directly!
|
||
|
|
||
|
If the character is simple, then if there is a unicode sequence equivalent of
|
||
|
the character, the Unicode codes of the sequences separated with commas will be
|
||
|
in the `unicode` field; otherwise, the field doesn't exist.
|
||
|
|
||
|
If the character data has a `set` field, then the character is in a set;
|
||
|
otherwise, it should be considered as part of the default set.
|
||
|
|
||
|
### Embedded CASIO BASIC documentation
|
||
|
Some characters will have the `type` field. This type means they have a special
|
||
|
meaning in CASIO Basic. There are two types: `function` and `object`. There is
|
||
|
an associated syntax, which is either `<name>(arg1, arg2)` or
|
||
|
`<name> arg1,arg2`, the first syntax is when `par` is `true` and the second one
|
||
|
is when it is `false`.
|
||
|
Note that for the first syntax, the ending parenthesis is not mandatory.
|
||
|
|
||
|
If `par` is `false` (or non-existent), then the `fix` field can be
|
||
|
set to `infix`, which means the function will be used with either
|
||
|
`arg1 <name>` or `arg1 <name> arg2`.
|
||
|
|
||
|
If the function/object should receive arguments, it can be documented using the
|
||
|
`args` field, and if it has, after these arguments, optional arguments, it can
|
||
|
be documented with the `optn` field. These fields receives a list of argument
|
||
|
strings. An argument type can be imposed by add-in `:<code>` at the end of the
|
||
|
argument string; for example, here are the `For` and `To` entries:
|
||
|
|
||
|
-
|
||
|
code: 0xF704
|
||
|
name: For
|
||
|
category: Statement
|
||
|
args: ["to:0xF705"]
|
||
|
action: ...
|
||
|
multi: [0x46, 0x6F, 0x72, 0x20]
|
||
|
-
|
||
|
code: 0xF705
|
||
|
name: To
|
||
|
category: Operator
|
||
|
args: ["assign:0x0E"]
|
||
|
optn: ["step:0xF706"]
|
||
|
action: ...
|
||
|
multi: [0x20, 0x54, 0x6F, 0x20]
|
||
|
|
||
|
If the function is supposed to make an action, this action can be documented
|
||
|
using the `action` field. If it is supposed to return something, it should can
|
||
|
be documented using the `return` field.
|
||
|
|
||
|
## Fonts
|
||
|
`fonts.yml` is the file containing the fonts information. For each font,
|
||
|
`id` is the ID string, `name` is the complete name, `author` is the complete
|
||
|
author name, `width` and `height` are the dimensions of each character in
|
||
|
the font.
|
||
|
|
||
|
For each font, there is a corresponding folder, named with the font ID.
|
||
|
This folder contains the characters images, organized by the leading multi-byte
|
||
|
character; if there is none, the file `0xXX.pbm` will be chosen, otherwise,
|
||
|
the file `0xLLXX.pbm` will be chosen, where `0xLL` is the leading character.
|
||
|
If the file doesn't exist, the character is to be considered as blank.
|
||
|
|
||
|
Each existing file is a set of 256 tiles of `width * height` each. Each row is
|
||
|
the tiles going from `0xR0` to `0xRF`, where `0xR` is the row number
|
||
|
(0x0 to 0xF).
|
||
|
|
||
|
[libcasio]: https://libcasio.planet-casio.com/
|