libc/doc/casio_encoding_tutorial.md

105 lines
4.4 KiB
Markdown
Raw Permalink Normal View History

2018-09-19 11:58:25 +02:00
# Casio Encoding
One of the most important aspects about creating Add-ins is wo write text on the display. The Casio SDK provides us with a very unflexible way of doing that:
```c
Print("Hello World");
```
As you might already know, the standard C library has a much more elegant solution which is much more flexible:
```c
#include <stdio.h>
printf("Hello World");
```
No matter which way we want to use to print letters, numbers or special symbols, we need to understand how all these characters are represented internally.
## Basics
### Strings
A *string* is an group of *characters*, e.g. "Hello World" contains the characters 'H', 'a', 'l', etc. As you might have notices, strings are enclosed by "..." while we use '...' for characters.
More technically speaking, a string is an array of multiple `char`s.
```c
char message[] = "Hello World";
printf(message);
```
### Characters
Actually the type `char` is a number which indicates, which letter/symbol is meant. E.g. 98 means 'a', 99 means 'b', 100 means 'c' etc.
There is a whole table of 127 number/character pairs which is called the ASCII table.
## ASCII
The way of mapping numbers to characters is called *encoding*. The most basic encoding is called ASCII. Note that the C compiler and the underlying operating system do not have to comply with ASCII, but generally they do (except CASIO of course). I'll come to that in a second.
Here you see a part of the ASCII table. In the Internet, you also find the [whole ASCII table](https://www.rapidtables.com/code/text/ascii-table.html).
| Number | Character |
| ------ | --------- |
| ...|...|
| 88 | X |
| 89 | Y |
| 90 | Z |
| 91 | [ |
| 92 | \ |
| 93 | ] |
| 94 | ^ |
| 95 | _ |
| 96 | ` |
| 97 | a |
| 98 | b |
| 99 | c |
| ...|...|
To print an ASCII character using its number, you must convert this number into hexadecimal first. For example, `'o'` = `111` = `0x6F` = `'\x6F'`:
```c
printf("Hell\x6F World");
```
In fact, 'o', 111, 0x6F and '\x6F' are *really* the same for the C compiler. As a consequence, `'o' == 111` will always be true (as long as the C compiler complies to ASCII).
Note that ASCII only covers the characters for 0-127 and is de-facto standard for all platforms. The characters 128-255 are sometimes referred to as *extended ASCII*. However, extended ASCII is non-standard and should be avoided if possible.
## Unicode
Of course there are much more symbols than 256 if we think about the chinese alphabet, greek letters etc. To be able to use them in C, we need another encoding which is called *Unicode*. While each character in ACSII requires one byte of memory, Unicode needs more.
Since the data type `char` is not sufficient for Unicode anymore, `char16_t`, `char32_t` or `wchar_t` are used. I do not want to go too much into detail here, because Casio does **not** support Unicode anyway. If you are interested, however, in the Internet you will find many good [examples](http://doc.bccnsoft.com/docs/cppreference2015/en/c/language/string_literal.html).
## Casio encoding
The Casio encoding is partly ASCII-compliant and not Unicode-compliant at all. All printable characters (32-126) of ASCII have the same value in the Casio encoding and can be used without care:
```
!"#$%&'()*+,-./
0123456789
:;<=>?@
ABCDEFGHIJKLMNOPQRSTUVWXYZ
[\]^_`
abcdefghijklmnopqrstuvwxyz
{|}~
```
However, Casio decided to replace the non-printable character with their own symbols. E.g. what is in ASCII the formfeed character `'\f'` = `12` = `0x0C` = `'\x0C'` was replaced by the character `'◢'`. That means that `Print("\x0C")` will now print `◢`.
Casio also added characters to the extended ASCII section 128-255. Details to all characters can be looked up in the [Casio charset table](casio_charset_table.md).
### Multi-byte Characters
Additionally, certain combinations of values print special characters, e.g. `Print("\xE5\x51")` will print the greek capital *sigma* (also known as sum sign) `'Σ'`. The two bytes `\E5` and `\x51` will be evaluated as *one character* here. In the [Casio charset table](casio_charset_table.md), this character can be found as `0xE551`.
Note that the first byte of two-byte characters can be either `0x7F`, `0xF7`, `0xF9`, `0xE5`, `0xE6` or `0xE7`. The second byte can be anything from `0x00` to `0xFF`.
### Casio charset table
Please have a look at the [Casio charset table](casio_charset_table.md) for details to all characters Casio is capable to handle. Some of them are not printable. We do not need to care about them.