libc/doc/casio_encoding_tutorial.md

4.4 KiB

Casio Encoding

One of the most important aspects about creating Add-ins is wo write text on the display. The Casio SDK provides us with a very unflexible way of doing that:

Print("Hello World");

As you might already know, the standard C library has a much more elegant solution which is much more flexible:

#include <stdio.h>

printf("Hello World");

No matter which way we want to use to print letters, numbers or special symbols, we need to understand how all these characters are represented internally.

Basics

Strings

A string is an group of characters, e.g. "Hello World" contains the characters 'H', 'a', 'l', etc. As you might have notices, strings are enclosed by "..." while we use '...' for characters.

More technically speaking, a string is an array of multiple chars.

char message[] = "Hello World";
printf(message);

Characters

Actually the type char is a number which indicates, which letter/symbol is meant. E.g. 98 means 'a', 99 means 'b', 100 means 'c' etc.

There is a whole table of 127 number/character pairs which is called the ASCII table.

ASCII

The way of mapping numbers to characters is called encoding. The most basic encoding is called ASCII. Note that the C compiler and the underlying operating system do not have to comply with ASCII, but generally they do (except CASIO of course). I'll come to that in a second.

Here you see a part of the ASCII table. In the Internet, you also find the whole ASCII table.

Number Character
... ...
88 X
89 Y
90 Z
91 [
92 \
93 ]
94 ^
95 _
96 `
97 a
98 b
99 c
... ...

To print an ASCII character using its number, you must convert this number into hexadecimal first. For example, 'o' = 111 = 0x6F = '\x6F':


printf("Hell\x6F World");

In fact, 'o', 111, 0x6F and '\x6F' are really the same for the C compiler. As a consequence, 'o' == 111 will always be true (as long as the C compiler complies to ASCII).

Note that ASCII only covers the characters for 0-127 and is de-facto standard for all platforms. The characters 128-255 are sometimes referred to as extended ASCII. However, extended ASCII is non-standard and should be avoided if possible.

Unicode

Of course there are much more symbols than 256 if we think about the chinese alphabet, greek letters etc. To be able to use them in C, we need another encoding which is called Unicode. While each character in ACSII requires one byte of memory, Unicode needs more.

Since the data type char is not sufficient for Unicode anymore, char16_t, char32_t or wchar_t are used. I do not want to go too much into detail here, because Casio does not support Unicode anyway. If you are interested, however, in the Internet you will find many good examples.

Casio encoding

The Casio encoding is partly ASCII-compliant and not Unicode-compliant at all. All printable characters (32-126) of ASCII have the same value in the Casio encoding and can be used without care:

 !"#$%&'()*+,-./
0123456789
:;<=>?@
ABCDEFGHIJKLMNOPQRSTUVWXYZ
[\]^_`
abcdefghijklmnopqrstuvwxyz
{|}~

However, Casio decided to replace the non-printable character with their own symbols. E.g. what is in ASCII the formfeed character '\f' = 12 = 0x0C = '\x0C' was replaced by the character '◢'. That means that Print("\x0C") will now print .

Casio also added characters to the extended ASCII section 128-255. Details to all characters can be looked up in the Casio charset table.

Multi-byte Characters

Additionally, certain combinations of values print special characters, e.g. Print("\xE5\x51") will print the greek capital sigma (also known as sum sign) 'Σ'. The two bytes \E5 and \x51 will be evaluated as one character here. In the Casio charset table, this character can be found as 0xE551.

Note that the first byte of two-byte characters can be either 0x7F, 0xF7, 0xF9, 0xE5, 0xE6 or 0xE7. The second byte can be anything from 0x00 to 0xFF.

Casio charset table

Please have a look at the Casio charset table for details to all characters Casio is capable to handle. Some of them are not printable. We do not need to care about them.