comments to document struct caseconv_entry

explain design of compact (packed) struct caseconv_entry,
in case it needs to be modified for future Unicode versions
This commit is contained in:
Thomas Wolff 2018-03-26 11:46:40 +02:00 committed by Corinna Vinschen
parent b49ce5af1b
commit fc59da00c8
1 changed files with 33 additions and 3 deletions

View File

@ -4,8 +4,36 @@
//#include <errno.h>
#include "local.h"
enum {EVENCAP, ODDCAP};
/*
struct caseconv_entry describes the case conversion behaviour
of a range of Unicode characters.
It was designed to be compact for a minimal table size.
The range is first...first + diff.
Conversion behaviour for a character c in the respective range:
mode == TOLO towlower (c) = c + delta
mode == TOUP towupper (c) = c + delta
mode == TOBOTH (titling case characters)
towlower (c) = c + 1
towupper (c) = c - 1
mode == TO1 capital/small letters are alternating
delta == EVENCAP even codes are capital
delta == ODDCAP odd codes are capital
(this correlates with an even/odd first range value
as of Unicode 10.0 but we do not rely on this)
As of Unicode 10.0, the following field lengths are sufficient
first: 17 bits
diff: 8 bits
delta: 17 bits
mode: 2 bits
The reserve of 4 bits (to limit the struct to 6 bytes)
is currently added to the 'first' field;
should a future Unicode version make it necessary to expand the others,
the 'first' field could be reduced as needed, or larger ranges could
be split up (reduce limit max=255 e.g. to max=127 or max=63 in
script mkcaseconv, check increasing table size).
*/
enum {TO1, TOLO, TOUP, TOBOTH};
enum {EVENCAP, ODDCAP};
static struct caseconv_entry {
unsigned int first: 21;
unsigned short diff: 8;
@ -71,6 +99,7 @@ toulower (wint_t c)
default:
break;
}
return c;
}
@ -102,9 +131,10 @@ touupper (wint_t c)
default:
break;
}
default:
break;
default:
break;
}
return c;
}