Update page 'bopti on fx 9860G'

Lephenixnoir 2019-07-28 13:56:30 +02:00
commit 5e6ec0b5a3
1 changed files with 250 additions and 0 deletions

250
bopti-on-fx-9860G.md Normal file

@ -0,0 +1,250 @@
*This is version 2 of bopti, included in the first fx-CG 50-compatible version
of gint, version 2.0.*
## *bopti* on fx-9860G
The bitmap drawing module, *bopti*, renders images using direct bitwise
operations on video RAM (vram) longwords. This method makes extensive use of
the 4-alignment of gint's vram to operate on 32 pixels at a time and avoid
costly single-bit operations.
In gint's development workflow, images in usual formats are first converted to
the *bopti* format at compile-time. The *bopti* format is designed for fast
rendering: it consists of one or several monochrome bitmaps called *layers*,
arranged in a fixed combination called a *profile*. To each profile corresponds
an assembler routine designed to quickly render the image.
## Performance
(TODO)
Probably about 15 times as fast as MonochromeLib.
## Color profiles
When converting an image, *fxconv* first quantizes the colors by mapping
transparent pixels to `alpha` and mapping other pixels to the closest color in
these four:
| Color name | Hexadecimal |
| ---------- | ----------- |
| `black` | `#000000` |
| `dark` | `#555555` |
| `light` | `#aaaaaa` |
| `white` | `#ffffff` |
Then the image is assigned the smallest profile that can represent all of its
colors:
| Profile | Supported colors |
| ------------ | ------------------------------------------ |
| `mono` | `black`, `white` |
| `mono_alpha` | `black`, `white`, `alpha` |
| `gray` | `black`, `white`, `light`, `dark` |
| `gray_alpha` | `black`, `white`, `light`, `dark`, `alpha` |
## Layers
Each profile has a fixed number of *layers* with a predefined meaning. During
rendering, all of the layers are blit in order to produce the image. The number
of layers in a profile is always minimal: it is $`\lceil 1 + \log n \rceil`$
where $`n`$ is the number of colors in that profile.
On fx-9860G, the vram is either monochrome or 4-color gray, so pixel colors can
only take 2 or 4 different values. This makes logical operations a privileged
method to implement blitting methods, because logical operations can
effortlessly be extended to apply on multiple pixels at once.
The current version of *bopti* uses the following types of layers:
| Layer name | Category | Effect for 0-bits | Effect for 1-bits |
| ----------- | ---------- | ------------------- | ---------------------- |
| `fill` | Monochrome | Paints white | Paints black |
| `white` | Monochrome | - | Paints white |
| `black` | Monochrome | - | Paints black |
| `lfill` | Gray | Clears light vram | Paints light vram |
| `dfill` | Gray | Clears dark vram | Paints dark vram |
| `light` | Gray | - | Paints light gray |
| `dark` | Gray | - | Paints dark gray |
When performing an operation, *bopti* takes data from the encoded image and
applies bitwise operations for all layers. It then moves to a different part of
the image. The previous version of *bopti* applied each layer independently,
but the current version applies them all at once, saving even more time.
Note that most functions do nothing on 0-bits; this is an optimization related
to *rectangle masks*. When a VRAM longword is loaded to a register, often the
blitted image will not cover it entirely. The pixels that must be preserved are
represented in a structure called a rectangle mask. Having this neutral 0-bit
makes it simple to preserve relevant pixels while drawing the image. When
layers don't have this preserving 0-bit, masks must instead be applied to the
VRAM itself. See later for more details.
Here is the relationship between color profiles and their layers:
* The `mono` profile only has a `fill` layer.
* The `mono_alpha` profile starts with a `white` layer to clear the
non-transparent region of the image, then blits a `black` layer to render
the content.
* The `gray` profile has an `lfill` and a `dfill` layer. These two types of
layer act on different VRAMs.
* The `gray_alpha` profile start by blitting a `white` layer on both VRAMs,
then adds a `light` layer and a `dark` layer.
## Logical operations on pixels
As a reference, here are the logical operations used to blit layers on past and
present versions of bopti. The $`x`$ parameter is a boolean; the transformation
must happen iff $`x=1`$. The significance of $`x`$ appears when extending the
logical operations to a longword: it allows controlling 32 pixels individually
while still using only a couple logical instructions.
```c
black (data, x) = data | x
white (data, x) = data & ~x
invert (data, x) = data ^ x
```
For gray images, we need to know that the gray engine produces an illusion of
intermediate color by quickly alternating two buffers on the screen, with a
different duration for each. This way, the proportion of time each pixel is
black is one of four different values. Assuming `long` and `short` represent
the value of a pixel in the buffer that stays longer and shorter on the screen,
we have the following encoding:
white = 0 (long=0 short=0)
lightgray = 1 (long=0 short=1)
darkgray = 2 (long=1 short=0)
black = 3 (long=1 short=1)
So operations on gray pixels will modify two VRAMs at once.
Among interesting operations, we have `ligthen`, which shifts all values
towards white (and white remains white), as if decrementing them, and `darken`
that shifts all values towards black (and black remains black), as if
incrementing them.
```c
black (light, dark, x) = (light | x, dark | x)
dark (light, dark, x) = (light & ~x, dark | x)
light (light, dark, x) = (light | x, dark & ~x)
white (light, dark, x) = (light & ~x, dark & ~x)
inverse (light, dark, x) = (light ^ x, dark ^ x)
lighten (light, dark, x) = ((light ^ x) & (dark | ~x), dark & (light | ~x))
darken (light, dark, x) = ((light ^ x) | (dark & x), dark | (light & x))
```
These functions are obtained by looking intensely at a truth table, then adding
a linear number of $`x`$'s to neutralize some operands when $`x=0`$.
## Assembler-driven rendering
The previous implementation of bopti was already fast, usually about 8 times
as fast as MonochromeLib. Half of it was due to vram alignment, the other was
related to implementation and format. It had, however, two limiting factors:
1. The operation function was a generic function taking the color as argument,
and it used a switch to decide which operation to apply;
2. Each layer was drawn independently, so the 2D structure of the image was
unnecessarily traversed several times.
These two limitations are related and can be overcome by specializing the
rendering code which is the deepest in the critical loop. The current version
of *bopti* has one specialized rendering function per color profile,
implemented in assembler.
## Image format
The conversion is performed by *fxconv* at compile-time and outputs a
big-endian data structure that can be efficiently traversed from the add-in.
The image is first extended to make its width a multiple of 32 pixels, then
stored in row-major order:
32 32 32
+--------+--------+--------+
| 1 | 2 | 3 | 1
+--------+--------+--------+
| 4 | 5 | 6 | 1
+--------+--------+--------+
A set of 32 pixels as numbered on the diagram above is called a *position*.
This in an important concept for the rendering algorithm. For each position,
the data of all layers is stored in rendering order, so the layers are
interwoven in the storage. It also means that the data for a position will
consist of several longwords, not just one.
Note that extending the image to a multiple of 32 in width is not a hard
requirement, it can be avoided by defining and implementing 16-bit and 8-bit
positions, but this is currently not done.
Along with this data, the image object contains a number of metadata:
```c
typedef struct
{
/* Image can only be rendered with the gray engine */
uint gray :1;
/* Left for future use */
uint :3;
/* Image profile (uniquely identifies a rendering function) */
uint profile :4;
/* Full width, in pixels */
uint width :12;
/* Full height, in pixels */
uint height :12;
/* Raw layer data */
uint8_t data[];
} GPACKED(4) image_t;
```
The first byte indicate the color profile and whether this profile is
gray-only. `width` and `height` are the natural dimensions of the image, before
width extension (which is only relevant for storage). The number of columns is
deduced from the width.
## Rendering algorithm
The rendering algorithm takes as parameter a subrectangle of an image and a
target position on the VRAM. Drawing a subrectangle instead of the whole image
makes it trivial to do clipping by just removing whatever goes beyond the
screen.
Two functions are available at this level:
* `bopti_render_clip()` clips the provided subrectangle to the image
dimensions, then clips that to the screen, and renders. This is the default
but all the checks take some time to perform.
* `bopti_render_noclip()` directly renders by assuming that the subrectangle is
valid and that the render fully fits into the VRAM. In many situations these
assumptions are known so it can be used by passing `DIMAGE_NOCLIP` to
`dsubimage()` to spare time.
After adjusting (or not) coordinates, both of these functions fall to the next
level. Rectangle masks are computed to indicate which part of the VRAM must or
not be affected. (This is because everything will be manipuled with longwords
from now on, and rendering boundaries will fall in the middle of them.)
Since the masks safeguards what we're going to draw, we can overestimate the
subrectangle to render with a larger set of positions that contains it. Each
position is rendered on two VRAM longwords using the color profile function,
then the next position is loaded until the image is complete.
Two functions are used for this task:
* `bopti_render()` does the prep work and parameter computation.
* `bopti_grid()` iterates over positions and calls the profiles's renderer.
The last level is the profile renderer, which is implemented in assembler.
These are functions that take as parameter the current VRAM values, a pointer
to image data, a pointer to rectangle masks, the x-position of the blit, and
return new VRAM values.
There are two types of such functions:
* `bopti_asm_*` for the `mono` and `mono_alpha` profiles, on a single VRAM.
* `bopti_gasm_*` for all four profiles, on two VRAMs.
TODO: Could add more detail.