commit 5e6ec0b5a3c6eea5055a182abf69bf8a9d264d3e Author: Lephenixnoir Date: Sun Jul 28 13:56:30 2019 +0200 Update page 'bopti on fx 9860G' diff --git a/bopti-on-fx-9860G.md b/bopti-on-fx-9860G.md new file mode 100644 index 0000000..bb55836 --- /dev/null +++ b/bopti-on-fx-9860G.md @@ -0,0 +1,250 @@ +*This is version 2 of bopti, included in the first fx-CG 50-compatible version +of gint, version 2.0.* + +## *bopti* on fx-9860G + +The bitmap drawing module, *bopti*, renders images using direct bitwise +operations on video RAM (vram) longwords. This method makes extensive use of +the 4-alignment of gint's vram to operate on 32 pixels at a time and avoid +costly single-bit operations. + +In gint's development workflow, images in usual formats are first converted to +the *bopti* format at compile-time. The *bopti* format is designed for fast +rendering: it consists of one or several monochrome bitmaps called *layers*, +arranged in a fixed combination called a *profile*. To each profile corresponds +an assembler routine designed to quickly render the image. + +## Performance + +(TODO) + +Probably about 15 times as fast as MonochromeLib. + +## Color profiles + +When converting an image, *fxconv* first quantizes the colors by mapping +transparent pixels to `alpha` and mapping other pixels to the closest color in +these four: + +| Color name | Hexadecimal | +| ---------- | ----------- | +| `black` | `#000000` | +| `dark` | `#555555` | +| `light` | `#aaaaaa` | +| `white` | `#ffffff` | + +Then the image is assigned the smallest profile that can represent all of its +colors: + +| Profile | Supported colors | +| ------------ | ------------------------------------------ | +| `mono` | `black`, `white` | +| `mono_alpha` | `black`, `white`, `alpha` | +| `gray` | `black`, `white`, `light`, `dark` | +| `gray_alpha` | `black`, `white`, `light`, `dark`, `alpha` | + +## Layers + +Each profile has a fixed number of *layers* with a predefined meaning. During +rendering, all of the layers are blit in order to produce the image. The number +of layers in a profile is always minimal: it is $`\lceil 1 + \log n \rceil`$ +where $`n`$ is the number of colors in that profile. + +On fx-9860G, the vram is either monochrome or 4-color gray, so pixel colors can +only take 2 or 4 different values. This makes logical operations a privileged +method to implement blitting methods, because logical operations can +effortlessly be extended to apply on multiple pixels at once. + +The current version of *bopti* uses the following types of layers: + +| Layer name | Category | Effect for 0-bits | Effect for 1-bits | +| ----------- | ---------- | ------------------- | ---------------------- | +| `fill` | Monochrome | Paints white | Paints black | +| `white` | Monochrome | - | Paints white | +| `black` | Monochrome | - | Paints black | +| `lfill` | Gray | Clears light vram | Paints light vram | +| `dfill` | Gray | Clears dark vram | Paints dark vram | +| `light` | Gray | - | Paints light gray | +| `dark` | Gray | - | Paints dark gray | + +When performing an operation, *bopti* takes data from the encoded image and +applies bitwise operations for all layers. It then moves to a different part of +the image. The previous version of *bopti* applied each layer independently, +but the current version applies them all at once, saving even more time. + +Note that most functions do nothing on 0-bits; this is an optimization related +to *rectangle masks*. When a VRAM longword is loaded to a register, often the +blitted image will not cover it entirely. The pixels that must be preserved are +represented in a structure called a rectangle mask. Having this neutral 0-bit +makes it simple to preserve relevant pixels while drawing the image. When +layers don't have this preserving 0-bit, masks must instead be applied to the +VRAM itself. See later for more details. + +Here is the relationship between color profiles and their layers: + +* The `mono` profile only has a `fill` layer. +* The `mono_alpha` profile starts with a `white` layer to clear the + non-transparent region of the image, then blits a `black` layer to render + the content. +* The `gray` profile has an `lfill` and a `dfill` layer. These two types of + layer act on different VRAMs. +* The `gray_alpha` profile start by blitting a `white` layer on both VRAMs, + then adds a `light` layer and a `dark` layer. + +## Logical operations on pixels + +As a reference, here are the logical operations used to blit layers on past and +present versions of bopti. The $`x`$ parameter is a boolean; the transformation +must happen iff $`x=1`$. The significance of $`x`$ appears when extending the +logical operations to a longword: it allows controlling 32 pixels individually +while still using only a couple logical instructions. + +```c +black (data, x) = data | x +white (data, x) = data & ~x +invert (data, x) = data ^ x +``` + +For gray images, we need to know that the gray engine produces an illusion of +intermediate color by quickly alternating two buffers on the screen, with a +different duration for each. This way, the proportion of time each pixel is +black is one of four different values. Assuming `long` and `short` represent +the value of a pixel in the buffer that stays longer and shorter on the screen, +we have the following encoding: + + white = 0 (long=0 short=0) + lightgray = 1 (long=0 short=1) + darkgray = 2 (long=1 short=0) + black = 3 (long=1 short=1) + +So operations on gray pixels will modify two VRAMs at once. + +Among interesting operations, we have `ligthen`, which shifts all values +towards white (and white remains white), as if decrementing them, and `darken` +that shifts all values towards black (and black remains black), as if +incrementing them. + +```c +black (light, dark, x) = (light | x, dark | x) +dark (light, dark, x) = (light & ~x, dark | x) +light (light, dark, x) = (light | x, dark & ~x) +white (light, dark, x) = (light & ~x, dark & ~x) +inverse (light, dark, x) = (light ^ x, dark ^ x) +lighten (light, dark, x) = ((light ^ x) & (dark | ~x), dark & (light | ~x)) +darken (light, dark, x) = ((light ^ x) | (dark & x), dark | (light & x)) +``` + +These functions are obtained by looking intensely at a truth table, then adding +a linear number of $`x`$'s to neutralize some operands when $`x=0`$. + +## Assembler-driven rendering + +The previous implementation of bopti was already fast, usually about 8 times +as fast as MonochromeLib. Half of it was due to vram alignment, the other was +related to implementation and format. It had, however, two limiting factors: + +1. The operation function was a generic function taking the color as argument, + and it used a switch to decide which operation to apply; +2. Each layer was drawn independently, so the 2D structure of the image was + unnecessarily traversed several times. + +These two limitations are related and can be overcome by specializing the +rendering code which is the deepest in the critical loop. The current version +of *bopti* has one specialized rendering function per color profile, +implemented in assembler. + +## Image format + +The conversion is performed by *fxconv* at compile-time and outputs a +big-endian data structure that can be efficiently traversed from the add-in. + +The image is first extended to make its width a multiple of 32 pixels, then +stored in row-major order: + + 32 32 32 + +--------+--------+--------+ + | 1 | 2 | 3 | 1 + +--------+--------+--------+ + | 4 | 5 | 6 | 1 + +--------+--------+--------+ + +A set of 32 pixels as numbered on the diagram above is called a *position*. +This in an important concept for the rendering algorithm. For each position, +the data of all layers is stored in rendering order, so the layers are +interwoven in the storage. It also means that the data for a position will +consist of several longwords, not just one. + +Note that extending the image to a multiple of 32 in width is not a hard +requirement, it can be avoided by defining and implementing 16-bit and 8-bit +positions, but this is currently not done. + +Along with this data, the image object contains a number of metadata: + +```c +typedef struct +{ + /* Image can only be rendered with the gray engine */ + uint gray :1; + /* Left for future use */ + uint :3; + /* Image profile (uniquely identifies a rendering function) */ + uint profile :4; + /* Full width, in pixels */ + uint width :12; + /* Full height, in pixels */ + uint height :12; + + /* Raw layer data */ + uint8_t data[]; + +} GPACKED(4) image_t; +``` + +The first byte indicate the color profile and whether this profile is +gray-only. `width` and `height` are the natural dimensions of the image, before +width extension (which is only relevant for storage). The number of columns is +deduced from the width. + +## Rendering algorithm + +The rendering algorithm takes as parameter a subrectangle of an image and a +target position on the VRAM. Drawing a subrectangle instead of the whole image +makes it trivial to do clipping by just removing whatever goes beyond the +screen. + +Two functions are available at this level: + +* `bopti_render_clip()` clips the provided subrectangle to the image + dimensions, then clips that to the screen, and renders. This is the default + but all the checks take some time to perform. +* `bopti_render_noclip()` directly renders by assuming that the subrectangle is + valid and that the render fully fits into the VRAM. In many situations these + assumptions are known so it can be used by passing `DIMAGE_NOCLIP` to + `dsubimage()` to spare time. + +After adjusting (or not) coordinates, both of these functions fall to the next +level. Rectangle masks are computed to indicate which part of the VRAM must or +not be affected. (This is because everything will be manipuled with longwords +from now on, and rendering boundaries will fall in the middle of them.) + +Since the masks safeguards what we're going to draw, we can overestimate the +subrectangle to render with a larger set of positions that contains it. Each +position is rendered on two VRAM longwords using the color profile function, +then the next position is loaded until the image is complete. + +Two functions are used for this task: + +* `bopti_render()` does the prep work and parameter computation. +* `bopti_grid()` iterates over positions and calls the profiles's renderer. + +The last level is the profile renderer, which is implemented in assembler. +These are functions that take as parameter the current VRAM values, a pointer +to image data, a pointer to rectangle masks, the x-position of the blit, and +return new VRAM values. + +There are two types of such functions: + +* `bopti_asm_*` for the `mono` and `mono_alpha` profiles, on a single VRAM. +* `bopti_gasm_*` for all four profiles, on two VRAMs. + +TODO: Could add more detail.