Update page 'bopti on fx 9860G'
commit
5e6ec0b5a3
|
@ -0,0 +1,250 @@
|
|||
*This is version 2 of bopti, included in the first fx-CG 50-compatible version
|
||||
of gint, version 2.0.*
|
||||
|
||||
## *bopti* on fx-9860G
|
||||
|
||||
The bitmap drawing module, *bopti*, renders images using direct bitwise
|
||||
operations on video RAM (vram) longwords. This method makes extensive use of
|
||||
the 4-alignment of gint's vram to operate on 32 pixels at a time and avoid
|
||||
costly single-bit operations.
|
||||
|
||||
In gint's development workflow, images in usual formats are first converted to
|
||||
the *bopti* format at compile-time. The *bopti* format is designed for fast
|
||||
rendering: it consists of one or several monochrome bitmaps called *layers*,
|
||||
arranged in a fixed combination called a *profile*. To each profile corresponds
|
||||
an assembler routine designed to quickly render the image.
|
||||
|
||||
## Performance
|
||||
|
||||
(TODO)
|
||||
|
||||
Probably about 15 times as fast as MonochromeLib.
|
||||
|
||||
## Color profiles
|
||||
|
||||
When converting an image, *fxconv* first quantizes the colors by mapping
|
||||
transparent pixels to `alpha` and mapping other pixels to the closest color in
|
||||
these four:
|
||||
|
||||
| Color name | Hexadecimal |
|
||||
| ---------- | ----------- |
|
||||
| `black` | `#000000` |
|
||||
| `dark` | `#555555` |
|
||||
| `light` | `#aaaaaa` |
|
||||
| `white` | `#ffffff` |
|
||||
|
||||
Then the image is assigned the smallest profile that can represent all of its
|
||||
colors:
|
||||
|
||||
| Profile | Supported colors |
|
||||
| ------------ | ------------------------------------------ |
|
||||
| `mono` | `black`, `white` |
|
||||
| `mono_alpha` | `black`, `white`, `alpha` |
|
||||
| `gray` | `black`, `white`, `light`, `dark` |
|
||||
| `gray_alpha` | `black`, `white`, `light`, `dark`, `alpha` |
|
||||
|
||||
## Layers
|
||||
|
||||
Each profile has a fixed number of *layers* with a predefined meaning. During
|
||||
rendering, all of the layers are blit in order to produce the image. The number
|
||||
of layers in a profile is always minimal: it is $`\lceil 1 + \log n \rceil`$
|
||||
where $`n`$ is the number of colors in that profile.
|
||||
|
||||
On fx-9860G, the vram is either monochrome or 4-color gray, so pixel colors can
|
||||
only take 2 or 4 different values. This makes logical operations a privileged
|
||||
method to implement blitting methods, because logical operations can
|
||||
effortlessly be extended to apply on multiple pixels at once.
|
||||
|
||||
The current version of *bopti* uses the following types of layers:
|
||||
|
||||
| Layer name | Category | Effect for 0-bits | Effect for 1-bits |
|
||||
| ----------- | ---------- | ------------------- | ---------------------- |
|
||||
| `fill` | Monochrome | Paints white | Paints black |
|
||||
| `white` | Monochrome | - | Paints white |
|
||||
| `black` | Monochrome | - | Paints black |
|
||||
| `lfill` | Gray | Clears light vram | Paints light vram |
|
||||
| `dfill` | Gray | Clears dark vram | Paints dark vram |
|
||||
| `light` | Gray | - | Paints light gray |
|
||||
| `dark` | Gray | - | Paints dark gray |
|
||||
|
||||
When performing an operation, *bopti* takes data from the encoded image and
|
||||
applies bitwise operations for all layers. It then moves to a different part of
|
||||
the image. The previous version of *bopti* applied each layer independently,
|
||||
but the current version applies them all at once, saving even more time.
|
||||
|
||||
Note that most functions do nothing on 0-bits; this is an optimization related
|
||||
to *rectangle masks*. When a VRAM longword is loaded to a register, often the
|
||||
blitted image will not cover it entirely. The pixels that must be preserved are
|
||||
represented in a structure called a rectangle mask. Having this neutral 0-bit
|
||||
makes it simple to preserve relevant pixels while drawing the image. When
|
||||
layers don't have this preserving 0-bit, masks must instead be applied to the
|
||||
VRAM itself. See later for more details.
|
||||
|
||||
Here is the relationship between color profiles and their layers:
|
||||
|
||||
* The `mono` profile only has a `fill` layer.
|
||||
* The `mono_alpha` profile starts with a `white` layer to clear the
|
||||
non-transparent region of the image, then blits a `black` layer to render
|
||||
the content.
|
||||
* The `gray` profile has an `lfill` and a `dfill` layer. These two types of
|
||||
layer act on different VRAMs.
|
||||
* The `gray_alpha` profile start by blitting a `white` layer on both VRAMs,
|
||||
then adds a `light` layer and a `dark` layer.
|
||||
|
||||
## Logical operations on pixels
|
||||
|
||||
As a reference, here are the logical operations used to blit layers on past and
|
||||
present versions of bopti. The $`x`$ parameter is a boolean; the transformation
|
||||
must happen iff $`x=1`$. The significance of $`x`$ appears when extending the
|
||||
logical operations to a longword: it allows controlling 32 pixels individually
|
||||
while still using only a couple logical instructions.
|
||||
|
||||
```c
|
||||
black (data, x) = data | x
|
||||
white (data, x) = data & ~x
|
||||
invert (data, x) = data ^ x
|
||||
```
|
||||
|
||||
For gray images, we need to know that the gray engine produces an illusion of
|
||||
intermediate color by quickly alternating two buffers on the screen, with a
|
||||
different duration for each. This way, the proportion of time each pixel is
|
||||
black is one of four different values. Assuming `long` and `short` represent
|
||||
the value of a pixel in the buffer that stays longer and shorter on the screen,
|
||||
we have the following encoding:
|
||||
|
||||
white = 0 (long=0 short=0)
|
||||
lightgray = 1 (long=0 short=1)
|
||||
darkgray = 2 (long=1 short=0)
|
||||
black = 3 (long=1 short=1)
|
||||
|
||||
So operations on gray pixels will modify two VRAMs at once.
|
||||
|
||||
Among interesting operations, we have `ligthen`, which shifts all values
|
||||
towards white (and white remains white), as if decrementing them, and `darken`
|
||||
that shifts all values towards black (and black remains black), as if
|
||||
incrementing them.
|
||||
|
||||
```c
|
||||
black (light, dark, x) = (light | x, dark | x)
|
||||
dark (light, dark, x) = (light & ~x, dark | x)
|
||||
light (light, dark, x) = (light | x, dark & ~x)
|
||||
white (light, dark, x) = (light & ~x, dark & ~x)
|
||||
inverse (light, dark, x) = (light ^ x, dark ^ x)
|
||||
lighten (light, dark, x) = ((light ^ x) & (dark | ~x), dark & (light | ~x))
|
||||
darken (light, dark, x) = ((light ^ x) | (dark & x), dark | (light & x))
|
||||
```
|
||||
|
||||
These functions are obtained by looking intensely at a truth table, then adding
|
||||
a linear number of $`x`$'s to neutralize some operands when $`x=0`$.
|
||||
|
||||
## Assembler-driven rendering
|
||||
|
||||
The previous implementation of bopti was already fast, usually about 8 times
|
||||
as fast as MonochromeLib. Half of it was due to vram alignment, the other was
|
||||
related to implementation and format. It had, however, two limiting factors:
|
||||
|
||||
1. The operation function was a generic function taking the color as argument,
|
||||
and it used a switch to decide which operation to apply;
|
||||
2. Each layer was drawn independently, so the 2D structure of the image was
|
||||
unnecessarily traversed several times.
|
||||
|
||||
These two limitations are related and can be overcome by specializing the
|
||||
rendering code which is the deepest in the critical loop. The current version
|
||||
of *bopti* has one specialized rendering function per color profile,
|
||||
implemented in assembler.
|
||||
|
||||
## Image format
|
||||
|
||||
The conversion is performed by *fxconv* at compile-time and outputs a
|
||||
big-endian data structure that can be efficiently traversed from the add-in.
|
||||
|
||||
The image is first extended to make its width a multiple of 32 pixels, then
|
||||
stored in row-major order:
|
||||
|
||||
32 32 32
|
||||
+--------+--------+--------+
|
||||
| 1 | 2 | 3 | 1
|
||||
+--------+--------+--------+
|
||||
| 4 | 5 | 6 | 1
|
||||
+--------+--------+--------+
|
||||
|
||||
A set of 32 pixels as numbered on the diagram above is called a *position*.
|
||||
This in an important concept for the rendering algorithm. For each position,
|
||||
the data of all layers is stored in rendering order, so the layers are
|
||||
interwoven in the storage. It also means that the data for a position will
|
||||
consist of several longwords, not just one.
|
||||
|
||||
Note that extending the image to a multiple of 32 in width is not a hard
|
||||
requirement, it can be avoided by defining and implementing 16-bit and 8-bit
|
||||
positions, but this is currently not done.
|
||||
|
||||
Along with this data, the image object contains a number of metadata:
|
||||
|
||||
```c
|
||||
typedef struct
|
||||
{
|
||||
/* Image can only be rendered with the gray engine */
|
||||
uint gray :1;
|
||||
/* Left for future use */
|
||||
uint :3;
|
||||
/* Image profile (uniquely identifies a rendering function) */
|
||||
uint profile :4;
|
||||
/* Full width, in pixels */
|
||||
uint width :12;
|
||||
/* Full height, in pixels */
|
||||
uint height :12;
|
||||
|
||||
/* Raw layer data */
|
||||
uint8_t data[];
|
||||
|
||||
} GPACKED(4) image_t;
|
||||
```
|
||||
|
||||
The first byte indicate the color profile and whether this profile is
|
||||
gray-only. `width` and `height` are the natural dimensions of the image, before
|
||||
width extension (which is only relevant for storage). The number of columns is
|
||||
deduced from the width.
|
||||
|
||||
## Rendering algorithm
|
||||
|
||||
The rendering algorithm takes as parameter a subrectangle of an image and a
|
||||
target position on the VRAM. Drawing a subrectangle instead of the whole image
|
||||
makes it trivial to do clipping by just removing whatever goes beyond the
|
||||
screen.
|
||||
|
||||
Two functions are available at this level:
|
||||
|
||||
* `bopti_render_clip()` clips the provided subrectangle to the image
|
||||
dimensions, then clips that to the screen, and renders. This is the default
|
||||
but all the checks take some time to perform.
|
||||
* `bopti_render_noclip()` directly renders by assuming that the subrectangle is
|
||||
valid and that the render fully fits into the VRAM. In many situations these
|
||||
assumptions are known so it can be used by passing `DIMAGE_NOCLIP` to
|
||||
`dsubimage()` to spare time.
|
||||
|
||||
After adjusting (or not) coordinates, both of these functions fall to the next
|
||||
level. Rectangle masks are computed to indicate which part of the VRAM must or
|
||||
not be affected. (This is because everything will be manipuled with longwords
|
||||
from now on, and rendering boundaries will fall in the middle of them.)
|
||||
|
||||
Since the masks safeguards what we're going to draw, we can overestimate the
|
||||
subrectangle to render with a larger set of positions that contains it. Each
|
||||
position is rendered on two VRAM longwords using the color profile function,
|
||||
then the next position is loaded until the image is complete.
|
||||
|
||||
Two functions are used for this task:
|
||||
|
||||
* `bopti_render()` does the prep work and parameter computation.
|
||||
* `bopti_grid()` iterates over positions and calls the profiles's renderer.
|
||||
|
||||
The last level is the profile renderer, which is implemented in assembler.
|
||||
These are functions that take as parameter the current VRAM values, a pointer
|
||||
to image data, a pointer to rectangle masks, the x-position of the blit, and
|
||||
return new VRAM values.
|
||||
|
||||
There are two types of such functions:
|
||||
|
||||
* `bopti_asm_*` for the `mono` and `mono_alpha` profiles, on a single VRAM.
|
||||
* `bopti_gasm_*` for all four profiles, on two VRAMs.
|
||||
|
||||
TODO: Could add more detail.
|
Loading…
Reference in New Issue