Details about the work of the profile functions.
parent
c3cf0c9907
commit
f4a0f4ba76
|
@ -1,11 +1,8 @@
|
|||
*This is version 2 of bopti, included in the first fx-CG 50-compatible version
|
||||
of gint, version 2.0.*
|
||||
|
||||
## *bopti* on fx-9860G
|
||||
**## *bopti* on fx-9860G
|
||||
|
||||
The bitmap drawing module, *bopti*, renders images using direct bitwise
|
||||
operations on video RAM (vram) longwords. This method makes extensive use of
|
||||
the 4-alignment of gint's vram to operate on 32 pixels at a time and avoid
|
||||
operations on video RAM (VRAM) longwords. This method makes extensive use of
|
||||
the 4-alignment of gint's VRAM to operate on 32 pixels at a time and avoid
|
||||
costly single-bit operations.
|
||||
|
||||
In gint's development workflow, images in usual formats are first converted to
|
||||
|
@ -22,9 +19,9 @@ Probably about 15 times as fast as MonochromeLib.
|
|||
|
||||
## Color profiles
|
||||
|
||||
When converting an image, *fxconv* first quantizes the colors by mapping
|
||||
transparent pixels to `alpha` and mapping other pixels to the closest color in
|
||||
these four:
|
||||
When converting an image, the fxconv tool of the [fxSDK](/Lephenixnoir/fxsdk)
|
||||
first quantizes the colors by mapping transparent pixels to `alpha` and every
|
||||
other pixel to the closest color in these four:
|
||||
|
||||
| Color name | Hexadecimal |
|
||||
| ---------- | ----------- |
|
||||
|
@ -48,9 +45,9 @@ colors:
|
|||
Each profile has a fixed number of *layers* with a predefined meaning. During
|
||||
rendering, all of the layers are blit in order to produce the image. The number
|
||||
of layers in a profile is always minimal: it is `⌊ 1 + log n ⌋` where `n` is
|
||||
the number of colors in that profile.
|
||||
the number of colors.
|
||||
|
||||
On fx-9860G, the vram is either monochrome or 4-color gray, so pixel colors can
|
||||
On fx-9860G, the VRAM is either monochrome or 4-color gray, so pixel colors can
|
||||
only take 2 or 4 different values. This makes logical operations a privileged
|
||||
method to implement blitting methods, because logical operations can
|
||||
effortlessly be extended to apply on multiple pixels at once.
|
||||
|
@ -62,8 +59,8 @@ The current version of *bopti* uses the following types of layers:
|
|||
| `fill` | Monochrome | Paints white | Paints black |
|
||||
| `white` | Monochrome | - | Paints white |
|
||||
| `black` | Monochrome | - | Paints black |
|
||||
| `lfill` | Gray | Clears light vram | Paints light vram |
|
||||
| `dfill` | Gray | Clears dark vram | Paints dark vram |
|
||||
| `lfill` | Gray | Clears light VRAM | Paints light VRAM |
|
||||
| `dfill` | Gray | Clears dark VRAM | Paints dark VRAM |
|
||||
| `light` | Gray | - | Paints light gray |
|
||||
| `dark` | Gray | - | Paints dark gray |
|
||||
|
||||
|
@ -76,9 +73,10 @@ Note that most functions do nothing on 0-bits; this is an optimization related
|
|||
to *rectangle masks*. When a VRAM longword is loaded to a register, often the
|
||||
blitted image will not cover it entirely. The pixels that must be preserved are
|
||||
represented in a structure called a rectangle mask. Having this neutral 0-bit
|
||||
makes it simple to preserve relevant pixels while drawing the image. When
|
||||
layers don't have this preserving 0-bit, masks must instead be applied to the
|
||||
VRAM itself. See later for more details.
|
||||
makes it simple to preserve relevant pixels while drawing the image by setting
|
||||
the corresponding rectangle mask bits to 0. When layers don't have this
|
||||
preserving 0-bit, masks must instead be applied manually. See later for more
|
||||
details.
|
||||
|
||||
Here is the relationship between color profiles and their layers:
|
||||
|
||||
|
@ -88,7 +86,7 @@ Here is the relationship between color profiles and their layers:
|
|||
the content.
|
||||
* The `gray` profile has an `lfill` and a `dfill` layer. These two types of
|
||||
layer act on different VRAMs.
|
||||
* The `gray_alpha` profile start by blitting a `white` layer on both VRAMs,
|
||||
* The `gray_alpha` profile starts by blitting a `white` layer on both VRAMs,
|
||||
then adds a `light` layer and a `dark` layer.
|
||||
|
||||
## Logical operations on pixels
|
||||
|
@ -109,15 +107,15 @@ For gray images, we need to know that the gray engine produces an illusion of
|
|||
intermediate color by quickly alternating two buffers on the screen, with a
|
||||
different duration for each. This way, the proportion of time each pixel is
|
||||
black is one of four different values. Assuming `long` and `short` represent
|
||||
the value of a pixel in the buffer that stays longer and shorter on the screen,
|
||||
we have the following encoding:
|
||||
the value of a pixel in the VRAMs that respectively stay longer and shorter on
|
||||
the screen, we have the following encoding:
|
||||
|
||||
white = 0 (long=0 short=0)
|
||||
lightgray = 1 (long=0 short=1)
|
||||
darkgray = 2 (long=1 short=0)
|
||||
black = 3 (long=1 short=1)
|
||||
|
||||
So operations on gray pixels will modify two VRAMs at once.
|
||||
So operations on gray pixels will modify two VRAMs.
|
||||
|
||||
Among interesting operations, we have `ligthen`, which shifts all values
|
||||
towards white (and white remains white), as if decrementing them, and `darken`
|
||||
|
@ -134,14 +132,15 @@ lighten (light, dark, x) = ((light ^ x) & (dark | ~x), dark & (light | ~x))
|
|||
darken (light, dark, x) = ((light ^ x) | (dark & x), dark | (light & x))
|
||||
```
|
||||
|
||||
These functions are obtained by looking intensely at a truth table, then adding
|
||||
a linear number of `x`'s to neutralize some operands when `x=0`.
|
||||
These functions are obtained by staring at a truth table, then adding a linear
|
||||
number of `x`'s to neutralize some operands when `x=0`.
|
||||
|
||||
## Assembler-driven rendering
|
||||
|
||||
The previous implementation of bopti was already fast, usually about 8 times
|
||||
as fast as MonochromeLib. Half of it was due to vram alignment, the other was
|
||||
related to implementation and format. It had, however, two limiting factors:
|
||||
as fast as MonochromeLib. Half of the speedup was due to VRAM alignment, and
|
||||
the other half was related to implementation and format. It had, however, two
|
||||
limiting factors:
|
||||
|
||||
1. The operation function was a generic function taking the color as argument,
|
||||
and it used a switch to decide which operation to apply;
|
||||
|
@ -149,9 +148,9 @@ related to implementation and format. It had, however, two limiting factors:
|
|||
unnecessarily traversed several times.
|
||||
|
||||
These two limitations are related and can be overcome by specializing the
|
||||
rendering code which is the deepest in the critical loop. The current version
|
||||
rendering code, which is the deepest in the critical loop. The current version
|
||||
of *bopti* has one specialized rendering function per color profile,
|
||||
implemented in assembler.
|
||||
implemented in assembler, which loops and renders altogether.
|
||||
|
||||
## Image format
|
||||
|
||||
|
@ -161,15 +160,15 @@ big-endian data structure that can be efficiently traversed from the add-in.
|
|||
The image is first extended to make its width a multiple of 32 pixels, then
|
||||
stored in row-major order:
|
||||
|
||||
32 32 32
|
||||
(32) (32) (32)
|
||||
+--------+--------+--------+
|
||||
| 1 | 2 | 3 | 1
|
||||
| 1 | 2 | 3 | (1)
|
||||
+--------+--------+--------+
|
||||
| 4 | 5 | 6 | 1
|
||||
| 4 | 5 | 6 | (1)
|
||||
+--------+--------+--------+
|
||||
|
||||
A set of 32 pixels as numbered on the diagram above is called a *position*.
|
||||
This in an important concept for the rendering algorithm. For each position,
|
||||
This is an important concept for the rendering algorithm. For each position,
|
||||
the data of all layers is stored in rendering order, so the layers are
|
||||
interwoven in the storage. It also means that the data for a position will
|
||||
consist of several longwords, not just one.
|
||||
|
@ -178,7 +177,7 @@ Note that extending the image to a multiple of 32 in width is not a hard
|
|||
requirement, it can be avoided by defining and implementing 16-bit and 8-bit
|
||||
positions, but this is currently not done.
|
||||
|
||||
Along with this data, the image object contains a number of metadata:
|
||||
Along with this data, the image object contains a number of attributes:
|
||||
|
||||
```c
|
||||
typedef struct
|
||||
|
@ -200,7 +199,7 @@ typedef struct
|
|||
} GPACKED(4) image_t;
|
||||
```
|
||||
|
||||
The first byte indicate the color profile and whether this profile is
|
||||
The first byte indicates the color profile and whether this profile is
|
||||
gray-only. `width` and `height` are the natural dimensions of the image, before
|
||||
width extension (which is only relevant for storage). The number of columns is
|
||||
deduced from the width.
|
||||
|
@ -209,8 +208,8 @@ deduced from the width.
|
|||
|
||||
The rendering algorithm takes as parameter a subrectangle of an image and a
|
||||
target position on the VRAM. Drawing a subrectangle instead of the whole image
|
||||
makes it trivial to do clipping by just removing whatever goes beyond the
|
||||
screen.
|
||||
makes it trivial to do clipping by just cutting whatever goes beyond the
|
||||
screen out of the source area.
|
||||
|
||||
Two functions are available at this level:
|
||||
|
||||
|
@ -224,13 +223,13 @@ Two functions are available at this level:
|
|||
|
||||
After adjusting (or not) coordinates, both of these functions fall to the next
|
||||
level. Rectangle masks are computed to indicate which part of the VRAM must or
|
||||
not be affected. (This is because everything will be manipuled with longwords
|
||||
not be affected. (This is because everything will be manipulated with longwords
|
||||
from now on, and rendering boundaries will fall in the middle of them.)
|
||||
|
||||
Since the masks safeguards what we're going to draw, we can overestimate the
|
||||
subrectangle to render with a larger set of positions that contains it. Each
|
||||
position is rendered on two VRAM longwords using the color profile function,
|
||||
then the next position is loaded until the image is complete.
|
||||
Since the masks prevent us from painting outside of the target area, we can now
|
||||
relax our source rectangle. Instead of pixels, we can now consider full 32-bit
|
||||
positions. We'll render each of them on the VRAM using the color profile
|
||||
function then move on until the image is complete.
|
||||
|
||||
Two functions are used for this task:
|
||||
|
||||
|
@ -242,9 +241,30 @@ These are functions that take as parameter the current VRAM values, a pointer
|
|||
to image data, a pointer to rectangle masks, the x-position of the blit, and
|
||||
return new VRAM values.
|
||||
|
||||
Note that a single position will generally intersect two VRAM longwords because
|
||||
the x-coordinate supplied by the user can be arbitrary. A fair amount of
|
||||
shifting in involved to position the position (hence the name) along the proper
|
||||
x coordinate, then render. Rectangle masks are aligned on the same x-coordinate
|
||||
as the VRAM so we don't have to shift them. In general, this will look like
|
||||
this:
|
||||
|
||||
<- Preserved area -><------------- Rendered area ------------->
|
||||
|
||||
+----------- VRAM 1 ------------+----------- VRAM 2 ------------+
|
||||
| ################### # # # # # | # # # # # # # # # ########### |
|
||||
+-------------------------------+-------------------------------+
|
||||
| |
|
||||
+----------- Mask 1 ------------+----------- Mask 2 ------------+
|
||||
| ########## | ############################# |
|
||||
+-------------------------------+-------------------------------+
|
||||
| |
|
||||
+---------- Position -----------+
|
||||
<---- x offset ---->| # # # # # # # # # # # # # # # |
|
||||
+-------------------------------+
|
||||
|
||||
There are two types of such functions:
|
||||
|
||||
* `bopti_asm_*` for the `mono` and `mono_alpha` profiles, on a single VRAM.
|
||||
* `bopti_gasm_*` for all four profiles, on two VRAMs.
|
||||
|
||||
TODO: Could add more detail.
|
||||
* `bopti_asm_*` for the `mono` and `mono_alpha` profiles, on a single VRAM (but
|
||||
still with two VRAM longwords because of positioning).
|
||||
* `bopti_gasm_*` for all four profiles, on two VRAMs (for a total of four VRAM
|
||||
longwords).
|
||||
|
|
Loading…
Reference in New Issue