commit 5e6ec0b5a3c6eea5055a182abf69bf8a9d264d3e
Author: Lephenixnoir <sebastien.mld@numericable.fr>
Date:   Sun Jul 28 13:56:30 2019 +0200

    Update page 'bopti on fx 9860G'

diff --git a/bopti-on-fx-9860G.md b/bopti-on-fx-9860G.md
new file mode 100644
index 0000000..bb55836
--- /dev/null
+++ b/bopti-on-fx-9860G.md
@@ -0,0 +1,250 @@
+*This is version 2 of bopti, included in the first fx-CG 50-compatible version
+of gint, version 2.0.*
+
+## *bopti* on fx-9860G
+
+The bitmap drawing module, *bopti*, renders images using direct bitwise
+operations on video RAM (vram) longwords. This method makes extensive use of
+the 4-alignment of gint's vram to operate on 32 pixels at a time and avoid
+costly single-bit operations.
+
+In gint's development workflow, images in usual formats are first converted to
+the *bopti* format at compile-time. The *bopti* format is designed for fast
+rendering: it consists of one or several monochrome bitmaps called *layers*,
+arranged in a fixed combination called a *profile*. To each profile corresponds
+an assembler routine designed to quickly render the image.
+
+## Performance
+
+(TODO)
+
+Probably about 15 times as fast as MonochromeLib.
+
+## Color profiles
+
+When converting an image, *fxconv* first quantizes the colors by mapping
+transparent pixels to `alpha` and mapping other pixels to the closest color in
+these four:
+
+| Color name | Hexadecimal |
+| ---------- | ----------- |
+| `black`    | `#000000`   |
+| `dark`     | `#555555`   |
+| `light`    | `#aaaaaa`   |
+| `white`    | `#ffffff`   |
+
+Then the image is assigned the smallest profile that can represent all of its
+colors:
+
+| Profile      | Supported colors                           |
+| ------------ | ------------------------------------------ |
+| `mono`       | `black`, `white`                           |
+| `mono_alpha` | `black`, `white`, `alpha`                  |
+| `gray`       | `black`, `white`, `light`, `dark`          |
+| `gray_alpha` | `black`, `white`, `light`, `dark`, `alpha` |
+
+## Layers
+
+Each profile has a fixed number of *layers* with a predefined meaning. During
+rendering, all of the layers are blit in order to produce the image. The number
+of layers in a profile is always minimal: it is $`\lceil 1 + \log n \rceil`$
+where $`n`$ is the number of colors in that profile.
+
+On fx-9860G, the vram is either monochrome or 4-color gray, so pixel colors can
+only take 2 or 4 different values. This makes logical operations a privileged
+method to implement blitting methods, because logical operations can
+effortlessly be extended to apply on multiple pixels at once.
+
+The current version of *bopti* uses the following types of layers:
+
+| Layer name  | Category   | Effect for 0-bits   | Effect for 1-bits      |
+| ----------- | ---------- | ------------------- | ---------------------- |
+| `fill`      | Monochrome | Paints white        | Paints black           |
+| `white`     | Monochrome | -                   | Paints white           |
+| `black`     | Monochrome | -                   | Paints black           |
+| `lfill`     | Gray       | Clears light vram   | Paints light vram      |
+| `dfill`     | Gray       | Clears dark vram    | Paints dark vram       |
+| `light`     | Gray       | -                   | Paints light gray      |
+| `dark`      | Gray       | -                   | Paints dark gray       |
+
+When performing an operation, *bopti* takes data from the encoded image and
+applies bitwise operations for all layers. It then moves to a different part of
+the image. The previous version of *bopti* applied each layer independently,
+but the current version applies them all at once, saving even more time.
+
+Note that most functions do nothing on 0-bits; this is an optimization related
+to *rectangle masks*. When a VRAM longword is loaded to a register, often the
+blitted image will not cover it entirely. The pixels that must be preserved are
+represented in a structure called a rectangle mask. Having this neutral 0-bit
+makes it simple to preserve relevant pixels while drawing the image. When
+layers don't have this preserving 0-bit, masks must instead be applied to the
+VRAM itself. See later for more details.
+
+Here is the relationship between color profiles and their layers:
+
+* The `mono` profile only has a `fill` layer.
+* The `mono_alpha` profile starts with a `white` layer to clear the
+  non-transparent region of the image, then blits a `black` layer to render
+  the content.
+* The `gray` profile has an `lfill` and a `dfill` layer. These two types of
+  layer act on different VRAMs.
+* The `gray_alpha` profile start by blitting a `white` layer on both VRAMs,
+  then adds a `light` layer and a `dark` layer.
+
+## Logical operations on pixels
+
+As a reference, here are the logical operations used to blit layers on past and
+present versions of bopti. The $`x`$ parameter is a boolean; the transformation
+must happen iff $`x=1`$. The significance of $`x`$ appears when extending the
+logical operations to a longword: it allows controlling 32 pixels individually
+while still using only a couple logical instructions.
+
+```c
+black  (data, x) = data | x
+white  (data, x) = data & ~x
+invert (data, x) = data ^ x
+```
+
+For gray images, we need to know that the gray engine produces an illusion of
+intermediate color by quickly alternating two buffers on the screen, with a
+different duration for each. This way, the proportion of time each pixel is
+black is one of four different values. Assuming `long` and `short` represent
+the value of a pixel in the buffer that stays longer and shorter on the screen,
+we have the following encoding:
+
+	white     = 0 (long=0 short=0)
+	lightgray = 1 (long=0 short=1)
+	darkgray  = 2 (long=1 short=0)
+	black     = 3 (long=1 short=1)
+
+So operations on gray pixels will modify two VRAMs at once.
+
+Among interesting operations, we have `ligthen`, which shifts all values
+towards white (and white remains white), as if decrementing them, and `darken`
+that shifts all values towards black (and black remains black), as if
+incrementing them.
+
+```c
+black   (light, dark, x) = (light | x, dark | x)
+dark    (light, dark, x) = (light & ~x, dark | x)
+light   (light, dark, x) = (light | x, dark & ~x)
+white   (light, dark, x) = (light & ~x, dark & ~x)
+inverse (light, dark, x) = (light ^ x, dark ^ x)
+lighten (light, dark, x) = ((light ^ x) & (dark | ~x), dark & (light | ~x))
+darken  (light, dark, x) = ((light ^ x) | (dark & x), dark | (light & x))
+```
+
+These functions are obtained by looking intensely at a truth table, then adding
+a linear number of $`x`$'s to neutralize some operands when $`x=0`$.
+
+## Assembler-driven rendering
+
+The previous implementation of bopti was already fast, usually about 8 times
+as fast as MonochromeLib. Half of it was due to vram alignment, the other was
+related to implementation and format. It had, however, two limiting factors:
+
+1. The operation function was a generic function taking the color as argument,
+   and it used a switch to decide which operation to apply;
+2. Each layer was drawn independently, so the 2D structure of the image was
+   unnecessarily traversed several times.
+
+These two limitations are related and can be overcome by specializing the
+rendering code which is the deepest in the critical loop. The current version
+of *bopti* has one specialized rendering function per color profile,
+implemented in assembler.
+
+## Image format
+
+The conversion is performed by *fxconv* at compile-time and outputs a
+big-endian data structure that can be efficiently traversed from the add-in.
+
+The image is first extended to make its width a multiple of 32 pixels, then
+stored in row-major order:
+
+        32       32       32
+    +--------+--------+--------+
+    |    1   |    2   |    3   |  1
+    +--------+--------+--------+
+    |    4   |    5   |    6   |  1
+    +--------+--------+--------+
+
+A set of 32 pixels as numbered on the diagram above is called a *position*.
+This in an important concept for the rendering algorithm. For each position,
+the data of all layers is stored in rendering order, so the layers are
+interwoven in the storage. It also means that the data for a position will
+consist of several longwords, not just one.
+
+Note that extending the image to a multiple of 32 in width is not a hard
+requirement, it can be avoided by defining and implementing 16-bit and 8-bit
+positions, but this is currently not done.
+
+Along with this data, the image object contains a number of metadata:
+
+```c
+typedef struct
+{
+  /* Image can only be rendered with the gray engine */
+  uint gray     :1;
+  /* Left for future use */
+  uint          :3;
+  /* Image profile (uniquely identifies a rendering function) */
+  uint profile  :4;
+  /* Full width, in pixels */
+  uint width    :12;
+  /* Full height, in pixels */
+  uint height   :12;
+
+  /* Raw layer data */
+  uint8_t data[];
+
+} GPACKED(4) image_t;
+```
+
+The first byte indicate the color profile and whether this profile is
+gray-only. `width` and `height` are the natural dimensions of the image, before
+width extension (which is only relevant for storage). The number of columns is
+deduced from the width.
+
+## Rendering algorithm
+
+The rendering algorithm takes as parameter a subrectangle of an image and a
+target position on the VRAM. Drawing a subrectangle instead of the whole image
+makes it trivial to do clipping by just removing whatever goes beyond the
+screen.
+
+Two functions are available at this level:
+
+* `bopti_render_clip()` clips the provided subrectangle to the image
+  dimensions, then clips that to the screen, and renders. This is the default
+  but all the checks take some time to perform.
+* `bopti_render_noclip()` directly renders by assuming that the subrectangle is
+  valid and that the render fully fits into the VRAM. In many situations these
+  assumptions are known so it can be used by passing `DIMAGE_NOCLIP` to
+  `dsubimage()` to spare time.
+
+After adjusting (or not) coordinates, both of these functions fall to the next
+level. Rectangle masks are computed to indicate which part of the VRAM must or
+not be affected. (This is because everything will be manipuled with longwords
+from now on, and rendering boundaries will fall in the middle of them.)
+
+Since the masks safeguards what we're going to draw, we can overestimate the
+subrectangle to render with a larger set of positions that contains it. Each
+position is rendered on two VRAM longwords using the color profile function,
+then the next position is loaded until the image is complete.
+
+Two functions are used for this task:
+
+* `bopti_render()` does the prep work and parameter computation.
+* `bopti_grid()` iterates over positions and calls the profiles's renderer.
+
+The last level is the profile renderer, which is implemented in assembler.
+These are functions that take as parameter the current VRAM values, a pointer
+to image data, a pointer to rectangle masks, the x-position of the blit, and
+return new VRAM values.
+
+There are two types of such functions:
+
+* `bopti_asm_*` for the `mono` and `mono_alpha` profiles, on a single VRAM.
+* `bopti_gasm_*` for all four profiles, on two VRAMs.
+
+TODO: Could add more detail.