From f4a0f4ba766c81c65583c1635609d1669ba6fdbc Mon Sep 17 00:00:00 2001
From: Lephenixnoir <sebastien.michelland@protonmail.com>
Date: Thu, 27 Feb 2020 10:58:55 +0100
Subject: [PATCH] Details about the work of the profile functions.

---
 bopti-on-fx-9860G.md | 106 +++++++++++++++++++++++++------------------
 1 file changed, 63 insertions(+), 43 deletions(-)

diff --git a/bopti-on-fx-9860G.md b/bopti-on-fx-9860G.md
index ec09ba1..fcfe58b 100644
--- a/bopti-on-fx-9860G.md
+++ b/bopti-on-fx-9860G.md
@@ -1,11 +1,8 @@
-*This is version 2 of bopti, included in the first fx-CG 50-compatible version
-of gint, version 2.0.*
-
-## *bopti* on fx-9860G
+**## *bopti* on fx-9860G
 
 The bitmap drawing module, *bopti*, renders images using direct bitwise
-operations on video RAM (vram) longwords. This method makes extensive use of
-the 4-alignment of gint's vram to operate on 32 pixels at a time and avoid
+operations on video RAM (VRAM) longwords. This method makes extensive use of
+the 4-alignment of gint's VRAM to operate on 32 pixels at a time and avoid
 costly single-bit operations.
 
 In gint's development workflow, images in usual formats are first converted to
@@ -22,9 +19,9 @@ Probably about 15 times as fast as MonochromeLib.
 
 ## Color profiles
 
-When converting an image, *fxconv* first quantizes the colors by mapping
-transparent pixels to `alpha` and mapping other pixels to the closest color in
-these four:
+When converting an image, the fxconv tool of the [fxSDK](/Lephenixnoir/fxsdk)
+first quantizes the colors by mapping transparent pixels to `alpha` and every
+other pixel to the closest color in these four:
 
 | Color name | Hexadecimal |
 | ---------- | ----------- |
@@ -48,9 +45,9 @@ colors:
 Each profile has a fixed number of *layers* with a predefined meaning. During
 rendering, all of the layers are blit in order to produce the image. The number
 of layers in a profile is always minimal: it is `⌊ 1 + log n ⌋` where `n` is
-the number of colors in that profile.
+the number of colors.
 
-On fx-9860G, the vram is either monochrome or 4-color gray, so pixel colors can
+On fx-9860G, the VRAM is either monochrome or 4-color gray, so pixel colors can
 only take 2 or 4 different values. This makes logical operations a privileged
 method to implement blitting methods, because logical operations can
 effortlessly be extended to apply on multiple pixels at once.
@@ -62,8 +59,8 @@ The current version of *bopti* uses the following types of layers:
 | `fill`      | Monochrome | Paints white        | Paints black           |
 | `white`     | Monochrome | -                   | Paints white           |
 | `black`     | Monochrome | -                   | Paints black           |
-| `lfill`     | Gray       | Clears light vram   | Paints light vram      |
-| `dfill`     | Gray       | Clears dark vram    | Paints dark vram       |
+| `lfill`     | Gray       | Clears light VRAM   | Paints light VRAM      |
+| `dfill`     | Gray       | Clears dark VRAM    | Paints dark VRAM       |
 | `light`     | Gray       | -                   | Paints light gray      |
 | `dark`      | Gray       | -                   | Paints dark gray       |
 
@@ -76,9 +73,10 @@ Note that most functions do nothing on 0-bits; this is an optimization related
 to *rectangle masks*. When a VRAM longword is loaded to a register, often the
 blitted image will not cover it entirely. The pixels that must be preserved are
 represented in a structure called a rectangle mask. Having this neutral 0-bit
-makes it simple to preserve relevant pixels while drawing the image. When
-layers don't have this preserving 0-bit, masks must instead be applied to the
-VRAM itself. See later for more details.
+makes it simple to preserve relevant pixels while drawing the image by setting
+the corresponding rectangle mask bits to 0. When layers don't have this
+preserving 0-bit, masks must instead be applied manually. See later for more
+details.
 
 Here is the relationship between color profiles and their layers:
 
@@ -88,7 +86,7 @@ Here is the relationship between color profiles and their layers:
   the content.
 * The `gray` profile has an `lfill` and a `dfill` layer. These two types of
   layer act on different VRAMs.
-* The `gray_alpha` profile start by blitting a `white` layer on both VRAMs,
+* The `gray_alpha` profile starts by blitting a `white` layer on both VRAMs,
   then adds a `light` layer and a `dark` layer.
 
 ## Logical operations on pixels
@@ -109,15 +107,15 @@ For gray images, we need to know that the gray engine produces an illusion of
 intermediate color by quickly alternating two buffers on the screen, with a
 different duration for each. This way, the proportion of time each pixel is
 black is one of four different values. Assuming `long` and `short` represent
-the value of a pixel in the buffer that stays longer and shorter on the screen,
-we have the following encoding:
+the value of a pixel in the VRAMs that respectively stay longer and shorter on
+the screen, we have the following encoding:
 
 	white     = 0 (long=0 short=0)
 	lightgray = 1 (long=0 short=1)
 	darkgray  = 2 (long=1 short=0)
 	black     = 3 (long=1 short=1)
 
-So operations on gray pixels will modify two VRAMs at once.
+So operations on gray pixels will modify two VRAMs.
 
 Among interesting operations, we have `ligthen`, which shifts all values
 towards white (and white remains white), as if decrementing them, and `darken`
@@ -134,14 +132,15 @@ lighten (light, dark, x) = ((light ^ x) & (dark | ~x), dark & (light | ~x))
 darken  (light, dark, x) = ((light ^ x) | (dark & x), dark | (light & x))
 ```
 
-These functions are obtained by looking intensely at a truth table, then adding
-a linear number of `x`'s to neutralize some operands when `x=0`.
+These functions are obtained by staring at a truth table, then adding a linear
+number of `x`'s to neutralize some operands when `x=0`.
 
 ## Assembler-driven rendering
 
 The previous implementation of bopti was already fast, usually about 8 times
-as fast as MonochromeLib. Half of it was due to vram alignment, the other was
-related to implementation and format. It had, however, two limiting factors:
+as fast as MonochromeLib. Half of the speedup was due to VRAM alignment, and
+the other half was related to implementation and format. It had, however, two
+limiting factors:
 
 1. The operation function was a generic function taking the color as argument,
    and it used a switch to decide which operation to apply;
@@ -149,9 +148,9 @@ related to implementation and format. It had, however, two limiting factors:
    unnecessarily traversed several times.
 
 These two limitations are related and can be overcome by specializing the
-rendering code which is the deepest in the critical loop. The current version
+rendering code, which is the deepest in the critical loop. The current version
 of *bopti* has one specialized rendering function per color profile,
-implemented in assembler.
+implemented in assembler, which loops and renders altogether.
 
 ## Image format
 
@@ -161,15 +160,15 @@ big-endian data structure that can be efficiently traversed from the add-in.
 The image is first extended to make its width a multiple of 32 pixels, then
 stored in row-major order:
 
-        32       32       32
+       (32)     (32)     (32)
     +--------+--------+--------+
-    |    1   |    2   |    3   |  1
+    |    1   |    2   |    3   |  (1)
     +--------+--------+--------+
-    |    4   |    5   |    6   |  1
+    |    4   |    5   |    6   |  (1)
     +--------+--------+--------+
 
 A set of 32 pixels as numbered on the diagram above is called a *position*.
-This in an important concept for the rendering algorithm. For each position,
+This is an important concept for the rendering algorithm. For each position,
 the data of all layers is stored in rendering order, so the layers are
 interwoven in the storage. It also means that the data for a position will
 consist of several longwords, not just one.
@@ -178,7 +177,7 @@ Note that extending the image to a multiple of 32 in width is not a hard
 requirement, it can be avoided by defining and implementing 16-bit and 8-bit
 positions, but this is currently not done.
 
-Along with this data, the image object contains a number of metadata:
+Along with this data, the image object contains a number of attributes:
 
 ```c
 typedef struct
@@ -200,7 +199,7 @@ typedef struct
 } GPACKED(4) image_t;
 ```
 
-The first byte indicate the color profile and whether this profile is
+The first byte indicates the color profile and whether this profile is
 gray-only. `width` and `height` are the natural dimensions of the image, before
 width extension (which is only relevant for storage). The number of columns is
 deduced from the width.
@@ -209,8 +208,8 @@ deduced from the width.
 
 The rendering algorithm takes as parameter a subrectangle of an image and a
 target position on the VRAM. Drawing a subrectangle instead of the whole image
-makes it trivial to do clipping by just removing whatever goes beyond the
-screen.
+makes it trivial to do clipping by just cutting whatever goes beyond the
+screen out of the source area.
 
 Two functions are available at this level:
 
@@ -224,13 +223,13 @@ Two functions are available at this level:
 
 After adjusting (or not) coordinates, both of these functions fall to the next
 level. Rectangle masks are computed to indicate which part of the VRAM must or
-not be affected. (This is because everything will be manipuled with longwords
+not be affected. (This is because everything will be manipulated with longwords
 from now on, and rendering boundaries will fall in the middle of them.)
 
-Since the masks safeguards what we're going to draw, we can overestimate the
-subrectangle to render with a larger set of positions that contains it. Each
-position is rendered on two VRAM longwords using the color profile function,
-then the next position is loaded until the image is complete.
+Since the masks prevent us from painting outside of the target area, we can now
+relax our source rectangle. Instead of pixels, we can now consider full 32-bit
+positions. We'll render each of them on the VRAM using the color profile
+function then move on until the image is complete.
 
 Two functions are used for this task:
 
@@ -242,9 +241,30 @@ These are functions that take as parameter the current VRAM values, a pointer
 to image data, a pointer to rectangle masks, the x-position of the blit, and
 return new VRAM values.
 
+Note that a single position will generally intersect two VRAM longwords because
+the x-coordinate supplied by the user can be arbitrary. A fair amount of
+shifting in involved to position the position (hence the name) along the proper
+x coordinate, then render. Rectangle masks are aligned on the same x-coordinate
+as the VRAM so we don't have to shift them. In general, this will look like
+this:
+
+     <- Preserved area -><------------- Rendered area ------------->
+
+    +----------- VRAM 1 ------------+----------- VRAM 2 ------------+
+    | ################### # # # # # | # # # # # # # # # ########### |
+    +-------------------------------+-------------------------------+
+                        |                               |
+    +----------- Mask 1 ------------+----------- Mask 2 ------------+
+    |                    ########## | ############################# |
+    +-------------------------------+-------------------------------+
+                        |                               |
+                        +---------- Position -----------+
+    <---- x offset ---->| # # # # # # # # # # # # # # # |
+                        +-------------------------------+
+
 There are two types of such functions:
 
-* `bopti_asm_*` for the `mono` and `mono_alpha` profiles, on a single VRAM.
-* `bopti_gasm_*` for all four profiles, on two VRAMs.
-
-TODO: Could add more detail.
+* `bopti_asm_*` for the `mono` and `mono_alpha` profiles, on a single VRAM (but
+  still with two VRAM longwords because of positioning).
+* `bopti_gasm_*` for all four profiles, on two VRAMs (for a total of four VRAM
+  longwords).