This change adds proper key control by querying the KEYSC directly
instead of using PRGM_GetKey(). This allows for the very distinctive
advantage of pressing multiples keys at once.
Controls are still quite hard to use, I'll think of an alternative
keymap.
lumpinfo is now allocated in Z_Malloc because it's needed for some
larger WADs.
More heap is needed to compensate and to support larger WADs fully, so
the unused part of the user stack is added as a second zone.
This makes at least the start of the DOOM Ultimate WAD playable.
The bar takes up a little bit of time too, but I think it's a plus.
Currently it's limited to ~20 frames which is normally < 0.3s. A frame
every fragment is disastrous in comparison (loading time x3 lol).
This was using screens[1] which I had deallocated when fixing the status
bar (I incorrectly assumed it was used only for that).
While the CGDOOM technique to share screens[1] to avoid allocating the
320x20 buffer for the status bar makes clear sense with that new
information, I think I'll keep this 6.4 kB buffer there and rather
search for ways to use more memory zones.
Loading is measured by RTC_GetTicks().
* Initial version: 9.8s
This was a regression due to using 512-byte sectors instead of 4 kiB
clusters as previously.
* Do BFile reads of 4 kiB: 5.2s (-47%)
Feels similar to original code, I'll take this as my baseline.
* Test second half of Flash first: 3.6s (-31%)
By reading from FLASH_FS_HINT to FLASH_END first many OS sectors can
be skipped (without missing on other sectors just in case).
* Load to XRAM instead or RAM with BFile
The DMA is 10% slower to XRAM than to RAM, but this benefits memcmp()
because of faster memory accesses through the operand bus. No effect
at this point, but ends up saving 8% after memcmp is optimized.
* Optimize memcmp for sectors: 3376 ms (-8%)
The optimized memcmp uses word accesses for ROM (which is fastest),
and weaves loop iterations to exploit superscalar parallelism.
* Search sectors most likely to contain data first: 2744 ms (-19%)
File fragments almost always start on 4-kiB boundaries between
FLASH_FS_HINT and FLASH_END, so these are tested first.
* Index most likely sectors, improve FLASH_FS_HINT: 2096 ms (-24%)
Most likely sectors are indexed by first 4 bytes and binary searched,
and a slightly larger region is considered for hints. The cache hits
119/129 fragments in my case.
* Use optimized memcmp for consecutive fragments: 1408 ms (-33%)
I only set it for the search of the first sector in each fragment and
forgot to use it where it is really needed. x)
Game speed is measured roughly by the time it takes to hit a wall by
walking straight after spawning in Hangar.
* Initial value: 4.4s
* Use cached ROM when loading data from the WAD: 2.9s (-35%)
Cached accesses are quite detrimental for sector search, I assume
because everything is aligned like crazy, but it's still a major help
when reading sequential data in real-time.
* Restore screen numbers; BG is 4, at least in the ST module.
* Let ST module allocate BG, which is just 32 pixels high and not a full
VRAM (huge memory gain!)
* Fix V_CopyRect() not working because memcpy is still broken (this will
be changed later with a proper memcpy)
BFile can now be selected in <platform.h> by defining CGDOOM_WAD_BFILE
instead of CGDOOM_WAD_MAPPING. The DMA option is not implemented yet.
BFile works as expected - a lot of stuttering due to reads during
gameplay. But the status bar texture still doesn't load properly!
* Use sh-elf-gcc (as used on Planète Casio)
* Link with libfxcg -DFXCG_MINI_COMPAT
* Disable LTO as it caused problems (hopefully could be reenabled later)