CGDoom/README

43 lines
1.7 KiB
Plaintext
Raw Normal View History

2015-04-15 02:16:51 +02:00
This repository is a fork of CGDOOM which was ported originally by MPoupe.
2021-07-17 10:51:07 +02:00
Credit goes to:
* Mrakoplaz for the original TI-Nspire port from MS-DOS sources.
* Critor for the Nspire CX (now CX II) port (which includes support for a
number of WAD files).
2021-07-17 10:51:07 +02:00
* MPoupe for the original fx-CG 10/20 port of DOOM.
* ComputerNerd for the first attempts at an fx-CG 50 port.
2021-07-17 10:51:07 +02:00
* Lephenixnoir for the final fixes and fx-CG 50 version.
TODO:
2021-07-30 16:46:26 +02:00
-> Fix screen not cleared when changing resolution
-> Shareware WAD crashes at the end of E1M4 (and in E1M9)
-> Ultimate DOOM WAD runs out of memory at the end of E1M2
-> Some bad textures here and there
-> Supply more VRAM memory to internal allocator
-> Level selector
-> Rate-limit the game when overclocking
-> Run key?
-> FPS counter on-screen
Optimize loading speed (x2.7) and game speed (+35%) Loading is measured by RTC_GetTicks(). * Initial version: 9.8s This was a regression due to using 512-byte sectors instead of 4 kiB clusters as previously. * Do BFile reads of 4 kiB: 5.2s (-47%) Feels similar to original code, I'll take this as my baseline. * Test second half of Flash first: 3.6s (-31%) By reading from FLASH_FS_HINT to FLASH_END first many OS sectors can be skipped (without missing on other sectors just in case). * Load to XRAM instead or RAM with BFile The DMA is 10% slower to XRAM than to RAM, but this benefits memcmp() because of faster memory accesses through the operand bus. No effect at this point, but ends up saving 8% after memcmp is optimized. * Optimize memcmp for sectors: 3376 ms (-8%) The optimized memcmp uses word accesses for ROM (which is fastest), and weaves loop iterations to exploit superscalar parallelism. * Search sectors most likely to contain data first: 2744 ms (-19%) File fragments almost always start on 4-kiB boundaries between FLASH_FS_HINT and FLASH_END, so these are tested first. * Index most likely sectors, improve FLASH_FS_HINT: 2096 ms (-24%) Most likely sectors are indexed by first 4 bytes and binary searched, and a slightly larger region is considered for hints. The cache hits 119/129 fragments in my case. * Use optimized memcmp for consecutive fragments: 1408 ms (-33%) I only set it for the search of the first sector in each fragment and forgot to use it where it is really needed. x) Game speed is measured roughly by the time it takes to hit a wall by walking straight after spawning in Hangar. * Initial value: 4.4s * Use cached ROM when loading data from the WAD: 2.9s (-35%) Cached accesses are quite detrimental for sector search, I assume because everything is aligned like crazy, but it's still a major help when reading sequential data in real-time.
2021-07-28 22:51:03 +02:00
-> Try and support more WADs
2021-07-17 10:51:07 +02:00
-> Reenable LTO if possible
2021-07-30 16:46:26 +02:00
-> Built-in overclocking?
2021-07-17 10:51:07 +02:00
CGDOOM used to be compiled with the mini-SDK. However, it's become quite
difficult to get a copy of that. Instead, this port is built with a slightly
modified PrizmSDK from Jonimoose/libfxcg.
The differences are (I might push it later):
* TOOLCHAIN_PREFIX=sh-elf- (in libc/ and libfxcg/)
* Syscall 0x1B0B, getSecondaryVramAddress() is added in libfxcg/
* abort() is removed from libc/ (CGDOOM has its own)
* calloc() defined in libc/ (just a call to sys_calloc)
* sys_calloc() fixed in libfxcg/ to use memset (memsetZero is broken)
2021-07-17 10:51:07 +02:00
* Linker script outputs in elf32-sh format
* Linker script sets 500k of RAM instead of 64k
* LTO disabled (hopefully it could be reenabled later)
* Syscall memcpy() (apparently broken) replaced by fxlibc memcpy()
Optimize loading speed (x2.7) and game speed (+35%) Loading is measured by RTC_GetTicks(). * Initial version: 9.8s This was a regression due to using 512-byte sectors instead of 4 kiB clusters as previously. * Do BFile reads of 4 kiB: 5.2s (-47%) Feels similar to original code, I'll take this as my baseline. * Test second half of Flash first: 3.6s (-31%) By reading from FLASH_FS_HINT to FLASH_END first many OS sectors can be skipped (without missing on other sectors just in case). * Load to XRAM instead or RAM with BFile The DMA is 10% slower to XRAM than to RAM, but this benefits memcmp() because of faster memory accesses through the operand bus. No effect at this point, but ends up saving 8% after memcmp is optimized. * Optimize memcmp for sectors: 3376 ms (-8%) The optimized memcmp uses word accesses for ROM (which is fastest), and weaves loop iterations to exploit superscalar parallelism. * Search sectors most likely to contain data first: 2744 ms (-19%) File fragments almost always start on 4-kiB boundaries between FLASH_FS_HINT and FLASH_END, so these are tested first. * Index most likely sectors, improve FLASH_FS_HINT: 2096 ms (-24%) Most likely sectors are indexed by first 4 bytes and binary searched, and a slightly larger region is considered for hints. The cache hits 119/129 fragments in my case. * Use optimized memcmp for consecutive fragments: 1408 ms (-33%) I only set it for the search of the first sector in each fragment and forgot to use it where it is really needed. x) Game speed is measured roughly by the time it takes to hit a wall by walking straight after spawning in Hangar. * Initial value: 4.4s * Use cached ROM when loading data from the WAD: 2.9s (-35%) Cached accesses are quite detrimental for sector search, I assume because everything is aligned like crazy, but it's still a major help when reading sequential data in real-time.
2021-07-28 22:51:03 +02:00
* fxlibc qsort() is added in libc/
* Linker script provides addresses to unused section of user RAM
2021-07-17 10:51:07 +02:00
[1] https://github.com/Jonimoose/libfxcg/