CGDoom/src-cg/platform.h

107 lines
3.6 KiB
C
Raw Normal View History

/* <platform.h> file for CASIO Graph 90+E / fx-CG 50 hardware */
2021-07-17 10:40:12 +02:00
#ifndef PLATFORM_H
#define PLATFORM_H
//---
Optimize loading speed (x2.7) and game speed (+35%) Loading is measured by RTC_GetTicks(). * Initial version: 9.8s This was a regression due to using 512-byte sectors instead of 4 kiB clusters as previously. * Do BFile reads of 4 kiB: 5.2s (-47%) Feels similar to original code, I'll take this as my baseline. * Test second half of Flash first: 3.6s (-31%) By reading from FLASH_FS_HINT to FLASH_END first many OS sectors can be skipped (without missing on other sectors just in case). * Load to XRAM instead or RAM with BFile The DMA is 10% slower to XRAM than to RAM, but this benefits memcmp() because of faster memory accesses through the operand bus. No effect at this point, but ends up saving 8% after memcmp is optimized. * Optimize memcmp for sectors: 3376 ms (-8%) The optimized memcmp uses word accesses for ROM (which is fastest), and weaves loop iterations to exploit superscalar parallelism. * Search sectors most likely to contain data first: 2744 ms (-19%) File fragments almost always start on 4-kiB boundaries between FLASH_FS_HINT and FLASH_END, so these are tested first. * Index most likely sectors, improve FLASH_FS_HINT: 2096 ms (-24%) Most likely sectors are indexed by first 4 bytes and binary searched, and a slightly larger region is considered for hints. The cache hits 119/129 fragments in my case. * Use optimized memcmp for consecutive fragments: 1408 ms (-33%) I only set it for the search of the first sector in each fragment and forgot to use it where it is really needed. x) Game speed is measured roughly by the time it takes to hit a wall by walking straight after spawning in Hangar. * Initial value: 4.4s * Use cached ROM when loading data from the WAD: 2.9s (-35%) Cached accesses are quite detrimental for sector search, I assume because everything is aligned like crazy, but it's still a major help when reading sequential data in real-time.
2021-07-28 22:51:03 +02:00
// WAD file access in Flash
//---
/* Settings for file mappings: traverse the whole 32-MiB Flash */
Optimize loading speed (x2.7) and game speed (+35%) Loading is measured by RTC_GetTicks(). * Initial version: 9.8s This was a regression due to using 512-byte sectors instead of 4 kiB clusters as previously. * Do BFile reads of 4 kiB: 5.2s (-47%) Feels similar to original code, I'll take this as my baseline. * Test second half of Flash first: 3.6s (-31%) By reading from FLASH_FS_HINT to FLASH_END first many OS sectors can be skipped (without missing on other sectors just in case). * Load to XRAM instead or RAM with BFile The DMA is 10% slower to XRAM than to RAM, but this benefits memcmp() because of faster memory accesses through the operand bus. No effect at this point, but ends up saving 8% after memcmp is optimized. * Optimize memcmp for sectors: 3376 ms (-8%) The optimized memcmp uses word accesses for ROM (which is fastest), and weaves loop iterations to exploit superscalar parallelism. * Search sectors most likely to contain data first: 2744 ms (-19%) File fragments almost always start on 4-kiB boundaries between FLASH_FS_HINT and FLASH_END, so these are tested first. * Index most likely sectors, improve FLASH_FS_HINT: 2096 ms (-24%) Most likely sectors are indexed by first 4 bytes and binary searched, and a slightly larger region is considered for hints. The cache hits 119/129 fragments in my case. * Use optimized memcmp for consecutive fragments: 1408 ms (-33%) I only set it for the search of the first sector in each fragment and forgot to use it where it is really needed. x) Game speed is measured roughly by the time it takes to hit a wall by walking straight after spawning in Hangar. * Initial value: 4.4s * Use cached ROM when loading data from the WAD: 2.9s (-35%) Cached accesses are quite detrimental for sector search, I assume because everything is aligned like crazy, but it's still a major help when reading sequential data in real-time.
2021-07-28 22:51:03 +02:00
#define FLASH_START ((const void *)0xA0000000)
#define FLASH_END ((const void *)0xA2000000)
/* Where we expect the file system to start, approximately (this region is
searched first to hit sectors more quickly) */
#define FLASH_FS_HINT ((const void *)0xA0C00000)
/* Flash too, but cached; slower for sector searches but much faster for actual
data loads while in-game */
#define FLASH_CACHED_START ((const void *)0x80000000)
#define FLASH_CACHED_END ((const void *)0x82000000)
2021-08-14 11:55:20 +02:00
/* When loading from Flash, non-fragmented lumps are returned directly via
pointers to Flash in order to save heap. Pointers to Flash can't be freed
and are detected with the following macro. */
#define PTR_TO_FLASH(x) ( \
((x) >= FLASH_START && (x) < FLASH_END) || \
((x) >= FLASH_CACHED_START && (x) < FLASH_CACHED_END))
Optimize loading speed (x2.7) and game speed (+35%) Loading is measured by RTC_GetTicks(). * Initial version: 9.8s This was a regression due to using 512-byte sectors instead of 4 kiB clusters as previously. * Do BFile reads of 4 kiB: 5.2s (-47%) Feels similar to original code, I'll take this as my baseline. * Test second half of Flash first: 3.6s (-31%) By reading from FLASH_FS_HINT to FLASH_END first many OS sectors can be skipped (without missing on other sectors just in case). * Load to XRAM instead or RAM with BFile The DMA is 10% slower to XRAM than to RAM, but this benefits memcmp() because of faster memory accesses through the operand bus. No effect at this point, but ends up saving 8% after memcmp is optimized. * Optimize memcmp for sectors: 3376 ms (-8%) The optimized memcmp uses word accesses for ROM (which is fastest), and weaves loop iterations to exploit superscalar parallelism. * Search sectors most likely to contain data first: 2744 ms (-19%) File fragments almost always start on 4-kiB boundaries between FLASH_FS_HINT and FLASH_END, so these are tested first. * Index most likely sectors, improve FLASH_FS_HINT: 2096 ms (-24%) Most likely sectors are indexed by first 4 bytes and binary searched, and a slightly larger region is considered for hints. The cache hits 119/129 fragments in my case. * Use optimized memcmp for consecutive fragments: 1408 ms (-33%) I only set it for the search of the first sector in each fragment and forgot to use it where it is really needed. x) Game speed is measured roughly by the time it takes to hit a wall by walking straight after spawning in Hangar. * Initial value: 4.4s * Use cached ROM when loading data from the WAD: 2.9s (-35%) Cached accesses are quite detrimental for sector search, I assume because everything is aligned like crazy, but it's still a major help when reading sequential data in real-time.
2021-07-28 22:51:03 +02:00
/* Storage unit is a cluster of 512 bytes; Fugue tries to use clusters of 4 kiB
(8 sectors) but in exceptional circumstances cluster alignment can be lost
(such as when sectors are dead) */
#define FLASH_PAGE_SIZE 512
#define FLASH_PAGE_SIZE_LOG2 9
#define FLASH_PAGE_COUNT ((FLASH_END-FLASH_START) / FLASH_PAGE_SIZE)
Optimize loading speed (x2.7) and game speed (+35%) Loading is measured by RTC_GetTicks(). * Initial version: 9.8s This was a regression due to using 512-byte sectors instead of 4 kiB clusters as previously. * Do BFile reads of 4 kiB: 5.2s (-47%) Feels similar to original code, I'll take this as my baseline. * Test second half of Flash first: 3.6s (-31%) By reading from FLASH_FS_HINT to FLASH_END first many OS sectors can be skipped (without missing on other sectors just in case). * Load to XRAM instead or RAM with BFile The DMA is 10% slower to XRAM than to RAM, but this benefits memcmp() because of faster memory accesses through the operand bus. No effect at this point, but ends up saving 8% after memcmp is optimized. * Optimize memcmp for sectors: 3376 ms (-8%) The optimized memcmp uses word accesses for ROM (which is fastest), and weaves loop iterations to exploit superscalar parallelism. * Search sectors most likely to contain data first: 2744 ms (-19%) File fragments almost always start on 4-kiB boundaries between FLASH_FS_HINT and FLASH_END, so these are tested first. * Index most likely sectors, improve FLASH_FS_HINT: 2096 ms (-24%) Most likely sectors are indexed by first 4 bytes and binary searched, and a slightly larger region is considered for hints. The cache hits 119/129 fragments in my case. * Use optimized memcmp for consecutive fragments: 1408 ms (-33%) I only set it for the search of the first sector in each fragment and forgot to use it where it is really needed. x) Game speed is measured roughly by the time it takes to hit a wall by walking straight after spawning in Hangar. * Initial value: 4.4s * Use cached ROM when loading data from the WAD: 2.9s (-35%) Cached accesses are quite detrimental for sector search, I assume because everything is aligned like crazy, but it's still a major help when reading sequential data in real-time.
2021-07-28 22:51:03 +02:00
/* Size of Bfile reads; performance is good when it's at least a cluster */
#define FLASH_BFILE_UNIT 4096
/* Whether to index ROM sectors most likely to have data to use in sector
searches (comment out to disable) */
#define FLASH_INDEX
/* Index contains 4 kiB cluster from FLASH_FS_HINT to FLASH_END; fragments are
almost always 4 kiB-aligned, and only occasionally not */
#define FLASH_INDEX_SIZE ((FLASH_END-FLASH_FS_HINT) / 4096)
//---
// Display driver access
//---
/* Whether to use direct-DD access rather than intermediate VRAM. Enabling this
option will provide the unused VRAM as additonal heap. (Comment out to
disable.) */
#define CGDOOM_DIRECT_R61524
//---
// Memory layout
//---
/* PRAM0 is an area of SPU2 memory that supports only 32-bit access. It is used
for the file mapping and some allocations; see <cgdoom-alloc.h>. */
#define PRAM0_START ((void *)0xfe200000)
#define PRAM0_END ((void *)0xfe228000)
//---
// Memory distribution
//---
/* When direct-display access is disabled, put the screens in secondary VRAM
and use VRAM for rendering. When it's enabled, put the screens in VRAM and
use the secondary VRAM as heap. We swap because using the VRAM as heap would
cause problems with error screens (stepping through errors would crash). */
#ifdef CGDOOM_DIRECT_R61524
# define CGDOOM_SCREENS_BASE ((void *)( \
((uint32_t)GetVRAMAddress() & 0x1fffffff) | 0x80000000 \
))
#else
2021-10-01 20:52:05 +02:00
# define CGDOOM_SCREENS_BASE SecondaryVRAM
#endif
/* Amount of extra memory past the 2 MB line that we deem safe to access. */
extern int CGD_2MBLineMemory;
/* Maximum amount of memory beyond the 2 MB line */
#define CGDOOM_2MBLINEMEMORY_MAX (6 << 20)
//---
2021-07-17 10:40:12 +02:00
#include "keyboard.hpp"
#include "keyboard_syscalls.h"
#include "APP_syscalls.h"
#include "CONVERT_syscalls.h"
#include "SYSTEM_syscalls.h"
#include "RTC_syscalls.h"
#include "MCS_syscalls.h"
#include "fxcg/display.h"
#include "fxcg/misc.h"
#include "fxcg/file.h"
#include "fxcg/serial.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
2021-07-17 10:40:12 +02:00
#define ASSERT(x)
#define GetOSVersion() ((char *)0x80020020)
#define printf(...)
2021-08-02 21:11:13 +02:00
2021-07-17 10:40:12 +02:00
#endif //#ifndef PLATFORM_H