SH4compatibilityTool/DOCUMENTATION.md

10 KiB

Problem solved by this tool

sh3/sh4 compatible input function: _GetKeyState

Most of the problem solved by this tool are keyboard input related. So the first step was to create an sh3/sh4 compatible function to get keyboard key state. This function is injected at the end of the program and because of that we also need to rewrite the header, but that's easy thanks to Simon Lothar and his documentation.

You can find the full commented assembly source of the function into _getKeyState.asm.

This function is faster than most input functions (except KeyDown). So using the following code, I tested the speed of each functions first:

int AddIn_main(int isAppli, unsigned short OptionNum)
{
   unsigned int timeBegin;
   unsigned int duration;
   char string[9];
   int i;
   while(1)
   {
      timeBegin = RTC_GetTicks();//RTC_GetTicks is a syscall documented in FxReverse
      for(i=0;i<5000;i++)
      {
         key_down(K_EXE);//Change this function here
      }
      duration = RTC_GetTicks()-timeBegin;
      intToHex(duration, string);
      
      Bdisp_AllClr_DDVRAM();
      locate(1,1);
      Print((unsigned char*)string);
      Bdisp_PutDisp_DD();
   }

    return 1;
}

void intToHex(unsigned int in, char* string)
{
   string[0] = nibbleToHex((unsigned char)in>>28);
   string[1] = nibbleToHex((unsigned char)(in>>24)&0xF);
   string[2] = nibbleToHex((unsigned char)(in>>20)&0xF);
   string[3] = nibbleToHex((unsigned char)(in>>16)&0xF);
   string[4] = nibbleToHex((unsigned char)(in>>12)&0xF);
   string[5] = nibbleToHex((unsigned char)(in>>8)&0xF);
   string[6] = nibbleToHex((unsigned char)(in>>4)&0xF);
   string[7] = nibbleToHex((unsigned char)in&0xF);
   string[8] = 0;
}


char nibbleToHex(unsigned char in)
{
   char out;
   if(in <= 9)
      out = 0x30 + (unsigned int)in;
   else
   {
      switch(in-10)
      {
         case  0 : out = 0x61; break;
         case  1 : out = 0x62; break;
         case  2 : out = 0x63; break;
         case  3 : out = 0x64; break;
         case  4 : out = 0x65; break;
         case  5 : out = 0x66; break;
      }
   }
   return out;
}

And here are the result:

It shows the number of ticks taken to execute the function.

Original _GetKeyState SH3 _GetKeyState SH4
IsKeyDown 0xb2 0x17(miss 155) 0x11(miss 161)
IsKeyDown with 0x1000 waitloop --- 0x17b 0x1cd
IsKeyDown ; sh3:0x12D1F ; sh4:0xF1A8 --- 0xb2 0xb2
IsKeyUp 0x1a43 0x17(miss 6700) 0x11(miss 6706)
IsKeyUp ; sh3:0x12D1F ; sh4:0xF1A8 --- 0x1a42 0x1a4a
KeyDown 0x9 0x11 0xd
  • 0x1000 loop takes 0x164 ticks to be executed on SH3 => 1024/89 loop/ticks
  • 0x1000 loop takes 0x1BC ticks to be executed on SH4 => 1024/111 loop/ticks
  • KeyDown is faster than _GetKeyState

IsKeyDown function from FxLib

The IsKeyDown Function read directly the keyboard input state without using a syscall. But with power graphic 2 the keyboard connections to the CPU changed and it's not possible anymore to read the keyboard the same way.

As Casimo showed us with his solution in C, Power Graphic 2 calculators have a specific keyboard register that we can read easily. The solution used by this tool is it to lookup for the binary code of the IsKeyDown function, and replacing it with a new one that jump to _GetKeyState.

Firstly, we look for the isKeyDown function:

4f227ffc63f3bee52f367f0463f0633c43118b0384f1600c401189037f044f26000be000bf5664f37f044f26000b0009

Then we replace its content starting at offset 0xc of the function. (The first bytes replaced are 0x63f0).

_IsKeyDownReplacement
    ; before this injection, there is call of _KeyCodeConvert that put an array of two byte that respectivly contain the col and the line in the stack
    ; So we put them into registers to use them letter in the _GetKeyState function that we will call
    mov r15,r4 ; first param of the _GetKeyState function
    mov #1,r5 ; set slowmode of the _GetKeyState function
    ; Jump to _GetKeyState
    mov.l GetKeyStateAddress,r0
    jsr @r0 ;call _GetKeyState
    nop
    nop
    ; after _GetKeyState
    ; return to the calling position
    add #4,r15
    lds.l @r15+,pr
    rts
    nop
GetKeyStateAddress:
    .data.l h'xxxxxxxx ; addres of the compatible _GetKeyState function

IsKeyUp function from FxLib

This function doesn't works at all like IsKeyDown. This one use the syscall 0x24C called "Chattering" by the fxLib.

The prototype seem to be :

int Chattering(unsigned char* coord);

with key coordinates in an array of two cells, the first is the cols, the seconds is the row of the key. This syscall return 1 if the key is pressed.

Syscall are writed in the OS code, so when there is an OS update, syscall are generally updated to works on the new hardware (it's the case of a lot of usefull syscall). But this one seem to not work on SH4 calc, it allays return 0. So maybe Casio choosed to disable this syscall.

Anyway, no matter the original implementation, we will also use _GetKeyState and just put a "not" at the end.

_IsKeyUpReplacement
    ; before this injection, there is call of _KeyCodeConvert that put an array of two byte that respectivly contain the col and the line in the stack
    ; So we put them into registers to use them letter in the _GetKeyState function that we will call
    mov r15,r4 ; first param of the _GetKeyState function
    mov #2,r5 ; set slowmode of the _GetKeyState function
    ; Jump to _GetKeyState
    mov.l GetKeyStateAddress2,r0
    jsr @r0 ;call _GetKeyState
    nop
    nop
    ; after _GetKeyState
    ; Negate _GetKeyState output
    not r0,r0
    and #1,r0
    ; return to the calling position
    add #4,r15
    lds.l @r15+,pr
    rts
    nop
GetKeyStateAddress2:
    .data.l h'xxxxxxxx ; addres of the compatible _GetKeyState function

KeyDown function

This is the same problem as the IsKeyDown function, the I/O register change so we can not read the keyboard input on the SH4 cpu. To identify this function, this was a little harder, because this function is not precompilated like the FxLib. So I found 2 binary implementation of this function in asm code:

  • when the first line offset is equal to 0 modulo 4
  • when it's equal to 2 modulo 4.

Some asm code work only when it's on a mod4=0, like

mov.l @(h'4,pc),r0

be cause it can read a longword (4byte) only at an offset mod4=0 and the number in parameter need to be divisible by 4. That explains the difference between the two implementations.

So we replace the KeyDown function with this code

_KeyDownReplacement ; put this at beginning of the KeyDown function. (the first four byte replaced : 2FE6634C)
   ;before : keycode in r4 ; keycode=col<<4 + row
   sts.l pr,@-r15
   mov.l r1,@-r15
   mov.l r5,@-r15
   ; add #-2,r15 ;r15 need to always contain a number divisible by 4 (because when we put a longword of 4byte in the stack, we can only put it on adress multiple of 4)
   ;get the col
   mov #-4,r0
   mov r4,r1
   shld r0,r1
   ;get the row
   mov r4,r0
   and #h'f,r0
   ; mov.b r0,@-r15
   ; mov.b r1,@-r15
   ;prepartion of the array content
   shll8 r1
   add r0,r1
   shll16 r1
   mov.l r1,@-r15
   ;prepare _GetKeyState call
   mov r15,r4 ; get array address
   mov #3,r5 ; set slowmode of _GetKeyState function
   mov.l GetKeyStateAddress3,r0
   jsr @r0 ;call _GetKeyState
   nop
   ;after _GetKeyState
   add #4,r15
   mov.l @r15+,r5
   mov.l @r15+,r1
   lds.l @r15+,pr
   rts
   nop
GetKeyStateAddress3:
    .data.l h'xxxxxxxx ; addres of the compatible _GetKeyState function

monochromeLib and syscall call method

Monochromelib call OS syscall with this C++ code

static int SysCallCode[] = {0xD201422B,0x60F20000,0x80010070};
static int (*SysCall)( int R4, int R5, int R6, int R7, int FNo ) = (void*)&SysCallCode;
char* ML_vram_adress()
{
    return (char*)((*SysCall)(0, 0, 0, 0, 309));
}

The array SysCallCode is writed in the memory at the address > 0x0810000. Once this array is writen into this memory, it jump on it. The content of this array is a binary code to run syscall. That's mean that this array need to be stored in a memory that can be read, writen and executed. This was possible in SH3, but on the SH4 CPU, this memory cannot be executed.

An easy solution is to avoid writting this array on this memory by making it a "const". As const, it will stay in the programm "instruction list", and will not be copied anywhere. And the programm instruction list is obviously still readable and executable.

static const int SysCallCode[] = {0xD201422B,0x60F20000,0x80010070};
static int (*SysCall)( int R4, int R5, int R6, int R7, int FNo ) = (void*)&SysCallCode;
char* ML_vram_adress()
{
    return (char*)((*SysCall)(0, 0, 0, 0, 309));
}

The code above is not a solution for us cause we cannot edit the C without sources. Another problem is that there is many binary implementation for this solution, because it depends of the parameters. And it's too small to look it up like we do on isKeyDown.

Here is an assembly that can be generated from the C++ code

mov.l @(H'114,pc),r3 ; It get the address where the address of the array is written, here it's 0x08100014
mov.l @(H'10c,pc),r2 ; It get the syscall number from the parameter, here it's 0x135
mov.l @r3,r0 ; It get the address of the array, here it's 0x08100008
jsr @r0 ; Jump to the array
mov.l r2,@-r15 ; This is executed just before to jump : it put the syscall number in the stack

So to solve the problem, I put this code at the end of the file

_SyscallFix
   mov.l #h'80010070,r2 ; the syscall table (where we jump to execute a syscall)
   jmp @r2 ; Jump to the syscall table
   mov.l @r15,r0 ; Just before to jump, put the value in the stack to the register r0 (the value in the stack is the syscall number)

And with our original code, i edit it a little :

mov.l @(H'114,pc),r3 ; I change the value pointed to be the address of my function SyscallFix (added at the end of the file)
mov.l @(H'10c,pc),r2
mov.l r3,r0 ; change to put the address get at the first line in r0
jsr @r0 ; Jump to the my added code
mov.l r2,@-r15

And it work well with this solution.

The hardest part is how we find the original implementation, because it changes everytime. All the line that are here everytime, but sometime separated by other instructions.