Reverse engineering an arcade game with Ghidra
A guide to using Ghidra to reverse engineer arcade games
About Ghidra #
- Released April 2019 as Open Source by the NSA
- Java, can run anywhere
- Not on par IDA-Pro, but supports many CPU architectures
- Including m68k and Z80
- Very extensible, using Java or Jython
Prereqisites #
- Ghidra
- MAME
- A basic understanding of low-level languages
ROMs #
Assemble program ROMs into one single binary file:
Hi/Lo bytes #
Need to interleave odd roms with even ones if your roms are 8-bit wide and your arch is 16-bit. If your ROMs are the same width at the host CPU, you can concat them, but you may need to byte-swap them, depending on how the ROMs are hooked up to the CPU.
Show cps1.c driver code example.
Link to program to assemble these: [https://gist.github.com/sf2platinum/19adb572afe948c3e51f24727dc44a38]
CPS2 Encryption #
Makes things very tricky:
- only the program opcodes are encrypted, not the data
- no straight forward way to tell one from the other
- try to find a 'clean' dump of your game
Process #
Build a memory map #
Look at the cps1.c driver entry for your game to figure out how the ROMs are mapped, and where the I/O addresses of the custom chips are
Code ROMs #
- main CPU (we'll concentrate on this one first)
- sound CPU
Set up a new Ghidra project #
Import the main code ROM #
Import the assembled ROM
Set the type to MC68000
Open 'Options', and give it a block name, otherwise it will default to 'ram'
Explain that code doesn't always get correctly decoded straight away, we're going to have to help the disassembler to understand by giving it context.
Explain the 68k vector table, and how it uses that to get the location to jump to when it starts up
Demo setting a location to a pointer with 'p' / Data -> Pointer, convert several addresses into pointers to reveal the Vector Table
Some games set their stack pointer up here, on this game it's done manually later. The address 0x40e is loaded into the Program Counter. The CPU jumps here, and begins executing instructions.
Demo manually disassembling a bunch of instructions with 'd' / Disassemble
Explain labels and x-refs, and how some of these will be red herrings due to the disassembler not having enough context.
Assign a label for the first instruction: 'initial_pc'
Explain the red references:
- the disassembler doesn't know what these mean yet
- they're red because the represent addresses that aren't mapped yet
Demo the memory map dialog
Demo clearing erroneously decoded with 'c' / Clear Code Bytes