All “managed FLASH” devices, such as SD, microSD, and SSD, contain an embedded controller to assist with the complex tasks necessary to create an abstraction of reliable, contiguous storage out of FLASH silicon that is fundamentally unreliable and unpredictably fragmented. This controller is an attack surface of interest. First, the ability to modify the block allocation and erasure algorithms introduces the opportunity to perform various MITM attacks in a virtually undetectable fashion. Second, the controller itself is typically powerful, with performance around 50MIPS, yet with a cost of mere pennies, making it an interesting and possibly useful development target for other non-storage related purposes. Finally, understanding the inner workings of the controller enables opportunities for data recovery in cards that are thought to have been erased, or have been partially damaged.
This talk demonstrates a method for reverse engineering and loading code into the microcontroller within a SD memory card.
TECHNICAL APPROACH
Publicly available documentation on SD card controllers is scarce. However, based upon tear-down and decap analysis as well as a survey of the publicly available product briefs, most controllers are believed to be either an enhanced 8051, or an ARM derivative.
A further challenge to overcome is the fact that SD card manufacturers typically reserve the right to change the controller IC within a card without updating the external markings to reflect the change. This policy favors the SD card manufacturers, as it allows them to swap out existing controllers for lower-cost devices as new controllers are introduced. However, it is problematic for users as it means that two otherwise identical looking cards can have different performance and/or bugs with which to contend.
To kick off the effort, a survey of available cards was made at an SD card gray market in Shenzhen, China. Each card was dissected and visually inspected for cues, such as the layout of the traces going to the controller glob-top, that would indicate the type of controller within. About a dozen different controller types were identified, of which one was picked for further investigation due to its use of SLC FLASH memory. SLC is a good starting point for reverse engineering because no storage-level scrambling is required to prevent the read and write-disturb issues typical of MLC and TLC FLASH.
A simple binary dump of the FLASH memory within the card revealed structure within the first erase block consistent with what we might expect for code storage. Since FLASH memory is inherently unreliable, four CRC + ECC protected copies of code are located within the first sector. This crude duplication scheme allows the card to boot even if bit errors creep into the code storage sector. We also noted the existence of the string “BUILDWIN” within the code storage sector, which indicates that the controller is likely from a series made by a company called Appotech. Product briefs from the Appotech/Buildwin websites indicated that the architecture of the code is likely an 8051-derivative, and the model of the controller is probably an AX211.
At this point, our effort to reverse engineer the controller split into two paths. One path was static analysis, where extracted binaries and manufacturing-related tools were analyzed with IDA to determine key entry points, storage locations, and most importantly a method for injecting code into the card via the SD interface. The other path was dynamic analysis, where the signals going to the SD card bus and to the NAND FLASH were instrumented with logging and stimulus facilities, and the controller's operation could be observed with exquisite resolution, enabling a broad class of fuzzing and other brute-force analysis attacks, as well as the rapid confirmation or rejection of hypotheses generated by static analysis. Dynamic analysis was key in determining features such as the location of the GPIO control registers and the function and format of otherwise undocumented extended instruction opcodes.
The static analysis path was assisted by the availability of official firmware burning routines, scavenged from Chinese language file-sharing websites. These are tools normally used during the production of SD cards, but made available on the gray market to enable (and correct) card capacity expansion fraud. Typically, these tools are used to flash an incorrect version of the firmware onto the SD cards, which would identify the card as having a much higher capacity than the physically available storage. This would allow unscrupulous dealers to sell, for example, aging 128MiB silicon as devices identifying themselves with 2GiB capacity. These bootleg tools would come with a collection of firmware blobs for loading onto the card, as well as a routine that communicates to a proprietary USB-based burning device. We did not have access to the burner, but static analysis of the communication protocol via code reverse engineering revealed that firmware loading is initiated through the application of a specific “secret knock” sequence during card boot.
The dynamic analysis path was implemented using an embedded platform of our making, known publicly as Novena. It is a quad-core ARM CPU running Linux mated to a Spartan 6 FPGA with a high speed expansion port. The FPGA also has a 256 MiB DDR3 buffer slaved off of it. This buffer + FPGA was used to implement a combination logic analyzer + ROM emulator used to wrap the controller on the SD card. The ROM emulator presents a virtual, dual-ported NAND FLASH device to the SD card controller. The dual-port design allows us to modify and read the contents of the NAND FLASH on the fly, thereby allowing us to stimulate the controller IC with various fault conditions or code loops and measure its effect on the FLASH device. This tight coupling enabled us to rapidly discover, for example, which SFRs (Special Function Registers) are responsible for reading or writing to the Flash controller hardware. Furthermore, each port of the SD card controller is connected to a virtual logic analyzer within the FPGA that can store up to a 16MiB trace of transactions going to and from the controller from both the SD and FLASH buses. This very deep buffer length allowed us to observe behavior of the device from power-on to full operation, as well as observe stimulus-response loops to compound SD bus commands.
SD INTERFACE OVERVIEW
Before diving into the details of our approach, we review a few important aspects of the SD protocol. When an SD card first boots, it is running in SD/MMC mode. This is a 9-wire protocol, using the following pinout (note: pin 9 is located below and to the side of pin 1):
12345678
9|||||||\- DAT1
|||||||\-- DAT0
||||||\--- GND
|||||\---- Clock
||||\----- VDD +3.3V
|||\------ GND
||\------- Command
|\-------- DAT3
\--------- DAT2
SD commands consist of a start bit of 0, a sync bit of 1, a 6-bit command number (referred to as CMDn, where n is between 0 and 63,) four 8-bit arguments, a CRC7 checksum, and a stop-bit of 1. This yields a total of 48-bits (or 6 bytes) per command.
While the card is processing the request, it will keep the Command line high, and will begin its communication with a start bit by bringing the Command line low, indicating "0". This will be followed by 38 bits of data, 7 bits of crc7, and a stop bit of "1", for a total of 48 bits.
Bit values are sampled when the clock line is high, and may change when the clock line is low. Bytes are sent from high to low. For example, hex value 0x80 would be written out as 0b10000000. CMD0 must be sent before any other commands are processed. To send such a command, transmit the following six byes.
{0x40 0x00 0x00 0x00 0x00 0x95}
| | | | | \----- (CRC7 of ({0x40, 0x00, 0x00, 0x00}))<<1 | 1
| \----\----\----\---------- Four arguments of "0"
\------------------------------ Start bit of 0, cmd bit of 1, CMD0)
Then, continue to wiggle the clock, and look for the 48-bit response. Data is transmitted either on the DAT0 line (for 1-bit mode), or striped in parallel across DAT0-DAT3. As with the Command line, bit 7 is sent first, and bit 0 is sent last. In four-bit mode, DAT3 will send bit 7, DAT2 will send bit 6, DAT1 will send bit 5, and DAT4 will send bit 4. Bits 3-0 will be sent using DAT3-DAT0 on the next clock cycle.
Data is transmitted with a start bit of 0, followed by the data (almost always 512 bytes), followed by a CRC16 of the data, and then a stop bit of 1. In the case of 4-bit mode, all four DAT lines send a start bit of 0, and each individual line sends its own CRC16 of the transmitted data. All four lines send a stop bit of 1, as well.
CONTROLLER-SPECIFIC NOTES
The AX211 is based on the 8051 architecture, which is an eight-bit processor core that is very common among embedded parts. The original 8051 ran at around 1 MIPS, but improvements in architecture and clock speed allows the 8051 present in the AX211 to run at 50 MIPS. The 8051 instruction set had a single undefined opcode of 0xa5, which is used by Appotech to implement 32-bit instruction set extensions.
The hardware SD protocol engine only handles transferring bytes (OSI layers 1 & 2). The higher-level details of the SD protocol set is implemented entirely in software, making for a compelling attack surface. When the processor first starts up it attempts to load a valid flash image off of the NAND. While it is in this state, it can respond to a limited set of SD commands, including "RESET", "Set CRC7 required", and "Set voltage". It will not respond to the "Go Ready" command until it has loaded its firmware. There are four copies of the AX211 firmware, located at NAND offset 0x0000, 0x10000, 0x20000, and 0x30000. Each copy is protected with its own error correcting code. If all four copies are sufficiently damaged, or if the NAND is missing or blank, then the card will never fully boot, and will never be able to go ready.
FACTORY PROGRAMMER
As noted previously, we had found a copy of the Windows-based AX211 factory programming tool off of Chinese-language file sharing websites. The full AX211 factory programming suite is a combination of this software and a proprietary (and unknown to us) hardware tool. We suspect the hardware is based on the AppoTech AX2002 processor, which also uses an 8051-compatible instruction set. The device connects to the main PC through USB, and has the USB ID string of "appotechcksd". This allows the software to identify the programmer regardless of its product/vendor ID. We believe the device may have a small display, or else it prints debugging information out a serial port. The programming software is almost entirely in Chinese. It does not contain a codepage identifier, so a system must be set to run non-Unicode programs as "Chinese (PRC)", which selects an encoding of GB-18030. If you do not do this, the software will display gibberish in place of Chinese text.
The programmer control software appears to be fully automated. Without the programmer hardware, we can only configure the software and see where, presumably, individual hardware programmers would appear in the user interface. Therefore, we must resort to disassembling the binary to determine how it works. Code flow analysis indicates that when the software initially detects the attachment of factory programming hardware via USB, it opens the device and sends a binary file called "2005FM.BIN". 2005FM.BIN is a raw opcode stream that is loaded into the programmer hardware's 8051 (not the SD target) at address 0. The program begins running from address 0, and begins doing some initialization. It then displays the string "RUN" through the putative logging interface on the programming hardware. After this, it awaits further control instructions from the host.
The microcontroller used in the programmer hardware does not contain an SD/MMC host controller. This is fortunate for us, as it means the SD programming interface is implemented entirely using GPIOs. This makes it easy to trace all signal changes through static code analysis.
When programming begins, the programmer places the SD card into a special mode where it will accept and execute programs. It does this by sending SD CMD0 (card reset) followed within 20 msec by CMD63 with the arguments 'A' 'P' 'P' 'O' and the appropriate crc7 checksum. The card then responds with the following sequence:
{0x3f 0x00 0x9d 0x20 0x0b 0x35}
| | | | | \-- CRC7 checksum, plus stop bit
| \----\----\----\------- Unknown values
\---------------------------- The command we sent (CMD63)
The program to be executed on the SD card is then loaded to offset 0x2900, and will begin executing immediately from within an interrupt context. Therefore, the first thing the routine should do is escape the context by changing the return value, then issuing a "reti" command.
When the host software detects a new card to be programmed, it informs the programmer to put the card into this special factory mode, and then begins uploading firmware and communicating with the card. The first program to get loaded is a small piece of test code called TestBoot.bin. The program performs a few sanity checks, then returns pass or fail by sending the result out the Command line. Once the program finishes, control returns to the AX211 ROM.
After TestBoot.bin runs, the programmer loads a file called communication.bin. This program contains a full interrupt vector table, and is therefore able to replace functions such as SD communication and NAND access. It uses this to load a new, expanded set of communication options, including the ability to load and execute another program at memory offset 0x800. Because it has its own interrupt vector table, it loads at offset 0, and takes control of the entire card.
Once the communications program is loaded, the programmer uses it to load more advanced programs that actually deal with the NAND flash component. Programs are loaded at offset 0x800 in order to allow the communications program (and its interrupt vector table) to stay resident.
The first program loaded is FLASH_SCAN.BIN, which is responsible for scanning the flash to determine its properties. This can be useful when operators are not familiar with the type of NAND they're working with. This program is able to inform the host of properties such as the page size, flash size, and ECC data. The host can then use this information to construct a Card ID block.
It's important to note that the host controller can choose to ignore the information coming from the flash, and can program invalid flash sizes. This can occur either deliberately or accidentally. Thus, some "counterfeit" cards may simply be a result of an operator forgetting to change the settings when programming a new batch of NAND.
After the flash has been scanned, a program called FLASH_PRO.BIN is loaded. This is responsible for the actual programming of the NAND with the requisite firmware file. With FLASH_PRO.BIN running, the programmer must feed the card a firmware file. A wide variety of firmwares are bundled with the programming software, and vary between the vendor, page size, block size, number of planes, and the number of chip enables. The host picks an appropriate firmware file to send, and writes it out to the card.
After programming is complete, the programmer resets the card and formats it. Depending on the size of the card, the programmer places either a FAT16 or a FAT32 header on the card, and creates the actual file allocation table. It is interesting to note that there are two DOS MBRs stored in the 2005FM.BIN programmer firmware, complete with an NTLDR boot sector. This gets placed at the head of the card, with the actual file allocation table following shortly after.
The upshot of reversing the factory programmer binaries is the development of our own routines which is capable of knocking the AX211 and loading our own code onto the device.
AX211 FIRMWARE REVERSING
The contents of the programs run on the AX211 has also been analyzed. In addition to confirming the host-side initialization protocols documented above, the firmware reversing effort has revealed more details on the allocation of instruction set enhancements and the Function Specific Registers (FSRs). FSRs are memory-mapped registers that are used to control and configure the state of the hardware. For example, the pins required to turn NAND I/Os into GPIOs and toggle their values is contained in the FSR region; therefore discovering the FSR map is an important component of achieving maximum utility from the SD card controller.
Opcode extensions on the 8051 are accomplished by using the escape opcode 0xA5. This is the only opcode function in the 8051 which doesn't have a pre-defined value. Typically, the escape opcode is followed by one or more bytes that specify which escape sequence to use. In our investigation of the firmware, we have only seen one and two-byte escape sequences. Two-byte sequences require that the lower nibble of the first escape opcode to be 1's. We have modified the 8051 processor module in IDA to handle these escape sequences gracefully. Our current theory is that the opcode extensions are used to trigger various NAND interface functions, such as transmitting a command cycle or computing ECC on a loaded page of memory.
The FSR map was determined in part through static binary code analysis in IDA, and in part through a dynamic fuzzing infrastructure. For example, to determine which FSR was used as a GPIO, we wrote a small assembly stub that would set a pseudo-random (based upon a seed value, so we could reproduce experiments later on) selection of four FSR registers within the 128-byte window of valid FSRs to 0xFF, then 0x00, with a few microseconds' delay between. We had connected an oscilloscope to one of the NAND I/Os on the AX211, and monitored for changes in I/O status while running this code. We decided to write four bytes at once under the theory that we had to configure not only a data register, but also direction & configuration registers as well to be outputs for us to measure a change using the oscilloscope. With some luck, we were able to discover very quickly an FSR which, when written to, would cause some of the NAND I/O pins to flip on and off.
Through this combination of static and dynamic analysis, we have confirmed the function of much of the FSR table, which looks like this:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | A | B | C | D | E | F |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
80 | SDMOD | SP | DPL | DPH | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
88 | SDOS | SDI4 | SDI3 | SDI2 | SDI1 | SDCMD | IACK | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
90 | SDSM | | | SDBL | SDBH | | SDDL | SDDH |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
98 | | | | | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
A0 | NTYPE | NCMD | NRAML | NRAMH | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
A8 | IE | NCMD1 | NCMD2 | NADD0 | NADD1 | NADD2 | NADD3 | NADD4 |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
B0 | | RAND | | | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
B8 | ER8 | | | | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
C0 | ER00 | ER01 | ER02 | ER03 | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
C8 | ER10 | ER11 | ER12 | ER13 | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
D0 | | | | | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
D8 | ER20 | ER21 | ER22 | ER23 | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
E0 | ACC | | | | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
E8 | | | NFMT | SDDIR | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
F0 | B | | NPRE1 | NPRE2 | | | PORT1 | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
F8 | ER30 | ER31 | ER32 | ER33 | | | | |
-----+-------+-------+-------+-------+-------+-------+-------+-------+
Our presumption is many of the spots reading back as 00 are probably unused FSRs, so we have probably uncovered the function of most of the implemented FSRs.
WORKING IT: A REPL DEBUGGER
In investigating the actual runtime firmware of the AX211 card, it has become apparent that the entire SD engine at the protocol level is implemented in 8051 code. Only the SD physical layer (including both SPI and MMC mode) is handled in hardware. When the host controller sends a command such as CMD0, the host controller actually treats the command as an offset in a jump table. Application commands are also treated in a separate jump table. This means that it is a trivial matter to add an additional command. All one needs to do is co-opt one of the unused commands, which currently point to a default handler. There is plenty of space at the end of the firmware to add any additional commands.
Now that the SD controller's functions are well known, we were able to implement a REPL-mode debugger firmware for the SD controller:
root@bunnie-novena:~/ax211-code# ./ax211 -d debugger.bin
FPGA hardware v1.26
Debug mode APPO response [6]: {0x3f 0x00 0xc1 0x04 0x17 0xab}
Result of factory mode: 0
00000000 0f 41 1f 0f 0f 0f ff ff |.A......|
Expected 0x00 0x00, got 0x0f 0x41
Loaded debugger
Locating fixup hooks... Done
AX211> help
List of available commands:
hello Make sure the card is there
peek Read an area of memory
poke Write to an area of memory
jump Jump to an area of memory
dumprom Dump all of ROM to a file
memset Set a range of memory to a single value
null Do nothing and return all zeroes
disasm Disassemble an area of memory
ram Manipulate internal RAM
sfr Manipulate special function registers
nand Operate on the NAND in some fashion
extop Execute an extended opcode on the chip
reset Reset the AX211 card
help Print this help
For more information on a specific command, type 'help [command]'
AX211>
As one can see, the features of the debugger include generic options such as memory manipulation (peek/poke), but also includes more powerful features such as disassembly, and chip-specific features such as NAND manipulation via the internal SFRs.
DEBUGGER IMPLEMENTATION
The debugger relies upon two key mechanisms that were reverse-engineered in the course of this study: (1) the ability to load small stubs of arbitrary code to bootstrap the system and (2) the ability to pass messages back and forth from the AX211 using the SD PHY layer.
The footprint of the native 8051 code for the debugger is limited to 512 bytes; this is the stub size afforded by the mask ROM bootloader on the AX211. Therefore, in order to implement a rich command feature set, the debugger functions are partitioned between a host CPU (the ARM i.MX6) and the target 8051-based CPU. For example, disassembling a particular region of memory breaks down to a script executed on the host side that drives the native AX211 stub code to dump the requested portion of memory to disassemble, followed up with the disassembly algorithm running on the host ARM CPU.
The only way to talk to the AX211 is via the SD interface. Therefore, the interactive REPL environment must run on the host ARM system, which is equipped with the requisite terminal interface features such as a ssh console. The REPL environment then translates the user's requests into bundles of SD commands. We take advantage of the command/response token protocol built into the SD-PHY spec (see page 8 of SD Card Physical Layer Simplified Specification v2.00 for more information). The command/response protocol runs bidirectionally on a single wire, the CMD line of the SD-PHY interface, and is timed by the CLK line. Command tokens are 48-bits long, and contain a CRC along with some static header bits. Response tokens are also 48 bits long, and similarly feature CRC protection along with different static header bits. The SD commands originating from the REPL environment are transmitted to the AX211 via the FPGA built into the host system. The FPGA presents a “gpio-like” API for the SD host PHY: one register for data output, one register for data input, and one register to bit-wise set the data direction.
The SD commands are received on the AX211 and processed by the hardware SD engine attached to the embedded 8051 CPU. The state machine handles receiving the data and computing/checking the CRC. Once a complete packet is received by the state machine, the 8051 is notified of the packet's arrival via an interrupt.
Therefore, the 512 bytes of native 8051 code for our debugger contains the following routines:
* An entry sequence that resets the execution environment and redirects ongoing code execution into the debug.bin patch
* An interrupt handler for receiving SD command packets
* Code to set the ISR handler to our new interrupt handler
* A jump table to parse incoming SD commands
* Helper routines, such as the response transmission and wait loops
* Implementation of all the basic stub commands:
* hello - inform the host we're running
* echo - loop back the request
* peek - request contents of XRAM
* poke - set contents of XRAM
* jump - run code at an address
* nand - manipulate the NAND hardware registers
* SFR/IRAM get/set routines - more on this below
* extended opcode - execute an extended opcode
* error - return an error code
CROSS-MEMORY AREA ACCESS
The 8051 has two different kinds of memory, each of which is further subdivided:
IRAM - located on-chip. 256 bytes in total. Accessed using "mov" instruction. The IRAM map is as follows:
* IRAM address 0-7 corresponds to CPU registers R0-R7
* IRAM address 8-0x7f are on-chip RAM
* IRAM address 0x80-0xff are Special Function Registers (SFRs). SFRs are locations that are mapped to hardware functions, such as the NAND command engine or GPIO
XRAM - 16 kilobytes of "External" RAM (this memory was a physically separate chip in the original non-embedded 8051 implementation). Accessed by storing an address in DPTR (which is really made up of SFR 0x82 and 0x83, called DPL and DPH respectively) and either loading to the accumulator using "movx A, @DPTR" or storing from the accumulator using "movx @DPTR, A". The XRAM map is as follows:
* 0x0000 - 0x0006 is reserved. Contains 0x51 0x00 0x00 0x00 ...
* 0x0007 - 0x01ff is protected and returns 0xff. The CPU and NAND block can't read this range, but the SD block can.
* 0x0200 - 0x0202 is interrupt vector 0 (SPI)
* 0x0203 - 0x0205 is interrupt vector 1 (other SPI)
* 0x0206 - 0x0208 is interrupt vector 2 (NAND)
* 0x0209 - 0x020b is interrupt vector 3 (unknown)
* 0x020c - 0x02af is general-purpose RAM
* Code execution for APPO factory mode begins at offset 0x2900
* 0x2ba0 - 0x2bff contains something interesting, and I'm not sure what
* 0x2c00 - 0x3fff is read-only and contains zeroes
* 0x0000 - 0x3fff is mirrored at 0x4000, 0x8000, and 0xc000
The division of 8051 memory space into IRAM and XRAM presents a unique challenge for debugger implementation. Not being able to index into IRAM means that one cannot dereference a variable pointer into IRAM. In the 8051 ISA, only constant values can be dereferenced into IRAM. This pseudo code draws a loose analogy of the situation in C-like syntax:
iram_vaule = *iram_pointer; // ERROR: variable values not allowed
iram_value = *(0x42); // ACCEPTABLE: only hard-coded constant values allowed
This limitation of the 8051 architecture present a challenge in implementing a key feature of the debugger, namely enabling the dynamic exploration of the IRAM and associated SFR space.
Fortunately, on the AX211, the debugger code is loaded into XRAM, which can be indexed via the DPTR register using the MOVX instruction. This allows the debugger to manipulate its own code, thereby allowing us to change, for example, the offset constant to a MOV instruction inside the debugger.
Therefore, in order to implement peek and poke in the IRAM space (and likewise enable convenient exploration of the SFRs), we reserve three-byte slots in debug.bin using a sentinel sequence of invalid opcodes that are unique and searchable.
Once debug.bin is loaded, one of the first tasks the host program does is scan the loaded program for the unique sequences and records their offsets. These sequences are then replaced on-demand with the appropriate instructions to implement the required MOV opcodes to explore arbitrary locations in IRAM.
Below is a concrete example of this technique as implemented in the debugger. The following code snippet is from debugger.asm:
cmd_sfr_get:
; This will get replaced by "mov 0x20, [SFR]" at runtime
.db 0xa5, 0x60, 0x61
mov 0x21, #0
mov 0x22, #0
mov 0x23, #0
sjmp xmit_response
Here, we are looking at the “SFR get” stub. The arguments to return to the host are loaded into IRAM locations 0x20-0x23. For example,
mov 0x21, #0
puts the constant value 0 into IRAM 0x21; so in this stub, locations 0x21-0x23 are set to 0, and 0x20 is intended to have the value of the queried SFR.
A sequence of three bytes, “0xA5 0x60 0x61” is used as a placeholder for a different “mov” instruction that will be slotted in. 0xA5 is the sole invalid opcode in the 8051 instruction set, and therefore it is an “almost safe” opcode (bar collisions with other opcode extensions) to use as the starting marker for a sentinel sequence. The remaining bytes “0x60 0x61” are simply chosen to be unique and non-colliding with other opcode extensions.
When the user requests the value of an SFR via the REPL interface on the host by typing, for example,
sfr -r 0xA0
the host issues a command to poke the cached location of the sentinel sequence “0xA5 0x60 0x61” with the sequence “0x85 0x20 0xA0” which represents the instruction
mov 0x20 0xA0
Recall that the only difference between IRAM and SFR is that SFRs have an address greater than 0x80; and furthermore all arguments to the mov are constant, therefore this is a valid 8051 instruction that implements the requested command via the REPL interface.
Now that the correct instruction has been installed in the cmd_sfr_get routine, the actual command to run this callback is issued on the SD interface and the requested SFR is returned to the host. This enhanced ability to interrogate the SFR space dynamically allowed us to greatly expand our map of the special function registers.
EXTENDED OPCODES
A similar method was used to also explore the extended opcode space for the 8051. As noted in the previous section, “0xA5” is the sole illegal instruction in the native 8051 instruction set, and in enhancements such as the one found in the AX211, it is used as an escape sequence to specify an extended instruction set.
As a (entertaining) side-note, the sparse literature available on the AX211 claims that the on-board 8051 is a 32-bit processor. We originally thought this was an amusing “lost in translation” moment, but in fact, the AX211 implements a set of instructions that operate on 32-bit data types using extended opcodes, thereby lending credibility to the “32-bit” label.
SFR locations 0xC0-3, 0xC8-B, 0xD8-B, and 0xF8-B were identified as functioning 32-bit registers, and opcodes were discovered that could reverse the order of bits in these registers, invert the contents of these registers, and clear the contents of these registers to zero. There may be other opcodes present as well; code has been found implementing CRC16 checksums using these 32-bit enhancements, and most likely these enhancements also enable the implementation of MLC/TLC scrambling algorithms. The BCH ECC computations, however, are handled by a dedicated hardware coprocessor.
SUMMARY
An approach has been disclosed for the exploration and exploitation of the embedded controller found within a particular type of SD card. The approach consists of a combination of static code analysis, and dynamic fuzzing analysis. A "secret knock" for uploading code into the controller was found, and through this mechanism we explored the register map and extended opcodes of the microcontroller. Significantly, in this particular device all of the SD protocol-layer commands are implemented in software. This allowed us to redefine the SD protocol set and implement, as a demonstration, a REPL-mode debugger for the SD card.