A CTF Challenge using the Nintendo 3DS' unique features

Or, how to make your CTF challenge much harder than intended. This long-form article written by Swiftloke describes an usual CTF challenge he wrote last year.

If you attended a certain hackathon late last year, you might have noticed a challenge listed like this:

Image of the CTF Challenge

You're given an ELF file and a mysterious "file.bin". What wonders lie within?

Apparently, it's a stripped 32-bit ARM binary. Let's open it up in our favorite decompiler and...

View of the initial decompiled view in the Ghidra decompiler, intended to be viewed as gibberish.

Oh my.

Come along with me as we partake in a journey to discover the flag hidden inside. Along the way I'll show you my thought process of the intended solution, and how a few rookie mistakes along the way made it significantly harder than intended to solve. While reading, keep in mind that this was intended to be a moderately difficult CTF challenge, maybe 6/10, with one interesting twist.

For those following along at home, you can download the challenge here.

An image of the Nintendo 3DS.

First Steps

Let's revisit those hints:

Looks like he left some logs at key intersections... Installing devkitARM may prove beneficial.

The challenges target platform has already been spoiled by the title, but if one looked it up they'd find that devkitARM is a complete toolchain for Nintendo handheld consoles, including the GBA, DS, and 3DS.

As for those logs... It's probably meant that "key intersections" means critical sections of the program flow. For the first of these, we can immediately see that "romfs:/file.bin" is an argument to a function. That means it's fopen, and that the return value is a file pointer... We'll come back to that shortly.

GX_DisplayTransfer sounds like the name of the previous function. Plug that into our search engine and we get linked directly to some documentation. Looks like the function is from libctru, which is the user-mode library for writing useful homebrew on the 3DS. Now what's that function do...?

Initiates a display transfer.

Well, that's not very helpful, but this is a "key intersection", so it's probably really important to understand this function.

By finding an example online (I'm linking to one I found randomly), and searching 3dbrew, a knowledge base of 3DS reverse engineering, then reading between the lines, one can determine that this is a DMA copy function.

DMA on the 3DS

Those familiar with computer architecture will know that a DMA engine is, in brief terms, an auxilliary device that can transfer large amounts of data without the CPUs involvement, freeing it up to perform more useful work. On the 3DS, the DMA engine is treated as part of the GPU, and is commonly used to upload textures to VRAM quickly.

With that in mind, we can revisit the documentation and example, and this "key intersection" makes a little more sense. The code leading up to this block opens romfs:/file.bin, which we can safely assume is the same file.bin we were provided. We can see that a value is read out of the file object, passed into another function, then that value is passed into another function twice. The results of those functions are then the input and output addresses of the DMA transfer, implying that everything just mentioned was memory allocation. Let's make some changes to the decompiled function with that in mind...

    uStack_14 = FUN_0010d4ec("romfs:/file.bin",&DAT_00125008);
    FUN_0010dc10(uStack_14,0,2);
    iStack_18 = FUN_0010e1d4(uStack_14);
    FUN_0010dc10(uStack_14,0,0);
    uVar3 = FUN_00101610(iStack_18);
    uStack_1c = FUN_00101aac(uVar3);
    uVar3 = FUN_00101610(iStack_18);
    uStack_20 = FUN_00101aac(uVar3);
    FUN_0010d7dc(uStack_1c,iStack_18,1,uStack_14);
    FUN_0010cd88(uStack_14);
    uVar3 = FUN_00101610(iStack_18);
    FUN_0010612c(uStack_1c,uVar3);
    uStack_24 = FUN_00104dac(uStack_1c,FUN_00100020,uStack_20,FUN_00100020,2);
    FUN_0010f40c("GX_DisplayTransfer returned %d\n",uStack_24);
    fp = fopen("romfs:/file.bin",&DAT_00125008);
    FUN_0010dc10(fp,0,2);
    local_18 = FUN_0010e1d4(fp);
    FUN_0010dc10(fp,0,0);
    uVar3 = FUN_00101610(local_18);
    local_1c = FUN_00101aac(uVar3);
    uVar3 = FUN_00101610(local_18);
    local_20 = FUN_00101aac(uVar3);
    FUN_0010d7dc(local_1c,local_18,1,fp);
    FUN_0010cd88(fp);
    uVar3 = FUN_00101610(local_18);
    FUN_0010612c(local_1c,uVar3);
    local_24 = FUN_00104dac(local_1c,FUN_00100020,local_20,FUN_00100020,2);
    FUN_0010f40c("GX_DisplayTransfer returned %d\n",local_24);
    fp = fopen("romfs:/file.bin",&DAT_00125008);
    FUN_0010dc10(fp,0,2);
    local_18 = FUN_0010e1d4(fp);
    FUN_0010dc10(fp,0,0);
    size_of_alloc = some_size_fxn(local_18);
    local_1c = some_alloc_fxn(size_of_alloc);
    size_of_alloc = some_size_fxn(local_18);
    local_20 = some_alloc_fxn(size_of_alloc);
    FUN_0010d7dc(local_1c,local_18,1,fp);
    FUN_0010cd88(fp);
    size_of_alloc = some_size_fxn(local_18);
    FUN_0010612c(local_1c,size_of_alloc);
    local_24 = FUN_00104dac(local_1c,FUN_00100020,local_20,FUN_00100020,2);
    FUN_0010f40c("GX_DisplayTransfer returned %d\n",local_24);
    fp = fopen("romfs:/file.bin",&DAT_00125008);
    FUN_0010dc10(fp,0,2);
    local_18 = FUN_0010e1d4(fp);
    FUN_0010dc10(fp,0,0);
    size_of_alloc = some_size_fxn(local_18);
    input_buf = some_alloc_fxn(size_of_alloc);
    size_of_alloc = some_size_fxn(local_18);
    output_buf = some_alloc_fxn(size_of_alloc);
    fread(input_buf,local_18,1,fp);
    FUN_0010cd88(fp);
    size_of_alloc = some_size_fxn(local_18);
    FUN_0010612c(input_buf,size_of_alloc);
    dt_result = GX_DisplayTransfer(input_buf,FUN_00100020,output_buf,FUN_00100020,2);
    printf("GX_DisplayTransfer returned %d\n",dt_result);
One noteworthy aspect of this, at least here, is that Ghidra mistakenly types the "dimension" flag of the display transfer as a function pointer. It's necessary to correct this issue; once it does, we can see that the dimension is composed of two 16-bit integers packed together as is seen in the documentation and example with the macro GX_BUFFER_DIM. We need not worry too much about this, it's effectively a glorified size parameter with a display transfer. (As an aside, this parameter gets much more interesting with the GX_TextureCopy counterpart.) We also see a flags value of 2, which probably isn't important, right?

    fp = fopen("romfs:/file.bin",&DAT_00125008);
    FUN_0010dc10(fp,0,2);
    local_18 = FUN_0010e1d4(fp);
    FUN_0010dc10(fp,0,0);
    size_of_alloc = some_size_fxn(local_18);
    input_buf = some_alloc_fxn(size_of_alloc);
    size_of_alloc = some_size_fxn(local_18);
    output_buf = some_alloc_fxn(size_of_alloc);
    fread(input_buf,local_18,1,fp);
    FUN_0010cd88(fp);
    size_of_alloc = some_size_fxn(local_18);
    FUN_0010612c(input_buf,size_of_alloc);
    dt_result = GX_DisplayTransfer(input_buf,0x100020,output_buf,0x100020,2);
    printf("GX_DisplayTransfer returned %d\n",dt_result);

Executable mapping?

The other "key intersection" referred to in the logs is a bit further down:

Executable mirroring of read-write page returned page %p\n

I intended for this segment of the code to be completely ignored, because it's a lot of function calls towards something that's meant to be largely overlooked. However, I believe that this was not made clear enough.

Here's what that block looks like in the source code:

	ctrdl_initCodeAllocator();
	char* codeAlloc = ctrdl_allocateAligned(nextPow2(fsize));
	memcpy(codeAlloc, display_transfer_result, nextPow2(fsize));

	void* executable_addr = ctrdl_allocateCode(codeAlloc, fsize);

	ctrdl_changePermission(
            CUR_PROCESS_HANDLE, executable_addr,
            ctrdl_alignSize(fsize),
            MEMPERM_READEXECUTE);
	printf("Executable mirroring of read-write page returned page %p\n", executable_addr);

It's now time to explain what this does. The 3DS, like all contemporary operating systems, employs W^X (write or execute, but never both) memory protection, meaning that you can't execute a page that you've written to and vice versa. One can get around this using Luma3DS, which relaxes enforcement of this policy; you can then have an allocated block of memory mapped as RW and a mirror of that memory mapped as RX. Ultimately, though, those technical details are unnecessary to understand what's important- this segment takes the result of the DMA operation and allows it to be executed.

Image of a modified Nintendo 3DS logo which adds the word "Lima" and an image of a lima bean.
It's Luma3DS, not Lima3DS.

It's important to remember while reading the rest of this article that code is data. For example, each single ARM instruction (add, subtract, multiply, compare, branch...) is a series of four bytes, decoded and executed by the processor. As Wikipedia puts it:

Code-as-data is also a principle of the Von Neumann architecture, since stored programs and data are both represented as bits in the same memory device. This architecture offers the ability to write self-modifying code. It also opens the security risk of disguising a malicious program as user data and then using an exploit to direct execution to the malicious program.

The Main Loop

The main loop of the program is another subtle mistake, meant to be trivial to understand but in reality frustrating.

    do {
      iVar2 = FUN_001078e0();
      if (iVar2 == 0) goto LAB_00101914;
      FUN_00105e5c(2,1);
      FUN_00104d08();
      FUN_00107e68();
      local_30 = FUN_0010810c();
      param_3 = local_5c;
      local_c = (*local_2c)(local_30,local_c);
    } while (local_5c[0] == 0);
    FUN_0010f520("Congratulations! Your flag is:");
    printf(local_5c);

One thing we are able to see is that somehow, if the loop exits correctly, local_5c must contain the output flag. But what's all the rest?

If you can figure out that FUN_0010810c is hidKeysDown(), the 3DS function which gets the buttons that were just pressed, this would have instantly become clear as day. However, ascertaining this is not a trivial task, which I'll discuss in more detail shortly. The main loop passes through the buttons pressed this frame, and somehow expects the executable code to spit out the flag if a certain button combination was pressed. Ghidra is actually mis-decompiling this code pointer call; the program does pass flag_buffer as a parameter, so I had to override the function signature to produce the correct output.

    do {
      iVar2 = FUN_001078e0();
      if (iVar2 == 0) goto LAB_00101914;
      FUN_00105e5c(2,1);
      FUN_00104d08();
      FUN_00107e68();
      uVar8 = hidKeysDown();
      some_input_and_output = (*code_pointer)(uVar8,some_input_and_output);
    } while (flag_buffer[0] == 0);
    FUN_0010f520("Congratulations! Your flag is:");
    printf(flag_buffer);
    do {
      iVar2 = FUN_001078e0();
      if (iVar2 == 0) goto LAB_00101914;
      FUN_00105e5c(2,1);
      FUN_00104d08();
      FUN_00107e68();
      uVar8 = hidKeysDown();
      some_input_and_output = (*code_pointer)(uVar8,some_input_and_output,flag_buffer);
    } while (flag_buffer[0] == 0);
    FUN_0010f520("Congratulations! Your flag is:");
    printf(flag_buffer);

Let's review what we've learned so far:

So our flag is inside executable code in file.bin! Let's disassemble that file...

str fp, [sp, #-4]!
add fp, sp, #0
str r1, [fp, #-0x14]
str r2, [fp, #-0x18]
str r3, [fp, #-0x14]
b #0x450
bne #0x3c
ldr r3, [fp, #-0x10]
ldr r3, [fp, #-0x14]
cmp r3, #2
bic r3, r3, #0x80000000
bic r3, r3, #0x80
and r3, r3, r2
cmp r3, #0
add r3, r3, #1
str r3, [fp, #-0x14]
beq #0x54
ldr r3, [fp, #-0x14]
b #0x2b8
ldr r3, [fp, #-0x14]
ldr r3, [fp, #-0x14]
cmp r3, #6
ldr r3, [pc, #0x244]
and r3, r3, r2
bne #0x84
ldr r3, [fp, #-0x10]
beq #0x7c
ldr r3, [fp, #-0x14]
mov r3, #0
str r3, [fp, #-0x14]
cmp r3, #9
bne #0xa0
sub sp, sp, #0x1c
str r0, [fp, #-0x10]
ldr r3, [fp, #-0x14]
cmp r3, #0
ldr r3, [fp, #-0x14]
cmp r3, #0
bic r3, r3, #0x40000000
bic r3, r3, #0x40
bne #0xc4
ldr r3, [fp, #-0x10]
cmp r3, #0
beq #0xbc
beq #0xc4
ldr r3, [fp, #-0x14]
b #0x3e0
ldr r3, [fp, #-0x14]
add r3, r3, #1
str r3, [fp, #-0x14]
cmp r3, #5
bne #0xf0
bne #0xf8
ldr r2, [fp, #-0x10]
cmp r3, #0
beq #0xf0
and r3, r3, #2
cmp r3, #0
add r3, r3, #1
str r3, [fp, #-0x14]
b #0x228
ldr r3, [fp, #-0x14]
ldr r3, [fp, #-0x10]
and r3, r3, #1
bne #0x128
ldr r2, [fp, #-0x10]
cmp r3, #0
beq #0x120
cmp r3, #0
beq #0x124
b #0x528
ldr r3, [fp, #-0x14]
mov r3, #0
str r3, [fp, #-0x14]
cmp r3, #3
bne #0x154
cmp r3, #3
bne #0x158
bic r3, r3, #0x80
cmp r3, #0
ldr r3, [fp, #-0x10]
bic r3, r3, #0x10000000
beq #0x158
mov r3, #0
ldr r3, [fp, #-0x14]
add r3, r3, #1
ldr r3, [fp, #-0x14]
cmp r3, #6
b #0x2c0
ldr r3, [fp, #-0x14]
ldr r3, [fp, #-0x10]
bic r3, r3, #2
cmp r3, #0
beq #0x188
str r3, [fp, #-0x14]
b #0x288
ldr r3, [pc, #0x464]
and r3, r3, r2
ldr r3, [fp, #-0x14]
add r3, r3, #1
mov r3, #0
str r3, [fp, #-0x14]
cmp r3, #1
bne #0x1c4
b #0x4f8
ldr r3, [fp, #-0x14]
ldr r2, [fp, #-0x10]
ldr r3, [pc, #0x354]
ldr r3, [fp, #-0x10]
bic r3, r3, #0x80000000
beq #0x1c8
mov r3, #0
bic r3, r3, #0x10
cmp r3, #0
str r3, [fp, #-0x14]
b #0x410
str r3, [fp, #-0x14]
b #0x3e8
bne #0x1fc
ldr r3, [fp, #-0x10]
cmp r3, #8
bne #0x204
cmp r3, #0
beq #0x1fc
ldr r3, [fp, #-0x14]
add r3, r3, #1
ldr r3, [fp, #-0x14]
cmp r3, #9
ldr r2, [fp, #-0x10]
ldr r3, [pc, #0x408]
beq #0x21c
ldr r3, [fp, #-0x14]
beq #0x220
mov r3, #0
ldr r3, [fp, #-0x14]
cmp r3, #2
str r3, [fp, #-0x14]
b #0x520
bne #0x250
ldr r2, [fp, #-0x10]
bne #0x254
ldr r3, [fp, #-0x10]
cmp r3, #0
beq #0x24c
bic r3, r3, #0x20000000
bic r3, r3, #0x20
mov r3, #0
str r3, [fp, #-0x14]
add r3, r3, #1
str r3, [fp, #-0x14]
cmp r3, #7
bne #0x280
bne #0x27c
ldr r3, [fp, #-0x10]
mov r3, #0
str r3, [fp, #-0x14]
ldr r3, [fp, #-0x18]
ldr r2, [pc, #0xd8]
add r3, r3, #4
ldr r2, [pc, #0xcc]
and r3, r3, r2
cmp r3, #0
add r3, r3, #1
str r3, [fp, #-0x14]
str r3, [fp, #-0x14]
b #0x648
bne #0x2c0
ldr r2, [fp, #-0x10]
ldr r3, [fp, #-0x14]
cmp r3, #4
ldr r3, [pc, #0x2fc]
and r3, r3, r2
bic r3, r3, #0x20000000
bic r3, r3, #0x20
mov r3, #0
str r3, [fp, #-0x14]
cmp r3, #0
beq #0x2d4
b #0x4b0
ldr r3, [fp, #-0x14]
b #0x488
ldr r3, [fp, #-0x14]
ldr r3, [fp, #-0x10]
bic r3, r3, #0x10000000
cmp r3, #1
bls #0x2f4
b #0x3d0
ldr r3, [fp, #-0x14]
str r2, [r3]
ldr r3, [fp, #-0x18]
str r2, [r3]
ldr r3, [fp, #-0x18]
b #0x6e0
ldr r3, [fp, #-0x14]
ldr r3, [fp, #-0x10]
bic r3, r3, #0x40000000
ldr r3, [pc, #0x3b0]
and r3, r3, r2
ldr r3, [fp, #-0x14]
add r3, r3, #1
cmp r3, #0
beq #0x338
str r3, [fp, #-0x14]
b #0x5f8
b #0x5d0
ldr r3, [fp, #-0x14]
ldr r2, [fp, #-0x10]
ldr r3, [pc, #0x2a4]
cmp r3, #7
bne #0x36c
and r3, r3, r2
cmp r3, #0
bic r3, r3, #0x10
cmp r3, #0
str r3, [fp, #-0x14]
b #0x4e8
cmp r3, #0xa
bne #0x440
cmp r3, #0
beq #0x438
add r3, r3, #8
ldr r2, [pc, #0xc0]
add r3, r3, #0xc
ldr r2, [pc, #0xb4]
cmp r3, #1
bne #0x3a8
bic r3, r3, #0x40
cmp r3, #0
cmp r3, #0
beq #0x3a8
str r3, [fp, #-0x14]
b #0x720
ldr r3, [fp, #-0x14]
add r3, r3, #1
ldr r3, [fp, #-0x14]
cmp r3, #4
cmp r3, #5
bne #0x3dc
and r3, r3, r2
cmp r3, #0
ldr r2, [fp, #-0x10]
ldr r3, [pc, #0x1ec]
beq #0x3dc
ldr r3, [fp, #-0x14]
beq #0x3e0
mov r3, #0
ldr r3, [fp, #-0x14]
cmp r3, #8
ldr r3, [fp, #-0x10]
and r3, r3, #8
mov r3, #0
str r3, [fp, #-0x14]
str r2, [r3]
ldr r3, [fp, #-0x18]
str r2, [r3]
ldr r3, [fp, #-0x18]
add r3, r3, #0x10
ldr r2, [pc, #0xa8]
add r3, r3, #0x14
ldr r2, [pc, #0x9c]
ldr r2, [fp, #-0x18]
add r3, r2, r3
lsl r3, r3, #2
ldr r2, [fp, #-0x18]
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
str r2, [r3]
ldr r3, [fp, #-0x18]
str r2, [r3]
ldr r3, [fp, #-0x18]
ldr r1, [r3]
ldr r3, [fp, #-8]
add r2, r2, r3
ldr r3, [pc, #0x5c]
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
add r3, r3, #0x18
ldr r2, [pc, #0x90]
str r3, [fp, #-8]
b #0x54c
eor r3, r3, r1
str r3, [r2]
str r3, [fp, #-8]
ldr r3, [fp, #-8]
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
str r2, [r3]
mov r3, #0
ldr r3, [fp, #-8]
lsl r3, r3, #2
ldr r3, [fp, #-8]
add r3, r3, #1
cmp r3, #6
ble #0x558
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
ldr r3, [fp, #-0x14]
mov r0, r3
bx lr
andmi r0, r0, r0, asr #32
svcpl #0x3759df
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
add sp, fp, #0
pop {fp}
andhi r0, r0, r0, lsl #1
andhs r0, r0, r0, lsr #32
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andne r0, r0, r0, lsl r0
ldrblo r3, [r4], #-0x8b7
ldmdalo sb, {r0, r1, r4, r5, r7, fp, sp, lr} ^
subeq r6, r4, r0, lsl #17
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
mcrreq p1, #0xb, r2, ip, c6
strbhs r3, [sp, #-0xa8]!
mrcvc p12, #2, r0, c9, c9, #5
svcpl #0x3753a2
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0
andeq r0, r0, r0

Aw, nuts.

It couldn't be that easy, right? After the accidentally substantial reverse engineering effort, we're confronted with a bizarre puzzle. Somewhere along the line, this apparently executable code was transformed from gibberish into properly functioning code that will spit out our flag.

Intermission- An Accidental Odyssey

This challenge was designed to have almost all of the information above available to you from the beginning. You were intended to learn that the file was opened, read in, fed to GX_DisplayTransfer (you were still intended to learn the meaning of that function on your own) and finally mapped as executable and run. Instead, it took careful, painful reverse engineering, with a lot of insight and guesswork. What went wrong?

The first major hurdle is that the meaning of GX_DisplayTransfer, which is not a mere search away. I spent quite a long time tinkering with 3DS homebrew, writing my own competent 2D renderer using the custom graphics API exposed with citro3d, so of course I know what GX_DisplayTransfer does, but it takes a fair bit of reading to ascertain its meaning for yourself, and even then it might not be clear what the function is. I mentioned 3dbrew earlier; the average reverse engineer might not come across that treasure trove of information.

The next problem was with the RW->RX remapping. As aforementioned, I intended for that segment to be entirely ignored by the reverse engineer, but the way the logs were actually written meant that the hint was almost certainly not clear enough. Another hint along the lines of Beginning to map DisplayTransfer result as executable... may have significantly improved the situation.

Another major problem was failing to properly explain that the unknown function in the main loop was hidKeysDown(). My challenge design revolved around people knowing that this was the function in question. I completely failed to elaborate on this, and the challenge was made much less clear as a result.

Finally, this challenge stripped symbols from the binary, the strings that identify the meanings of a function. While designing this challenge, I was running up against the deadline to the CTF and needed a way to prevent the challenge from simply being run on a real 3DS. After all, since the code is clearly valid by the time it's executed, one could simply attach a debugger to the running process and dump the code at that time.

Without any better ideas, I provided a stripped .ELF file as the binary. Without symbols, the devkitARM toolchain can't build it into an executable image. One of the more saavy people who attempted this challenge made many attempts to get it to run without progress; I figure at that rate, anyone who did manage to execute it deserves their flag... But at what cost? Recall that this was meant to be a "6/10" challenge, run as a side event during a hackathon, with one neat twist to discover; instead, it required a mountain of reverse engineering to even get far enough to discover the twist. Only one person got far enough to even consider uncovering the secret, and only long after the CTF had ended when they continued grinding away at it.

This One Weird Trick Will Scramble Your Data At The Hardware Level!

With my lament about accidentally creating a CTF challenge raised from hell, let's come back to how that nonsense code gets transformed into being executable.

Let's review the process of how the data flows:

Where does that magical transformation happen?

No, seriously, engage your brain. Use some reading comprehension.

Perhaps the name of the challenge would help guide the way?

...

If you guessed "It's the DMA function", congratulations! You've been paying attention. If you skipped straight to the answer, well, I get it.

The PICA200 GPU in the 3DS employs swizzling in its texturing pipeline. This is fairly confusing terminology, as this phrase has multiple other uses in the realm of graphics (see vertex swizzling, i.e. vertex.xyz = vertex.zxy). In this context, swizzling means that texture memory is physically stored differently than would appear logical to the human eye. This is done for the purposes of spatial locality- if you re-arrange the data, you're able increase your cache hit rate and therefore make fewer (and costlier) trips to main memory.

View of a normal texture versus its swizzled representation, showing how the image is garbled to the human eye

So what does this have to do with a DMA transfer? Well, earlier it was stated that

We also see a flags value of 2, which probably isn't important, right?

Maybe it is important. Let's check that documentation a little more closely... We can see a list of transfer flags here (which you'd be able to understand were transfer flags by looking at the example). That GX_TRANSFER_OUT_TILED(x) flag is in fact referring to the very same form of tiling! The DMA engine on the 3DS has some functionality that allows it to, during the copy process, re-arrange data that is laid out in the way one expects into the way the GPU expects it. This can be used to load textures at runtime; think ports of games in which ahead-of-time translation isn't possible, or very bad code which decodes and loads PNG files at runtime instead of AoT conversion. (The latter was the de-facto standard while writing 3DS homebrew for a few years before the tex3ds project came out.)

GX_TRANSFER_OUT_TILED(1) is equal to (1<<1) which is equal to 2. So that means that this DMA transfer is applying that tiling flag, and the bytes are re-arranged to fit the tiled format. That's how this data winds up being real code instead of garbage!

At this point, there are two solutions to the problem. You'll need to tile this data to get the normal code back out. You can either:

I'll go over both solutions, as well as some trivia about how the tiling system works that made development a lot easier.

Understand how the tiling function works

The PICA200 follows the common Morton Order tiling curve, also known as Z-Curve. As aforementioned, the goal of tiling is to increase spatial locality of texture accesses by re-arranging blocks. Morton Order is not particularly intuitive to me, and in fact I had little knowledge of it before writing this article; understanding that the 3DS has swizzling hardware without comprehending its inner workings was enough to write the challenge. I'll leave the Wikipedia link for those who seek to learn the idea for themselves, as they do a better job explaining it than I could.

Treat the tiling system as a black box

This approach effectively means that you'll rewrite the challenge yourself, compile it, and execute it, either on real 3DS hardware or on an emulator. Luckily, you don't need to actually execute the code, you just need to dump it; that'll save you some headache.

Let's find another, more concise example of how display transfers work. I'm going to dig up an ancient piece of history which inspired this challenge in the first place- the aforementioned piece of code that would decode and tile PNG files at runtime. This was copied around the internet many, many times over the course of 2016 through 2018 (when citro2d and tex3ds were released, superseding it). I too used this code, and we can see it in an early branch of my renderer project.

If you are interested in writing homebrew, do not use this code! It's dead for a reason! We have citro2d and tex3ds which do the same thing but faster and easier! See the current examples for a modern lesson!

As you can see, it's fairly straightforward; one needs to allocate enough memory using linearAlloc (a block of memory that's guaranteed to be readable by the GPU) first, then shove your data into it. Flush the cache so the GPU can read your data, then initiate the transfer and wait for the GPU to signal that it's done. (C3D_SafeDisplayTransfer is a citro3d function that allows usage of display transfers within a fully initialized C3D environment, such that the two don't trip over each other.) What's more important are the TEXTURE_TRANSFER_FLAGS:

// Used to convert textures to 3DS tiled format
// Note: vertical flip flag set so 0,0 is top left of texture
#define TEXTURE_TRANSFER_FLAGS \
	(GX_TRANSFER_FLIP_VERT(1) | GX_TRANSFER_OUT_TILED(1) | GX_TRANSFER_RAW_COPY(0) | \
	GX_TRANSFER_IN_FORMAT(GX_TRANSFER_FMT_RGBA8) | GX_TRANSFER_OUT_FORMAT(GX_TRANSFER_FMT_RGBA8) | \
	GX_TRANSFER_SCALING(GX_TRANSFER_SCALE_NO))

In this code, the vertical flip flag is set to enable a texturing convention; obviously don't do this, but do use the tiling flag.

With a little bit of elbow grease, and an installation of devkitARM as the challenge description recommends, you can use this tutorial to dump the original code.

Aside: How to un-swizzle data

Writing this challenge requiring that I convert the code using the inverse of the swizzling function first, since F(F'(x)) = x. I started out by attempting to use tex3ds' inverse swizzler, but I couldn't get it quite right. While bashing my head against the wall, I inadventently nerd-sniped DeltaV into performing a detailed, graphical series of tests on the GPU to see how flags affected the display transfer engine, and the results were quite interesting...

Results of the experiments. Many images are shown, showcasing the results of the Display Transfer and Texture Copy operations with each permutation of the RAW and TILED flags.

This image showcases the results of the outputs of each combination of flags with GX_DisplayTransfer and its close cousin GX_TextureCopy (which is another DMA function for slightly different purposes.) Input into each is a swizzled texture, now let's take a look right-to-left at the results...

That's right, in addition to supporting tiling, the 3DS hardware supports inverse tiling! This is implemented to make final output to a framebuffer easy. Thus, scrambling the challenge code was as simple as changing a one to a zero and dumping the results to a file.

Climbing the Summit

The reverse engineering nightmare overcome, the inscrutable DMA function understood and the obfuscated code un-scrambled, now at last the end is in sight...

int with_post_jump(uint param_1,int param_2,undefined4 *param_3)

{
  int local_18;
  int local_c;
  
  if ((param_2 == 0) && ((param_1 & 0x40000040) != 0)) {
    local_18 = 1;
  }
  else if ((param_2 == 0) && ((param_1 & 0xbfffffbf) != 0)) {
    local_18 = 0;
  }
  else if ((param_2 == 1) && ((param_1 & 0x40000040) != 0)) {
    local_18 = 2;
  }
  else if ((param_2 == 1) && ((param_1 & 0xbfffffbf) != 0)) {
    local_18 = 0;
  }
  else if ((param_2 == 2) && ((param_1 & 0x80000080) != 0)) {
    local_18 = 3;
  }
  else if ((param_2 == 2) && ((param_1 & 0x7fffff7f) != 0)) {
    local_18 = 0;
  }
  else if ((param_2 == 3) && ((param_1 & 0x80000080) != 0)) {
    local_18 = 4;
  }
  else if ((param_2 == 3) && ((param_1 & 0x7fffff7f) != 0)) {
    local_18 = 0;
  }
  else if ((param_2 == 4) && ((param_1 & 0x20000020) != 0)) {
    local_18 = 5;
  }
  else if ((param_2 == 4) && ((param_1 & 0xdfffffdf) != 0)) {
    local_18 = 0;
  }
  else if ((param_2 == 5) && ((param_1 & 0x10000010) != 0)) {
    local_18 = 6;
  }
  else if ((param_2 == 5) && ((param_1 & 0xefffffef) != 0)) {
    local_18 = 0;
  }
  else if ((param_2 == 6) && ((param_1 & 0x20000020) != 0)) {
    local_18 = 7;
  }
  else if ((param_2 == 6) && ((param_1 & 0xdfffffdf) != 0)) {
    local_18 = 0;
  }
  else if ((param_2 == 7) && ((param_1 & 0x10000010) != 0)) {
    local_18 = 8;
  }
  else if ((param_2 == 7) && ((param_1 & 0xefffffef) != 0)) {
    local_18 = 0;
  }
  else if ((param_2 == 8) && ((param_1 & 2) != 0)) {
    local_18 = 9;
  }
  else if ((param_2 == 8) && ((param_1 & 0xfffffffd) != 0)) {
    local_18 = 0;
  }
  else if ((param_2 == 9) && ((param_1 & 1) != 0)) {
    local_18 = 10;
  }
  else if ((param_2 == 9) && (1 < param_1)) {
    local_18 = 0;
  }
  else {
    local_18 = param_2;
    if ((param_2 == 10) && ((param_1 & 8) != 0)) {
      local_18 = 0;
      *param_3 = 0x345438b7;
      param_3[1] = 0xc4c21b6;
      param_3[2] = 0x256d30a8;
      param_3[3] = 0x385968b3;
      param_3[4] = 0x446880;
      param_3[5] = 0x7e590cb9;
      param_3[6] = 0x5f3753a2;
      for (local_c = 0; local_c < 7; local_c = local_c + 1) {
         param_3[local_c] = param_3[local_c] ^ 0x5f3759df;
      }
    }
  }
  return local_18;
}

Well, this is really not that complicated... param_2 is obviously a state which is stored by the calling function, we've already established that param_1 is hidKeysDown() and param_3 is the output flag. Let's update with that...

int with_post_jump(uint keysDown,int state,undefined4 *out_flag)

{
  int output_state;
  int i;
  
  if ((state == 0) && ((keysDown & 0x40000040) != 0)) {
    output_state = 1;
  }
  else if ((state == 0) && ((keysDown & 0xbfffffbf) != 0)) {
    output_state = 0;
  }
  else if ((state == 1) && ((keysDown & 0x40000040) != 0)) {
    output_state = 2;
  }
  else if ((state == 1) && ((keysDown & 0xbfffffbf) != 0)) {
    output_state = 0;
  }
  else if ((state == 2) && ((keysDown & 0x80000080) != 0)) {
    output_state = 3;
  }
  else if ((state == 2) && ((keysDown & 0x7fffff7f) != 0)) {
    output_state = 0;
  }
  else if ((state == 3) && ((keysDown & 0x80000080) != 0)) {
    output_state = 4;
  }
  else if ((state == 3) && ((keysDown & 0x7fffff7f) != 0)) {
    output_state = 0;
  }
  else if ((state == 4) && ((keysDown & 0x20000020) != 0)) {
    output_state = 5;
  }
  else if ((state == 4) && ((keysDown & 0xdfffffdf) != 0)) {
    output_state = 0;
  }
  else if ((state == 5) && ((keysDown & 0x10000010) != 0)) {
    output_state = 6;
  }
  else if ((state == 5) && ((keysDown & 0xefffffef) != 0)) {
    output_state = 0;
  }
  else if ((state == 6) && ((keysDown & 0x20000020) != 0)) {
    output_state = 7;
  }
  else if ((state == 6) && ((keysDown & 0xdfffffdf) != 0)) {
    output_state = 0;
  }
  else if ((state == 7) && ((keysDown & 0x10000010) != 0)) {
    output_state = 8;
  }
  else if ((state == 7) && ((keysDown & 0xefffffef) != 0)) {
    output_state = 0;
  }
  else if ((state == 8) && ((keysDown & 2) != 0)) {
    output_state = 9;
  }
  else if ((state == 8) && ((keysDown & 0xfffffffd) != 0)) {
    output_state = 0;
  }
  else if ((state == 9) && ((keysDown & 1) != 0)) {
    output_state = 10;
  }
  else if ((state == 9) && (1 < keysDown)) {
    output_state = 0;
  }
  else {
    output_state = state;
    if ((state == 10) && ((keysDown & 8) != 0)) {
      output_state = 0;
      *out_flag = 0x345438b7;
      out_flag[1] = 0xc4c21b6;
      out_flag[2] = 0x256d30a8;
      out_flag[3] = 0x385968b3;
      out_flag[4] = 0x446880;
      out_flag[5] = 0x7e590cb9;
      out_flag[6] = 0x5f3753a2;
      for (i = 0; i < 7; i = i + 1) {
         out_flag[i] = out_flag[i] ^ 0x5f3759df;
      }
    }
  }
  return output_state;
}

So that state value seems to rely on the current button inputs. If you click the correct sequence of buttons, it'll spit out the flag. If you went really far and checked the bitmask of each button, you'd find that the function is actually searching for the Konami Code...

Anyway, let's move on to that flag. If the Konami Code is successfully input, the function will write some interesting values into the output buffer and then XOR those values with a constant (which is, as an Easter Egg, the Quake inverse square root constant) and spit that out. Do that XOR yourself and you'll have the flag.

A last-minute disaster

Having spent two Saturdays building this challenge, the morning before the CTF was set to start I got a message from a colleague:

Them:

Oh shit
What's the flag
Bruh is it $THE_FLAG
I just did ``strings file.bin``

Me:

Is my challenge actually that broken
Yes that's the flag

And sure enough:

$ strings file.bin
l1ng_1s_
fUn!}
hackD
xi{SwiZz

This is how I learned not to leave your strings in plaintext. All of this incredibly complicated obfuscation couldn't stop people from discovering the flag, as four-byte segments had to still be present in the obfuscated file. I quickly added the XORing function and spent the next hour rebuilding the challenge (it shouldn't be too surprising that it's a major pain in the neck to build).

Conclusion

Although not my first CTF challenge to be widely distributed, this was by far my most ambitious, and it shows along the way with the stacking mistakes making it inaccessible to all but the most talented reverse engineers. This was also my first long-form article since 2016, in which I wrote about some concepts of 3DS security without having formed a grasp on programming.

Reflecting on the value of writing, it's interesting that in terms of pure time investment writing this article probably took me about as long as making the challenge in the first place. Grabbing citations, checking notes, reverse engineering the challenge myself and making diagrams... It's tough work, but by doing it I get to share everything I learned.

If you've made it to the end of this article, I applaud your interest (or possible insanity). There will be more articles to come soon on some challenges I wrote earlier in 2022. In order to whet your appetite, I'll share the following teaser:

Teaser