An extensive step by step reverse engineering analysis of a Linux CTF binary

...or, in other words, when failing to reverse a CTF binary makes you lose that job.

During a past job interview, I was tasked to reverse four linux binaries of increasing difficulties as proof of my ability into the reverse engineering field. I solved the first two in a matter of an hour, the third one required me an entire day of work but sadly, I was not able to solve the last one. I don't know if I wasn't selected because of this fail, but it proved me one sure thing: I wasn't prepared enough or, at least, as much as I wanted. Flash forward, I successfully ended up with another job, but that challenge kept staying there, like a small needle, in my head. During the following months, I studied and practiced a lot, mainly into firmware reversing field and, every now and then, I've tried to solve that sneaky challenge.

This is my extensive and detailed description of my fails and success.

Important note

Please note that as this analysis started some months ago and this post was reviewed a huge number of times, you won't find same memory addresses or function names across the screenshots and code snippets.

Running the binary

With what are we dealing?

root@kali:/opt/ctf# file original 
original: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=d0d5b9a34a4fe4c52a3939c75bd71cfa0dc23825, stripped
root@kali:/opt/ctf# checksec -f ./original
RELRO           STACK CANARY      NX            PIE             RPATH      RUNPATH	Symbols		FORTIFY	Fortified	Fortifiable  FILE
Partial RELRO   No canary found   NX enabled    PIE enabled     No RPATH   No RUNPATH   No Symbols       No	0		2	./original

A standard, stripped, Linux 32bit binary with no fancy protection active. We're not aiming to exploit it but only to find the flag. A picture is worth a thousand words, they say:

root@kali:/opt/ctf# ./original
[-] No vm please ;)
root@kali:/opt/ctf# ./original AAAA
[-] No vm please ;)
root@kali:/opt/ctf# ./original -h
[-] No vm please ;)
root@kali:/opt/ctf# ./original AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[-] No vm please ;)
root@kali:/opt/ctf# 

It doesn't run inside a virtual machine and I definitely don't want to build a physical linux box. Would you tell me some of your internals, please?

root@kali:/opt/ctf# strace ./original 
execve("./original", ["./original"], 0x7fff2a7dc4f0 /* 48 vars */) = 0
strace: [ Process PID=121645 runs in 32 bit mode. ]
brk(NULL)                               = 0x572fb000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7f05000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=133840, ...}) = 0
mmap2(NULL, 133840, PROT_READ, MAP_PRIVATE, 3, 0) = 0xf7ee4000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300\254\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1947056, ...}) = 0
mmap2(NULL, 1955712, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf7d06000
mprotect(0xf7d1f000, 1830912, PROT_NONE) = 0
mmap2(0xf7d1f000, 1368064, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19000) = 0xf7d1f000
mmap2(0xf7e6d000, 458752, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x167000) = 0xf7e6d000
mmap2(0xf7ede000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d7000) = 0xf7ede000
mmap2(0xf7ee1000, 10112, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf7ee1000
close(3)                                = 0
set_thread_area({entry_number=-1, base_addr=0xf7f060c0, limit=0x0fffff, seg_32bit=1, contents=0, read_exec_only=0, limit_in_pages=1, seg_not_present=0, useable=1}) = 0 (entry_number=12)
mprotect(0xf7ede000, 8192, PROT_READ)   = 0
mprotect(0x565b8000, 4096, PROT_READ)   = 0
mprotect(0xf7f34000, 4096, PROT_READ)   = 0
munmap(0xf7ee4000, 133840)              = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xf7f06128) = 121646
waitpid(121646, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0) = 121646
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=121646, si_uid=0, si_status=1, si_utime=0, si_stime=0} ---
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x1), ...}) = 0
brk(NULL)                               = 0x572fb000
brk(0x5731c000)                         = 0x5731c000
brk(0x5731d000)                         = 0x5731d000
write(1, "**[-] You fool, nobody debugs me!!**"..., 34[-] You fool, nobody debugs me!!!
) = 34
write(1, "1\n", 21
)                      = 2
exit_group(-1)                          = ?
+++ exited with 255 +++

"You fool, nobody debugs me!!!"

Great, after a few couples of runs, we know that there are some anti-VM and anti-debug code in place. Let's look inside.

First thing, I searched and found the strings pretty quickly, and I noticed also two other interesting strings: one for a fail, one for a success.

Digging a little more, we can find where are placed the strings and from where they're used for.

It's clear that the subroutine placed at address 0x566429DC has something to do with them and with the anti-VM/anti-debug tricks.

Analyzing the anti-debug and anti-vm routine

Once I have identified where are the strings involved in this anti-debug and anti-vm tricks, it's easy to find them and visualize the blocks in IDA. Please note that sub_566429DC was here renamed in AntiDebugAntiVM.

This is the graph of the AntiDebugAntiVM functions. In the first block of code, we can see the standard function call convention that setup the stack frame. After that, a bunch of  NOPS and a call to fork(). Let's understand the fork call, what's its purpose?

fork() creates a new process by duplicating the calling process. The new process, referred to as the child, is an exact duplicate of the calling process, referred to as the parent. On success, the PID of the child process is returned in the parent, and 0 is returned in the child. On failure, -1 is returned in the parent, no child process is created, and errno is set appropriately. (ref)

Basically, right after the fork call, its return value is saved into the EAX register and then moved into a local variable that is compared with the zero value. The first branch is important: if the JNZ is true, we're into the parent process so we're going into the right path. Vice versa, if the instruction is false we're heading to the left or into the child process.

Into the child process

If EAX is zero, or in other terms, we're into the child process, we can see a call to getppid()function that returns the process ID of the parent of the calling process.  But the important call is the next one, the call to the ptrace() function. The standard definition of this function is:

The ptrace() system call provides a means by which one process (the"tracer") may observe and control the execution of another process(the "tracee"), and examine and change the tracee's memory and registers. It is primarily used to implement breakpoint debugging and system call tracing.

And is defined as:

long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);

In assembly, the call is built with these lines of code:

push    0             ; *data  
push    0             ; *addr  
push    [ebp+var_1C]  ; Parent PID
push    10h           ; _ptrace_request          
call    _ptrace

Basically, the child retrieve its PPID and tries to attach a debugger [1], if it fails, it's the evidence that it is being debugged so sleep 5 seconds, detach and returns [2] . Otherwise returns anyway [3]. Going up a level, if the fork() return -1 so returns with the status code 1 [5]

Into the parent process

If EAX is not zero, we're in the right path, so in the parent process. As you can remember, we have the PID of the child into the EAX register. After the check with -1 into the block [1], it goes into the block [2]. Here, the parent performs a call to waitpid():

push    0                     ; options
lea     eax, [ebp+stat_loc]
push    eax                   ; stat_loc
push    [ebp+pid]             ; child PID
call    _waitpid
The waitpid() system call is used to wait for state changes in a child of the calling process, and obtain information about the child whose state has changed. A state change is considered to be: the child terminated; the child was stopped by a signal; or the child was resumed by a signal. In the case of a terminated child, performing a wait allows the system to release the resources associated with the child; if a wait is not performed, then the terminated child remains in a "zombie" state. (ref)

On success, waitpid() returns the process ID of the child whose state has changed; On error, -1 is returned. In the next blocks 2, 3, 4 and 5 what happens is described in this answer I got on ReverseEngineering. There's no need to add anything more.

Anti-VM code

This is where things become fun and interesting. We can observe a bunch of mov instructions into the stack, a loop and inside of it an interesting xor instruction: xor eax, 75h. It seems to be a loop that cycle 0x32 times (50in decimal) and starting from [ebp+command] it xors one byte at a time to a fixed value equal to \x75. Pretty standard XOR decryption routine, right? We can try to replicate this routine in python:

#!/usr/bin/python
 
hexdata = "19061605005509551207100555523D0C051007031C061A07525509550107555811555255525509551600015558114F55581347"
binary = hexdata.decode("hex")
 
def xor_strings(data):
    return "".join(chr(ord(data[i]) ^ 0x75) for i in range(len(data)))
 
xored = xor_strings(binary)
print "Your decrypted string is: " + xored
root@kali:/opt/ctf# ./script.py 
Your decrypted string is: lscpu | grep 'Hypervisor' | tr -d ' ' | cut -d: -f2

Basically, it decrypts in memory a shell command and execute it via the next popen syscall that verifies, using the lscpu command, if the CPU name contains a string Hypervisor. This syscall looks pretty interesting:

The  popen() function opens a process by creating a pipe, forking, and invoking the shell. Since a pipe is by definition unidirectional, the type argument may specify only reading or writing, not both; the resulting stream is correspondingly read-only or write-only. The command argument is a pointer to a null-terminated string containing a shell command line.  This command is passed to /bin/sh using the -c flag; interpretation, if any, is performed by the shell. The type argument is a pointer to a null-terminated string which must contain either the letter 'r' for reading or the letter 'w' for writing. popen(): on success, returns a pointer to an open stream that can be used to read or write to the pipe; if the fork(2) or pipe(2) calls fail, or if the function cannot allocate memory, NULL is returned.

After the stream is opened, another syscall fgetc() is executed.

fgetc() reads the next character from stream and returns it as an        unsigned char cast to an int, or EOF on end of file or error.

What happens is simple: it opens a stream, in read-only mode, and executes the command 'lscpu | grep 'Hypervisor' | tr -d ' ' | cut -d: -f2' . If it returns something, so the grep returns something, we're in a virtual machine, prints the string: [-] No vm please ;) and exit. If the stream fails or it does not return anything, it closes the stream via the fclose() syscall and returns.

Everything becomes clear if we look now into the pseudo-code, with important variables renamed as their role.

int AntiDebugAntiVM()
{
  char command; // [esp+4h] [ebp-54h]
  char v2; // [esp+5h] [ebp-53h]
  char v3; // [esp+6h] [ebp-52h]
  char v4; // [esp+7h] [ebp-51h]
  char v5; // [esp+8h] [ebp-50h]
  char v6; // [esp+9h] [ebp-4Fh]
  char v7; // [esp+Ah] [ebp-4Eh]
  char v8; // [esp+Bh] [ebp-4Dh]
  char v9; // [esp+Ch] [ebp-4Ch]
  char v10; // [esp+Dh] [ebp-4Bh]
  char v11; // [esp+Eh] [ebp-4Ah]
  char v12; // [esp+Fh] [ebp-49h]
  char v13; // [esp+10h] [ebp-48h]
  char v14; // [esp+11h] [ebp-47h]
  char v15; // [esp+12h] [ebp-46h]
  char v16; // [esp+13h] [ebp-45h]
  char v17; // [esp+14h] [ebp-44h]
  char v18; // [esp+15h] [ebp-43h]
  char v19; // [esp+16h] [ebp-42h]
  char v20; // [esp+17h] [ebp-41h]
  char v21; // [esp+18h] [ebp-40h]
  char v22; // [esp+19h] [ebp-3Fh]
  char v23; // [esp+1Ah] [ebp-3Eh]
  char v24; // [esp+1Bh] [ebp-3Dh]
  char v25; // [esp+1Ch] [ebp-3Ch]
  char v26; // [esp+1Dh] [ebp-3Bh]
  char v27; // [esp+1Eh] [ebp-3Ah]
  char v28; // [esp+1Fh] [ebp-39h]
  char v29; // [esp+20h] [ebp-38h]
  char v30; // [esp+21h] [ebp-37h]
  char v31; // [esp+22h] [ebp-36h]
  char v32; // [esp+23h] [ebp-35h]
  char v33; // [esp+24h] [ebp-34h]
  char v34; // [esp+25h] [ebp-33h]
  char v35; // [esp+26h] [ebp-32h]
  char v36; // [esp+27h] [ebp-31h]
  char v37; // [esp+28h] [ebp-30h]
  char v38; // [esp+29h] [ebp-2Fh]
  char v39; // [esp+2Ah] [ebp-2Eh]
  char v40; // [esp+2Bh] [ebp-2Dh]
  char v41; // [esp+2Ch] [ebp-2Ch]
  char v42; // [esp+2Dh] [ebp-2Bh]
  char v43; // [esp+2Eh] [ebp-2Ah]
  char v44; // [esp+2Fh] [ebp-29h]
  char v45; // [esp+30h] [ebp-28h]
  char v46; // [esp+31h] [ebp-27h]
  char v47; // [esp+32h] [ebp-26h]
  char v48; // [esp+33h] [ebp-25h]
  char v49; // [esp+34h] [ebp-24h]
  char v50; // [esp+35h] [ebp-23h]
  char v51; // [esp+36h] [ebp-22h]
  char v52; // [esp+37h] [ebp-21h]
  int stat_loc; // [esp+38h] [ebp-20h]
  __pid_t ParentPID; // [esp+3Ch] [ebp-1Ch]
  FILE *stream; // [esp+40h] [ebp-18h]
  __pid_t ChangedStateChildPID; // [esp+44h] [ebp-14h]
  __pid_t ChildPID; // [esp+48h] [ebp-10h]
  unsigned int i; // [esp+4Ch] [ebp-Ch]

  ChildPID = fork();
  if ( !ChildPID )
  {
    ParentPID = getppid();
    if ( ptrace(PTRACE_ATTACH, ParentPID, 0, 0) )
    {
      stat_loc = 1;
      exit(1);
    }
    sleep(5u);
    ptrace(PTRACE_DETACH, ParentPID, 0, 0);
    exit(0);
  }
  if ( ChildPID == -1 )
    exit(1);
  do
    ChangedStateChildPID = waitpid(ChildPID, &stat_loc, 0);
  while ( ChangedStateChildPID == -1 && *__errno_location() == 4 );
  if ( BYTE1(stat_loc) )
  {
    printf("[-] You fool, nobody debugs me!!!\n%d\n", BYTE1(stat_loc));
    exit(-1);
  }
  command = 0x19;
  v2 = 6;
  v3 = 0x16;
  v4 = 5;
  v5 = 0;
  v6 = 0x55;
  v7 = 9;
  v8 = 0x55;
  v9 = 0x12;
  v10 = 7;
  v11 = 0x10;
  v12 = 5;
  v13 = 0x55;
  v14 = 0x52;
  v15 = 0x3D;
  v16 = 0xC;
  v17 = 5;
  v18 = 0x10;
  v19 = 7;
  v20 = 3;
  v21 = 0x1C;
  v22 = 6;
  v23 = 0x1A;
  v24 = 7;
  v25 = 0x52;
  v26 = 0x55;
  v27 = 9;
  v28 = 0x55;
  v29 = 1;
  v30 = 7;
  v31 = 0x55;
  v32 = 0x58;
  v33 = 0x11;
  v34 = 0x55;
  v35 = 0x52;
  v36 = 0x55;
  v37 = 0x52;
  v38 = 0x55;
  v39 = 9;
  v40 = 0x55;
  v41 = 0x16;
  v42 = 0;
  v43 = 1;
  v44 = 0x55;
  v45 = 0x58;
  v46 = 0x11;
  v47 = 0x4F;
  v48 = 0x55;
  v49 = 0x58;
  v50 = 0x13;
  v51 = 0x47;
  v52 = 0;
  for ( i = 0; i <= 50; ++i )
    *(&command + i) ^= 0x75u;
  stream = popen(&command, "r");
  if ( stream && fgetc(stream) != -1 )
  {
    puts("[-] No vm please ;)");
    exit(-1);
  }
  return fclose(stream);
}

First round of conclusions

Right now it may seem pretty easy, but for me at that time, this was impossible to understand and represented the first big fail: I was not prepared with interpreting assembly XOR instruction, decryption loops and Linux syscalls. I spent almost an entire weekend on this and failed so hard. Because of the time constraints of the job selection, I sent my results without this last exercise and maybe this influenced my performance into the selection. How to bypass all these checks? We need to find from where this function is called and maybe we could modify the code flow to avoid this calling.

Jumping away

With the IDA basic functionalities, we can find where this function is called and, luckily for us, it's called from a single location:

The instruction that calls the function is located inside this sub_E00 and, in particular, IDA shows that's the instruction: call ds:(off_2EF0-3000h) [ebx+edi*4]. Looking around this code we can patch the jz short loc_E55 into a jmp, so we would be able to circumvent all of the above protections.

Cheating with the shell

If you don't want to patch the binary, there's another way to fool this VM check, but not the anti-debug. If you notice, the command passed as an argument to the popen syscall is a normal shell command but with a relative path. So quick and dirty trick would be to create a fake lscpu like this:

#!/bin/bash
echo "I will run you anyway in this VM"

Be sure to export the directory inside the PATH variable and, basically, you're done: when the binary will try to execute the lscpu command, it will run the fake one, it won't return anything containing Hypervisor string, the grep would return nothing and the fgetc consequently will read nothing. Basically, all checks are positive. Easy as it seems.

Analyzing the self decrypting and injecting routine

We can take advantages of the debugging capabilities of IDA and playing with breakpoints. Single stepping into the program flow, after the above routines, we land into this interesting piece of code:  

I spent a lot of days trying to understand this routine: but it was worth it because I learned a lot: I learned about linux syscalls like mprotect, calloc and also memcpy. I learned about how the code could auto-decrypt and auto-inject inside the binary itself. Moreover, how can be possible to change memory protections back and forth. Indeed, it was very helpful to look around this code, side by side, with its decompiled version:

int sub_CB5()
{
  char v0; // si
  size_t v1; // eax
  char s; // [esp+8h] [ebp-30h]
  char v4; // [esp+9h] [ebp-2Fh]
  char v5; // [esp+Ah] [ebp-2Eh]
  char v6; // [esp+Bh] [ebp-2Dh]
  char v7; // [esp+Ch] [ebp-2Ch]
  char v8; // [esp+Dh] [ebp-2Bh]
  char v9; // [esp+Eh] [ebp-2Ah]
  char v10; // [esp+Fh] [ebp-29h]
  char v11; // [esp+10h] [ebp-28h]
  char v12; // [esp+11h] [ebp-27h]
  char v13; // [esp+12h] [ebp-26h]
  char v14; // [esp+13h] [ebp-25h]
  char v15; // [esp+14h] [ebp-24h]
  char v16; // [esp+15h] [ebp-23h]
  char v17; // [esp+16h] [ebp-22h]
  char v18; // [esp+17h] [ebp-21h]
  char v19; // [esp+18h] [ebp-20h]
  char v20; // [esp+19h] [ebp-1Fh]
  char v21; // [esp+1Ah] [ebp-1Eh]
  char v22; // [esp+1Bh] [ebp-1Dh]
  void *src; // [esp+1Ch] [ebp-1Ch]
  _BYTE *v24; // [esp+20h] [ebp-18h]
  void *addr; // [esp+24h] [ebp-14h]
  size_t n; // [esp+28h] [ebp-10h]
  size_t i; // [esp+2Ch] [ebp-Ch]

  n = 320;
  addr = 0;
  v24 = &unk_E78;
  mprotect(0, (size_t)((char *)&unk_E78 - 0xFFFFD000 - 12288), 6);
  s = 0xF9u;
  v4 = 0xFCu;
  v5 = 0xFFu;
  v6 = 0xE6u;
  v7 = 0xF5u;
  v8 = 0xE0u;
  v9 = 0xF1u;
  v10 = 0xF3u;
  v11 = 0xFBu;
  v12 = 0xF9u;
  v13 = 0xFEu;
  v14 = 0xF7u;
  v15 = 0xFDu;
  v16 = 0xE9u;
  v17 = 0xF3u;
  v18 = 0xFFu;
  v19 = 0xF4u;
  v20 = 0xF5u;
  v21 = 0;
  src = calloc(0x141u, 1u);
  for ( i = 0; i < n; ++i )
  {
    v22 = *((_BYTE *)sub_89B + i);
    v0 = v22 ^ 0x90;
    v1 = strlen(&s);
    *((_BYTE *)src + i) = *(&s + i % v1) ^ v0;
  }
  memcpy(sub_89B, src, n);
  return mprotect(addr, v24 - (_BYTE *)addr, 4);
}

TL;DR

Before we go deep into the details of the single blocks of code, giving a general overview of what its final purpose is, may help its comprehension. First thing, the code changes via mprotect function the memory protections, adding the write permission, of a specific part of its .text section. After that, it copies, into the stack, some bytes that will be revealed as a key for an afterward decryption. Before entering into the main loop, it allocates an array of bytes into the heap via calloc. Specifically, the length of the array is 0x140 bytes; this value is saved into a local variable placed into the stack at [ebp+n] offset. The main loop is somehow complicated because it xors byte per byte some of its code, placed at sub_89B+i offset, with a fixed constant 0x90 and after, it xors it again with the aforementioned key on the stack. After that, it overwrites the code placed at sub_89B offset, with these new values via the memcpy call and returns after changing again the memory protections of that code section back to read-execute. Let's break in line by line, considering only the useful ones.

Here, it setups the length of the future array in the variable placed on the stack at [ebp+n] with the size of 0x140 or 320 elements of 1 byte. After that, it prepares the arguments of the next call to mprotect, that will change the protection, enabling write permission, on the the address 0x5657D000. Looking up the stack:

Having ESP pointing at 0xFFC344F0, the calling convention dictate that the arguments of a function must be pushed into the stack in reverse order. The mprotect call is defined as: int mprotect(void *addr, size_t len, int prot); with

  • prot = 6
  • len = 0xE78
  • *addr = 0x5657D000

In other words: change the permission of the memory area of 3704 bytes starting from address 0x5657D000, granting the writability via the PROT_WRITE constant. More info of this syscall here. But what's inside this address? We're inside the ELF header, basically the start of the entire binary.

Going further, we can see the moving into the stack of some bytes, a call to calloc to allocate an array of 320+1 null bytes into the heap and the setup of a loop counter variable, placed at [ebp+var_C], with the same size of the array. We're setting up a loop that will scan, byte per byte, a specific area of the binary located at 0x5657D89B - that is a fixed value - and xor every byte, first with 0x90 and after with those bytes that were moved into the stack. For better understand this loop, I suggest to read the answer I got here. When this decryption loop ends, we have the decrypted code inside the heap, into the allocated array. Code can now be replaced with the decrypted one via the memcpy syscall. Finally, write permission can now be disabled and the routine can finish and return.

Second round of conclusions

Many days and months passed staring at me failing so hard into the understanding of this routine. But the feeling was still the same: I wanted to have that "[+] Good job! ;)" string and I've always had the Try Harder approach. Understanding this loop wasn't easy, not even close. I asked for help and, luckily, I got plenty. This is what I got: don't be afraid to ask for help but don't blindly ask for a solution. Work on that, demonstrate that you studied that thing and failed; People, eventually, will get that and will help you.

Towards the victory

After executing the decryption function we land into the code below. First it verifies that the user submitted a password of the length of exactly 0x27, that is a fixed value coming from this instruction: mov eax, (dword_56561058 - 56561000h) [ebx].

Only if the password is exactly 39 characters, it moves on into the DecryptedFunction, passing the user's password as the argument. The previous experience helped a lot to understand this function and the pseudo code generated by IDA is pretty nice.

int __cdecl DecryptedFunction(int UserSubmittedPassword)
{
  int result; // eax
  char v2; // [esp+0h] [ebp-38h]
  char v3; // [esp+1h] [ebp-37h]
  char v4; // [esp+2h] [ebp-36h]
  char v5; // [esp+3h] [ebp-35h]
  char v6; // [esp+4h] [ebp-34h]
  char v7; // [esp+5h] [ebp-33h]
  char v8; // [esp+6h] [ebp-32h]
  char v9; // [esp+7h] [ebp-31h]
  char v10; // [esp+8h] [ebp-30h]
  char v11; // [esp+9h] [ebp-2Fh]
  char v12; // [esp+Ah] [ebp-2Eh]
  char v13; // [esp+Bh] [ebp-2Dh]
  char v14; // [esp+Ch] [ebp-2Ch]
  char v15; // [esp+Dh] [ebp-2Bh]
  char v16; // [esp+Eh] [ebp-2Ah]
  char v17; // [esp+Fh] [ebp-29h]
  char v18; // [esp+10h] [ebp-28h]
  char v19; // [esp+11h] [ebp-27h]
  char v20; // [esp+12h] [ebp-26h]
  char v21; // [esp+13h] [ebp-25h]
  char v22; // [esp+14h] [ebp-24h]
  char v23; // [esp+15h] [ebp-23h]
  char v24; // [esp+16h] [ebp-22h]
  char v25; // [esp+17h] [ebp-21h]
  char v26; // [esp+18h] [ebp-20h]
  char v27; // [esp+19h] [ebp-1Fh]
  char v28; // [esp+1Ah] [ebp-1Eh]
  char v29; // [esp+1Bh] [ebp-1Dh]
  char v30; // [esp+1Ch] [ebp-1Ch]
  char v31; // [esp+1Dh] [ebp-1Bh]
  char v32; // [esp+1Eh] [ebp-1Ah]
  char v33; // [esp+1Fh] [ebp-19h]
  char v34; // [esp+20h] [ebp-18h]
  char v35; // [esp+21h] [ebp-17h]
  char v36; // [esp+22h] [ebp-16h]
  char v37; // [esp+23h] [ebp-15h]
  char v38; // [esp+24h] [ebp-14h]
  char v39; // [esp+25h] [ebp-13h]
  char v40; // [esp+26h] [ebp-12h]
  unsigned __int8 v41; // [esp+27h] [ebp-11h]
  int counter; // [esp+28h] [ebp-10h]
  int v43; // [esp+2Ch] [ebp-Ch]

  v43 = 0;
  v2 = 0x93u;
  v3 = 0x5E;
  v4 = 0xB0u;
  v5 = 0xB8u;
  v6 = 0xC5u;
  v7 = 0xD7u;
  v8 = 0xACu;
  v9 = 0x23;
  v10 = 0xC3u;
  v11 = 0xF0u;
  v12 = 6;
  v13 = 0x72;
  v14 = 0xF4u;
  v15 = 0x74;
  v16 = 0x93u;
  v17 = 0x52;
  v18 = 0x74;
  v19 = 0x72;
  v20 = 0x30;
  v21 = 0xEDu;
  v22 = 0x8Bu;
  v23 = 0x3D;
  v24 = 4;
  v25 = 0x58;
  v26 = 0xD8u;
  v27 = 0xE5u;
  v28 = 0xA2u;
  v29 = 0xCFu;
  v30 = 0x8Au;
  v31 = 0xEDu;
  v32 = 0x8Bu;
  v33 = 0x5C;
  v34 = 0x5E;
  v35 = 0x61;
  v36 = 0xDCu;
  v37 = 0x31;
  v38 = 0xCFu;
  v39 = 0x91u;
  v40 = 0x82u;
  for ( counter = 0; counter < PasswordLength; ++counter )
  {
    v41 = *((_BYTE *)AntiAnalysisFunction + counter + 0xC7);
    if ( (v41 ^ *(_BYTE *)(counter + UserSubmittedPassword)) != *(&v2 + counter) )
    {
      v43 = 1;
      break;
    }
  }
  if ( v43 )
    result = puts("[-] Nope!");
  else
    result = puts("[+] Good job! ;)");
  return result;
}

It scans the user's password, character by character, xoring it with a string retrieved from the binary itself. If every character matches it goes on and continue in the loop, otherwise it breaks. In the end, if everything is correct, it prints the beloved success string. How can we retrieve the correct flag? If we dump the 39 bytes from the binary, from the correct addresses, and xor them with the hardcoded string, we can take advantage of the xor bidirectional nature. Although you can find more details here, we're basically telling this:

A xor B = C
A xor C = B
B xor C = A

My first approach was to bruteforce the routine: if the string submitted is, eventually, \x41\x41\x41\x41\x41\x41\x41... we can step by step into the code and go into the final cmp instruction, retrieve the byte that it compares to and change the ZERO flag to force the loop to continue and not to stop. Otherwise we can dump the contents of the memory and xor with the hardcoded string, as result we get the flag that needs to be submitted to the binary.

We know that we need to get 39 bytes from address *((_BYTE *)AntiAnalysisFunction + 0 + 0xC7) to *((_BYTE *)AntiAnalysisFunction + 0x27 + 0xC7). Or from  (0x5662A9DC + 0 + 0xC7) = 0x5662AAA3 to 0x5662AACA = (0x5662A9DC + 0x27 + 0xC7). We can apply the xor operation with the known string and we're able retrieve the flag, finally.

Hardcoded: 93 5E B0 B8 C5 D7 AC 23 C3 F0 06 72 F4 74 93 52 74 72 30 ED 8B 3D 04 58 D8 E5 A2 CF 8A ED 8B 5C 5E 61 DC 31 CF 91 82 
             
Memory dump: E8 18 FC FF FF 83 C4 10 85 C0 74 11 C7 45 E0 01 00 00 00 83 EC 0C 6A 01 E8 90 FB FF FF 83 EC 0C 6A 05 E8 46 FB FF FF

Flag hex: 7B 46 4C 47 3A 54 68 33 46 30 72 63 33 31 73 53 74 72 30 6E 67 31 6E 59 30 75 59 30 75 6E 67 50 34 64 34 77 34 6E 7D
Flag ascii:  {  F  L  G  :  T  h  3  F  0  r  c  3  1  s  S  t  r  0  n  g  1  n  Y  0  u  Y  0  u  n  g  P  4  d  4  w  4  n  }

Conclusions

This was a long journey that required a lot of effort and countless sleepless nights. It was worth it? Every single minute, without any doubt. I hope this post will help you in your studies and if you spot any errors or want to help me in my journey into the reverse engineering world please leave a comment, tweet or e-mail.

As always, Try Harder.