Disassembling Hello, World! in GNU Project Debugger
Compiling C code and disassembling the program with GDB
Last updated
Compiling C code and disassembling the program with GDB
Last updated
Before enrolling in the software security module for my Masters' program, malware analysis seemed like a pretty daunting subject to me. This is not to say that it's not daunting anymore, but it surely is more fun than I expected.
As I have very little knowledge of C and x86 assembly language, it does get a little confusing when our professor talks about static analysis and disassembling the binary malware code. So, I decided the best way to understand it is to do it myself. Like you unravel a piece of paper to understand how the origami was made.
I turned to the wonderful book by Jon Erickson, Hacking: The Art of Exploitation. The title seems a little funny at first but trust me, it is an amazing book to start learning C and Assembly. Especially for a focus on binary exploitation and malware development. Moreover, the book also contains chapters on Networking and Cryptology so grab your copy today! (I was not paid to say this I just love the book)
Anyways, let's dive into my first disassembly: Printing "Hello, World!" in C.
Note: The below code is referenced from Jon Erickson's book but the actual disassembly is different as the book is working with a 32-bit architecture and my system works with 64-bit.
The code is fairly simple, a for-loop iterates over "Hello, World!" 10 times. Now, let's compile this and run the program.
Great, the program is working as intended. Now, let's get to the fun part. For an in-depth understanding of the disassmbled program, Jon recommends using the source code along with the assembly instructions.
I will use the GNU Project Debugger aka GDB to disassemble and step-through the program. GDB is usually pre-installed in many distributions but if you don't have it, you can download it here.
Next, I will disassemble the main
function and set a breakpoint on the very first line of the function.
Now, I will run the program and then look at the contents of the rip
register. The rip
register is an instruction pointer which does what it says on the label: points at the current instruction.
Let's disassemble the main function again to understand where rip
is actually pointing.
Bingo! The rip
is currently pointing at the instruction mov DWORD PTR [rbp-0x4],0x0
(we also have an arrow =>
for extra readability, thanks GDB!). There are 3 instructions before that but those represent a function prologue and we are currently not interested in that.
Now, what the current rip
instruction does is put the value 0 in the memory address rbp-4
. This is initialising the variable i
which must be 0 before the function begins. After we execute this instruction, the value at rbp-4
should be 0. Let's see if that is really the case.
Okay, that's just gibberish. Actually, it is not! The print
command calculates the value of whatever address we give it and stores it in a temporary variable. First, we calculate the value of rbp-4
and store it in the variable $1
. Now, we execute the next instruction with nexti
. Finally, we look at the $1
variable again with x/4b $1
(x
is the shorthand for examine command, 4b
tells the debugger to print out the value in 4 bytes format). So, the instruction did what we thought it was going to do. The value of rbp-4
is now 0. Let's move on.
The above instruction was executed so obviously, the rip
should be pointing at the next instruction now. I will look at the rip
register again but this time I will also look at the 10 following instructions.
Now, the program is jumping to another address. This address is 0x55555555515d
or <main+36>
. We can actually look at <main+36>
and see that this address contains a cmp
or compare instruction. Let's look at a group of instructions at once because that will result in a better understanding of the program.
The cmp
instruction compares the value of rbp-4
with 9. The next instruction jle
(jump if less than or equal to) tells the program to jump to the address <main+17>
if the value at rbp-4
is less than or equal to 9. The next three instructions are basically an else block telling the program to exit if the above instruction results false.
As we already know, the value at rbp-4
is 0, so the program should follow the jump instruction. Let's execute the next few instructions.
Great, now the rip
is pointing at the address <main+17>
. What is this instruction doing? The lea
instruction is short for Load Effective Address. This instruction is calculating the address of the value rip+0xeb3
and storing it in the rax
register for faster access.
There's something interesting here, the debugger is also printing the actual address of this instruction as a comment on the right. Why is this address so important? Let's find out what is stored here.
Ok, we get some hexadecimal values. But, if you have worked with hexadecimal values before, you can see a pattern here. These byte sized values resemble ASCII characters. You can use any online hex to ASCII converter to convert these values into human-readable format.
Et voila! The value at this address is the string "Hello, World!". We can also use the string
command in GDB to convert the value into ASCII format.
Great, we found the string let's move on to the next instruction.
The next few instructions just print the "Hello, World!" string on the console. First the value at rax
is stored in rdi
destination index register, then the <puts@plt>
function is called.
One thing to notice here, if I have used the printf
function in my code then why is the program using puts
? This is because the printf
operation in the code is trivial and does not need any printf
specific features (e.g. %s
string identifier or referencing a variable). Thus, it is far more efficient for the program to use puts
. A more detailed explanation is in the following Stackoverflow answer and another article from 2005.
Now, we have reached the end of the program. The next few instructions tell the program to add 1 into rbp-4
. Then, the program enters the previous loop: jump if less than or equal to, else exit. Naturally, this loop goes on until the value reaches 10. Then, the program exits. If we want, we could iterate over the loop 9 times inside the debugger but that is unnecessary. We have already understood the inner workings of the program.
This disassembly was extremely fun! Though I already had the source code it was enlightening nevertheless to see how these high-level programming language instructions are compiled into assembly instructions. I hope this was an interesting read, thank you for reading and I will see you in the next post!