Segmentation fault while typecasting unsigned int to pointer: Solved
Solving the segmentation fault error while typecasting pointers in 64-bit systems
Last updated
TL;DR: While typecasting an unsigned int into a pointer, 64-bit operating systems can run into segmentation fault error due to the size difference between an int and a pointer. To fix this error, use uintptr_t data type for the pointer instead of unsigned or signed int.
While going through the typecasting section in the wonderful book by Jon Erickson, I ran into a segmentation fault error. The unique thing is, I followed the same code by Erickson's book but my compiler gave me multiple warnings, while Erickson's code ran successfully without warnings or errors. So why was this happening? The answer could be in the way 64-bit systems work compared to 32-bit systems. Let's look at the code.
hacky_pointer.c
#include<stdio.h>intmain() {int i;char char_array[5] = {'a','b','c','d','e'};int int_array[5] = {1,2,3,4,5};unsignedint hacky_pointer; hacky_pointer = (unsignedint) char_array;for(i=0; i <5; i++) { printf("[hacky_pointer] points to %p, which contains the character %c\n", hacky_pointer, *((char *) hacky_pointer));
hacky_pointer = hacky_pointer +sizeof(char); } hacky_pointer = (unsignedint) int_array;for(i=0; i <5; i++) { printf("[hacky_pointer] points to %p, which contains the integer %d\n", hacky_pointer, *((int *) hacky_pointer));
hacky_pointer = hacky_pointer +sizeof(int); }}
One thing you might notice and which might also feel off about this code is that I am using an unsigned int datatype for the hacky_pointer variable as opposed to just declaring it as a pointer with int *hacky_pointer. This is because we are learning typecasting. Typecasting refers to the feature in C where we can instruct the program to temporarily treat a previously declared variable as a different datatype. For example, look at the following code from the same book.
#include<stdio.h>intmain() {int a, b;float= c; a =13; b =5; c = (float) a / (float) b;printf("%f", c); // prints 2.600000}
In the above code, we are temporarily typecasting the integers a and b into float datatype. This gives us a more accurate result of 2.600000 for the division of 13 by 5.
Now, back to the original problem. What we are doing in this code is declaring an unsigned int variable hacky_pointer. Next, we typecast the character array char_array into an unsigned int and assign it to hacky_pointer. Finally, we iterate through the hacky_pointer by first printing the memory address of this pointer with %p then the character with %c.
We are dereferencing the pointer with *((char *) hacky_pointer), thus accessing the actual data stored at this address. Finally, we increment the hacky_pointer with the sizeof(char) so the memory address properly increments by 1 byte and prints out the correct address and data for next iteration. Ideally, a sample result of this code should be something like this:
[hacky_pointer] points to 0x7fff3b63dc1b, which contains the character a
However, this is where the program runs into an error. When I compiled the above code, I received the following warnings:
$gcc-g-ohacky_pointerhacky_pointer.chacky_pointer.c:Infunction‘main’:hacky_pointer.c:11:21:warning:castfrompointertointegerofdifferentsize [-Wpointer-to-int-cast]11|hacky_pointer= (unsigned int) char_array;|^hacky_pointer.c:14:100:warning:casttopointerfromintegerofdifferentsize [-Wint-to-pointer-cast] 14 | printf("[hacky_pointer] points to %p, which contains the character %c\n", hacky_pointer, *((char *) hacky_pointer));
|^hacky_pointer.c:18:21:warning:castfrompointertointegerofdifferentsize [-Wpointer-to-int-cast]18|hacky_pointer= (unsigned int) int_array;|^hacky_pointer.c:21:98:warning:casttopointerfromintegerofdifferentsize [-Wint-to-pointer-cast] 21 | printf("[hacky_pointer] points to %p, which contains the integer %d\n", hacky_pointer, *((int *) hacky_pointer));
|
These warnings tell me that the pointer is being cast to an integer of a different size. Although the program compiles without errors, these warnings tell me that unexpected behaviour is expected while running the program. Consequently, I ran into a segmentation fault error while executing the program.
Let's debug the binary and see what is the problem.
I will allow the program to run until the instruction at <+70>. Before this instruction, the program is assigning variables and initialising the for-loop. The actual problem should start after the for-loop is initialised and we enter the printf instruction. As expected, the program runs into an error at the following instruction.
(gdb) nextiProgramreceivedsignalSIGSEGV,Segmentationfault.main() at hacky_pointer.c:1414 printf("[hacky_pointer] points to %p, which contains the character %c\n", hacky_pointer, *((char *) hacky_pointer));
To understand why this is happening, we need to start at the very beginning and go through the instructions step-by-step.
In the first few instructions, the program is assigning values to the memory addresses. The values "a, b, c, d" are assigned as DWORD at the address 0xd. These values take up 4 bytes of memory, then the program assigns the final byte "e" at the address 0x9. Now, the program assigns the integers "1, 2, 3, 4, 5". As each integer takes up 4 bytes of memory, we can see the address moving with a difference of 4 bytes, 0x30,0x2c,0x28 and so on.
Now, the program loads the memory address of 0xd into the register rax. This is where the problem begins. In 64-bit systems, the register are 64-bits long. Thus, the register rax can hold a value upto 64-bits long. The register rax is currently holding the address to the char_array, we can confirm this by examining the register.
So far so good? However, in the next few instructions we are setting ourselves up for a disaster. Right after loading the array address into rax, we load the pointer into eax.
Great! We have our data and the pointer set up. But, in the very next instruction we zero out the first 4 bytes of this value. Now, the eax register still contains the address for the char_array, but this address is incomplete. If we try to access the memory location now we will get an error.
This is the problem our program encounters. It tries to access the pointer at an incomplete memory location and runs into an error.
Okay, but why is this even happening? The answer lies in the 64-bit architecture. We are declaring our pointer as an int. An int is 4 bytes long. Therefore, when the program casts the pointer into an int, the size is truncated to 4 bytes. However, in 64-bit architecture, the pointer is 8 bytes long. So, when the program finally reaches the point where it has to cast the int back into a pointer, it runs into a segmentation fault error. A detailed explanation can be found in the following resources.
A simple way to fix this error for our scenario is to use a bigger size datatype to hold the pointer. In C, we can use the uintptr_t data type which is sufficiently long.
Let's make the desired changes and see if our program runs successfully now.
#include<stdio.h>#include<stdint.h>intmain() {int i;char char_array[5] = {'a','b','c','d','e'};int int_array[5] = {1,2,3,4,5};uintptr_t hacky_pointer; // change from unsigned int to uintptr_t hacky_pointer = (uintptr_t) char_array;for(i=0; i <5; i++) { printf("[hacky_pointer] points to %p, which contains the character %c\n", hacky_pointer, *((char *) hacky_pointer));
hacky_pointer = hacky_pointer +sizeof(char); } hacky_pointer = (uintptr_t) int_array;for(i=0; i <5; i++) { printf("[hacky_pointer] points to %p, which contains the integer %d\n", hacky_pointer, *((int *) hacky_pointer));
hacky_pointer = hacky_pointer +sizeof(int); }}
Let's compile and run this.
$gcc-g-ohacky_pointerhacky_pointer.c$./hacky_pointer[hacky_pointer] points to 0x7fffa19e4b2b, which contains the character a[hacky_pointer] points to 0x7fffa19e4b2c, which contains the character b[hacky_pointer] points to 0x7fffa19e4b2d, which contains the character c[hacky_pointer] points to 0x7fffa19e4b2e, which contains the character d[hacky_pointer] points to 0x7fffa19e4b2f, which contains the character e[hacky_pointer] points to 0x7fffa19e4b10, which contains the integer 1[hacky_pointer] points to 0x7fffa19e4b14, which contains the integer 2[hacky_pointer] points to 0x7fffa19e4b18, which contains the integer 3[hacky_pointer] points to 0x7fffa19e4b1c, which contains the integer 4[hacky_pointer] points to 0x7fffa19e4b20, which contains the integer 5
Wonderful! The error is fixed and our program is working fine. Let's also debug this and look at the difference in instructions.
The solution was pretty simple but it taught me a lot about the size of data types and the difference between 32-bit and 64-bit architecture. Thank you for reading, I will see you in the next post!