Type punning in C and the strict aliasing rule
Sunday, June 2, 2024
A common idiom in C for reinterpreting the bits of one type as another type is as follows:
int32_t get_mantissa(float f) { int32_t i = *(int *)&f; return i & ((1 << 23) - 1); }
Reading this code may raise some flags relating to implementation-defined
behavior. For one thing, we are assuming that the float
type
conforms to IEEE-754, which is not strictly required by any ISO C standard.
In addition, the code assumes that the endianness of floats matches that of
ints.
While this implementation-defined behavior can be justified, there is a much more serious issue with the code: the function invokes undefined behavior.
In short,
the C standard forbids accessing the same area of memory as two distinct
types. There are exceptions to this rule—for instance, any readable memory
can be read as a sequence of bytes using the char
type.
This “strict aliasing” rule gives compilers more room for optimization, and in many cases this can have a significant effect on performance.
Unfortunately, it is quite easy for programmers to make mistakes around this, especially considering that C programmers are often inclined to think of memory as a very large array of generic bytes.
The correct way to reinterpret the bits of one type as the bits of another, according to the standard, is to use a union:
int32_t get_mantissa(float f) { union { float f; int32_t i; } u = {.f = f}; return u.i & ((1 << 23) - 1); }
But this is limited by the fact that you have to copy the value before you can reinterpret it. Under the strict aliasing rule, if you have an array of one type and want to convert it into a byte-for-byte identical array of a different type, you have to copy the entire array.
The strict aliasing rule also makes it impossible to implement
malloc
in pure C. Since separate invocations of the function need
to be able to return pointers to different sections of the same buffer of
memory, you cannot safely dereference the pointers as different types.
void * my_malloc(size_t size) { static char alignas(max_align_t) buffer[256]; static size_t offset; size_t remainder = size % alignof(max_align_t); if (remainder != 0) { size += alignof(max_align_t) - remainder; } if (size > sizeof(buffer) - offset) { return NULL; } void *ptr = &buffer[offset]; offset += size; return ptr; } int main(void) { int *a = my_malloc(sizeof(int)); float *b = my_malloc(sizeof(float)); *a = 0; *b = 0.5f; // Strict aliasing is violated on this line }
To get behavior more like what you might expect, GCC supports the
-fno-strict-aliasing
flag, which prevents optimizations that depend
on the strict aliasing rule. Many free software projects, including the Linux
kernel, explicitly ignore the strict aliasing rule.