LastSecondsToLive February 2016

Endianness macro in C

I recently saw this post about endianness macros in C and I can't really wrap my head around the first answer.

Code supporting arbitrary byte orders, ready to be put into a file called order32.h:

#ifndef ORDER32_H
#define ORDER32_H

#include <limits.h>
#include <stdint.h>

#if CHAR_BIT != 8
#error "unsupported char size"
#endif

enum
{
    O32_LITTLE_ENDIAN = 0x03020100ul,
    O32_BIG_ENDIAN = 0x00010203ul,
    O32_PDP_ENDIAN = 0x01000302ul
};

static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
    { { 0, 1, 2, 3 } };

#define O32_HOST_ORDER (o32_host_order.value)

#endif

You would check for little endian systems via

O32_HOST_ORDER == O32_LITTLE_ENDIAN

I do understand endianness in general. This is how I understand the code:

  1. Create example of little, middle and big endianness.
  2. Compare test case to examples of little, middle and big endianness and decide what type the host machine is of.

What I don't understand are the following aspects:

  1. Why is an union needed to store the test-case? Isn't uint32_t guaranteed to be able to hold 32 bits/4 bytes as needed? And what does the assignment { { 0, 1, 2, 3 } } mean? It assigns the value to the union, but why the strange markup with two braces?
  2. Why the check for CHAR_BIT? One comment mentions that it would be more useful to check UINT8_MAX? Why is char even used here, when it's not guaranteed to be 8 bits wide? Why not just use uint8_t? I found this link to Google

Answers


DigitalRoss February 2016

Why is a union needed to store the test case?

    The entire point of the test is to alias the array with the magic value the array will create.

Isn't uint32_t guaranteed to be able to hold 32 bits/4 bytes as needed?

    Well, more-or-less. It will but other than 32 bits there are no guarantees. It would fail only on some really fringe architecture you will never encounter.

And what does the assignment { { 0, 1, 2, 3 } } mean? It assigns the value to the union, but why the strange markup with two braces?

    The inner brace is for the array.

Why the check for CHAR_BIT?

    Because that's the actual guarantee. If that doesn't blow up, everything will work.

One comment mentions that it would be more useful to check UINT8_MAX? Why is char even used here, when it's not guaranteed to be 8 bits wide?

    Because in fact it always is, these days.

Why not just use uint8_t? I found this link to Google-Devs github. They don't rely on this check... Could someone please elaborate?

    Lots of other choices would work also.


dbush February 2016

The initialization has two set of braces because the inner braces initialize the bytes array. So byte[0] is 0, byte[1] is 1, etc.

The union allows a uint32_t to lie on the same bytes as the char array and be interpreted in whatever the machine's endianness is. So if the machine is little endian, 0 is in the low order byte and 3 is in the high order byte of value. Conversely, if the machine is big endian, 0 is in the high order byte and 3 is in the low order byte of value.


viraptor February 2016

{{0, 1, 2, 3}} is the initializer for the union, which will result in bytes component being filled with [0, 1, 2, 3].

Now, since the bytes array and the uint32_t occupy the same space, you can read the same value as a native 32-bit integer. The value of that integer shows you how the array was shuffled - which really means which endian system are you using.

There are only 3 popular possibilities here - O32_LITTLE_ENDIAN, O32_BIG_ENDIAN, and O32_PDP_ENDIAN.

As for the char / uint8_t - I don't know. I think it makes more sense to just use uint_8 with no checks.

Post Status

Asked in February 2016
Viewed 1,536 times
Voted 14
Answered 3 times

Search




Leave an answer