How I do (type-safe) container types in C
Recently, after seeing two articles on how to achieve container-types in C, I decided I'd also write one.
Martin Uecker's article | HN thread
Daniel Hooper's article | lobste.rs thread
Why am I not satisfied with these two articles
The only correct reason is that I suffer from the not-invented-here syndrome, but I have some complaints that would make me not want to use these implementations.
Uecker's way
#define vec(T) struct vec_##T { ssize_t N; T data[/* .N */]; }
-- Martin Uecker, Generic Containers in C: vec
This is how I started doing "generics" in C and I quickly ran into issues with having the macro define the name of the Vec.
For "simple types" this works great, vec(int)
would expand to:
struct vec_int { ssize_t N, int data[] }
But for more complex types this would break down pretty quickly.
struct MyValue {int a, int b}
vec(struct MyValue)
Would expand to:
struct vec_struct MyValue { ssize_t N, struct MyValue data[] }
And it would result in invalid C. This can be worked around by typedef
-ing the
struct
instead but it would force me to typedef
pointers to values and I don't
like how it "pollutes" my namespace. I'm also not that imaginative so if I can
not name something I usually go that route.
Overall it's not a bad way of doing things, but my real gripe comes with the way that the logic is implemented.
#define vec_push(T, v, x) \ ({ \ vec(T) **_vp = (v); \ ssize_t _N = (*_vp)->N + 1; \ ssize_t _S = _N * (ssize_t)sizeof((*_vp)->data[0]) \ + (ssize_t)sizeof(vec(T)); \ if (!(*_vp = realloc(*_vp, _S))) abort(); \ (*_vp)->N++; \ (*_vp)->data[_N - 1] = (x); \ })
-- Martin Uecker, Generic Containers in C: vec
Having done that in the past, I now tend to avoid including too much logic
inside my macros because I find that they lead to cryptic error messages and
sometimes variable name clashes. (Again I suck I naming things)
I've spent too much time grep-ing through the output of cpp
and I've now switched
to doing something else.
Hooper's way
I find that Hooper does container types in a very similar way to myself.
#define List(type) union { \ ListNode *head; \ type *payload; \ }
-- Daniel Hooper, Type Safe Generic Data Structures in C
Defining an unnamed union avoids the complex type problem we ran into with the other implementation, but as Hooper points out, without doing anything else, this would result in type errors when expanding the macro more than once.
From Hooper's article:
List(Foo) a; List(Foo) b = a; // error void my_function(List(Foo) list); my_function(a); // error: incompatible type
Even though the variables have identical type definitions, the compiler still errors because they are two distinct definitions. A
typedef
avoids the issue:typedef List(Foo) ListFoo; // this makes it all work ListFoo a; ListFoo b = a; // ok void my_function(ListFoo list); my_function(a); // ok List(Foo) local_foo_list; // still works
-- Daniel Hooper, Type Safe Generic Data Structures in C
I personally don't like how expansions of the same macro won't point back to the same type. C23 "fixed" this behaviour with it's named record equivalence rule, but for it to work we would need to make the name of the type part of the macro and we would run in the same issue with complex types.
My way
I do it much in the same way as Hooper. I declare a "base implementation" of my datastructure that every generic version will wrap.
// aligned on eights (or fours on 32 bit machines)
struct Vec {
size_t len;
size_t cap;
void *data;
};
And a macro to define a new type of that datastructure.
#define VecDef(_type) \
typedef struct { \
struct Vec inner; \
_type *phantom[0]; \
}
Inserting the typedef directly in the macro allows me to define this type and
to export it rather than re-expanding the same macro every time. The main
drawback of this approach is that typedefs cannot be forward declared but
structs
can.
Take note that the phantom
field is a zero sized array of pointers to _type
,
this way I can forward declare _type
. Also as a bonus, the zero size array is
a zero-sized type (duh) and, in this case, adds no additional padding.
VecDef(int) IntVec;
VecDef(struct Pos) PosVec;
To then get some type safety, I make use of C11's _Generic
keyword.
#define vecPush(vec, data) _Generic((data), typeof(**((vec)->phantom)): \
vec_push(&(vec)->inner, sizeof(**((vec)->phantom)), &(data)))
By using _Generic
here I'm able to check that the type passed in matches
exactly the type expected.
VecDef(int) IntVec;
IntVec a = {};
char b = 10;
// Controlling expression type 'char' not compatible
// with any generic association type
vecPush(&a, b);
Hooper's way of type checking might be superior since the compiler will tell you which type it expected instead of just saying your the type is incompatible.
He uses the ternary operator to assert that both the parameter and the inner type match.
1 ? (param) : *(vec)->type
Sadly when it comes to reading a value out, I haven't found a way to have as much control, I instead cast the pointer, which works great for pointer types, but I open myself to C type casting rules if I try to dereference that pointer.
#define vecGetPtr(vec, idx) ((typeof(*(vec)->phantom))vec_get_ptr(\
&(vec)->inner, sizeof(**((vec)->phantom)), idx))
IntVec a = {};
// incompatible pointer types
double *r1 = vecGetPtr(&a, 1);
// C silent type casting, meh
double r2 = *vecGetPtr(&a, 1);
This dovetails pretty nicely with C23's auto
keyword were I basically never
have to worry about type mismatch.
// r3 will always be the correct type
auto r3 = *vecGetPtr(&a, 1);
I've found that this technique works pretty well and I've been able to build all the reusable data-structure I've needed with it:
An Hashmap
#define HMapDef(type) \
typedef struct { \
struct HMap inner; \
type *phantom[0]; \
}
A Queue
#define QueueDef(type) \
typedef struct { \
struct Queue inner; \
type *phantom[0]; \
}
And many more.
The only generic data-structure I use not written in this way is my implementation of a primary queue, and I'm planning to rewrite it this way in order to make it type-safe, I just haven't taken the time to do it yet.