SciTech.blog
SciTech.blog

Automatic memory management in C

26 Sep 2017, 09:06 • c, memory management

While not part of the language, it's perfectly possible to automatically manage memory in C. Let's look at how to implement a simple object system with garbage collection.

C is a low-level language without most of the fancy stuff modern language offer, but it's still possible to manage memory automatically in it. Automatic memory management is generally desirable, since it facilitates the use of dynamically allocated objects. Let's implement a simple object system (which we might call COX: “C Object eXtensions”) with automatic memory management and polymorphic behavior. In order to achieve polymorphism, every object needs to have a pointer to its type, which in turn contains pointers to polymorphic functions, such as the destructor and a function converting the object into its textual representation (mostly for debugging):

struct cox_type {
	void(*finaliser)(void*);
	cox_string_t(*descriptor)(void*);
};

Every object has a header with a pointer to its type and a reference counter:

struct cox_base {
	struct cox_type* type;
	atomic_int refcount;
};

Note that reference counting needs to be atomic, since an object can be used on many threads simultaneously.

A string object can be declared as follows:

struct cox_string {
	struct cox_base base;
	char* cstr;
	unsigned long len;
};
typedef struct cox_string* cox_string_t;

Types are defined like so:

static struct cox_type cox_string_type = {.finaliser = &cox_string_destroy, .descriptor = &cox_string_describe };

Destroying a string instance involves freeing the memory of the raw string and subsequently the object itself:

static void cox_string_destroy(void* obj) {
	cox_string_t str = obj;
	free(str->cstr);
	free(str);
}

The textual representation of a string is the string itself:

static cox_string_t cox_string_describe(void* obj) {
	return obj;
}

To create an instance of a string we have to allocate a chunk of memory for the object, assign the appropriate type to the header, initialise the reference counter and allocate a chunk of memory for the raw string so we can copy the provided data:

cox_string_t cox_string_create(const char* s) {
	cox_string_t str = panicking_malloc(sizeof(struct cox_string));
	str->base.type = &cox_string_type;
	str->base.refcount = ATOMIC_VAR_INIT(1);
	str->len = strlen(s);
	str->cstr = panicking_malloc(str->len + 1);
	strcpy(str->cstr, s);
	return str;
}

To release an object, we atomically decrease its reference counter by one and finalise it in case there are no more references to it:

void cox_release(void* obj) {
	struct cox_base* base = obj;
	int n = atomic_fetch_add(&base->refcount, -1);
	if (n == 1) {
		base->type->finaliser(obj);
	}
}

NB: atomic_fetch_add is part of C11.

The general pattern is to instantiate an object, use it and then release it once it's no longer needed:

cox_string_t str = cox_string_create("Hello, world!");
...
cox_release(str);

Note that if we add an object to a collection, the collection increases its reference counter so releasing it won't free it, since there'll be another reference.

To avoid the need to explicitly release objects, we'll define the useful __auto attribute:

#define __auto __attribute__((cleanup(cox_release_indirect)))
void cox_release_indirect(void* p) {
	cox_release(*(void**)p);
}

We can now create objects as follows

__auto cox_string_t str = cox_string_create("Hello, world!");

without having to call cox_release once we don't need them. Such objects will be released automatically once the variable pointing to them goes out of scope.

The __auto attribute is useful for local variables, but it can't be used if we want to return an object from a function. In such a case we'll need an autorelease pool:

struct cox_autoreleasepool {
	void** objs;
	unsigned int count;
	unsigned int capacity;
	struct cox_autoreleasepool* next;
	struct cox_autoreleasepool* prev;
};
typedef struct cox_autoreleasepool* cox_autoreleasepool_t;

Every thread has its own stack of autorelease pools temporarily holding references to objects we want to release later.

Autorelease pools are usually managed automatically by the runtime, typically in conjunction with an event loop. The following code defines the relevant methods:

__thread static cox_autoreleasepool_t autopool = NULL;

cox_autoreleasepool_t cox_autoreleasepool_create() {
	cox_autoreleasepool_t pool = panicking_malloc(sizeof(struct cox_autoreleasepool));
	pool->count = 0;
	pool->capacity = 100;
	pool->objs = panicking_malloc(sizeof(void*) * pool->capacity);
	pool->next = NULL;
	if (autopool != NULL) autopool->next = pool;
	pool->prev = autopool;
	autopool = pool;
	return pool;
}

void cox_autoreleasepool_destroy(cox_autoreleasepool_t pool) {
	if (pool->next != NULL) cox_autoreleasepool_destroy(pool->next);
	for (int i = 0; i < pool->count; i++) cox_release(pool->objs[i]);
	free(pool->objs);
	autopool = pool->prev;
	free(pool);
}

void* cox_autorelease(void* obj) {
	if (autopool == NULL) fprintf(stderr, "no autorelease pool in place, leaking memory\n");
	else {
		if (autopool->count == autopool->capacity) {
			autopool->capacity *= 2;
			autopool->objs = panicking_realloc(autopool->objs, sizeof(void*) * autopool->capacity);
		}
		autopool->objs[autopool->count++] = obj;
	}
	return obj;
}

Thus if we want to return an object from a function, we'll autorelease it first:

cox_my_object_t my_function() {
	cox_my_object_t obj = ...;
	...
	return cox_autorelease(obj);  
}

The mechanism described above is used, for example, in libraries such as CoreFoundation and Grand Central Dispatch, in many languages such as Objective-C and Swift and in runtimes (e.g., Windows Runtime—WinRT).

Comments

Name: