chood.net/blog/

First impressions of Emscripten

Thursday, June 6, 2024

I have recently been porting an SDL application to the web. It has been surprising painless, and I am very impressed by how easy it is to run arbitrary applications in the web browser nowadays.

Emscripten is basically a C and C++ compiler plus a POSIX runtime environment for the web. It uses LLVM (and Clang) internally, which has a WebAssembly backend.

Getting C or C++ programs to run in the browser is as simple as running emcc. And with emmake, Emscripten can take makefiles and get them to compile for the web, sometimes without any modification.

One thing I noticed is that by default Emscripten will generate the web application in two files: an HTML shell page and a WebAssembly file containing all of your real code. Due to most browsers’ security policies, you can’t open the application directly from your local filesystem, since HTML pages shouldn’t be able to access other files on your computer. One way to work around this is to run a local HTTP server (e.g. python3 -m http.server), but I have found that a simpler solution is to pass the -sSINGLE_FILE=1 flag to emcc in the linking step.

Overall, for SDL applications, things work fairly smoothly. I am targeting OpenGL ES 3.0 on desktop because it closely corresponds to WebGL 2.0 (which is supported by Emscripten), and so far I have not had to deal with any differences between them.

LLVM supports WebAssembly very well. C++ exceptions work like normal, and you can even use longjmp.

I haven’t looked much into how WebAssembly works under the hood or how it differs from “real” machine languages, but one interesting detail I learned about is how C function pointers are implemented. Bear in mind that I’m not a WebAssembly expert, so I could be mistaken about some of this.

On x86 (and all other “real” machine languages I am familiar with), function pointers are no different from ordinary pointers: they just represent a memory address that the CPU can read/write or execute. On WebAssembly, at least in the convention used by LLVM, function pointers are indices in lookup tables. The runtime then uses a set of “indirect call” functions, one for each distinct function signature, that contain these lookup tables.

Implementing function pointers this way allows for better type safety, which is probably a good idea if you’re running random code off the internet.

As a consequence of these unusual function pointers, we cannot “downcast” them like a lot of software relies on. For example, in GLib and GTK, we use function pointers to register signal handlers. We can respond to a mouse click event like so:

int
main(int argc, char **argv)
{
	// ...
	g_signal_connect(button, "clicked", G_CALLBACK(handle_button_clicked));
	// ...
}

void
handle_button_clicked(GtkButton *button, gpointer user_data)
{
	show_message();
}

void
show_message(void)
{
	printf("Clicked!\n");
}

The G_CALLBACK macro can’t validate the type of handle_button_clicked because different events have different event handler signatures.

Out of convenience, we might want to skip the handle_button_clicked function and just directly connect the event to show_message. E.g.:

int
main(int argc, char **argv)
{
	// ...
	g_signal_connect(button, "clicked", G_CALLBACK(show_message));
	// ...
}

void
show_message(void)
{
	printf("Clicked!\n");
}

On x86, this works because the arguments are passed in rdi and rsi (assuming the x86-64 System V calling convention), and these values are safely ignored by show_message. No problem.

On WebAssembly, however, we would be calling a different “call indirect” function because the signature of show_message is different from the signature of the function pointer that gets called. As a result, we might call some random function whose function pointer index happens to be same as show_message, or we might go out of bounds of the lookup table.

As you could probably guess, calling a function pointer using the wrong signature is undefined behavior, according to the ISO C standards. I don’t think compilers generally have a problem with this, though—at least until you port your program to WebAssembly.