First impressions of Emscripten
Thursday, June 6, 2024
I have recently been porting an SDL application to the web. It has been surprising painless, and I am very impressed by how easy it is to run arbitrary applications in the web browser nowadays.
Emscripten is basically a C and C++ compiler plus a POSIX runtime environment for the web. It uses LLVM (and Clang) internally, which has a WebAssembly backend.
Getting C or C++ programs to run in the browser is as simple as running
emcc
. And with emmake
, Emscripten can take makefiles
and get them to compile for the web, sometimes without any modification.
One thing I noticed is that by default Emscripten will generate the web
application in two files: an HTML shell page and a WebAssembly file containing
all of your real code. Due to most browsers’ security policies, you can’t open
the application directly from your local filesystem, since HTML pages shouldn’t
be able to access other files on your computer. One way to work around this is
to run a local HTTP server (e.g. python3 -m http.server
), but I
have found that a simpler solution is to pass the -sSINGLE_FILE=1
flag to emcc
in the linking step.
Overall, for SDL applications, things work fairly smoothly. I am targeting OpenGL ES 3.0 on desktop because it closely corresponds to WebGL 2.0 (which is supported by Emscripten), and so far I have not had to deal with any differences between them.
LLVM supports WebAssembly very well. C++ exceptions work like normal, and you
can even use longjmp
.
I haven’t looked much into how WebAssembly works under the hood or how it differs from “real” machine languages, but one interesting detail I learned about is how C function pointers are implemented. Bear in mind that I’m not a WebAssembly expert, so I could be mistaken about some of this.
On x86 (and all other “real” machine languages I am familiar with), function pointers are no different from ordinary pointers: they just represent a memory address that the CPU can read/write or execute. On WebAssembly, at least in the convention used by LLVM, function pointers are indices in lookup tables. The runtime then uses a set of “indirect call” functions, one for each distinct function signature, that contain these lookup tables.
Implementing function pointers this way allows for better type safety, which is probably a good idea if you’re running random code off the internet.
As a consequence of these unusual function pointers, we cannot “downcast” them like a lot of software relies on. For example, in GLib and GTK, we use function pointers to register signal handlers. We can respond to a mouse click event like so:
int main(int argc, char **argv) { // ... g_signal_connect(button, "clicked", G_CALLBACK(handle_button_clicked)); // ... } void handle_button_clicked(GtkButton *button, gpointer user_data) { show_message(); } void show_message(void) { printf("Clicked!\n"); }
The G_CALLBACK
macro can’t validate the type of
handle_button_clicked
because different events have different
event handler signatures.
Out of convenience, we might want to skip the
handle_button_clicked
function and just directly connect the
event to show_message
. E.g.:
int main(int argc, char **argv) { // ... g_signal_connect(button, "clicked", G_CALLBACK(show_message)); // ... } void show_message(void) { printf("Clicked!\n"); }
On x86, this works because the arguments are passed in rdi
and
rsi
(assuming the x86-64 System V calling convention), and these
values are safely ignored by show_message
. No problem.
On WebAssembly, however, we would be calling a different “call indirect”
function because the signature of show_message
is different from
the signature of the function pointer that gets called. As a result, we might
call some random function whose function pointer index happens to be same as
show_message
, or we might go out of bounds of the lookup table.
As you could probably guess, calling a function pointer using the wrong signature is undefined behavior, according to the ISO C standards. I don’t think compilers generally have a problem with this, though—at least until you port your program to WebAssembly.