It’s been more than a decade since I worked in games, but for my entire games career, any and all use of C++ exceptions was strictly disallowed due to other horror stories. I was still bit in a very similar way by someone’s C++ copy constructor trickery - a crash that only happened in a release build after playing the game for a while, with a stack corruption. Like the author, for me this was one of the hardest bugs I ever had to track down, and I ended up writing a tiny release mode debugger that logged call stacks in order to do it. Once I was able to catch the corruption (after several days of debugging during a crunch weekend), someone on my team noticed the stomp values looked like floating point numbers, and pretty quickly we figured out it was coming from the matrix class trying to be too clever with it’s reference counting IIRC. There’d been a team of around a dozen people trying to track this down during overtime, so it suddenly hit me once we fixed it that someone’s cute idea that took maybe 10 seconds to write cost several tens of thousands of dollars to fix.
intelVISA 18 days ago [-]
Clever code is always expensive, either you're paying for somebody smart to work at their cognitive peak which is less productive for them than 'simpler' code, or more likely you'll instead pay multiples more down the line for someone's hubris.
I think this is the rare direction that more langs should follow Rust in that 'clever' code can be more easily quarantined for scrutiny via unsafe.
hinkley 18 days ago [-]
The time when people see this the clearest is during an outage. You’re in a stressful situation and it’s being made worse and longer by clever code. Do you see now why maybe clever code isn’t a good idea?
Cortisol actually reduces the effectiveness of working memory. Write your code like someone looking at it is already having a bad day, because not only are odds very good they will be, but that’s the costliest time for them to be looking at it. Probability x consequences.
17 days ago [-]
ziml77 18 days ago [-]
This could still happen without exceptions though, right? The flow is more explicit without exceptions, but returning an error code up the stack would have the same effect of causing the memory that select() is referencing to possibly be used for a different purpose when it writes its result.
mark_undoio 18 days ago [-]
My reading is that the problem was specifically that they were injecting an exception to recover control from the C library back to their code.
It seems like the select() was within its rights to have passed a stack allocated buffer to be written asynchronously by the kernel since it, presumably, knew it couldn't encounter any exceptions. But injecting one has broken that assumption.
If the select() implementation had returned normally with an error or was expecting then I'd assume this wouldn't have happened.
ryao 17 days ago [-]
According to the documentation, the wait is terminated by APC calls, so there is no need to trigger an exception in the APC call to return control to C++:
If they had implemented this to use a C++ exception to return control to C++, they should have encountered this stack corruption issue immediately upon implementing this, rather than have a customer some point in the future hit it before they did.
My read of this is that they had the callback function do something and someone eventually got it to throw an exception. This is undefined behavior because there is no correct way to unwind a C stack frame. However, that is not obvious, especially if you test it since if the C function does nothing special such that it needs no special clean up, everything should be fine. However, WaitForSingleObjectEx() does something extremely special. Skipping that by unwinding the stack bit them.
I filed bugs against both GCC and LLVM requesting warnings to protect people from doing this:
There are no error returns from an APC? The return type is void and the system expects the routine (whatever it is) to return: https://learn.microsoft.com/en-us/windows/win32/api/winnt/nc... - whichever call put the process in the alertable wait state then ends up returning early. This is a little bit like the Win32 analogue of POSIX signal handlers and EINTR, I suppose.
ziml77 17 days ago [-]
Oh I see my confusion. I misunderstood how the APC fit in. Now it makes sense why specifically an exception was problematic.
hinkley 18 days ago [-]
I appreciate the notion of Grug Brained development but at the end of the day it’s just a slightly sardonic restatement of Kernighan’s Law, which is easier to get people to buy into in negotiations.
Grug brain is maybe good for 1:1 interactions or over coffee with the people you vent to or are vented to.
bjornsing 18 days ago [-]
Back in the day I used to consult for Sony Ericsson. We had like 5000 engineers writing C code that ran as a single executable in a single address space(!). Memory corruption was rampant. So rampant in fact that when we finally got an MMU it took months before we could turn it on in release builds, because there were so many memory corruption bugs even in the released product. The software just wouldn’t work unless it could overwrite memory here and there.
ryao 17 days ago [-]
The Linux kernel has thousands of engineers writing C code that runs as a single executable in a single address space. Memory corruption is not rampant, since they are careful to write good code, although it can still happen. I am going to guess that Sony Ericsson had a ton of use after free bugs and array out of bounds bugs, as that is the main way that developers can step on each others' code when it shares a process space.
Speaking of which, I once did consulting for a company that insisted that they had hit a bug in ZFS because there were backtraces showing ZFS. Using KASAN, I found that the bug was in fact in another third party kernel module that they used that appeared to be the result of a hobbyist project on github. It had a string function write 1 past the end of a memory allocation, which often would corrupt memory allocations used in ZFS' AVL trees. ZFS' AVL tree code did not like this, so ZFS appeared in backtraces.
That said, the bug described by the unity blog is really an undefined behavior bug rather than a memory corruption bug. Doing an exception in a C++ callback function called by a C function is undefined since C does not support stack unwinding. If the C function does not need any special cleanup when stack unwinding is done, it will probably work fine, but in this particular case, the function needed very special cleanup and invoking undefined behavior bit them.
bjornsing 17 days ago [-]
> The Linux kernel has thousands of engineers writing C code that runs as a single executable in a single address space. Memory corruption is not rampant, since they are careful to write good code, although it can still happen.
Sure, but I’ve rarely seen quality comparable to the Linux kernel in a commercial context.
> I am going to guess that Sony Ericsson had a ton of use after free bugs and array out of bounds bugs, as that is the main way that developers can step on each others' code when it shares a process space.
Mostly yes. But there were some interesting ones too. I debugged one that turned out to be a stack overflow, for example. There was basically this giant function with ridiculously many local variables. When that function called another function the stack pointer was bumped far out of the allotted stack space and the called function overwrote stuff when writing to local variables.
farmdve 18 days ago [-]
I remember back in the day of the Sony Satio U1, the last Symbian v5 phone, the software was horrendous(screen tearing, random OS freezes) and later, the phone abandoned. I think it was afterwards that Sony and Ericsson split?
> What is despair? I have known it—hear my song. Despair is
when you’re debugging a kernel driver and you look at a memory dump and you see that a pointer has a value of 7. THERE IS
NO HARDWARE ARCHITECTURE THAT IS ALIGNED ON
7. Furthermore, 7 IS TOO SMALL AND ONLY EVIL CODE
WOULD TRY TO ACCESS SMALL NUMBER MEMORY.
Misaligned, small-number memory accesses have stolen
decades from my life.
I don’t know if it’s still a thing but there used to be debugging tools that would put a page of memory marked as either read only or unreadable in front of every malloc call so that any pointer arithmetic with a math error would trigger a page fault which could be debugged. It worked in apps that didn’t use too much of total memory or too many fine grained allocations. I mean obviously turning every 8 byte pointer into a whole memory page could consume all of memory very quickly. But in front of arrays or large data structures that could work.
saagarjha 17 days ago [-]
In this case the write bypassed page protections
dwattttt 17 days ago [-]
It shouldn't bypass page protections, that would be a kernel bug. And quite a bit harder to achieve too, since the kernel would still be using the same virtual address mapping as user space there.
saagarjha 17 days ago [-]
They made the pages read-only for themselves; the kernel has the ability to write through that.
dwattttt 17 days ago [-]
Sure, but only by either changing the page permissions on the page that virtual address is on, or by remapping it elsewhere as writable; both of those are heavy handed operations, and neither would be in the "report the result of an IO operation" path.
The user mapping may be read only, but the kernel will likely use other writable mappings to the same page.
Linux for example maintains a big writable linear mapping of all RAM at all times (on 64-bit), you can corrupt read-only user pages through it all day and never fault. Code running in the kernel generlly uses virtual addresses from that mapping.
dwattttt 16 days ago [-]
By policy, asking the Windows kernel to write to an address happens within an SEH try/catch, and a failure will result in the kernel returning an error code.
The kernel has no reason to be trying to bypass page protections; what if you asked it to write to a shared read-only page?
jcalvinowens 16 days ago [-]
Most kernel code will be dealing with a different set of virtual addresses, which map the entire address space read/write. Those addresses necessarily alias the physical pages backing any userspace mapping, and thus allow corrupting them e.g. via buffer overflows beyond page boundaries.
The try-catch dance you're describing is only necessary for accessing the userspace mapping (because it might fault). Kernel code which dereferences kernel pointers doesn't do that.
Pages don't get remapped into userspace: userspace gets an additional aliased mapping with restricted permissions. The kernel's writable mapping still exists. There's nothing to "bypass", what permissions userspace applies its user mapping of the same physical page has no effect on the kernel mapping.
dwattttt 16 days ago [-]
I fear we may have gotten away from the Windows kernel writing an output value asynchronously to a user space address on IO completion.
lupire 18 days ago [-]
I don't understand. Pointers aren't numbers, and can only be compared when inside a common array. What is small number memory?
:-)
MobiusHorizons 17 days ago [-]
I realize you are probably referring to UB in c/c++, but of course in hardware memory addresses are numbers. And when debugging, it’s really the hardware version of events that matters, since the compiler has already done whatever optimizations it wants.
ryao 17 days ago [-]
Pointers are numbers representing memory addresses. This is very obvious if you look at the definition of NULL in C. It is:
#define NULL ((void *)0)
As of C99, C also has uintptr_t, which lets you treat pointers as integers.
moomin 17 days ago [-]
I mean, that’s horribly misleading. There’s no guarantee that “zero” is actually an int zero. (Although I’m pretty sure it is on Intel and ARM.)
jart 18 days ago [-]
This kind of error is a right of passage with WIN32 programming. For example, to do nontrivial i/o on Windows you have to create an OVERLAPPED object and give it to ReadFile() and WriteFile() which will return a pending status code, and write back to your OVERLAPPED object once the i/o has completed. Usually it makes the most sense to put that object on the stack. So if you return from your function without making sure WIN32 is done with that object, you're going to end up with bugs like this one. You have to call GetOverlappedResult() to do that. That means no throwing or returning until you do. Even if you call CancelIoEx() beforehand, you still need to call the result function. When you mix all that up with your WaitForMultipleObjects() call, it ends up being a whole lot of if statements you could easily get wrong if the ritual isn't burned into your brain.
UNIX system calls never do this. The kernel won't keep references to pointers you pass them and write to them later. It just isn't in the DNA. The only exceptions I can think of would be clone(), which is abstracted by the POSIX threads runtime, and Windows-inspired non-standard event i/o system calls like epoll.
dataflow 18 days ago [-]
> UNIX system calls never do this. The kernel won't keep references to pointers you pass them and write to them later. It just isn't in the DNA. The only exceptions I can think of would be clone(), which is abstracted by the POSIX threads runtime, and Windows-inspired non-standard system calls like epoll.
I mean, this is because the UNIX model was based on readiness rather than completion. Which is slower. Hence the newer I/O models.
cryptonector 17 days ago [-]
There are evented I/O APIs for Unix, though anything other than select(2) and poll(2) is non-standard, and while some do let you use pointer-sized user cookies to identify the events I've never seen a case where a programmer used a stack address as a cookie. I have seen cases where the address used as the cookie was freed before the event registration was deleted, or before it fired, leading to use-after-free bugs.
jart 18 days ago [-]
System calls take 10x longer on Windows than they do on UNIX.
That's what i/o completion ports are working around.
They solve a problem UNIX doesn't have.
dataflow 17 days ago [-]
No, IOCP is not a workaround for syscall overhead. In fact this performance hit has nothing to do with syscall overhead. The overhead is O(1). The penalty of the readiness design is O(n). Because if the system can't copy into your n-byte buffer in the background, then you gotta block for O(n) time to memcpy it yourself.
dataflow 16 days ago [-]
What is the downvote here supposed to mean? Do people think this is wrong, or just not like it?
saagarjha 17 days ago [-]
You do realize io_uring exists for a reason, right?
That said, it would be insane to pass it pointers to the stack, as the only safe way to make that work would be to do `aio_suspend()` or busy wait and you might as well just use a synchronous read() function in that case.
dataflow 17 days ago [-]
> as the only safe way to make that work would be to do `aio_suspend()` or busy wait and you might as well just use a synchronous read() function in that case.
Wait but read() wouldn't allow overlapping operations. Whereas if you suspend or busy wait you can do that for multiple operations executing concurrently.
Also if the buffer is from a caller frame then you could also return safely no?
alexvitkov 18 days ago [-]
> The project was quite big (although far from the largest ones); it took 40 minutes to build on my machine.
A bit tangential, but I've been crying about the insane Unity project build times for years now, and about how they've taken zero steps to fix them and are instead trying their hardest to sell you cloud builds. Glad to see them having to suffer through what they're inflicting on us for once!
Regardless, very good writeup, and yet another reason to never ever under any conditions use exceptions.
yard2010 18 days ago [-]
This poor human being doesn't deserve to pay the price for the shitty middle management actions though :(
voidUpdate 17 days ago [-]
hey, at least its not unreal, where when you build it sits there compiling tens of thousands of shaders
noitpmeder 18 days ago [-]
Would a ccache or similar help alleviate the pain?
17 days ago [-]
rectang 18 days ago [-]
After trying and failing over several days to track down a squirrely segfault in a C project about 15 years ago, I taught myself Valgrind in order to debug the issue.
Valgrind flagged an "invalid write", which I eventually hunted down as a fencepost error in a dependency which overwrote their allocated stack array by one byte. I recall that it wrote "1" rather than "2", though, haha.
> Lesson learnt, folks: do not throw exceptions out of asynchronous procedures if you’re inside a system call!
The author's debugging skills are impressive and significantly better than mine, but I find this an unsatisfying takeaway. I yearn for a systemic approach to either prevent such issues altogether or to make them less difficult to troubleshoot. The general solution is to move away from C/C++ to memory safe languages whenever possible, but such choices are of course not always realistic.
With my project, I started running most of the test suite under Valgrind periodically. That took took half an hour to finish rather than a few seconds, but it caught many similar memory corruption issues over the next few years.
ryao 17 days ago [-]
Switching to memory safe languages would not necessarily prevent this issue because this is an undefined behavior issue, not a memory safety issue.
Pointers to C++ functions that can throw exceptions should not be passed to C functions as callback pointers. Executing the exception in the callback context is undefined behavior, since C does not support stack unwinding.
Presumably, any language that has exception handling would have an issue on Windows when doing select() in one thread and QueueUserAPC() in another thread to interrupt it with a callback function that throws an exception. What happens then depends on how the stack unwinding works.
No programming language can avoid this because they need to use the underlying operating system's system calls in order to function. They also need to use the C functions in ntdll to do system calls since Microsoft does not support doing system calls outside of calling the corresponding functions in ntdll and friends.
pjmlp 18 days ago [-]
Similar experience, spending one week debugging memory corruption issues in production back in 2000, with the customer service pinging our team every couple of hours, due to it being on an high profile customer, has been my lesson.
ch33zer 18 days ago [-]
Doesn't C++ already support everything you need here? It supports the noexcept keyword which should have been used in the interface to this syscall. That would have prevented throwing callbacks from being used at compile time. My guess is that this is a much older syscall than noexcept though.
masspro 17 days ago [-]
noexcept doesn’t prevent any throws at compile-time, it basically just wraps the function in a `catch(...)` block that will call std::terminate, like a failed assert. IMHO it is a stupid feature for this very confusion.
ch33zer 17 days ago [-]
This was true until c++17. It was changed in 17 to make noexcept part of the function type meaning a noexcept(false) function can't be used in a context where a noexcept is needed as they're unrelated types. I don't know if compilers actually implement this but according to the standard it should be usable.
Yes this helps specifically when passing functions as pointers or something like std::function (edit: or overriding methods), it will at least inform the developer that they need to add noexcept to the function declaration if they want to use it there, and hopefully due to that they recursively audit the function body and anything it calls for exceptions. And hopefully all future developers also notice the noexcept and keep up the practice. But it changes nothing about checking plain function calls. So I think adding this to the function type helps some cases but still does not move noexcept toward the behavior most people want/expect.
This just feels important to point out because this feature is 15 years old and still commonly misunderstood, and each time people are wanting the same thing (actual compile-time prevention of `throw`) which it is not.
Edit: OK I finally just went and tried it on godbolt.org. C++17 GCC, Clang, and MSVC all give 1 warning on this code for `bar` and that's all.
I think the PAPCFUNC type needs to have noexcept. Wrapping a function typedef in extern "C" does not make it imply noexcept IIRC.
It would also help if the APC docs documented that APCs must not throw.
ryao 17 days ago [-]
It is a C interface. It is implicitly noexcept. I filed bugs against both GCC and LLVM requesting warnings when someone passes a non-noexcept C++ function pointer to either a C function or to a C++ noexcept function:
Perhaps the C++ standards committee should specify that doing this should cause a compiler failure, rather than a warning.
cryptonector 17 days ago [-]
Thanks for checking, clarifying, and filing those bugs!
ch33zer 17 days ago [-]
This would actually be a nice change (but probably very breaking) for c interfaces called from a c++ context to be implicitly noexcept.
saagarjha 17 days ago [-]
Valgrind is neat but it wouldn’t help here, unfortunately
jart 18 days ago [-]
No the solution isn't to rewrite it in Rust. The solution is to have the option of compiling your C/C++ program with memory safety whenever things go loopy. ASAN, MSAN, and UBSAN are one great way to do that. Another up and coming solution that promises even more memory safety is Fil-C which is being made by Epic Games. https://github.com/pizlonator/llvm-project-deluge/blob/delug...
AlotOfReading 18 days ago [-]
Ubsan is fantastic, but ASAN and the rest have serious caveats. They're not suitable for production use and they have a tendency to break in mysterious, intermittent ways. For example, Ubuntu 24.04 unknowingly broke Clang <=15ish when it increased mmap_rnd_bits. ASAN on Windows will actually check if you have ASLR enabled, disable it, and restart at entry. They interact in fun ways with LD_PRELOAD too.
james_promoted 17 days ago [-]
I'm on Clang 19 and still have a bunch of those sysctl commands sitting around.
AlotOfReading 17 days ago [-]
I'm not in a position to look up exactly when it was merged, but I'm pretty confident that shouldn't be needed anymore. The entry point on 19 should do the same restart juggling it does on Windows if the environment isn't correct for some other reason. I can double check later if you want to provide details.
I encountered the issue when our (not Ubuntu, not 24.04) LTS upstream backported security fixes that included the mmap changes without updating universe to include a clang version with the fixes. Any developers diligent enough to update and run sanitisers locally started seeing intermittent crashes.
kelnos 18 days ago [-]
The solution is usually not to do a rewrite, but I think for greenfield projects we should stop using C or C++ unless there is a compelling reason to do so. Memory-safe systems languages are available today; IMO it's professionally irresponsible to not use them, without a good reason.
MSAN, ASAN, and UBSAN are great tools that have saved me a lot of time and headaches, but they don't catch everything that the compiler of a memory safe language can, at least not today.
jart 18 days ago [-]
Rust isn't standardized. Last time I checked, everyone who uses it depends on its nightly build. Their toolchain is enormous and isn't vendorable. The binaries it builds are quite large. Programs take a very long time to compile. You need to depend on web servers to do your development and use lots of third party libraries maintained by people you've never heard of, because Rust adopted NodeJS' quadratic dependency model. Choosing Rust will greatly limit your audience if you're doing an open source project, since your users need to install Rust to build your program, and there are many platforms Rust doesn't support.
Rust programs use unsafe a lot in practice. One of the greatest difficulties I've had in supporting Rust with Cosmopolitan Libc is that Rust libraries all try to be clever by using raw assembly system calls rather than using libc. So our Rust binaries will break mysteriously when I run them on other OSes. Everyone who does AI or scientific computing with Rust, if you profile their programs, I guarantee you 99% of the time it's going to be inside C/C++ code. If better C/C++ tools can give us memory safety, then how much difference does it really make if it's baked into the language syntax. Rust can't prove everything at compile time.
Some of the Rust programs I've used like Alacrity will runtime panic all the time. Because the language doesn't actually save you. What saves you is smart people spending 5+ years hammering out all the bugs. That's why old tools we depend on every day like GNU programs never crash and their memory bugs are rare enough to be newsworthy. The Rust community has a reputation for toxic behavior that raises questions about its the reliability of its governance. Rust evangelizes its ideas by attacking other languages and socially ostracizing the developers who use them. Software development is the process of manipulating memory, so do you really want to be relinquishing control over your memory to these kinds of people?
fc417fc802 17 days ago [-]
> Not standardized.
> Dependency soup.
Exactly why I don't use it. I don't really feel like including the source for the entire toolchain as part of my project and building it all myself. At least if I write standards conforming C++ there are multiple compiler implementations that can all handle it. I also have a reasonable expectation that a few decades from now I will be able to `apt get somecompiler` and the code will still just work (aside from any API changes at the OS level, for which compatibility shims will almost certainly exist).
If I can't build something starting from a repo in a network isolated environment then I want absolutely nothing to do with it. (Emscripten I am looking at you. I will not be downloading sketchy binary blobs from cloud storage to "build from source" that is not a source build that is binary distribution you liars.)
saagarjha 17 days ago [-]
> One of the greatest difficulties I've had in supporting Rust with Cosmopolitan Libc is that Rust libraries all try to be clever by using raw assembly system calls rather than using libc.
I’m sorry, this is coming from Justine “the magic syscall numbers are my god given right to use” Tunney?
jart 17 days ago [-]
It's the stated mission of my project. Rust's stated mission is memory safety.
So using raw system calls is a distraction for them.
fc417fc802 17 days ago [-]
Seems like it depends entirely on context. I'd expect code which intends to be portable to use some sort of dynamically linked wrapper, even if that wrapper isn't libc.
vvanders 17 days ago [-]
You may want to refresh your familiarity with Rust, I haven't touched nightly in ages and much of what you mention doesn't really resonate with what I've seen in practice. Not saying the language doesn't have issues and things that aren't frustrating but in my experience unless you're going to go to the nines in testing/validation/etc (which is the first thing that's cut when schedules/etc are in peril) I've seen Rust code scale better than C++ ever did.
More tools in the C/C++ realm are always welcome but I've yet to see more than 50% of projects I've worked on be able to successfully use ASAN(assuming you've got the time to burn to configure them and all their dependencies properly). I've used ASAN, CBMC and other tools to good effect but find Rust more productive overall.
ryao 17 days ago [-]
That would not catch this. This is technically undefined behavior, but UBSAN does not have a check for this:
The problem boils down to usage of stack memory after the memory is given to somebody else.
musjleman 18 days ago [-]
> The problem boils down to usage of stack memory after the memory is given to somebody else.
While this isn't incorrect in this case the problem seems to be caused by stack unwinding without the consent of lower frames rather than a willful bug where the callee forgets about the ownership.
layer8 18 days ago [-]
Yes, it’s the consequence of throwing exceptions through exception-unaware code, which is a problem when said code needs to perform some cleanup logic before returning, like releasing resources.
bialpio 17 days ago [-]
WDYM? The root cause is "you passed ownership to stack-based memory to the kernel and didn't ensure it's valid when it called you back", why would "consent of lower frames" matter here? Exceptions (where lower frames matter) hid the control flow here, but that's one way to reach this situation (early return is another way, as shown by Raymond Chen's post).
musjleman 17 days ago [-]
> WDYM? The root cause is "you passed ownership to stack-based memory to the kernel and didn't ensure it's valid when it called you back", why would "consent of lower frames" matter here?
There is no "called back" in this case. The APC was executed by the sleep and corrupted the stack by unwinding across the C winsock code without any cleanup. It never returned.
The user-mode enters an "alertable" wait which allows an asynchronous procedure (APC) to interrupt it and execute code. Instead of returning the APC causes an exception, unwinds the stack across the APC delivery and ends up executing some random code instead of returning to the winapi code that called wait(alertable: true) in a loop. So the code that was supposed to be synchronous because of while(!completed) wait(); suddenly is broken out of the loop without actually being completed.
> Exceptions (where lower frames matter) hid the control flow here, but that's one way to reach this situation (early return is another way, as shown by Raymond Chen's post).
This isn't just hiding the control flow here. It's control flow that shouldn't have existed in the first place. It walks across the boundary of the windows APC dispatcher. Unity folks needed to go out of their way to make this "work" in the first place because using c++ exceptions and standard library threads this wouldn't work.
saagarjha 18 days ago [-]
Scary. I assume standard memory corruption detection tools would also have trouble finding this, as the write is coming from outside the application itself…
loeg 18 days ago [-]
Yeah. Not tripping a page fault on modifying readonly (userspace) pages both makes it hard for userspace tools but also paints a pretty specific picture of where the write is coming from.
I'm actually not sure if Linux would handle this in the same way or what. Plausibly it sees the same leaf page tables as user space, trips a fault, and doesn't scribble the pages anyway. Maybe Windows translates the user-provided virtual address to a physical address (or other kernel mapping that happens to have write permission) upon registration.
cryptonector 17 days ago [-]
> The fix was pretty straightforward: instead of using QueueUserAPC(), we now create a loopback socket to which we send a byte any time we need to interrupt select().
This is an absolutely standard trick that is known as "the self-pipe trick". I believe DJB created and named it. It is used for turning APIs not based on file handles/descriptors into events for an event loop based on file handles/descriptors, especially for turning signals into events for select(2)/poll(2)/epoll/kqueue/...
ptsneves 17 days ago [-]
Agree. Maybe This is obvious for devs of OSes that are file (descriptor) centric like Linux and POSIX.
Also that pattern is a bit too sweet as it can be a nice way to create condition variables or queues, the problem being one is now paying syscall overhead.
cryptonector 17 days ago [-]
It works with file handles too.
> Also that pattern is a bit too sweet as it can be a nice way to create condition variables or queues, the problem being one is now paying syscall overhead.
Sure but you don't need to, and even if you did, the system call overhead is probably not the hill you're dying on, or if it is then you want io_uring or similar.
hrtk 18 days ago [-]
I don’t know windows programming and this was a very interesting (nightmare-ish) post.
Very interesting insights about low level programming in general
rramadass 18 days ago [-]
That's a pretty good use case for ChatGPT. Do you do this often? And if so, are your results specific to debugging consistently good?
hrtk 15 days ago [-]
Yes, I complement GPT answers with Google search for original documentations just to be sure. Results do contain hallucinations at times but can be easily verified.
lionkor 18 days ago [-]
Very wild bug. I feel like this is some kind of a worst-case "exceptions bad" lesson, but I've only been doing systems level programming for a couple of years so I'm probably talking out my ass.
wat10000 18 days ago [-]
This experienced systems programmer agrees with you 100%. This is an exceptionally bad case, but even in more normal circumstances, C++ exceptions are outrageously dangerous. Correct behavior requires the cooperation of not just the thrower and catcher, but everything in between. And there are basically no guardrails to enforce that. If you throw through C++ code that hasn’t been made exception safe, it just goes. You can throw through code written in plain C, which doesn’t even have exceptions.
It’s probably feasible to use them if you draw a tight boundary around the exception-using code, use RAII without fail inside that boundary to ensure everything cleans up properly, and make sure all code that might be called from the outside has try/catch blocks. (And, obviously, don’t trigger async calls to throw from the middle of someone else’s function!).
I find it a lot easier to avoid them entirely. Error handling is a little more annoying, but it’s worth it.
ryao 17 days ago [-]
To this day, I still do not understand why the C++ language designers are infaturated with exceptions. They are more trouble than they are worth. It is as if they looked at people doing horrific things to implement exception handling with C's setjmp/longjmp and said "Lets make this a language feature with more structure".
usrnm 18 days ago [-]
I don't think that the lesson here is "exceptions are bad", the same kind of bug can be easily made without using exceptions.
IX-103 18 days ago [-]
I'm not so sure. The bug was that when an exception occurred while select was blocked then select did not properly clean up after itself. But no code in select actually dealt with exceptions at all, so handling it doesn't really make sense.
Without exceptions the possible control flow is entirely explicit. It would have at least been obvious that cleanup wasn't properly handled in the select function for all cases.
fc417fc802 17 days ago [-]
> an exception occurred
An exception was effectively injected from outside of the code via low level shenanigans. That's not "exceptions bad" that's "low level monkeying with program control flow can blow up in your face".
usrnm 18 days ago [-]
Another thing to note is that exactly the same bug can be made in Rust or go, both of which officially don't have exceptions. They both, of course, do have exceptions and just call them a different name.
LegionMammal978 18 days ago [-]
As it happens, Rust 1.81 recently added a feature where it aborts by default when you attempt to unwind from a function with the "C" ABI [0], which mitigates this issue for most FFI use cases.
Of course, it's not uncommon to find unsafe Rust code that misbehaves badly when something panics, which is yet another of its hazards that I wish were better documented.
In this case, I'd put down the lesson as "exceptions in C++ are very dangerous if you're coming out of C code", since they can cause undocumented and unexpected behavior that ultimately led to the use-after-return issue here.
It can also result in logic errors if objects are used after their methods panic, but such usage is generally not expected to work in the first place.
samatman 18 days ago [-]
Exceptions aren't bad because of the name. Exceptions are bad because lower stack frames unwind higher ones, without giving those frames any opportunity to do something sane with any captured resources. Calling it panic/recover doesn't help anything.
danaris 18 days ago [-]
WSPSelect: 'Twas I who wrote "2" to your stack! And I would've gotten away with it, too, if it weren't for that meddling kernel debugger!
greenbit 18 days ago [-]
.. and their hardware breakpoint!
cyberax 18 days ago [-]
A small request: please stop using automatic translation for blog posts or documentation.
Especially when I still have English set as the second priority language.
AshleysBrain 18 days ago [-]
Would memory safe languages avoid these kinds of problems? It seems like a good example of a nightmare bug from memory corruption - 5 days to fix and the author alludes to it keeping them up at night is a pretty strong motivation to avoid memory unsafety IMO.
layer8 18 days ago [-]
Depends. The underlying issue for this bug is that the code involved crosses language boundaries (the Windows kernel and win32 libraries written in C and the application in C++). The code where the lifetime failure occurs is Windows code, not application code. However, the Windows code is correct in the context of the C language. The error is caused by an APC that calls exception-throwing C++ code, being pushed onto the waiting-in-C thread. This is a case of language-agnostic OS mechanisms conflicting with language-specific stack unwinding mechanisms.
This could only be made safe by the OS somehow imposing safety mechanisms on the binary level, or by wrapping all OS APIs into APIs of the safe language, where the wrappers have to take care to ensure both the guarantees implied by the language and the assumptions made by the OS APIs. (Writing the OS itself in a memory-safe language isn’t sufficient, for one because it very likely will still require some amount of “unsafe” code, and furthermore because one would still want to allow applications written in a different language, which even if it also is memory-safe, would need memory-correct wrappers/adapters.)
This is similar to the distinction between memory-safe languages like Rust where the safety is established primarily on the source level, not on the binary level, and memory-safe runtimes like the CLR (.NET) and the JVM.
jpc0 18 days ago [-]
> the Windows kernel and win32 libraries written in C and the application in C++
To my knowledge the kernel and win32 is in fact written in C++ and only the interface has C linkage and follows C norms.
So this error occurred going C++ > C > C++ never mind languages with different memory protection mechanisms like Rust > C > C++.
layer8 18 days ago [-]
It’s an unholy combination of C, C++, and Microsoft extensions at worst. But apart possibly from some COM-related DLLs, the spirit is clearly C, and C++ exceptions are generally not expected. (There may be use of SEH in some parts.)
Of course, you can write C++ without exception safety too, but “C++ as a better C” and exception-safe C++ are effectively like two different languages.
ryao 17 days ago [-]
I filed bugs against both GCC and LLVM asking for compiler warnings that would inform developers of the risk:
I believe it's C++, but not allowed to use exceptions.
rurban 17 days ago [-]
We know that it's pure C, because it leaked.
cryptonector 17 days ago [-]
All of it is C?
protomolecule 17 days ago [-]
>The error is caused by an APC that calls exception-throwing C++ code
The article doesn't say it was a C++ exception. Could've been a SEH exception.
saagarjha 18 days ago [-]
No*. This is one of the bugs that traditional memory safety would not fix, because the issue crosses privilege boundaries in a way that the language can't protect against.
*This could, in theory, be caught by fancy hardware strategies like capabilities. But those are somewhat more esoteric.
kibwen 18 days ago [-]
To elaborate, the problem here is that it looks like the OS API itself is fundamentally unsafe: it's taking a pointer to a memory location and then blindly writing into it, expecting that it's still valid without actually doing any sort of verification. You could imagine an OS providing a safe API instead (with possible performance implications depending on the exact approach used), and if your OS API was written in e.g. Rust then this unsafe version of the API would be marked as `unsafe` with the documented invariant "the caller must ensure that the pointer remains valid".
ryao 17 days ago [-]
They passed a function that throws an exception to a C ABI function. C ABI functions cannot tolerate exceptions because C does not support stack unwinding. It might work anyway, but it is technically undefined behavior and it will only ever work when simply deallocating what is on the stack does not require any cleanup elsewhere.
The exception caused the stack frame to disappear before the OS kernel was done with it. Presumably, the timeout would have been properly handled had the stack not been unwound by the exception. If it had not, that would be a bug in Windows.
There is a conceptually simple solution to this issue, which is to have the C++ compiler issue a warning when a programmer does this. I filed bug reports against both GCC and LLVM asking for one:
Seeing as rust has no stable ABI and likely never will. How would you provide the API in rust, also in golang, also in .NET, and swift, and Java, and whatever other language you add without doing exactly what Win32 does and go to C which has a stable ABI to tie into all those other languages?
pornel 18 days ago [-]
Rust ecosystem solves that by providing packages that are thin wrappers around underlying APIs. It's very similar to providing an .h file with extra type information, except it's an .rs file.
Correctness of the Rust wrapper can't be checked by the compiler, just like correctness of C headers is unchecked, and it just has to match the actual underlying ABI.
The task of making a safe API wrapper can be relatively simple, because you don't have to take into consideration safety an application as a whole, you only need to translate requirements of individual APIs to Rust's safety requirements, function by function. In this case you would need to be aware that the function call may unwind, so whether someone making a dedicated safe API for it would think of it or not, is only a speculation.
jpc0 18 days ago [-]
I seem to remember a linux kernel dev quiting and not being able to specify exactly what you say this wrapper should abide by as being a contributing factor.
If those specifications were written down clearly enough then this dev wouldn't have needed to spend 5 days debugging this since he spent a significant amount of time reading the documentation to find any errors they are making that is mentioned in the documentation.
And don't say that they can actually just read the rust code and check that since well, I can't read low level rust code and how any of the annotations ca interact with each other.
A single line of rust code could easily need several paragraphs of written documentation so that someone not familier with what rust is specifying will actually understand what that entails.
This is part of why Rust is difficult, you have to nail down the specification and a small change to the specification causes broad changes to the codebase. The same might need to happen in C, but many times it doesn't.
pornel 16 days ago [-]
That Linux drama was due to "nontechnical nonsense" of maintainers refusing to document their APIs requirements.
In C you can have a function that returns a pointer, and get no information how long that pointer is valid for, what is responsible for freeing it, whether it's safe to use from another thread.
That's not only an obstacle for making a safe Rust API for it, that's also a problem for C programmers who don't want to just wing it and hope it won't crash.
The benefit of safe wrappers is that as a downstream user you don't need to manually check their requirements. They're encoded in the type system that the compiler checks for you. If it compiles, it's safe by Rust's definition. The safety rules are universal for all of Rust, which also makes it easier to understand the requirements, because they're not custom for each library or syscall. The wrappers boil it down to Rust's references, lifetimes, guards, markers, etc. that work the same everywhere.
jpc0 16 days ago [-]
> ... you only need to translate requirements of individual APIs to Rust's safety requirements...
> That Linux drama was due to "nontechnical nonsense" of maintainers refusing to document their APIs requirements.
> If those specifications were written down clearly enough then this dev wouldn't have needed to spend 5 days debugging...
wat10000 18 days ago [-]
What would this safe API look like? The only thing I can think of would be to have the kernel allocate memory in the process and return that pointer, rather than having the caller provide a buffer. Performance would be painful. Is there a faster way that preserves safety?
LorenPechtel 18 days ago [-]
No allocation--it returns the address of a buffer in a pool. Of course this permits a resource leak. It's a problem with no real solution.
quotemstr 18 days ago [-]
Safe code definitely won't have this sort of problem. Any code that could invoke a system call to scribble on arbitrary memory is by definition unsafe.
saagarjha 18 days ago [-]
That's basically all code
quotemstr 18 days ago [-]
No it isn't. You can write safe file IO in Rust despite the read and write system calls being unsafe.
saagarjha 18 days ago [-]
I take it you are not familiar with the classic Rust meme of opening /proc/self/mem and using it to completely wreck your program?
IshKebab 18 days ago [-]
That's obviously outside the scope of the language's safety model, and it would be quite hard to do that accidentally.
saagarjha 18 days ago [-]
That is exactly my point, though: system calls are completely outside the scope of a language's safety model. You can say, well /proc/self/mem is stupid (it is) and our file wrappers for read and write are safe (…most languages have at least one), but the fundamental problem remains that you can't just expect to make system calls without that being implicitly unsafe. In the extreme the syscall itself cannot be done safely, with no possible safe wrapper around it. My point is that if you are calling these Windows APIs you can't do it safely from any language; Rust won't magically start yelling at you that the kernel still expects you to keep the buffer alive. You can design your own wrapper around it and try to match the kernel's requirements but you can do that in a lot of languages, and that's kind of missing the point.
loeg 18 days ago [-]
Right. And of course, it's not just Windows. For example the Linux syscall aio_read() similarly registers a user address with the kernel for later, asynchronous writing (by the kernel). (And I'm sure you get similar lifetime issues with io_uring operations.)
ryao 17 days ago [-]
While I am not aware of a Linux syscall that would be equivalent to QueueUserAPC() to allow this to happen, the kernel writing to stack memory is not the problem here. The problem is that a C++ exception was invoked and it unwound a C stack frame. C++ exceptions that unwind C stack frames invoke undefined behavior, so the real solution is to avoid passing function pointers to C++ functions not marked noexcept to C functions as callbacks. It is rather unusual that Windows permits execution on the thread while the kernel is supposed to give it a return value. Writing to the stack is not how I would expect a return value to be passed. Presumably, had the stack frame not been unwound, things would have been fine, unless there is a horrific bug in Windows that should have been obvious when QueueUserAPC() was first implemented.
Anyway, it is a shame that the compiler does not issue a warning when you do this. I filed bug reports with both GCC and LLVM requesting that they issue warnings, which should be able to avoid this mess if the compilers issue them and developers heed them:
The bug was not because a system call was involved. It was a multi threaded lifetime issue which is completely withing Rust's safety model.
To put it another way, you can design a safe wrapper around this in Rust, but you can't in C++.
saagarjha 18 days ago [-]
No. The kernel has no idea what your lifetimes are. There’s nothing stopping a buggy Rust implementation from handing out a pointer for the syscall (…an unsafe operation!) and then accidentally dropping the owner. To userspace there are no more references and this code is fine. The problem is the kernel doesn’t care what you think, and it has a blank check to write where it wants.
IshKebab 18 days ago [-]
That's no different to FFI with any C code. There's nothing unique to this being a kernel or a syscall. There are plenty of C libraries that behave in a similar way and can be safely wrapped with Rust by adding the lifetime requirements.
fc417fc802 17 days ago [-]
> can be safely wrapped with Rust
They can't. Rust can't verify the safety of the called code once you cross the language boundary. Handing out the pointer is inherently unsafe.
In the user space FFI case at least you might be able to switch to an implementation written in the same (memory safe) language that you are already using. Not so for a syscall.
IshKebab 17 days ago [-]
Rust can't verify the correctness of the kernel code, but the problem here wasn't incorrect kernel code!
The problem was that the C API exposed by the kernel did not encode lifetime requirements, so they were accidentally violated. Rust APIs (including ones that wrap C interfaces) can encode lifetime requirements, so you get compile time errors if you screw it up.
I don't think you can win this argument by saying "but you have to use `unsafe` to write the Rust wrapper". That's obviously unavoidable.
ryao 17 days ago [-]
There was no problem with lifetime requirements. The problem was that a pointer to a C++ function that could throw exceptions was passed to a C function. This is undefined behavior because C does not support stack unwinding. If the C function's stack frame has no special for how it is deallocated, then simply deallocating the stack frame will work fine, despite this being undefined behavior. In this case, the C function had very specail requirements for being deallocated, so the undefined behavior became stack corruption.
As others have mentioned, this same issue could happen in Rust until very recently. As of Rust 1.81.0, Rust will abort instead of unwinding C stack frames:
Once the compilers begin emitting warnings, this should not be an issue anymore as long as developers heed the warnings.
18 days ago [-]
muststopmyths 18 days ago [-]
In this specific type of Win32 API case, I can think of a way to make this safe.
It would involve looking at the function pointer in QueueUserAPC and making sure the function being called doesn't mess with the stack frame being executed on.
This function will run in the context of the called thread, in that thread's stack. NOT in the calling thread.
It's a weird execution mode where you're allowed to hijack a blocked thread and run some code in its context.
Don't know enough about Rust or the like to say if that's something that could be done in the language with attributes/annotations for a function, but it seems plausible.
loeg 18 days ago [-]
Perhaps simpler would be to just not unwind C++ exceptions through non-C++ stack frames and abort instead. (You'd run into these crashes at development time, debugging them would be pretty obvious, and it'd never release like this.) This might not be viable on Windows, though, where there is a lot of both C++ and legacy C code.
charrondev 18 days ago [-]
As I understand this was recent stabilized in rust and is now the default behaviour.
You have to explicitly opt into unwinding like this now otherwise the program will abort.
ryao 17 days ago [-]
Another possibility is to avoid it in the first place by not allowing C++ function pointers that are not marked noexcept to be passed to C functions. I filed bugs against both GCC and LLVM requesting warnings:
If/when they are implemented, they will become errors with -Werror.
loeg 17 days ago [-]
Doesn't seem all that useful unless C++ compilers will start warning about noexcept functions calling exception-throwing functions -- they don't today: https://godbolt.org/z/4qbcbxaET .
ryao 17 days ago [-]
That is supposed to be handled at runtime:
> Whenever an exception is thrown and the search for a handler ([except.handle]) encounters the outermost block of a function with a non-throwing exception specification, the function std :: terminate is invoked ([except.terminate])
If it is not, then there is a bug in the C++ implementation.
loeg 17 days ago [-]
Catching it at runtime somewhat defeats the benefit of your approach upthread:
> Another possibility is to avoid it in the first place by not allowing C++ function pointers that are not marked noexcept to be passed to C functions.
ryao 16 days ago [-]
The two would combine to avoid situations where people spend 5 days debugging like unity did.
That said, my personal preference is to use C instead of C++, which avoids the issues of exceptions breaking kernel expectations entirely.
17 days ago [-]
LegionMammal978 18 days ago [-]
Nothing in C can prevent your function from being abnormally unwound through (whether it's via C++ exceptions or via C longjmp()). The only real fix is "don't use C++ exceptions unless you're 100% sure that the code in between is exception-safe (and don't use C longjmp() at all outside of controlled scenarios)".
ryao 17 days ago [-]
A better fix is to avoid passing pointers to C++ functions that can throw exceptions to C functions. This theoretically can be enforced by the compiler by requiring the C++ function pointers be marked noexcept.
I filed bugs against both GCC and LLVM requesting warnings:
No. The problem was in the architecture of the asynchronous api w.r.t. the kernel. The last line of the article states; Lesson learnt, folks: do not throw exceptions out of asynchronous procedures if you’re inside a system call!
LorenPechtel 18 days ago [-]
More generally:
1) The top level of an async routine should have a handler that catches all exceptions and dies if it catches one.
2) If you have a resource you have a cleanup routine for it.
rramadass 17 days ago [-]
It is even more fundamental. People are focusing wrongly on the mention of exceptions here (most obvious) but what is crucial is to understand how Async callbacks registered with a Kernel work on all OSes. The limitations/caveats imposed on these routines (they are akin to interrupts) are given in their respective documentations and one has to be careful to understand and use them appropriately; eg. what is the stack used by these handlers? The article though detailed in the beginning sort of glosses over all this in the final paragraphs and hence we have to link the dots ourselves.
LegionMammal978 17 days ago [-]
It's not really about asynchronous callbacks or their equivalents. (In this case, the thread running it is otherwise meant to be blocked in a safe state, so that there's none of the usual dangers of interrupting arbitrary code.) Instead, it's about any callbacks coming out of C code, even something as trivial as qsort(). If you pass a C library your C++ callback, and your callback runs back through it with an exception, then 9 times out of 10, the C library will leak some resources at best, or reach an unstable state at worst. C just doesn't have any portable 'try/finally' construct that can help deal with it.
So I'd say it's more about the basic expectations of a function called from C, which includes a million other trivial things like "don't write beyond the bounds of buffers you're given" and "don't clobber your caller's stack frame" and "don't spawn another thread just to write to output pointers after your function returns" (not that any of these is the issue here).
rramadass 17 days ago [-]
No, you (and most folks here) have not understood the full picture. Only the C ABI is relevant here and not the language (C/C++/whatever) itself.
You have to know how exactly asynchronous callbacks registered with the kernel get called, how their stack frames get setup, how kernel writes to local variables within a stack frame of a user thread, how stack frames are adjusted when a blocking system call returns to user space and finally, how and when exceptions (in any language) mess up the above when they implement a different flow of control than that expected by the above "async callback kernel api architecture". All of these are at play here and once you put them together you understand the scenario.
IshKebab 18 days ago [-]
Yes memory safe languages would absolutely help here. In Rust you would get a compile time error about the destination variable not living long enough.
This sort of stuff is why any productivity arguments for C++ over Rust are bullshit. Sure you spend a little more time writing lifetime annotations, but in return you avoid spending 5 days debugging one memory corruption bug.
ryao 17 days ago [-]
This is not a memory corruption bug. It is an undefined behavior bug and it also affected Rust until 1.81.0 as per comments from others:
A Story for the ages. That is some hardcore debugging involving everything viz. user land, system call, kernel, disassembly etc.
17 days ago [-]
glandium 18 days ago [-]
I wonder if Time Travel Debugging would have helped narrow it down.
loeg 18 days ago [-]
I don't think any reverse debugging system can step the kernel backwards to this degree, unless they're doing something really clever (slow) with virtual machines and snapshots.
dzaima 18 days ago [-]
While not allowing stepping in the kernel, a large part of rr is indeed intercepting all things the kernel may do and re-implementing its actions, writing down all changes to memory & etc it does (of course for Linux, not Windows). With which the kernel doing an asynchronous write would necessarily end up as a part of the recording stating what the kernel writes at the given point in time, which a debugger could deterministically reason about. (of course this relies on the recording system reimplementing the things accurately enough, but that's at least possible)
Veserv 18 days ago [-]
You are correct. A time travel debugging solution that supports recording the relevant system call side effects would handle this. In fact, this system call is likely just rewriting the program counter register and maybe a few others, so it would likely be very easy to support if you could hook the relevant kernel operations which may or may not be possible in Windows.
The replay system would also be unlikely to pose a problem. Replay systems usually just encode and replay the side effects, so there is no need to "reimplement" the operations. So, if you did some wacky system call, but all it did is write 0x2 to a memory location, M, you effectively just record: "at time T we issued a system call that wrote 0x2 to M". Then, when you get to simulated time T in the replay, you do not reissue the wacky system call, you just write 0x2 to M and call it a day.
loeg 18 days ago [-]
This system call returned and then asynchronously wrote to memory some time later. How does the replay system even know the write happened, without scanning all memory? It can't generally. With knowledge of the specific call it could put just that address on a to-be-scanned list to wait for completion, but it still needs to periodically poll the memory. It is far more complicated to record than a synchronous syscall.
Veserv 18 days ago [-]
You hook the kernel write. That is why I said hook the relevant kernel operations.
The primary complexity is actually in creating a consistent timeline with respect to parallel asynchronous writes. Record-Replay systems like rr usually just serialize multithreaded execution during recording to avoid such problems. You could also do so by just serializing the executing thread and the parallel asynchronous write by stopping execution of the thread while the write occurs.
Again, not really sure if that would be possible in Windows, but there is nothing particularly mechanically hard about doing this. It is just a question of whether it matches the abstractions and hooks Windows uses and supports.
dzaima 18 days ago [-]
I don't think rr hooks actual kernel writes, but rather just has hard-coded information on each syscall of how to compute what regions of memory it may modify, reading those on recording and writing on replay.
As such, for an asynchronous kernel write you'd want to set up the kernel to never mutate recordee memory, instead having it modify recorder-local memory, which the recorder can then copy over to the real process whenever, and get to record when it happens while at it (see https://github.com/rr-debugger/rr/issues/2613). But such can introduce large delays, thereby changing execution characteristics (if not make things appear to happen in a different order than the kernel would, if done improperly). And you still need the recording system to have accurately implemented the forwarding of whatever edge-case of the asynchronous operation you hit.
And, if done as just that, you'd still hit the problem encountered in the article of it looking like unrelated code changes the memory (whereas with synchronous syscalls you'd at least see the mutation happening on a syscall instruction). So you'd want some extra injected recordee instruction(s) to present separation of recordee actions from asynchronous kernel ones. As a sibling comment notes, rr as-is doesn't handle any asynchronous kernel write cases (though it's certainly not entirely impossible to).
loeg 18 days ago [-]
rr does not record AIO or io_uring operations today, because recording syscalls with async behavior is challenging.
Maybe Windows TTD records async NtDeviceIoControlFile acculately, maybe it doesn't; I don't know.
mark_undoio 18 days ago [-]
It looks like they didn't actually need to step the kernel in the end - it just helped understand the bug (which I'd say was in user space - injecting an exception into select() and this preventing it exiting normally - even though a kernel behaviour was involved in how the bug manifested).
The time travel debugging available with WinDbg should be able to wind back to the point of corruption - that'd probably have taken a few days off the initial realisation that an async change to the stack was causing the problem.
There'd still be another reasoning step required to understand why that happened - but you would be able to step back in time e.g. to when this buffer was previously used on the stack to see how select () was submitting it to the kernel.
In fact, a data breakpoint / watchpoint could likely have taken you back from the corruption to the previous valid use, which may have been the missing piece.
chris_wot 18 days ago [-]
How? Throwing an exception would prevent this wouldn’t it?
ben-schaaf 18 days ago [-]
When the assertion on the stack sentinel was reached they could have watched the value and then reverse continued, which in theory would reveal the APC causing the issue - or at least the instruction writing the value. Not sure how well reverse debugging works on Windows though, I'm only familiar with rr.
ryao 17 days ago [-]
It would have only helped if it were applied to the OS kernel.
tomsmeding 18 days ago [-]
Everyone here is going on about exceptions bad, but let's talk about QueueUserAPC(). Yeah, let's throw an asynchronous interrupt to some other thread that might be doing, you know, anything!
In the Unix world we have this too, and it's called signals, but every documentation about signals is sure to say "in a signal handler, almost nothing is safe!". You aren't supposed to call printf() in a signal handler. Throwing exceptions is unthinkable.
I skimmed the linked QueueUserAPC() documentation page and it says none of this. Exceptions aren't the handgrenade here (though sure, they're nasty) — QueueUserAPC() is.
Veserv 18 days ago [-]
That does not seem to be correct. The documentation indicates APC [1] can only occur from a waiting/blocking state. So, the program is in a consistent state and can only be on a few known instructions, unlike signals. As such, most functions should be safe to call.
This is more like select() sometimes calling a user-supplied function in addition to checking for I/O.
> A thread enters an alertable state when it calls the SleepEx, SignalObjectAndWait, MsgWaitForMultipleObjectsEx, WaitForMultipleObjectsEx, or WaitForSingleObjectEx function.
So this is a lot less like Unix signals. It only really works if the thread you're doing the async procedure call to is one that's likely to use those.
So APCs are safe enough -- a lot safer than Unix signal handlers.
tomsmeding 18 days ago [-]
I see, I read the docs slightly too quickly. Still, though, I would have expected a conspicuous warning about exceptions in those calls, because MS is in on C++ (so they can't hide behind "but we expected only C") and apparently(?) the APC machinery doesn't catch and block exceptions in user code.
cryptonector 17 days ago [-]
The problem is that select() is a wrapper around WaitForMultipleObjectsEx() or whatever it is that select() uses, and those functions (the ones that can call APCs because entering them enters "alertable state") are extern "C" functions, which means they cannot throw exceptions, but here we have an APC throwing an exception, which is not allowed!
Now, MSFT does NOT document that APCs can't throw, but since the functions that enter alertable states (hence which can call APCs) are extern "C" functions, it follows that APCs cannot throw.
So part of the problem here is that the PAPCFUNC function type is not noexcept, therefore the compiler can't stop you from using a function that is not noexcept as an APC, thus APCs can -but must not!- throw exceptions.
ryao 17 days ago [-]
The C++ standard really should be amended to state that C ABI functions are noexcept.
That being said, I filed bugs against both GCC and LLVM requesting warnings, which is the next best thing:
There is nothing inherently wrong with throwing an exception from an APC. Windows supports it and will unwind the stack correctly. If you wrote all the code absolutely nothing will go wrong.
The issue is more about the actual code that calls alertable waits not expecting exceptions that will unwind which will most likely be all of winapi code because in it exception == crash.
The second piece of code I wrote for pay was a FFI around a c library, which had callbacks to send incremental data back to the caller. I didn’t understand why the documented examples would re-acquire the object handles every iteration through the loop so I dropped them. And everything seemed to work until I got to the larger problems and then I was getting mutually exclusive states in data that was marked as immutable, in some of the objects. I pulled my hair on this for days.
What ended up happening is that if the GC ran inside the callback then the objects the native code could see could move, and so the next block of code was smashing the heap by writing to the wrong spots. All the small inputs finished before a GC was called and looked fine but larger ones went into undefined behavior. So dumb.
pantalaimon 17 days ago [-]
> Jemand hatte das Gemächt meines Wächters angefasst - und es war definitiv kein Freund.
That auto-translation is something else
explosion-s 17 days ago [-]
> Somebody had been touching my sentinel’s privates - and it definitely wasn’t a friend
Gotta love programmers out of context
diekhans 18 days ago [-]
Nicely written (and executed). Worse that my worst memory corruption.
hun3 18 days ago [-]
(2016)
saagarjha 18 days ago [-]
Added
18 days ago [-]
Dwedit 17 days ago [-]
TLDR: Kernel wrote memory back to a pointer provided by the user-mode program, as it was supposed to do. Unfortunately, it was a dangling pointer (Use-after-free)
When the Kernel does the memory write, user-mode memory debuggers don't see it happen.
mgaunard 18 days ago [-]
Set a hardware breakpoint and you'll know immediately. That's what he eventually did, but he should have done so sooner.
Then obviously, cancelling an operation is always tricky business with lifetime due to asynchronicity. My approach is to always design my APIs with synchronous cancel semantics, which is sometimes tricky to implement. Many common libraries don't do it right.
alexvitkov 18 days ago [-]
He mentioned in the article that the corruption happens at a seemingly random spot the middle of a large buffer, and you can only have a HW breakpoint on 4 addresses in x86-64.
mgaunard 15 days ago [-]
it's only random when using ASLR.
quotemstr 18 days ago [-]
Reproduce the corruption under rr. Replay the rr trace. Replay is totally deterministic, so you can just seek to the end of the trace, set a hardware breakpoint on the damaged stack location, and reverse-continue until you find the culprit.
zorgmonkey 18 days ago [-]
rr is only works on Linux and the release of Windows TTD was after this blog post was published. Also the huge slowdown from time travel debuggers can sometimes make tricky bugs like this much harder to reproduce.
pm215 18 days ago [-]
I would certainly try with a reverse debugger if I had one, but where the repro instructions are "run this big complex interactive program for 10 minutes" I wouldn't be super confident about successfully recording a repro. At least in my experience with rr the slowdown is enough to make that painful, especially if you need to do multiple "chaos mode" runs to get a timing sensitive bug to trigger. It might still be worth spending time trying to get a faster repro case to make reverse debug a bit more tractable.
IshKebab 18 days ago [-]
Sure let me just run `rr` on Windows...
saagarjha 18 days ago [-]
Hardware breakpoints don't work if the kernel is doing the writes, because the kernel won't let you enable them globally so they trigger outside of your program.
mgaunard 14 days ago [-]
If you use a decent kernel like Linux, there is an API to do that from userspace without requiring you to reboot your kernel under a debugger.
saagarjha 14 days ago [-]
I don't think I'm familiar with that API. What is it?
mgaunard 14 days ago [-]
It's part of perf_event, available since 2.6.33.
machine_coffee 18 days ago [-]
Also surprised an async completion was writing to the stack. You should normally pass a heap buffer to these functions and keep it alive e.g for the lifetime of the object being watched.
muststopmyths 18 days ago [-]
It's not an async completion. The call is synchronous.
Windows allows some synchronous calls to be interrupted by another thread to run an APC if the called thread is in an "alertable wait" state. The interrupted thread then returns to the blocking call, so the pointers in the call are expected to be valid.
Edit 2: I should clarify that the thread returns to the blocking call, which then exits with WAIT_IO_COMPLETION status. So you have to retry it again. but the stack context is expected to be safe.
APC is an "Asynchronous procedure call", which is asynchronous to the calling thread in that it may or may not get run.
Edit: May or may not run a future time.
There are very limited things you are supposed to do in an APC, but these are poorly documented and need one to think carefully about what is happening when a thread is executing in a stack frame and you interrupt it with this horrorshow.
Win32 API is a plethora of footguns. For the uninitiated it can be like playing Minesweeper with code. Or like that scene in Galaxy Quest where the hammers are coming at you at random times as you try to cross a hallway.
A lot of it was designed by people who, I think, would call one stupid for holding it wrong.
I suppose it's a relic of the late 80s and 90s when you crawled on broken glass because there was no other way to get to the other side.
You learn a lot of the underlying systems this way, but these days people need to get shit done and move on with their lives.
Us olds are left behind staring at nostalgically at our mangled feet while we yell at people to get off our lawns.
Fulgen 17 days ago [-]
> There are very limited things you are supposed to do in an APC, but these are poorly documented and need one to think carefully about what is happening when a thread is executing in a stack frame and you interrupt it with this horrorshow.
One must not throw a C++ exception across stack frames that don't participate in C++ stack unwinding, whether it's a Win32 APC, another Win32 callback, a POSIX signal or `qsort` (for the people that believe qsort still has a place in this decade). How the Win32 API is designed is absolutely irrelevant for the bug in this code.
muststopmyths 13 days ago [-]
I was talking about APCs and win32 api in general not this bug.
loeg 18 days ago [-]
select() (written in C, a language without exceptions) is synchronous, its authors just (reasonably) did not expect an exception to be thrown in the middle of it invoking a blocking syscall. The algorithm was correct in the absence of a language feature C simply does not have and that is relatively surprising (you don't expect syscalls to throw in C++ either).
PhiSchle 18 days ago [-]
You state this like an obvious fact, but it is only obvious if you either heard of something like this, or you've been through it.
From that point on I am sure he knew to do that. What's obvious to you can also just be your experience.
I think this is the rare direction that more langs should follow Rust in that 'clever' code can be more easily quarantined for scrutiny via unsafe.
Cortisol actually reduces the effectiveness of working memory. Write your code like someone looking at it is already having a bad day, because not only are odds very good they will be, but that’s the costliest time for them to be looking at it. Probability x consequences.
It seems like the select() was within its rights to have passed a stack allocated buffer to be written asynchronously by the kernel since it, presumably, knew it couldn't encounter any exceptions. But injecting one has broken that assumption.
If the select() implementation had returned normally with an error or was expecting then I'd assume this wouldn't have happened.
https://learn.microsoft.com/en-us/windows/win32/api/winsock2...
https://learn.microsoft.com/en-us/windows/win32/api/synchapi...
If they had implemented this to use a C++ exception to return control to C++, they should have encountered this stack corruption issue immediately upon implementing this, rather than have a customer some point in the future hit it before they did.
My read of this is that they had the callback function do something and someone eventually got it to throw an exception. This is undefined behavior because there is no correct way to unwind a C stack frame. However, that is not obvious, especially if you test it since if the C function does nothing special such that it needs no special clean up, everything should be fine. However, WaitForSingleObjectEx() does something extremely special. Skipping that by unwinding the stack bit them.
I filed bugs against both GCC and LLVM requesting warnings to protect people from doing this:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118263
https://github.com/llvm/llvm-project/issues/121427
Grug brain is maybe good for 1:1 interactions or over coffee with the people you vent to or are vented to.
Speaking of which, I once did consulting for a company that insisted that they had hit a bug in ZFS because there were backtraces showing ZFS. Using KASAN, I found that the bug was in fact in another third party kernel module that they used that appeared to be the result of a hobbyist project on github. It had a string function write 1 past the end of a memory allocation, which often would corrupt memory allocations used in ZFS' AVL trees. ZFS' AVL tree code did not like this, so ZFS appeared in backtraces.
That said, the bug described by the unity blog is really an undefined behavior bug rather than a memory corruption bug. Doing an exception in a C++ callback function called by a C function is undefined since C does not support stack unwinding. If the C function does not need any special cleanup when stack unwinding is done, it will probably work fine, but in this particular case, the function needed very special cleanup and invoking undefined behavior bit them.
Sure, but I’ve rarely seen quality comparable to the Linux kernel in a commercial context.
> I am going to guess that Sony Ericsson had a ton of use after free bugs and array out of bounds bugs, as that is the main way that developers can step on each others' code when it shares a process space.
Mostly yes. But there were some interesting ones too. I debugged one that turned out to be a stack overflow, for example. There was basically this giant function with ridiculously many local variables. When that function called another function the stack pointer was bumped far out of the allotted stack space and the called function overwrote stuff when writing to local variables.
> What is despair? I have known it—hear my song. Despair is when you’re debugging a kernel driver and you look at a memory dump and you see that a pointer has a value of 7. THERE IS NO HARDWARE ARCHITECTURE THAT IS ALIGNED ON 7. Furthermore, 7 IS TOO SMALL AND ONLY EVIL CODE WOULD TRY TO ACCESS SMALL NUMBER MEMORY. Misaligned, small-number memory accesses have stolen decades from my life.
All James Mickens' USENIX articles are fun (for a very specific subset of computer scientist - the kind that would comment on this thread). https://mickens.seas.harvard.edu/wisdom-james-mickens
x86 page table entries can't express "user read only, kernel read/write"
Linux for example maintains a big writable linear mapping of all RAM at all times (on 64-bit), you can corrupt read-only user pages through it all day and never fault. Code running in the kernel generlly uses virtual addresses from that mapping.
The kernel has no reason to be trying to bypass page protections; what if you asked it to write to a shared read-only page?
The try-catch dance you're describing is only necessary for accessing the userspace mapping (because it might fault). Kernel code which dereferences kernel pointers doesn't do that.
Pages don't get remapped into userspace: userspace gets an additional aliased mapping with restricted permissions. The kernel's writable mapping still exists. There's nothing to "bypass", what permissions userspace applies its user mapping of the same physical page has no effect on the kernel mapping.
:-)
UNIX system calls never do this. The kernel won't keep references to pointers you pass them and write to them later. It just isn't in the DNA. The only exceptions I can think of would be clone(), which is abstracted by the POSIX threads runtime, and Windows-inspired non-standard event i/o system calls like epoll.
I mean, this is because the UNIX model was based on readiness rather than completion. Which is slower. Hence the newer I/O models.
That's what i/o completion ports are working around.
They solve a problem UNIX doesn't have.
https://www.man7.org/linux/man-pages/man7/aio.7.html
That said, it would be insane to pass it pointers to the stack, as the only safe way to make that work would be to do `aio_suspend()` or busy wait and you might as well just use a synchronous read() function in that case.
Wait but read() wouldn't allow overlapping operations. Whereas if you suspend or busy wait you can do that for multiple operations executing concurrently.
Also if the buffer is from a caller frame then you could also return safely no?
A bit tangential, but I've been crying about the insane Unity project build times for years now, and about how they've taken zero steps to fix them and are instead trying their hardest to sell you cloud builds. Glad to see them having to suffer through what they're inflicting on us for once!
Regardless, very good writeup, and yet another reason to never ever under any conditions use exceptions.
Valgrind flagged an "invalid write", which I eventually hunted down as a fencepost error in a dependency which overwrote their allocated stack array by one byte. I recall that it wrote "1" rather than "2", though, haha.
> Lesson learnt, folks: do not throw exceptions out of asynchronous procedures if you’re inside a system call!
The author's debugging skills are impressive and significantly better than mine, but I find this an unsatisfying takeaway. I yearn for a systemic approach to either prevent such issues altogether or to make them less difficult to troubleshoot. The general solution is to move away from C/C++ to memory safe languages whenever possible, but such choices are of course not always realistic.
With my project, I started running most of the test suite under Valgrind periodically. That took took half an hour to finish rather than a few seconds, but it caught many similar memory corruption issues over the next few years.
Pointers to C++ functions that can throw exceptions should not be passed to C functions as callback pointers. Executing the exception in the callback context is undefined behavior, since C does not support stack unwinding.
Presumably, any language that has exception handling would have an issue on Windows when doing select() in one thread and QueueUserAPC() in another thread to interrupt it with a callback function that throws an exception. What happens then depends on how the stack unwinding works.
No programming language can avoid this because they need to use the underlying operating system's system calls in order to function. They also need to use the C functions in ntdll to do system calls since Microsoft does not support doing system calls outside of calling the corresponding functions in ntdll and friends.
https://en.cppreference.com/w/cpp/language/noexcept_spec
This just feels important to point out because this feature is 15 years old and still commonly misunderstood, and each time people are wanting the same thing (actual compile-time prevention of `throw`) which it is not.
Edit: OK I finally just went and tried it on godbolt.org. C++17 GCC, Clang, and MSVC all give 1 warning on this code for `bar` and that's all.
https://godbolt.org/z/rjPfYjnzf
https://godbolt.org/z/14ocshsE5
I filed bugs against both GCC and LLVM requesting warnings:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118263
https://github.com/llvm/llvm-project/issues/121427
It would also help if the APC docs documented that APCs must not throw.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118263
https://github.com/llvm/llvm-project/issues/121427
Perhaps the C++ standards committee should specify that doing this should cause a compiler failure, rather than a warning.
I encountered the issue when our (not Ubuntu, not 24.04) LTS upstream backported security fixes that included the mmap changes without updating universe to include a clang version with the fixes. Any developers diligent enough to update and run sanitisers locally started seeing intermittent crashes.
MSAN, ASAN, and UBSAN are great tools that have saved me a lot of time and headaches, but they don't catch everything that the compiler of a memory safe language can, at least not today.
Rust programs use unsafe a lot in practice. One of the greatest difficulties I've had in supporting Rust with Cosmopolitan Libc is that Rust libraries all try to be clever by using raw assembly system calls rather than using libc. So our Rust binaries will break mysteriously when I run them on other OSes. Everyone who does AI or scientific computing with Rust, if you profile their programs, I guarantee you 99% of the time it's going to be inside C/C++ code. If better C/C++ tools can give us memory safety, then how much difference does it really make if it's baked into the language syntax. Rust can't prove everything at compile time.
Some of the Rust programs I've used like Alacrity will runtime panic all the time. Because the language doesn't actually save you. What saves you is smart people spending 5+ years hammering out all the bugs. That's why old tools we depend on every day like GNU programs never crash and their memory bugs are rare enough to be newsworthy. The Rust community has a reputation for toxic behavior that raises questions about its the reliability of its governance. Rust evangelizes its ideas by attacking other languages and socially ostracizing the developers who use them. Software development is the process of manipulating memory, so do you really want to be relinquishing control over your memory to these kinds of people?
> Dependency soup.
Exactly why I don't use it. I don't really feel like including the source for the entire toolchain as part of my project and building it all myself. At least if I write standards conforming C++ there are multiple compiler implementations that can all handle it. I also have a reasonable expectation that a few decades from now I will be able to `apt get somecompiler` and the code will still just work (aside from any API changes at the OS level, for which compatibility shims will almost certainly exist).
If I can't build something starting from a repo in a network isolated environment then I want absolutely nothing to do with it. (Emscripten I am looking at you. I will not be downloading sketchy binary blobs from cloud storage to "build from source" that is not a source build that is binary distribution you liars.)
I’m sorry, this is coming from Justine “the magic syscall numbers are my god given right to use” Tunney?
So using raw system calls is a distraction for them.
More tools in the C/C++ realm are always welcome but I've yet to see more than 50% of projects I've worked on be able to successfully use ASAN(assuming you've got the time to burn to configure them and all their dependencies properly). I've used ASAN, CBMC and other tools to good effect but find Rust more productive overall.
https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#...
What would catch this would be a C++ compiler warning. I filed bug reports against both GCC and LLVM asking for one:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118263
https://github.com/llvm/llvm-project/issues/121427
The problem boils down to usage of stack memory after the memory is given to somebody else.
While this isn't incorrect in this case the problem seems to be caused by stack unwinding without the consent of lower frames rather than a willful bug where the callee forgets about the ownership.
There is no "called back" in this case. The APC was executed by the sleep and corrupted the stack by unwinding across the C winsock code without any cleanup. It never returned.
The user-mode enters an "alertable" wait which allows an asynchronous procedure (APC) to interrupt it and execute code. Instead of returning the APC causes an exception, unwinds the stack across the APC delivery and ends up executing some random code instead of returning to the winapi code that called wait(alertable: true) in a loop. So the code that was supposed to be synchronous because of while(!completed) wait(); suddenly is broken out of the loop without actually being completed.
> Exceptions (where lower frames matter) hid the control flow here, but that's one way to reach this situation (early return is another way, as shown by Raymond Chen's post).
This isn't just hiding the control flow here. It's control flow that shouldn't have existed in the first place. It walks across the boundary of the windows APC dispatcher. Unity folks needed to go out of their way to make this "work" in the first place because using c++ exceptions and standard library threads this wouldn't work.
I'm actually not sure if Linux would handle this in the same way or what. Plausibly it sees the same leaf page tables as user space, trips a fault, and doesn't scribble the pages anyway. Maybe Windows translates the user-provided virtual address to a physical address (or other kernel mapping that happens to have write permission) upon registration.
This is an absolutely standard trick that is known as "the self-pipe trick". I believe DJB created and named it. It is used for turning APIs not based on file handles/descriptors into events for an event loop based on file handles/descriptors, especially for turning signals into events for select(2)/poll(2)/epoll/kqueue/...
> Also that pattern is a bit too sweet as it can be a nice way to create condition variables or queues, the problem being one is now paying syscall overhead.
Sure but you don't need to, and even if you did, the system call overhead is probably not the hill you're dying on, or if it is then you want io_uring or similar.
I had a few questions I asked ChatGPT to understand better: https://chatgpt.com/share/677411f9-b8a0-8013-8724-8cdff8dc4d...
Very interesting insights about low level programming in general
It’s probably feasible to use them if you draw a tight boundary around the exception-using code, use RAII without fail inside that boundary to ensure everything cleans up properly, and make sure all code that might be called from the outside has try/catch blocks. (And, obviously, don’t trigger async calls to throw from the middle of someone else’s function!).
I find it a lot easier to avoid them entirely. Error handling is a little more annoying, but it’s worth it.
Without exceptions the possible control flow is entirely explicit. It would have at least been obvious that cleanup wasn't properly handled in the select function for all cases.
An exception was effectively injected from outside of the code via low level shenanigans. That's not "exceptions bad" that's "low level monkeying with program control flow can blow up in your face".
Of course, it's not uncommon to find unsafe Rust code that misbehaves badly when something panics, which is yet another of its hazards that I wish were better documented.
In this case, I'd put down the lesson as "exceptions in C++ are very dangerous if you're coming out of C code", since they can cause undocumented and unexpected behavior that ultimately led to the use-after-return issue here.
[0] https://blog.rust-lang.org/2024/09/05/Rust-1.81.0.html#abort...
It can also result in logic errors if objects are used after their methods panic, but such usage is generally not expected to work in the first place.
Especially when I still have English set as the second priority language.
This could only be made safe by the OS somehow imposing safety mechanisms on the binary level, or by wrapping all OS APIs into APIs of the safe language, where the wrappers have to take care to ensure both the guarantees implied by the language and the assumptions made by the OS APIs. (Writing the OS itself in a memory-safe language isn’t sufficient, for one because it very likely will still require some amount of “unsafe” code, and furthermore because one would still want to allow applications written in a different language, which even if it also is memory-safe, would need memory-correct wrappers/adapters.)
This is similar to the distinction between memory-safe languages like Rust where the safety is established primarily on the source level, not on the binary level, and memory-safe runtimes like the CLR (.NET) and the JVM.
To my knowledge the kernel and win32 is in fact written in C++ and only the interface has C linkage and follows C norms.
So this error occurred going C++ > C > C++ never mind languages with different memory protection mechanisms like Rust > C > C++.
Of course, you can write C++ without exception safety too, but “C++ as a better C” and exception-safe C++ are effectively like two different languages.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118263
https://github.com/llvm/llvm-project/issues/121427
The article doesn't say it was a C++ exception. Could've been a SEH exception.
*This could, in theory, be caught by fancy hardware strategies like capabilities. But those are somewhat more esoteric.
The exception caused the stack frame to disappear before the OS kernel was done with it. Presumably, the timeout would have been properly handled had the stack not been unwound by the exception. If it had not, that would be a bug in Windows.
There is a conceptually simple solution to this issue, which is to have the C++ compiler issue a warning when a programmer does this. I filed bug reports against both GCC and LLVM asking for one:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118263
https://github.com/llvm/llvm-project/issues/121427
Correctness of the Rust wrapper can't be checked by the compiler, just like correctness of C headers is unchecked, and it just has to match the actual underlying ABI.
The task of making a safe API wrapper can be relatively simple, because you don't have to take into consideration safety an application as a whole, you only need to translate requirements of individual APIs to Rust's safety requirements, function by function. In this case you would need to be aware that the function call may unwind, so whether someone making a dedicated safe API for it would think of it or not, is only a speculation.
If those specifications were written down clearly enough then this dev wouldn't have needed to spend 5 days debugging this since he spent a significant amount of time reading the documentation to find any errors they are making that is mentioned in the documentation.
And don't say that they can actually just read the rust code and check that since well, I can't read low level rust code and how any of the annotations ca interact with each other.
A single line of rust code could easily need several paragraphs of written documentation so that someone not familier with what rust is specifying will actually understand what that entails.
This is part of why Rust is difficult, you have to nail down the specification and a small change to the specification causes broad changes to the codebase. The same might need to happen in C, but many times it doesn't.
In C you can have a function that returns a pointer, and get no information how long that pointer is valid for, what is responsible for freeing it, whether it's safe to use from another thread. That's not only an obstacle for making a safe Rust API for it, that's also a problem for C programmers who don't want to just wing it and hope it won't crash.
The benefit of safe wrappers is that as a downstream user you don't need to manually check their requirements. They're encoded in the type system that the compiler checks for you. If it compiles, it's safe by Rust's definition. The safety rules are universal for all of Rust, which also makes it easier to understand the requirements, because they're not custom for each library or syscall. The wrappers boil it down to Rust's references, lifetimes, guards, markers, etc. that work the same everywhere.
> That Linux drama was due to "nontechnical nonsense" of maintainers refusing to document their APIs requirements.
> If those specifications were written down clearly enough then this dev wouldn't have needed to spend 5 days debugging...
Anyway, it is a shame that the compiler does not issue a warning when you do this. I filed bug reports with both GCC and LLVM requesting that they issue warnings, which should be able to avoid this mess if the compilers issue them and developers heed them:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118263
https://github.com/llvm/llvm-project/issues/121427
To put it another way, you can design a safe wrapper around this in Rust, but you can't in C++.
They can't. Rust can't verify the safety of the called code once you cross the language boundary. Handing out the pointer is inherently unsafe.
In the user space FFI case at least you might be able to switch to an implementation written in the same (memory safe) language that you are already using. Not so for a syscall.
The problem was that the C API exposed by the kernel did not encode lifetime requirements, so they were accidentally violated. Rust APIs (including ones that wrap C interfaces) can encode lifetime requirements, so you get compile time errors if you screw it up.
I don't think you can win this argument by saying "but you have to use `unsafe` to write the Rust wrapper". That's obviously unavoidable.
As others have mentioned, this same issue could happen in Rust until very recently. As of Rust 1.81.0, Rust will abort instead of unwinding C stack frames:
https://blog.rust-lang.org/2024/09/05/Rust-1.81.0.html#abort...
That avoids this issue in Rust. As for avoiding it in C++ code, I have filed bugs against both GCC and LLVM requesting warnings:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118263
https://github.com/llvm/llvm-project/issues/121427
Once the compilers begin emitting warnings, this should not be an issue anymore as long as developers heed the warnings.
It would involve looking at the function pointer in QueueUserAPC and making sure the function being called doesn't mess with the stack frame being executed on.
This function will run in the context of the called thread, in that thread's stack. NOT in the calling thread.
It's a weird execution mode where you're allowed to hijack a blocked thread and run some code in its context.
Don't know enough about Rust or the like to say if that's something that could be done in the language with attributes/annotations for a function, but it seems plausible.
https://blog.rust-lang.org/2024/09/05/Rust-1.81.0.html#abort...
You have to explicitly opt into unwinding like this now otherwise the program will abort.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118263
https://github.com/llvm/llvm-project/issues/121427
If/when they are implemented, they will become errors with -Werror.
If it is not, then there is a bug in the C++ implementation.
> Another possibility is to avoid it in the first place by not allowing C++ function pointers that are not marked noexcept to be passed to C functions.
That said, my personal preference is to use C instead of C++, which avoids the issues of exceptions breaking kernel expectations entirely.
I filed bugs against both GCC and LLVM requesting warnings:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118263
https://github.com/llvm/llvm-project/issues/121427
1) The top level of an async routine should have a handler that catches all exceptions and dies if it catches one.
2) If you have a resource you have a cleanup routine for it.
So I'd say it's more about the basic expectations of a function called from C, which includes a million other trivial things like "don't write beyond the bounds of buffers you're given" and "don't clobber your caller's stack frame" and "don't spawn another thread just to write to output pointers after your function returns" (not that any of these is the issue here).
You have to know how exactly asynchronous callbacks registered with the kernel get called, how their stack frames get setup, how kernel writes to local variables within a stack frame of a user thread, how stack frames are adjusted when a blocking system call returns to user space and finally, how and when exceptions (in any language) mess up the above when they implement a different flow of control than that expected by the above "async callback kernel api architecture". All of these are at play here and once you put them together you understand the scenario.
This sort of stuff is why any productivity arguments for C++ over Rust are bullshit. Sure you spend a little more time writing lifetime annotations, but in return you avoid spending 5 days debugging one memory corruption bug.
https://blog.rust-lang.org/2024/09/05/Rust-1.81.0.html#abort...
The replay system would also be unlikely to pose a problem. Replay systems usually just encode and replay the side effects, so there is no need to "reimplement" the operations. So, if you did some wacky system call, but all it did is write 0x2 to a memory location, M, you effectively just record: "at time T we issued a system call that wrote 0x2 to M". Then, when you get to simulated time T in the replay, you do not reissue the wacky system call, you just write 0x2 to M and call it a day.
The primary complexity is actually in creating a consistent timeline with respect to parallel asynchronous writes. Record-Replay systems like rr usually just serialize multithreaded execution during recording to avoid such problems. You could also do so by just serializing the executing thread and the parallel asynchronous write by stopping execution of the thread while the write occurs.
Again, not really sure if that would be possible in Windows, but there is nothing particularly mechanically hard about doing this. It is just a question of whether it matches the abstractions and hooks Windows uses and supports.
As such, for an asynchronous kernel write you'd want to set up the kernel to never mutate recordee memory, instead having it modify recorder-local memory, which the recorder can then copy over to the real process whenever, and get to record when it happens while at it (see https://github.com/rr-debugger/rr/issues/2613). But such can introduce large delays, thereby changing execution characteristics (if not make things appear to happen in a different order than the kernel would, if done improperly). And you still need the recording system to have accurately implemented the forwarding of whatever edge-case of the asynchronous operation you hit.
And, if done as just that, you'd still hit the problem encountered in the article of it looking like unrelated code changes the memory (whereas with synchronous syscalls you'd at least see the mutation happening on a syscall instruction). So you'd want some extra injected recordee instruction(s) to present separation of recordee actions from asynchronous kernel ones. As a sibling comment notes, rr as-is doesn't handle any asynchronous kernel write cases (though it's certainly not entirely impossible to).
Maybe Windows TTD records async NtDeviceIoControlFile acculately, maybe it doesn't; I don't know.
The time travel debugging available with WinDbg should be able to wind back to the point of corruption - that'd probably have taken a few days off the initial realisation that an async change to the stack was causing the problem.
There'd still be another reasoning step required to understand why that happened - but you would be able to step back in time e.g. to when this buffer was previously used on the stack to see how select () was submitting it to the kernel.
In fact, a data breakpoint / watchpoint could likely have taken you back from the corruption to the previous valid use, which may have been the missing piece.
In the Unix world we have this too, and it's called signals, but every documentation about signals is sure to say "in a signal handler, almost nothing is safe!". You aren't supposed to call printf() in a signal handler. Throwing exceptions is unthinkable.
I skimmed the linked QueueUserAPC() documentation page and it says none of this. Exceptions aren't the handgrenade here (though sure, they're nasty) — QueueUserAPC() is.
This is more like select() sometimes calling a user-supplied function in addition to checking for I/O.
[1] https://learn.microsoft.com/en-us/windows/win32/sync/asynchr...
> A thread enters an alertable state when it calls the SleepEx, SignalObjectAndWait, MsgWaitForMultipleObjectsEx, WaitForMultipleObjectsEx, or WaitForSingleObjectEx function.
So this is a lot less like Unix signals. It only really works if the thread you're doing the async procedure call to is one that's likely to use those.
So APCs are safe enough -- a lot safer than Unix signal handlers.
Now, MSFT does NOT document that APCs can't throw, but since the functions that enter alertable states (hence which can call APCs) are extern "C" functions, it follows that APCs cannot throw.
So part of the problem here is that the PAPCFUNC function type is not noexcept, therefore the compiler can't stop you from using a function that is not noexcept as an APC, thus APCs can -but must not!- throw exceptions.
That being said, I filed bugs against both GCC and LLVM requesting warnings, which is the next best thing:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118263
https://github.com/llvm/llvm-project/issues/121427
The issue is more about the actual code that calls alertable waits not expecting exceptions that will unwind which will most likely be all of winapi code because in it exception == crash.
What ended up happening is that if the GC ran inside the callback then the objects the native code could see could move, and so the next block of code was smashing the heap by writing to the wrong spots. All the small inputs finished before a GC was called and looked fine but larger ones went into undefined behavior. So dumb.
That auto-translation is something else
Gotta love programmers out of context
When the Kernel does the memory write, user-mode memory debuggers don't see it happen.
Then obviously, cancelling an operation is always tricky business with lifetime due to asynchronicity. My approach is to always design my APIs with synchronous cancel semantics, which is sometimes tricky to implement. Many common libraries don't do it right.
Windows allows some synchronous calls to be interrupted by another thread to run an APC if the called thread is in an "alertable wait" state. The interrupted thread then returns to the blocking call, so the pointers in the call are expected to be valid.
Edit 2: I should clarify that the thread returns to the blocking call, which then exits with WAIT_IO_COMPLETION status. So you have to retry it again. but the stack context is expected to be safe.
APC is an "Asynchronous procedure call", which is asynchronous to the calling thread in that it may or may not get run. Edit: May or may not run a future time.
(https://learn.microsoft.com/en-us/windows/win32/sync/asynchr...)
There are very limited things you are supposed to do in an APC, but these are poorly documented and need one to think carefully about what is happening when a thread is executing in a stack frame and you interrupt it with this horrorshow.
Win32 API is a plethora of footguns. For the uninitiated it can be like playing Minesweeper with code. Or like that scene in Galaxy Quest where the hammers are coming at you at random times as you try to cross a hallway.
A lot of it was designed by people who, I think, would call one stupid for holding it wrong.
I suppose it's a relic of the late 80s and 90s when you crawled on broken glass because there was no other way to get to the other side.
You learn a lot of the underlying systems this way, but these days people need to get shit done and move on with their lives.
Us olds are left behind staring at nostalgically at our mangled feet while we yell at people to get off our lawns.
One must not throw a C++ exception across stack frames that don't participate in C++ stack unwinding, whether it's a Win32 APC, another Win32 callback, a POSIX signal or `qsort` (for the people that believe qsort still has a place in this decade). How the Win32 API is designed is absolutely irrelevant for the bug in this code.
From that point on I am sure he knew to do that. What's obvious to you can also just be your experience.