I don't think you have to have systems in the same thread/process if you have bake in an API for controlling time and ingress/egress for each component. (depending on what you're trying to test)
You can have the communication channels between components under the control of the simulation environment rather than have them happen in their 'normal' manner. This allows you to inject latency between components, 'fiddle' with the inputs/outputs as suggested as well as record messaging (assuming you're working with systems that exchange messages/events) with appropriate event times.
Another important point around clocks is that you'll want to include a scheduling API in your clock components to be able to schedule events for themselves in the future. If you start moving to event-time then being able to fast-forward in time is another advantage.
Overall it's worth considering the class of issue you're trying to detect with this approach. It's great to be able to run things in event time, debug ad-nauseam but you're not going to catch race conditions without the smarts mentioned around native scheduling.
There are some parallels as well between this style of testing and production replication, so that's good to have at the back of your mind if you're looking to collect production telemetry/ingress + egress data. If you build a good system for this type of testing producing reproduceable code will be part of your development process and you're more likely to be able to produce tooling for investigation/reproduction of production issues.
(The context in which I do work that's kind of in this area is robust backtesting of trading systems)
10000truths 27 days ago [-]
The reason for using a single underlying thread/process is to prevent the OS scheduler from interfering with deterministic execution. You can't control how and when the OS scheduler kicks in, nor can you perfectly reproduce the clock drift/jitter between multiple cores. If the program under test spawns threads, then you'll have to emulate the execution of those threads by writing your own scheduler whose time slicing and scheduling policies are done deterministically.
Veserv 27 days ago [-]
You can control scheduling; that is how many record-replay based time-traveling debuggers do it.
Also, scheduling is independent of deterministic execution unless you are doing inherently non-deterministic things like multithreaded shared memory accesses which you can not simulate faithfully anyways. The only thing that matters in a deterministic execution model is runs of deterministic execution interrupted with non-deterministic events injected at precise points in the execution trace.
When serializing onto a single thread you already need to define some sort of correspondence between "simulated scheduler state" to number of instructions to execute as you are already giving up on the actual scheduler (unless you do not care about correspondence to the actual schedule configuration). You just do that, but you get to execute with all of your cores until you reach the injection point (which is how replay systems can work already). Now you can execute in parallel (multiprocessing only though, no multithreading) and use blocking I/O.
joncrocks 26 days ago [-]
This depends on what you're testing. If you're collapsing threads/processes into a single thread you're making decisions about the order of execution anyway so you're not going to catch errors introduced around unexpected preemption or inter-core process/thread timing issues.
If you're not looking for that then you can build your software/system to have known sync-points where you can allow a given process to stop/wait and allow other processing to occur. This can then be in-process/out of process/remote.
As you say, this ends up being a scheduler with which you have to employ knowledge about the execution/communication channels in order to coordinate correctly.
a_t48 27 days ago [-]
This is actually an area I’m solving right now in the robotics space - you’ve got it exactly right. You need to restrict usage of calls to query the system time, deep control over the message passing later, and a custom scheduler for executing message handlers.
Edit: trading systems isn’t an area I considered for this work. A lot of parallels, though.
RandomThoughts3 27 days ago [-]
Considering you actually have to design around DST, I’m still widely unconvinced that the time and effort spent setting up DST and fuzzing hoping it finds your bugs wouldn’t be better spent actually proving that your design is bug free using tools like TLA+ before intelligently using static analysis and formal proof during implementation.
I believe DST to be the wrong solution to the actual problem. Its main advantage is that it doesn’t require that people used to design distributed system actually acquire a new skill set and it doesn’t challenge the status quo too much (after all it’s pretty much just fuzzing on path you have chosen to make fuzzable).
hwayne 27 days ago [-]
One of the big problems with using TLA+ is that it verifies your design, not your code. People are looking for ways to link the two. Formal proof works but is too expensive for most businesses.
The most promising approach I've seen so far is... DST! First we simulate a system, we generate a bunch of timelines, then we see if those timelines are valid behaviors in the TLA+ design. I've heard of a few success stories and it's definitely cheaper than formal proof!
RandomThoughts3 27 days ago [-]
> The most promising approach I've seen so far is... DST! First we simulate a system, we generate a bunch of timelines, then we see if those timelines are valid behaviors in the TLA+ design.
That’s just testing again. That’s not linking the code to the design.
DST feels to me like the way people treated memory access before Rust came around. Doing things properly was also seen as too costly then but now that it’s trendy everyone is fully behind lifetime tracking. Same here, formal proof is “too costly” but spray and pray approach like DST is somehow acceptable. Anyway, keeping with the Rust example I guess I just have to wait two decades and the next generation might finally see the light.
I haven't dealt with it directly on Firezone but I wrote one or two games this way for game jams years ago, and I keep wishing it would catch on. It was harder with the games because floating-point math doesn't like to be deterministic across platforms.
gguergabo 27 days ago [-]
Thanks for this, Phil! Blog posts like this one that break down complex topics into digestible pieces are a big help for the space and are some of my favorites.
Antithesis employee here. Happy to jump in and answer any burning questions people might have about Deterministic Simulation Testing (DST).
rhplus 27 days ago [-]
For anyone writing services in C# there’s a project from MSR called Coyote that does similar deterministic simulation testing by systematically testing interleavings of async code.
JavaScript is single-threaded, but I/O (events) introduce nondeterminism. I’m wondering if there are tools that let you control how events get scheduled when testing async code?
You can have the communication channels between components under the control of the simulation environment rather than have them happen in their 'normal' manner. This allows you to inject latency between components, 'fiddle' with the inputs/outputs as suggested as well as record messaging (assuming you're working with systems that exchange messages/events) with appropriate event times.
Another important point around clocks is that you'll want to include a scheduling API in your clock components to be able to schedule events for themselves in the future. If you start moving to event-time then being able to fast-forward in time is another advantage.
Overall it's worth considering the class of issue you're trying to detect with this approach. It's great to be able to run things in event time, debug ad-nauseam but you're not going to catch race conditions without the smarts mentioned around native scheduling.
There are some parallels as well between this style of testing and production replication, so that's good to have at the back of your mind if you're looking to collect production telemetry/ingress + egress data. If you build a good system for this type of testing producing reproduceable code will be part of your development process and you're more likely to be able to produce tooling for investigation/reproduction of production issues.
(The context in which I do work that's kind of in this area is robust backtesting of trading systems)
Also, scheduling is independent of deterministic execution unless you are doing inherently non-deterministic things like multithreaded shared memory accesses which you can not simulate faithfully anyways. The only thing that matters in a deterministic execution model is runs of deterministic execution interrupted with non-deterministic events injected at precise points in the execution trace.
When serializing onto a single thread you already need to define some sort of correspondence between "simulated scheduler state" to number of instructions to execute as you are already giving up on the actual scheduler (unless you do not care about correspondence to the actual schedule configuration). You just do that, but you get to execute with all of your cores until you reach the injection point (which is how replay systems can work already). Now you can execute in parallel (multiprocessing only though, no multithreading) and use blocking I/O.
If you're not looking for that then you can build your software/system to have known sync-points where you can allow a given process to stop/wait and allow other processing to occur. This can then be in-process/out of process/remote.
As you say, this ends up being a scheduler with which you have to employ knowledge about the execution/communication channels in order to coordinate correctly.
Edit: trading systems isn’t an area I considered for this work. A lot of parallels, though.
I believe DST to be the wrong solution to the actual problem. Its main advantage is that it doesn’t require that people used to design distributed system actually acquire a new skill set and it doesn’t challenge the status quo too much (after all it’s pretty much just fuzzing on path you have chosen to make fuzzable).
The most promising approach I've seen so far is... DST! First we simulate a system, we generate a bunch of timelines, then we see if those timelines are valid behaviors in the TLA+ design. I've heard of a few success stories and it's definitely cheaper than formal proof!
That’s just testing again. That’s not linking the code to the design.
DST feels to me like the way people treated memory access before Rust came around. Doing things properly was also seen as too costly then but now that it’s trendy everyone is fully behind lifetime tracking. Same here, formal proof is “too costly” but spray and pray approach like DST is somehow acceptable. Anyway, keeping with the Rust example I guess I just have to wait two decades and the next generation might finally see the light.
I haven't dealt with it directly on Firezone but I wrote one or two games this way for game jams years ago, and I keep wishing it would catch on. It was harder with the games because floating-point math doesn't like to be deterministic across platforms.
Antithesis employee here. Happy to jump in and answer any burning questions people might have about Deterministic Simulation Testing (DST).
https://microsoft.github.io/coyote/