A tested foundation¶
fdyno's correctness argument is short, and most of it is not about fdyno. The server is a thin, stateless translator: it maps the DynamoDB wire protocol onto FoundationDB transactions and holds no durable state of its own. Isolation, durability, atomic commit, and crash recovery are not properties fdyno implements; they are properties it inherits. So the interesting question is why FoundationDB's guarantees can be trusted, and the answer is an unusually rigorous methodology: deterministic simulation.
What fdyno must get right, and what it inherits¶
It helps to draw the line precisely.
fdyno is responsible for the translation:
- encoding items, keys, and indexes into FoundationDB's keyspace so the ordering and uniqueness the DynamoDB API promises are preserved;
- bundling a base-item write, its index entries, the change-stream record, and the idempotency token into one transaction;
- declaring the right read- and write-conflict ranges, so a conditional write or a transaction conflicts exactly when DynamoDB semantics say it should;
- surfacing FoundationDB's outcomes as the DynamoDB errors a client expects.
FoundationDB is responsible for the hard part:
- serializable isolation and a strict, real-time order over those transactions;
- durability and atomic commit across the cluster;
- recovery after process and machine failure, with no lost or torn writes.
A bug in the first list is an fdyno bug, the kind the DynamoDB conformance corpora it runs against are built to catch. A bug in the second list would be a FoundationDB bug, and that is the list deterministic simulation exists to eliminate.
Deterministic simulation¶
FoundationDB is written in Flow, a C++ dialect that compiles asynchronous, actor-style code into state machines. One consequence of that design is that an entire cluster (coordinators, proxies, resolvers, transaction logs, storage servers, and the network between them) can run inside a single process, as a discrete-event simulation driven by one pseudo-random seed.
In that simulation, every source of nondeterminism is controlled:
- network message ordering and latency,
- disk timing, and disk faults such as partial writes and corruption,
- process and machine restarts, reboots, and swaps,
- clock skew,
- the scheduling of every concurrent task.
Because the seed determines all of it, a run is perfectly reproducible: a test that fails on seed N fails the same way every time it is replayed. That turns a one-in-a-billion concurrency bug from an un-debuggable flake into a deterministic, steppable failure.
Searching for the worst case¶
A faithful simulation of normal operation would not find much. The value comes from injecting faults far more aggressively than reality would:
- Buggify: a mechanism that, at thousands of points in the code, randomly takes a rare-but-legal path: add a delay here, flush early there, return the maximum batch size, reorder a queue. It deliberately makes unlikely interleavings likely.
- Aggressive fault injection: the simulator partitions the network, kills and restarts machines, fills disks, and corrupts files at rates a real cluster would never survive, then checks that the database still upholds its contract once the faults clear.
- Swarm testing: randomizing the cluster configuration and fault parameters from run to run, so the search explores a wide space rather than one fixed scenario, around the clock.
The oracle is the database's own contract: after any sequence of faults, committed transactions must remain durable and the history must remain serializable. A great deal of CPU-time spent failing to violate that contract is what stands behind the words "strict serializable."
Why this matters for a layer on top¶
This is the property that lets fdyno's consistency model make strong claims without re-deriving them. A DynamoDB-compatible layer is only as correct as the engine beneath it; build the same API over a store with weaker or less-tested isolation, and the wire compatibility becomes a façade over a different set of guarantees.
It also sets a clear boundary of trust. fdyno does not claim to have re-verified distributed consistency from first principles; that would be a far larger and more dubious claim. It claims something narrower and more defensible: that it translates DynamoDB semantics onto FoundationDB transactions faithfully, and that FoundationDB upholds those transactions. The first half is what the conformance suites check on every commit; the second half is what deterministic simulation has been checking, on FoundationDB's side, for far longer.
The external reputation matches the method: FoundationDB's simulation testing is often cited as a high-water mark for distributed-systems correctness, to the point that practitioners who specialize in breaking databases have pointed to it as more thorough than an outside analysis would add. fdyno's design takes that at face value and is built to remain a thin, faithful translation on top of it.
Further reading¶
- FoundationDB documentation: Testing and fault injection. apple.github.io/foundationdb
- Will Wilson. Testing Distributed Systems w/ Deterministic Simulation in Apple's FoundationDB. Strange Loop, 2014.
- Jingyu Zhou et al. FoundationDB: A Distributed Unbundled Transactional Key Value Store. SIGMOD, 2021. See the simulation-testing section.