Model-based Testing
Quint is a specification language that shines in building complex distributed systems. In such projects there are two questions one needs to address:
- Am I building the right system?
- Am I building the system the right way?
When writing a specification in Quint, and using the tools “quint run —invariant” and “quint verify —invariant”, we can check that the design is right, that is, that the interactions of the components ensures that system-wide invariants are satisfied. At this point, big progress has been made: we can be sure that, if we implement each of the components in a way that is consistent with the specification, then the system-wide invariants will also hold. We have reduced system-level correctness (potentially involving distributed and concurrent components) to making sure that each component in isolation is implemented according to the specification.
So, naturally, one of the most asked questions when it comes to writing specifications is “what about the code?”. Or, in a more expressive problem statement:
How to bridge the gap between the specification and the implementation?
…which is just another way of asking Question 2 from above. That is, even if we verify that some model is correct, there could still be problems in the code that is run in production, which is what actually matters. This is specially likely to happen if the specification is buried deep into some project sub-sub-sub-folder and no one dares to open it, let alone update it as the code evolves.
The perfect solution would be: verify your code! But that is far from easy, and can most likely only be accomplished with a fat budget (or some really smart volunteers). Other people might say: write the spec in several layers of abstraction until you are really close to the implementation, proving the refinement between each of them - but that is also hard, and doesn’t even address the problem of keeping code and spec in sync. Finally, there’s code generation (or extracting code from the formal spec), which is great for some sequential algorithms, but gets weird in many ways when you start having multiple processes - and that discussion doesn’t fit in this page.
We have found that the following two pragmatic approaches strike a balance between investing work and increasing confidence in the code:
- Model-based Testing (MBT)
- Trace validation
Both of these won’t give you a proof that your code is correct, but they will make you more confident that the behavior between the model (specification) and the code (implementation) match, even throughout changes to the code base (or to the spec, which tends to change way less).
MBT in one paragraph
Produce a bunch of traces (executions) from your model, which should be valid traces in your system. For example: Bob deposits $100 and then transfers $60 to Alice, and then Alice withdraws $20. A trace is just a JSON file with all the data you need. You then write a test driver to call the respective functions (deposit("Bob", 100)
, transfer("Bob", "Alice", 60)
, withdraw("Alice", 20)
) in your implementation, and optionally check things between each call or at the very end (such as asserting a property or comparing the state). That’s it: you use the model to generate test data (inputs and expectations) for you in a very efficient way.
Data | Expectations |
---|---|
Model | Model |
Observe the potential here: In many software projects, a test is a function written in your programming language that captures one specific scenario (Bob is sending to Alice, who has 20 tokens at the end). In our experience, it is very unlikely that someone is going to write more than a handful of tests of this type. In contrast, in MBT you write one test driver, and the scenarios are generated by a tool. You can easily test your software against thousands of test scenarios. So instead of saying “my code behaves like the specification in the case Bob sends 20 tokens to Alice” you will be able to say “I have a list of x (put a huge number here) scenarios, where the code behaves like the specification”.
Trace validation in one paragraph
- Trace validation can be seen as the other way around (from MBT). Instead of using the model to generate test data, you use the data from your production (or staging etc.) environments, but the expectations of what should happen with that data will come from the model. You do this by adding some sort of logging to your code that spits out information about the trace (either the transitions, telling what happened, like
deposit("Bob", 100)
; and/or the states, with sequential snapshots of the relevant balances, like"Bob" -> 40, "Alice -> 60"
). You then capture the logged data and ask Quint: is this sequence of transitions/states possible in my model? And if Quint tells you “no”, it means that your code allows something that the model doesn’t. In other words, there is a bug in your code.
Data | Expectations |
---|---|
Production | Model |
We plan on writing a full page on trace validation in the near future.
MBT in a concrete example
Let’s use a toy bank application written in Rust to go through the process. Everything is available here . Assuming we already have the implementation (under src
) and the model (bank.qnt
), this is how we’d set up model-based testing for the code.
Generate traces from the model
We can create a folder called traces
to hold all of our trace files and then generate as many as we want using --n-traces
:
quint run bank.qnt --mbt --n-traces=10000 --out-itf=traces/out.itf.json
Explaining the command:
quint run
is the subcommand for the Quint simulatorbank.qnt
is the file holding the model for which we want to generate traces--mbt
says we want extra info in the traces that is very useful for model-based testing:mbt::actionTaken
which will hold the name of the action that was used to reach the given statembt::nondetPicks
which will hold what value was chosen for each relevant non-deterministic choice leading to that state
--n-traces
determines how many traces the simulator will hold in memory. It will hold theN
longest traces in case there is no violation, or the shortest ones if they represent violations to given properties (not the case in this example).- Note: if you want more than 10k traces, you need to also increase
--max-samples
, as 10k is the default for that. - Troubleshooting: If you run out of node memory while running this, you can increase it with
NODE_OPTIONS="--max-old-space-size=8192"
. Shouldn’t be the case for 10k on modern computers.
- Note: if you want more than 10k traces, you need to also increase
--out-itf
specifies that we want to dump all those traces to JSON, and where to put them.
This will create files from traces/out0.itf.json
to traces/out9999.itf.json
with a bunch of executions that our model allows.
Parse the traces in the test driver
Now we need to get all that data into our implementation language structures, so we can use them in tests. In this example, this is Rust, which makes things super easy as we can use the itf-rs
library, which takes care of all the boring parts for us. It should not be hard to set it up in other languages as well - people have done it in Go, C++ and Java, for instance.
In Rust, all we have to do is define the types we are parsing. Some type definitions can be re-used from the implementation (under src
) as they match the Quint types. Usually this is not the case, but this is a toy example:
#[derive(Clone, Debug, Deserialize)]
pub struct Investment {
pub owner: String,
pub amount: BigInt,
}
#[derive(Clone, Debug, Deserialize)]
pub struct BankState {
pub balances: HashMap<String, BigInt>,
pub investments: HashMap<BigInt, Investment>,
pub next_id: BigInt,
}
For the rest, we also only define their types, and add some serde
annotations on the Option
types and on the mbt::*
metadata:
use itf::de::{self, As};
#[derive(Clone, Debug, Deserialize)]
pub struct NondetPicks {
#[serde(with = "As::<de::Option::<_>>")]
pub depositor: Option<String>,
#[serde(with = "As::<de::Option::<_>>")]
pub withdrawer: Option<String>,
#[serde(with = "As::<de::Option::<_>>")]
pub sender: Option<String>,
#[serde(with = "As::<de::Option::<_>>")]
pub receiver: Option<String>,
#[serde(with = "As::<de::Option::<_>>")]
pub amount: Option<BigInt>,
#[serde(with = "As::<de::Option::<_>>")]
pub buyer: Option<String>,
#[serde(with = "As::<de::Option::<_>>")]
pub seller: Option<String>,
#[serde(with = "As::<de::Option::<_>>")]
pub id: Option<BigInt>,
}
#[derive(Clone, Debug, Deserialize)]
pub struct State {
pub bank_state: BankState,
#[serde(with = "As::<de::Option::<_>>")]
pub error: Option<String>,
#[serde(rename = "mbt::actionTaken")]
pub action_taken: String,
#[serde(rename = "mbt::nondetPicks")]
pub nondet_picks: NondetPicks,
}
Inspecting one of the JSON files should be helpful in understanding which fields are there. You can use the ITF Trace Viewer VSCode extension to view a formatted version of them in VSCode, or use some JSON tool like jq
.
Write code to replay the trace
We then have to write the core of our driver. For each trace we generated, we try to replay it in the implementation. This means calling the function or series of functions that correspond to the action we took in the model (action_taken
) with the proper parameters, which will either be derived from the state or registered in nondet_picks
.
The "init"
action is a bit special, as it is always (and, in this case, only) called at the first state. So instead of initializing things inside the match statement, it is easier to do it before that, to not have to handle the possibility of bank_state
being undefined (None
). In this example, we don’t need to call anything from the implementation as both the code and the implementation always start in the same fixed state (where all balances are zero and there are no investments).
It’s also useful to add print statements (or any other logging) here so we can have enough information if something goes wrong.
#[test]
fn model_test() {
for i in 0..10000 {
println!("Trace #{}", i);
let data = fs::read_to_string(format!("traces/out{}.itf.json", i)).unwrap();
let trace: itf::Trace<State> = trace_from_str(data.as_str()).unwrap();
let mut bank_state = trace.states[0].value.bank_state.clone();
for state in trace.states {
let action_taken = state.value.action_taken;
let nondet_picks = state.value.nondet_picks;
match action_taken.as_str() {
"init" => {
println!("initializing");
}
"deposit_action" => {
let depositor = nondet_picks.depositor.clone().unwrap();
let amount = nondet_picks.amount.clone().unwrap();
println!("deposit({}, {})", depositor, amount);
let res = deposit(&mut bank_state, depositor, amount);
// TODO: check things
}
// ... other actions
}
}
}
}
Add assertions
Now we have many inputs to our implementation, which accumulate over several steps. This already brings some assurance as we could be hitting some panic!
statements or failures in code assertions. But we’ll have much more confidence by checking the outputs.
We can assert different things, for example:
- That the state (current balances and investments) in the implementation matches the one in the trace
- That the implementation function returns an error if and only if the trace also has an error registered for that state (i.e. “Could not withdraw because you don’t have enough balance”).
- That some property holds (i.e. no balances are negative).
For the bank example, we check (2) by defining this auxiliary function:
fn compare_error(trace_error: Option<String>, app_error: Option<String>) {
if trace_error.is_some() {
assert!(
app_error.is_some(),
"Expected action to fail with error: {:?}, but it succeeded",
trace_error
);
println!("Action failed as expected");
} else {
assert!(
app_error.is_none(),
"Expected action to succeed, but it failed with error: {:?}",
app_error
);
println!("Action successful as expected");
}
}
And then call it after each function invocation (all of them return a Option<String>
like deposit
):
"deposit_action" => {
// ...
let res = deposit(&mut bank_state, depositor, amount);
compare_error(state.value.error.clone(), res)
}
Running the complete example
The repository includes a shell script file that will both generate the traces for you and run cargo test
. The example includes a mismatch between model and implementation, where the implementation has a bug that allows a user to sell the same investment more than once, potentially draining funds from the bank. The model-based test we wrote identifies this issue. Assuming that the specification is thoroughly analyzed and/or verified, this in fact constitutes a bug in the implementation and not a protocol bug.
Read also: Informal System’s page on Model-Based Techniques: mbt.informal.systems .