Pinboard (jm)
https://pinboard.in/u:jm/public/
recent bookmarks from jmCombining static model checking with dynamic enforcement using the Statecall Policy Language2015-03-24T12:55:54+00:00
http://blog.acolyer.org/2015/03/23/combining-static-model-checking-with-dynamic-enforcement-using-the-statecall-policy-language/
jm01 automaton ping (int max_count, int count, bool can_timeout) {
02 Initialize;
03 during {
04 count = 0;
05 do {
06 Transmit_Ping;
07 either {
08 Receive_Ping;
09 } or (can_timeout) {
10 Timeout_Ping;
11 };
12 count = count + 1;
13 } until (count >= max_count);
14 } handle {
15 SIGINFO;
16 Print_Summary;
17 };
]]>ping model-checking models formal-methods verification static dynamic coding debugging testing distcomp papershttps://pinboard.in/https://pinboard.in/u:jm/b:9eb26fa27811/Use of Formal Methods at Amazon Web Services2014-06-17T13:33:47+00:00
http://research.microsoft.com/en-us/um/people/lamport/tla/formal-methods-amazon.pdf
jm
The success with DynamoDB gave us enough evidence to present TLA+ to the broader engineering community at Amazon. This raised a challenge; how to convey the purpose and benefits of formal methods to an audience of software engineers? Engineers think in terms of debugging rather than ‘verification’, so we called the presentation “Debugging Designs”.
Continuing that metaphor, we have found that software engineers more readily grasp the concept and practical value of TLA+ if we dub it 'Exhaustively-testable pseudo-code'.
We initially avoid the words ‘formal’, ‘verification’, and ‘proof’, due to the widespread view that formal methods are impractical. We also initially avoid mentioning what the acronym ‘TLA’ stands for, as doing so would give an incorrect impression of complexity.
More slides at http://tla2012.loria.fr/contributed/newcombe-slides.pdf ; proggit discussion at http://www.reddit.com/r/programming/comments/277fbh/use_of_formal_methods_at_amazon_web_services/]]>formal-methods model-checking tla tla+ programming distsys distcomp ebs s3 dynamodb aws ec2 marc-brooker chris-newcombehttps://pinboard.in/https://pinboard.in/u:jm/b:8758e4d5f996/Model checking for highly concurrent code2013-09-30T09:22:04+00:00
http://pl.atyp.us/2013-09-model-checking.html
jm
I'll use an example from my own recent experience. I'm developing a new kind of replication for GlusterFS. To make sure the protocol behaves correctly even across multiple failures, I developed a Murphi model for it. [...]
I added a third failure [to the simulated model]. I didn't expect a three-node system to continue working if more than one of those were concurrent (the model allows the failures to be any mix of sequential and concurrent), but I expected it to fail cleanly without reaching an invalid state. Surprise! It managed to produce a case where a reader can observe values that go back in time. This might not make much sense without knowing the protocol involved, but it might give some idea of the crazy conditions a model checker will find that you couldn't possibly have considered. [...]
So now I have a bug to fix, and that's a good thing. Clearly, it involves a very specific set of ill-timed reads, writes, and failures. Could I have found it by inspection or ad-hoc analysis? Hell, no. Could I have found it by testing on live systems? Maybe, eventually, but it probably would have taken months for this particular combination to occur on its own. Forcing it to occur would require a lot of extra code, plus an exerciser that would amount to a model checker running 100x slower across machines than Murphi does. With enough real deployments over enough time it would have happened, but the only feasible way to prevent that was with model checking. These are exactly the kinds of bugs that are hardest to fix in the field, and that make users distrust distributed systems, so those of us who build such systems should use every tool at our disposal to avoid them.
]]>model-checking formal-methods modelling murphi distcomp distributed-systems glusterfs testing protocolshttps://pinboard.in/https://pinboard.in/u:jm/b:e2d8ef692aa0/