<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>
myOffice Email Message
</title>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
<meta name="date" content="2002-11-01">
<style type="text/css">
</style>
</head>
<body>
<span style=
"color:#FF0000 "><b><span style=
"font-family:MS Sans Serif ">[Reply]</span></b></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#000000 ">HI all,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#000000 ">This reminds me of the industrial microcontrollers we designed for highly unstable power environments that were continually reset every 500mS (or so) by an external watchdog timer. Slightly different approach, but very reliable.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#000000 ">cheers</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#000000 ">Gary</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#000000 ">A</span><span style=
"color:#FF0000 "><b>t 15:25 on 24/07/2006 you wrote </b></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>To : delphi@ns3.123.co.nz</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>CC : </span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>From: Neven MacEwan, neven@mwk.co.nz</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Content Type: text/plain</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Attached: neven.vcf</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>This is a multi-part message in MIME format.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>With all this outpouring of Try Finally/Except angst I thought this </span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>pearl might be appropo</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Crash-only software: More than meets the eye</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>July 12, 2006</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>This article was contributed by Valerie Henson</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Next time your Linux laptop crashes, pull out your watch (or your cell</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>phone) and time how long it takes to boot up. More than likely, you're</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>running a journaling file system, and not only did your system boot up</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>quickly, but it didn't lose any data that you cared about. (Maybe you</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>lost the last few bytes of your DHCP client's log file, darn.) Now, keep</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>your timekeeping device of choice handy and execute a normal shutdown</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>and reboot. More than likely, you will find that it took longer to</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>reboot "normally" than it did to crash your system and recover it - and</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>for no perceivable benefit.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>George Candea and Armando Fox noticed that, counter-intuitively, many</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>software systems can crash and recover more quickly than they can be</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>shutdown and restarted. They reported the following measurements in</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>their paper, Crash-only Software</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">><http://www.usenix.org/events/hotos03/tech/candea.html> (published in</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Hot Topics in Operating Systems IX</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">><http://www.usenix.org/events/hotos03/> in 2003):</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">> System Clean reboot Crash reboot Speedup</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">> RedHat 8 (ext3) 104 sec 75 sec 1.4x</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">> JBoss 3.0 app server 47 sec 39 sec 1.2x</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">> Windows XP 61 sec 48 sec 1.3x</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>In their experiments, no important data was lost. This is not surprising</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>as, after all, good software is designed to safely handle crashes.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Software that loses or ruins your data when it crashes isn't very</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>popular in today's computing environment - remember how frustrating it</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>was to use word processors without an auto-save feature? What is</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>surprising is that most systems have two methods of shutting down -</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>cleanly or by crashing - and two methods of starting up - normal start</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>up or recovery - and that frequently the crash/recover method is, by all</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>objective measures, a better choice. Given this, why support the extra</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>code (and associated bugs) to do a clean start up and shutdown? In other</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>words, why should I ever type "halt" instead of hitting the power button?</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>The main reason to support explicit shutdown and start-up is simple:</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>performance. Often, designers must trade off higher steady state</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>performance (when the application is running normally) with performance</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>during a restart - and with acceptable data loss. File systems are a</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>good example of this trade-off: ext2 runs very quickly while in use but</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>takes a long time to recover and makes no guarantees about when data</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>hits disk, while ext3 has somewhat lower performance while in use but is</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>very quick to recover and makes explicit guarantees about when data hits</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>disk. When overall system availability and acceptable data loss in the</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>event of a crash are factored into the performance equation, ext3 or any</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>other journaling file system is the winner for many systems, including,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>more than likely, the laptop you are using to read this article.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Crash-only software is software that crashes safely and recovers</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>quickly. The only way to stop it is to crash it, and the only way to</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>start it is to recover. A crash-only system is composed of crash-only</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>components which communicate with retryable requests; faults are handled</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>by crashing and restarting the faulty component and retrying any</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>requests which have timed out. The resulting system is often more robust</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>and reliable because crash recovery is a first-class citizen in the</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>development process, rather than an afterthought, and you no longer need</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>the extra code (and associated interfaces and bugs) for explicit</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>shutdown. All software ought to be able to crash safely and recover</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>quickly, but crash-only software must have these qualities, or their</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>lack becomes quickly evident.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>The concept of crash-only software has received quite a lot of attention</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>since its publication. Besides several well-received research papers</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>demonstrating useful implementations of crash-only software, crash-only</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>software has been covered in several popular articles in publications as</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>diverse as Scientific American, Salon.com, and CIO Today. It was cited</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>as one of the reasons Armando Fox was named one of Scientific American's</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>list of top 50 scientists for 2003 and George Candea as one of MIT</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Technology Review's Top 35 Young Innovators for 2005. Crash-only</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>software has made its mark outside the press room as well; for example,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Google's distributed file system, GoogleFS, is implemented as crash-only</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>software, all the way through to the metadata server. The term</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>"crash-only" is now regularly bandied about in design discussions for</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>production software. I myself wrote a blog entry on crash-only software</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">><http://blogs.sun.com/roller/page/val?entry=is_b_your_b_software> back</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>in 2004. Why bother writing about it again? Quite simply, the crash-only</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>software meme became so popular that, inevitably, mutations arose and</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>flourished, sometimes to the detriment of allegedly crash-only software</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>systems. In this article, we will review some of the more common</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>misunderstandings about designing and implementing crash-only software.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">> Misconceptions about crash-only software</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>The first major misunderstanding is that crash-only software is a form</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>of free lunch: you can be lazy and not write shutdown code, not handle</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>errors (just crash it! whee!), or not save state. Just pull up your</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>favorite application in an editor, delete the code for normal start up</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>and shutdown, and voila! instant crash-only software. In fact,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>crash-only software involves greater discipline and more careful design,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>because if your checkpointing and recovery code doesn't work, you will</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>find out right away. Crash-only design helps you produce more robust,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>reliable software, it doesn't exempt you from writing robust, reliable</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>software in the first place.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Another mistake is overuse of the crash/restart "hammer." One of the</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>ideas in crash-only software is that if a component is behaving</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>strangely or suffering some bug, you can just crash it and restart it,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>and more than likely it will start functioning again. This will often be</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>faster than diagnosing and fixing the problem by hand, and so a good</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>technique for high-availability services. Some programmers overuse the</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>technique by deliberately writing code to crash the program whenever</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>something goes wrong, when the correct solution is to handle all the</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>errors you can think of correctly, and then rely on crash/restart for</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>unforeseen error conditions. Another overuse of crash/restart is that</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>when things go wrong, you should crash and restart the whole system. One</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>tenet of crash-only /system/ design is the idea that crash/restart is</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>cheap - because you are only crashing and recovering small,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>self-contained parts of the system (see the paper on microreboots)</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">><http://www.usenix.org/events/osdi04/tech/candea.html>. Try telling your</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>users that your whole web browser crashes and restarts every 2 minutes</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>because it is crash-only software and see how well that goes over. If</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>instead the browser quietly crashes and recovers only the thread that is</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>misbehaving you will have much happier users.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>On the face of it, the simplest part of crash-only software would be</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>implementing the "crash" part. How hard is it to hit the power button?</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>There is a subtle implementation point that is easy to miss, though: the</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>crash mechanism has to be entirely outside and independent of the</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>crash-only system - hardware power switch, kill -9, shutting down the</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>virtual machine. If it is implemented through internal code, it takes</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>away a valuable part of crash-only software: that you have an</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>all-powerful, reliable method to take any misbehaving component of the</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>system and crash/restart it into a known state.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>I heard of one "crash-only" system in which the shutdown code was</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>replaced with an abort() system call as part of a "crash-only" design.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>There were two problems with this approach. One, it relied on the system</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>to not have any bugs in the code path leading to the abort() system call</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>or any deadlocks which would prevent it being executed. Two, shutting</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>down the system in this manner only exercised a subset of the total</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>possible crash space, since it was only testing what happened when the</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>system successfully received and handled a request to shutdown. For</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>example, a single-threaded program that handled requests in an event</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>loop would never be crashed in the middle of handling another request,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>and so the recovery code would not be tested for this case. One more</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>example of a badly implemented "crash" is a database that, when it ran</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>out of disk space for its event logging, could not be safely shut down</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>because it wanted to write a log entry before shutting down, but it was</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>out of disk space, so...</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Another common pattern is to ignore the trade-offs of performance vs.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>recovery time vs. reliability and take an absolutist approach to</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>optimizing for one quality while maintaining superficial allegiance to</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>crash-only design. The major trade-off is that checkpointing your</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>application's state improves recovery time and reliability but reduces</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>steady state performance. The two extremes are checkpointing or saving</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>state far too often and checkpointing not at all; like Goldilocks, you</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>need to find the checkpoint frequency that is Just Right for your</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>application.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>What frequency of checkpointing will give you acceptable recovery time,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>acceptable performance, and acceptable data loss? I once used a web</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>browser which only saved preferences and browsing history on a clean</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>shutdown of the browser. Saving the history every millisecond is clearly</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>overkill, but saving changed items every minute would be quite</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>reasonable. The chosen strategy, "save only on shutdown," turned out to</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>be equivalent to "save never" - how often do people close their</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>browsers, compared to how often they crash? I ended up solving this</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>problem by explicitly starting up the browser for the sole purpose of</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>changing the settings and immediately closing it again after the third</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>or fourth time I lost my settings. (This is good example of how all</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>software should be written to crash safely but does not.) Most</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>implementations of bash I have used take the same approach to saving the</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>command history; as a result I now explicitly "exit" out of running</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>shells (all 13 or so of them) whenever I shut down my computer so I</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>don't lose my command history.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Shutdown code should be viewed as, fundamentally, only of use to</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>optimize the next start up sequence and should not be used to do</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>anything required for correctness. One way to approach shutdown code is</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>to add a big comment at the top of the code saying "WISHFUL THINKING:</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>This code may never be executed. But it sure would be nice."</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Another class of misunderstanding is about what kind of systems are</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>suitable for crash-only design. Some people think crash-only software</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>must be stateless, since any part of the system might crash and restart,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>and lose any uncommitted state in the process. While this means you must</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>carefully distinguish between volatile and non-volatile state, it</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>certainly doesn't mean your system must be stateless! Crash-only</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>software only says that any non-volatile state your system needs must</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>itself be stored in a crash-only system, such as a database or session</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>state store. Usually, it is far easier to use a special purpose system</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>to store state, rather than rolling your own. Writing a crash-safe,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>quick-recovery state store is an extremely difficult task and should be</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>left to the experts (and will make your system easier to implement).</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Crash-only software makes explicit the trade-off between optimizing for</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>steady-state performance and optimizing for recovery. Sometimes this is</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>taken to mean that you can't use crash-only design for high performance</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>systems. As usual, it depends on your system, but many systems suffer</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>bugs and crashes often enough that crash-only design is a win when you</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>consider overall up time and performance, rather than performance only</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>when the system is up and running. Perhaps your system is robust enough</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>that you can optimize for steady state performance and disregard</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>recovery time... but it's unlikely.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Because it must be possible to crash and restart components, some people</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>think that a multi-threaded system using locks can't be crash-only -</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>after all, what happens if you crash while holding a lock? The answer is</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>that locks can be used inside a crash-only component, but all interfaces</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>between components need to allow for the unexpected crash of components.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Interfaces between components need to strongly enforce fault boundaries,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>put timeouts on all requests, and carefully formulate requests so that</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>they don't rely on uncommitted state that could be lost. As an example,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>consider how the recently-merged robust futex facility</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">><http://lwn.net/Articles/172149/> makes crash recovery explicit.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Some people end up with the impression that crash-only software is less</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>reliable and unsuitable for important "mission-critical" applications</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>because the design explicitly admits that crashes are inevitable.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Crash-only software is actually more reliable because it takes into</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>account from the beginning an unavoidable fact of computing - unexpected</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>crashes.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>A criticism often leveled at systems designed to improve reliability by</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>handling errors in some way other than complete system crash is that</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>they will hide or encourage software bugs by masking their effects.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>First, crash-only software in many ways exposes previously hidden bugs,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>by explicitly testing recovery code in normal use. Second, explicitly</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>crashing and restarting components as a workaround for bugs does not</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>preclude taking a crash dump or otherwise recording data that can be</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>used to solve the bug.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>How can we apply crash-only design to operating systems? One example is</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>file systems, and the design of chunkfs (discussed in last week's LWN</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>article on the 2006 Linux file systems workshop</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">><http://lwn.net/Articles/190222/> and in more detail here</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">><http://www.fenrus.org/chunkfs.txt>). We are trying to improve</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>reliability and data availability by separating the on-disk data into</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>individually checkable components with strong fault isolation. Each</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>chunk must be able to be individually "crashed" - unmounted - and</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>recovered - fsck'd - without bringing down the other chunks. The code</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>itself must be designed to allow the failure of individual chunks</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>without holding locks or other resources indefinitely, which could cause</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>system-wide deadlocks and unavailability. Updates within each chunk must</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>be crash-safe and quickly recoverable. Splitting the file system up into</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>smaller, restartable, crash-only components creates a more reliable,</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>easier to repair crash-only system.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">> The conclusion</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Properly implemented, crash-only software produces higher quality, more</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>reliable code; poorly understood it results in lazy programming.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Probably the most common misconception is the idea that writing</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>crash-only software is that it allows you to take shortcuts when writing</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>and designing your code. Wake up, Sleeping Beauty, there ain't no such</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>thing as a free lunch. But you can get a more reliable, easier to debug</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>system if you rigorously apply the principles of crash-only design.</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>_______________________________________________</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Delphi mailing list</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>Delphi@ns3.123.co.nz</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">>http://ns3.123.co.nz/mailman/listinfo/delphi</span></span>
<p>
<span style=
"font-family:MS Sans Serif "><span style=
"color:#008000 ">></span></span><p>
<font face=arial size = 1 color = Navy><DIV style="WIDTH: 260px; HEIGHT: 50px"><MARQUEE id=marquee1 style="WIDTH: 260px; HEIGHT: 200px" trueSpeed scrollAmount=5 scrollDelay=20 direction=right behavior=slide loop=1 border="2"><hr><table><tr><td><FONT color=black size=4 face = "helvetica,verdana,arial">Gary Benner </FONT></td></tr><tr><td><FONT face="arial, arial, helvetica, sans-serif" color=black size=2>e-Engineer, Lecturer, and Software Developer</FONT></td></tr><br>
<tr><td bgcolor=><FONT face="arial, arial, helvetica, sans-serif" color=#000099 size=2><B><A HREF="http://www.123.co.nz" style="text-decoration:none; color:blue">123 Internet Limited</A></B></FONT></td></tr><tr><td bgcolor=><FONT face="arial, arial, helvetica, sans-serif" color=#000099 size=2><B><A HREF="http://www.waiariki.ac.nz" style="text-decoration:none; color:#993333">Waiariki Institute of Technology</A></B></FONT></td></tr><tr><td bgcolor=><FONT face="arial, arial, helvetica, sans-serif" color=#CECE00 size=2><B><A HREF="http://www.sunshinebags.co.nz" style="text-decoration:none; color:#CECE00">Sunshine Garden Bag Co.</A></B></FONT></td></tr><tr><td bgcolor=><FONT face="arial, arial, helvetica, sans-serif" color=red size=2><B><A HREF="http://www.sommnet.com" style="text-decoration:none; color:red" >Sommnet.com Limited</A></B></FONT></td></tr><tr><td><font face = 22helvetica,verdana,arial" size = 1>Mob: 021 966 992</font></td></tr><tr><td><font face = "helvetica,verdana,arial" size = 1>Email: <A href="mailto:gary@123.co.nz" >gary@123.co.nz</A> </font></td></tr></table></MARQUEE></DIV><br>
<br>
Ref#: 41006<br>
<br>
</body>
</html>