From d7db2e3f8a90b1e6f3410037c778b50796c1395c Mon Sep 17 00:00:00 2001 From: Tim Peters <tim.one@comcast.net> Date: Sat, 1 Jun 2002 19:02:53 +0000 Subject: [PATCH] Clarified (I hope <wink>) BTREE_SEARCH's correctness proof. --- trunk/src/BTrees/Maintainer.txt | 56 ++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 25 deletions(-) diff --git a/trunk/src/BTrees/Maintainer.txt b/trunk/src/BTrees/Maintainer.txt index 34bfc612..db81c855 100644 --- a/trunk/src/BTrees/Maintainer.txt +++ b/trunk/src/BTrees/Maintainer.txt @@ -192,48 +192,54 @@ BTree nodes.) When searching for a key k, then, the child pointer we want to follow is the one at index i such that K(i) <= k < K(i+1). There can be -only one such i, since the keys are strictly increasing. And there is -at *least* one such i provided the tree isn't empty. For the moment, -assume the tree isn't empty (we'll get back to that later). +at most one such i, since the K(i) are strictly increasing. And there +is at least one such i provided the tree isn't empty (so that 0 < len). +For the moment, assume the tree isn't empty (we'll get back to that +later). The macro's chief loop invariant is K(lo) < k < K(hi) -This holds trivially at the start, since lo is set to 0 ahd hi to +This holds trivially at the start, since lo is set to 0, and hi to x->len, and we pretend K(0) is minus infinity and K(len) is plus infinity. Inside the loop, if K(i) < k we set lo to i, and if K(i) > k we set hi to i. These obviously preserve the invariant. If K(i) == k, the loop breaks and sets the result to i, and since K(i) == k in that case i is obviously the correct result. -What if the key isn't present? lo and hi move toward each other, -narrowing the range, until eventually lo+1 == hi. At that point, -i = (lo+hi)/2 = (lo+lo+1)/2 = lo + 1/2 = lo, so that: +Other cases depend on how i = floor((lo + hi)/2) works, exactly. +Suppose lo + d = hi for some d >= 0. Then i = floor((lo + lo + d)/2) = +floor(lo + d/2) = lo + floor(d/2). So: -1. The loop's "i > lo" test is false, so the loop ends then. +a. [d == 0] (lo == i == hi) if and only if (lo == hi). +b. [d == 1] (lo == i < hi) if and only if (lo+1 == hi). +c. [d > 1] (lo < i < hi) if and only if (lo+1 < hi). -and +If the node is empty (x->len == 0), then lo==i==hi==0 at the start, +and the loop exits immediately (the first "i > lo" test fails), +without entering the body. -2. The invariant still holds, so K(i) < k < K(i+1), and i is again - the correct answer. +Else lo < hi at the start, and the invariant K(lo) < k < K(hi) holds. -Can we get out of the loop too early? No: if hi = lo + d for some d -greater than 1, then i = (lo+lo+d)/2 = lo + d/2, and d/2 is at least 1 -since d is at least 2: i is strictly greater than lo then, and the -loop continues. +If lo+1 < hi, we're in case #c: i is strictly between lo and hi, +so the loop body is entered, and regardless of whether the body sets +the new lo or the new hi to i, the new lo is strictly less than the +new hi, and the difference between the new lo and new hi is strictly +less than the difference between the old lo and old hi. So long as +the new lo + 1 remains < the new hi, we stay in this case. We can't +stay in this case forever, though: because hi-lo decreases on each +trip but remains > 0, lo+1 == hi must eventually become true. (In +fact, it becomes true quickly, in about log2(x->len) trips; the +point is more that lo doesn't equal hi when the loop ends, it has to +end with lo+1==hi and i==lo). -Can lo==hi? Yes, but only if the node is empty. Then i, lo and hi -all start out as 0, and the loop exits immediately. If the loop -isn't empty, then lo and hi start out with different values. Whenever -lo and hi have different values, lo <= (lo + hi)/2 < hi, so i and lo -are strictly smaller than hi, so setting either lo or hi to i leaves -the new lo strictly smaller than the new hi. +Then we're in case #b: i==lo==hi-1 then, and the loop exits. The +invariant still holds, with lo==i and hi==lo+1==i+1: -Can the loop fail to terminate? No: by the above, when lo < hi-1, -lo < i=(lo+hi)/2 < hi, so setting either lo or hi to i leaves the -new lo and hi strictly closer to each other than were the old lo and -hi. + K(i) < k < K(i+1) + +so i is again the correct answer. Optimization points: -- 2.30.9