Commit 80153aa5 authored by Kirill Smelkov's avatar Kirill Smelkov

wcfs: xbtree: BTree-diff algorithm

This algorithm will be internally used by ΔBtail in the next patch.

The algorithm would be simple, if we would need to diff two trees
completely. However in ΔBtail only subpart of BTree nodes are tracked(*)
and the diff has to work modulo that tracking set.

No tests now because ΔBtail tests will cover treediff functionality as well.

Some preliminary history:

78f2f88b    X wcfs/xbtree: Fix treediff(a, ø)
5324547c    X wcfs/xbtree: root(a) must stay in trackSet even after treediff(a,ø)
f65f775b    X wcfs/xbtree: treediff(ø, b)
c75b1c6f    X wcfs/xbtree: Start killing holeIdx
ef5e5183    X treediff ret += δtkeycov
9d20f8e8    X treediff: Fix BUG while computing AB coverage
ddb28043    X rebuild: Don't return nil for empty ΔPPTreeSubSet - that leads to SIGSEGV
f68398c9    X wcfs: Move treediff into its own file

(*) because full BTree scan is needed to discover all of its nodes.

Quoting treediff documentation:

---- 8< ----

treediff provides diff for BTrees

Use δZConnectTracked + treediff to compute BTree-diff caused by δZ:

    δZConnectTracked(δZ, trackSet)                         -> δZTC, δtopsByRoot
    treediff(root, δtops, δZTC, trackSet, zconn{Old,New})  -> δT, δtrack, δtkeycov

δZConnectTracked computes BTree-connected closure of δZ modulo tracked set
and also returns δtopsByRoot to indicate which tree objects were changed and
in which subtree parts. With that information one can call treediff for each
changed root to compute BTree-diff and δ for trackSet itself.

BTree diff algorithm

diffT, diffB and δMerge constitute the diff algorithm implementation.
diff(A,B) works on pair of A and B whole key ranges splitted into regions
covered by tree nodes. The splitting represents current state of recursion
into corresponding tree. If a node in particular key range is Bucket, that
bucket contributes to δ- in case of A, and to δ+ in case of B. If a node in
particular key range is Tree, the algorithm may want to expand that tree
node into its children and to recourse into some of the children.

There are two phases:

- Phase 1 expands A top->down driven by δZTC, adds reached buckets to δ-,
  and queues key regions of those buckets to be processed on B.

- Phase 2 starts processing from queued key regions, expands them on B and
  adds reached buckets to δ+. Then it iterates to reach consistency in between
  A and B because processing buckets on B side may increase δ key coverage,
  and so corresponding key ranges has to be again processed on A. Which in
  turn may increase δ key coverage again, and needs to be processed on B side,
  etc...

The final δ is merge of δ- and δ+.

diffT has more detailed explanation of phase 1 and phase 2 logic.
parent 27df5a3b
// Copyright (C) 2018-2021 Nexedi SA and Contributors.
// Kirill Smelkov <kirr@nexedi.com>
//
// This program is free software: you can Use, Study, Modify and Redistribute
// it under the terms of the GNU General Public License version 3, or (at your
// option) any later version, as published by the Free Software Foundation.
//
// You can also Link and Combine this program with other software covered by
// the terms of any of the Free Software licenses or any of the Open Source
// Initiative approved licenses and Convey the resulting work. Corresponding
// source of such a combination shall include the source code for all other
// software used.
//
// This program is distributed WITHOUT ANY WARRANTY; without even the implied
// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
//
// See COPYING file for full licensing terms.
// See https://www.nexedi.com/licensing for rationale and options.
package xbtree
// treediff provides diff for BTrees
//
// Use δZConnectTracked + treediff to compute BTree-diff caused by δZ:
//
// δZConnectTracked(δZ, trackSet) -> δZTC, δtopsByRoot
// treediff(root, δtops, δZTC, trackSet, zconn{Old,New}) -> δT, δtrack, δtkeycov
//
// δZConnectTracked computes BTree-connected closure of δZ modulo tracked set
// and also returns δtopsByRoot to indicate which tree objects were changed and
// in which subtree parts. With that information one can call treediff for each
// changed root to compute BTree-diff and δ for trackSet itself.
//
//
// BTree diff algorithm
//
// diffT, diffB and δMerge constitute the diff algorithm implementation.
// diff(A,B) works on pair of A and B whole key ranges splitted into regions
// covered by tree nodes. The splitting represents current state of recursion
// into corresponding tree. If a node in particular key range is Bucket, that
// bucket contributes to δ- in case of A, and to δ+ in case of B. If a node in
// particular key range is Tree, the algorithm may want to expand that tree
// node into its children and to recourse into some of the children.
//
// There are two phases:
//
// - Phase 1 expands A top->down driven by δZTC, adds reached buckets to δ-,
// and queues key regions of those buckets to be processed on B.
//
// - Phase 2 starts processing from queued key regions, expands them on B and
// adds reached buckets to δ+. Then it iterates to reach consistency in between
// A and B because processing buckets on B side may increase δ key coverage,
// and so corresponding key ranges has to be again processed on A. Which in
// turn may increase δ key coverage again, and needs to be processed on B side,
// etc...
//
// The final δ is merge of δ- and δ+.
//
// diffT has more detailed explanation of phase 1 and phase 2 logic.
import (
"context"
"fmt"
"reflect"
"sort"
"lab.nexedi.com/kirr/go123/xerr"
"lab.nexedi.com/kirr/neo/go/zodb"
"lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/xbtree/blib"
"lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/xzodb"
)
const traceDiff = false
const debugDiff = false // topoview from xbtree.py is also handy
// ΔValue represents change in value.
type ΔValue struct {
Old Value
New Value
}
// String is like default %v, but uses ø for VDEL.
func (δv ΔValue) String() string {
old, new := "ø", "ø"
if δv.Old != VDEL {
old = δv.Old.String()
}
if δv.New != VDEL {
new = δv.New.String()
}
return fmt.Sprintf("{%s %s}", old, new)
}
// δZConnectTracked computes connected closure of δZ/T.
//
// δZ - all changes in a ZODB transaction.
// δZ/T - subset of those changes intersecting with tracking set.
// δZ/TC - connected closure for δZ/T
//
// for example for e.g. t₀->t₁->b₂ if δZ/T={t₀ b₂} -> δZ/TC=δZ/T+{t₁}
//
// δtopsByRoot = {} root -> {top changed nodes in that tree}
func δZConnectTracked(δZv []zodb.Oid, T blib.PPTreeSubSet) (δZTC setOid, δtopsByRoot map[zodb.Oid]setOid) {
δZ := setOid{}; for _, δ := range δZv { δZ.Add(δ) }
δZTC = setOid{}
δtopsByRoot = map[zodb.Oid]setOid{}
for δ := range δZ {
track, ok := T[δ]
if !ok {
continue // not tracked at all
}
δZTC.Add(δ)
// go up by .parent till root or till another tracked node in the tree
// if root -> δtopsByRoot[root] += δ
// if !root -> δZTC += path through which we reached another node (forming connection)
path := []zodb.Oid{}
node := δ
parent := track.Parent()
for {
// reached root
if parent == zodb.InvalidOid {
root := node
δtops, ok := δtopsByRoot[root]
if !ok {
δtops = setOid{}
δtopsByRoot[root] = δtops
}
δtops.Add(δ)
break
}
// reached another tracked node
if δZ.Has(parent) {
for _, δp := range path {
δZTC.Add(δp)
}
break
}
path = append(path, parent)
trackUp, ok := T[parent]
if !ok {
panicf("BUG: .p%s -> %s, but %s is not tracked", node, parent, parent)
}
node = parent
parent = trackUp.Parent()
}
}
return δZTC, δtopsByRoot
}
// treediff computes δT/δtrack/δtkeycov for tree/trackSet specified by root in between old..new.
//
// δtops is set of top nodes for changed subtrees.
// δZTC is connected(δZ/T) - connected closure for subset of δZ(old..new) that
// touches tracked nodes of T.
//
// Use δZConnectTracked to prepare δtops and δZTC.
func treediff(ctx context.Context, root zodb.Oid, δtops setOid, δZTC setOid, trackSet blib.PPTreeSubSet, zconnOld, zconnNew *zodb.Connection) (δT map[Key]ΔValue, δtrack *blib.ΔPPTreeSubSet, δtkeycov *blib.RangedKeySet, err error) {
defer xerr.Contextf(&err, "treediff %s..%s %s", zconnOld.At(), zconnNew.At(), root)
δT = map[Key]ΔValue{}
δtrack = blib.NewΔPPTreeSubSet()
δtkeycov = &blib.RangedKeySet{}
tracefDiff("\ntreediff %s δtops: %v δZTC: %v\n", root, δtops, δZTC)
tracefDiff(" trackSet: %v\n", trackSet)
defer tracefDiff("\n-> δT: %v\nδtrack: %v\nδtkeycov: %v\n", δT, δtrack, δtkeycov)
δtrackv := []*blib.ΔPPTreeSubSet{}
for top := range δtops {
a, err1 := zgetNodeOrNil(ctx, zconnOld, top)
b, err2 := zgetNodeOrNil(ctx, zconnNew, top)
err := xerr.Merge(err1, err2)
if err != nil {
return nil, nil, nil, err
}
δtop, δtrackTop, δtkeycovTop, err := diffX(ctx, a, b, δZTC, trackSet)
if err != nil {
return nil, nil, nil, err
}
debugfDiff("-> δtop: %v\n", δtop)
debugfDiff("-> δtrackTop: %v\n", δtrackTop)
debugfDiff("-> δtkeycovTop: %v\n", δtkeycovTop)
for k,δv := range δtop {
// NOTE keys cannot migrate in between two disconnected subtrees
δv_, kdup := δT[k]
if kdup {
panicf("BUG: key %v present in two disconnected subtrees; δv1: %s δv2: %s", k, δv_, δv)
}
δT[k] = δv
}
δtrackv = append(δtrackv, δtrackTop)
δtkeycov.UnionInplace(δtkeycovTop)
}
// trackSet should be adjusted by merge(δtrackTops)
for _, δ := range δtrackv {
δtrack.Update(δ)
}
return δT, δtrack, δtkeycov, nil
}
// diffX computes difference in between two revisions of a tree's subtree.
//
// a, b point to top of the subtree @old and @new revisions and must be of the
// same type - tree or bucket.
//
// δZTC is connected set of objects covering δZT (objects changed in this tree in old..new).
//
// a/b can be nil; a=nil means addition, b=nil means deletion.
//
//
// δtrack is trackSet δ that needs to be applied to trackSet to keep it
// consistent with b (= a + δ).
//
// δtkeycov represents how δtrack grows (always grows) tracking set key coverage.
func diffX(ctx context.Context, a, b Node, δZTC setOid, trackSet blib.PPTreeSubSet) (δ map[Key]ΔValue, δtrack *blib.ΔPPTreeSubSet, δtkeycov *blib.RangedKeySet, err error) {
if a==nil && b==nil {
// DEL..DEL -> ø diff
return map[Key]ΔValue{}, blib.NewΔPPTreeSubSet(), &blib.RangedKeySet{}, nil
}
var aT, bT *Tree
var aB, bB *Bucket
isT := false
if a != nil {
aT, isT = a.(*Tree)
aB, _ = a.(*Bucket)
if aT == nil && aB == nil {
panicf("a: bad type %T", a)
}
}
if b != nil {
bT, isT = b.(*Tree)
bB, _ = b.(*Bucket)
if bT == nil && bB == nil {
panicf("b: bad type %T", b)
}
}
if a != nil && b != nil {
if a.POid() != b.POid() {
panicf("BUG: a.oid != b.oid ; a: %s b: %s", a.POid(), b.POid())
}
if !((aT != nil && bT != nil) || (aB != nil && bB != nil)) {
return nil, nil, nil, fmt.Errorf("object %s: type mutated %s -> %s", a.POid(),
zodb.ClassOf(a), zodb.ClassOf(b))
}
}
if isT {
return diffT(ctx, aT, bT, δZTC, trackSet)
} else {
var δtrack *blib.ΔPPTreeSubSet
δ, err := diffB(ctx, aB, bB)
if δ != nil {
δtrack = blib.NewΔPPTreeSubSet()
δtkeycov = &blib.RangedKeySet{}
}
return δ, δtrack, δtkeycov, err
}
}
// ---- diff algorithm ----
// nodeInRange represents a Node coming under [lo, hi_] key range in its tree.
//
// The following operations are provided:
//
// Path() -> []oid - get full path to this node.
type nodeInRange struct {
prefix []zodb.Oid // path to this node goes via this objects
keycov blib.KeyRange // key coverage
node Node
done bool // whether this node was already taken into account while computing diff
}
// rangeSplit represents set of nodes covering a range.
// nodes come with key↑ and no intersection in between their [lo,hi)
//
// The following operations are provided:
//
// Get(key) -> node - get node covering key
// Expand(node) -> children - replace node with its children
// GetToLeaf(key) -> leaf - get/expand to leaf node that covers key
type rangeSplit []*nodeInRange // key↑
// diffT computes difference in between two subtrees.
//
// a, b point to top of subtrees @old and @new revisions.
// δZTC is connected set of objects covering δZT (objects changed in this tree in old..new).
func diffT(ctx context.Context, A, B *Tree, δZTC setOid, trackSet blib.PPTreeSubSet) (δ map[Key]ΔValue, δtrack *blib.ΔPPTreeSubSet, δtkeycov *blib.RangedKeySet, err error) {
tracefDiff(" diffT %s %s\n", xzodb.XidOf(A), xzodb.XidOf(B))
defer xerr.Contextf(&err, "diffT %s %s", xzodb.XidOf(A), xzodb.XidOf(B))
δ = map[Key]ΔValue{}
δtrack = blib.NewΔPPTreeSubSet()
δtkeycov = &blib.RangedKeySet{}
defer func() {
tracefDiff(" -> δ: %v\n", δ)
tracefDiff(" -> δtrack: %v\n", δtrack)
tracefDiff(" -> δtkeycov: %v\n", δtkeycov)
}()
if A == nil && B == nil {
return δ, δtrack, δtkeycov, nil // ø changes
}
// assert A.oid == B.oid
if A != nil && B != nil {
Aoid := A.POid()
Boid := B.POid()
if Aoid != Boid {
panicf("A.oid (%s) != B.oid (%s)", Aoid, Boid)
}
}
var ABoid zodb.Oid
var AB *Tree
if A != nil {
ABoid = A.POid()
AB = A
}
if B != nil {
ABoid = B.POid()
AB = B
}
// path prefix to A and B
ABpath := trackSet.Path(ABoid)
// key coverage for A and B
ABlo := KeyMin
ABhi_ := KeyMax
node := AB
ABcov:
for i := len(ABpath)-2; i >= 0; i-- {
xparent, err := node.PJar().Get(ctx, ABpath[i]); /*X*/if err != nil { return nil,nil,nil, err }
err = xparent.PActivate(ctx); /*X*/if err != nil { return nil,nil,nil, err}
defer xparent.PDeactivate()
parent := xparent.(*Tree) // must succeed
// find node in parent children and constrain ABlo/ABhi accordingly
entryv := parent.Entryv()
for j, entry := range entryv {
if entry.Child() == node {
// parent.entry[j] points to node
// [i].Key ≤ [i].Child.*.Key < [i+1].Key
klo := entryv[j].Key()
khi_ := KeyMax
if j+1 < len(entryv) {
khi_ = entryv[j+1].Key() - 1
}
if klo > ABlo {
ABlo = klo
}
if khi_ < ABhi_ {
ABhi_ = khi_
}
node = parent
continue ABcov
}
}
emsg := fmt.Sprintf("BUG: T%s points to T%s as parent in trackSet, but not found in T%s children\n", node.POid(), parent.POid(), parent.POid())
children := []string{}
for _, entry := range entryv {
children = append(children, vnode(entry.Child()))
}
emsg += fmt.Sprintf("T%s children: %v\n", parent.POid(), children)
emsg += fmt.Sprintf("trackSet: %s\n", trackSet)
panic(emsg)
}
if A == nil || B == nil {
// top of the subtree must stay in the tracking set even if the subtree is removed
// this way, if later, the subtree will be recreated, that change won't be missed
δtrack.Del.AddPath(ABpath)
δtrack.Add.AddPath(ABpath)
// δtkeycov stays ø
}
// A|B == nil -> artificial empty tree
if A == nil {
A = zodb.NewPersistent(reflect.TypeOf(Tree{}), /*jar*/nil).(*Tree)
}
Bempty := false
if B == nil {
B = zodb.NewPersistent(reflect.TypeOf(Tree{}), /*jar*/nil).(*Tree)
Bempty = true
}
// initial split ranges for A and B
ABcov := blib.KeyRange{ABlo, ABhi_}
prefix := ABpath[:len(ABpath)-1]
atop := &nodeInRange{prefix: prefix, keycov: ABcov, node: A}
btop := &nodeInRange{prefix: prefix, keycov: ABcov, node: B}
Av := rangeSplit{atop} // nodes expanded from A
Bv := rangeSplit{btop} // nodes expanded from B
// for phase 2:
Akqueue := &blib.RangedKeySet{} // queue for keys in A to be processed for δ-
Bkqueue := &blib.RangedKeySet{} // ----//---- in B for δ+
Akdone := &blib.RangedKeySet{} // already processed keys in A
Bkdone := &blib.RangedKeySet{} // ----//---- in B
Aktodo := func(r blib.KeyRange) {
if !Akdone.HasRange(r) {
δtodo := &blib.RangedKeySet{}
δtodo.AddRange(r)
δtodo.DifferenceInplace(Akdone)
debugfDiff(" Akq <- %s\n", δtodo)
Akqueue.UnionInplace(δtodo)
}
}
Bktodo := func(r blib.KeyRange) {
if !Bkdone.HasRange(r) {
δtodo := &blib.RangedKeySet{}
δtodo.AddRange(r)
δtodo.DifferenceInplace(Bkdone)
debugfDiff(" Bkq <- %s\n", δtodo)
Bkqueue.UnionInplace(δtodo)
}
}
// {} oid -> nodeInRange for all nodes we've came through in Bv:
// current and previously expanded - up till top B.
BnodeIdx := map[zodb.Oid]*nodeInRange{}
if !Bempty {
BnodeIdx[ABoid] = btop
}
// δtkeycov will be = BAdd \ ADel
δtkeycovADel := &blib.RangedKeySet{}
δtkeycovBAdd := &blib.RangedKeySet{}
// phase 1: expand A top->down driven by δZTC.
// by default a node contributes to δ-
// a node ac does not contribute to δ- and can be skipped, if:
// - ac is not tracked, or
// - ac ∉ δZTC && ∃ bc from B: ac.oid == bc.oid && ac.keycov == bc.keycov
// (ac+ac.children were not changed, ac stays in the tree with the same key range coverage)
Aq := []*nodeInRange{atop} // queue for A nodes that contribute to δ-
for len(Aq) > 0 {
debugfDiff("\n")
debugfDiff(" aq: %v\n", Aq)
debugfDiff(" av: %s\n", Av)
debugfDiff(" bv: %s\n", Bv)
ra := pop(&Aq)
err = ra.node.PActivate(ctx); /*X*/if err != nil { return nil,nil,nil, err }
defer ra.node.PDeactivate()
debugfDiff(" a: %s\n", ra)
switch a := ra.node.(type) {
case *Bucket:
// a is bucket -> δ-
δA, err := diffB(ctx, a, nil); /*X*/if err != nil { return nil,nil,nil, err }
err = δMerge(δ, δA); /*X*/if err != nil { return nil,nil,nil, err }
δtrack.Del.AddPath(ra.Path())
δtkeycovADel.AddRange(ra.keycov)
debugfDiff(" δtrack - %s %v\n", ra.keycov, ra.Path())
// Bkqueue <- ra.range
Bktodo(ra.keycov)
ra.done = true
case *Tree:
// empty tree - queue holes covered by it
if len(a.Entryv()) == 0 {
δtrack.Del.AddPath(ra.Path())
δtkeycovADel.AddRange(ra.keycov)
debugfDiff(" δtrack - %s %v\n", ra.keycov, ra.Path())
Bktodo(ra.keycov)
continue
}
// a is !empty tree - expand it and queue children
// check for each children whether it can be skipped
achildren := Av.Expand(ra)
for _, ac := range achildren {
acOid := ac.node.POid()
at, tracked := trackSet[acOid]
if !tracked && /*cannot skip embedded bucket:*/acOid != zodb.InvalidOid {
continue
}
if !δZTC.Has(acOid) && /*cannot skip embedded bucket:*/acOid != zodb.InvalidOid {
// check B children for node with ac.oid
// while checking expand Bv till ac.lo and ac.hi_ point to the same node
// ( this does not give exact answer but should be a reasonable heuristic;
// the diff is the same if heuristic does not work and we
// look into and load more nodes to compute δ )
bc, found := BnodeIdx[acOid]
if !found {
for {
blo := Bv.Get(ac.keycov.Lo)
bhi_ := Bv.Get(ac.keycov.Hi_)
if blo != bhi_ {
break
}
bloT, ok := blo.node.(*Tree)
if !ok {
break // bucket
}
err = bloT.PActivate(ctx); /*X*/if err != nil { return nil,nil,nil, err }
defer bloT.PDeactivate()
if len(bloT.Entryv()) == 0 {
break // empty tree
}
bchildren := Bv.Expand(blo)
for _, bc_ := range bchildren {
bc_Oid := bc_.node.POid()
BnodeIdx[bc_Oid] = bc_
if acOid == bc_Oid {
found = true
bc = bc_
}
}
if found {
break
}
}
}
if found {
// ac can be skipped if key coverage stays the same
if ac.keycov == bc.keycov {
// adjust trackSet since path to the node could have changed
apath := ac.Path()
bpath := bc.Path()
if !pathEqual(apath, bpath) {
δtrack.Del.AddPath(apath)
δtrack.Add.AddPath(bpath)
if nc := at.NChild(); nc != 0 {
δtrack.ΔnchildNonLeafs[acOid] = nc
}
}
continue
}
}
}
// ac cannot be skipped
push(&Aq, ac)
}
}
}
// phase 2: reach consistency in between A and B.
// Every key removed in A has to be checked for whether it is present
// in B and contribute to δ+. In B, in turn, adding that key can add
// other keys to δ+. Those keys, in turn, have to be checked for
// whether they were present in A and contribute to δ-. For example:
//
// [ 2 4 ] [ 3 5 ]
// ↓ ↓ ↓ ↓ ↓ ↓
// |1| |23| |45| |12| |34| |56|
//
// if values for all keys change, tracked={1}, change to 1 adds
// * -B1, which queues B.1 and leads to
// * +B12, which queues A.2 and leads to
// * -B23, which queues B.3 and leads to
// * +B23, ...
debugfDiff("\nphase 2:\n")
for {
debugfDiff("\n")
debugfDiff(" av: %s\n", Av)
debugfDiff(" bv: %s\n", Bv)
debugfDiff("\n")
debugfDiff(" Bkq: %s\n", Bkqueue)
if Bkqueue.Empty() {
break
}
for _, r := range Bkqueue.AllRanges() {
lo := r.Lo
for {
b, err := Bv.GetToLeaf(ctx, lo); /*X*/if err != nil { return nil,nil,nil, err }
debugfDiff(" B k%d -> %s\n", lo, b)
// +bucket if that bucket is reached for the first time
if !b.done {
var δB map[Key]ΔValue
bbucket, ok := b.node.(*Bucket)
if ok { // !ok means ø tree
δB, err = diffB(ctx, nil, bbucket); /*X*/if err != nil { return nil,nil,nil, err }
}
// δ <- δB
err = δMerge(δ, δB); /*X*/if err != nil { return nil,nil,nil, err }
δtrack.Add.AddPath(b.Path())
δtkeycovBAdd.AddRange(b.keycov)
debugfDiff(" δtrack + %s %v\n", b.keycov, b.Path())
// Akqueue <- δB
Bkdone.AddRange(b.keycov)
Aktodo(b.keycov)
b.done = true
}
// continue with next right bucket until r coverage is complete
if r.Hi_ <= b.keycov.Hi_ {
break
}
lo = b.keycov.Hi_ + 1
}
}
Bkqueue.Clear()
debugfDiff("\n")
debugfDiff(" Akq: %s\n", Akqueue)
for _, r := range Akqueue.AllRanges() {
lo := r.Lo
for {
a, err := Av.GetToLeaf(ctx, lo); /*X*/if err != nil { return nil,nil,nil, err }
debugfDiff(" A k%d -> %s\n", lo, a)
// -bucket if that bucket is reached for the first time
if !a.done {
var δA map[Key]ΔValue
abucket, ok := a.node.(*Bucket)
if ok { // !ok means ø tree
δA, err = diffB(ctx, abucket, nil); /*X*/if err != nil { return nil,nil,nil, err }
}
// δ <- δA
err = δMerge(δ, δA); /*X*/if err != nil { return nil,nil,nil, err }
δtrack.Del.AddPath(a.Path())
// NOTE adjust δtkeycovADel only if a was originally tracked
_, tracked := trackSet[a.node.POid()]
if tracked {
δtkeycovADel.AddRange(a.keycov)
debugfDiff(" δtrack - %s %v\n", a.keycov, a.Path())
} else {
debugfDiff(" δtrack - [) %v\n", a.Path())
}
// Bkqueue <- a.range
Akdone.AddRange(a.keycov)
Bktodo(a.keycov)
a.done = true
}
// continue with next right bucket until r coverage is complete
if r.Hi_ <= a.keycov.Hi_ {
break
}
lo = a.keycov.Hi_ + 1
}
}
Akqueue.Clear()
}
δtkeycov = δtkeycovBAdd.Difference(δtkeycovADel)
return δ, δtrack, δtkeycov, nil
}
// δMerge merges changes from δ2 into δ.
// δ is total-building diff, while δ2 is diff from comparing some subnodes.
func δMerge(δ, δ2 map[Key]ΔValue) error {
debugfDiff(" δmerge %v <- %v\n", δ, δ2)
defer debugfDiff(" -> %v\n", δ)
// merge δ <- δ2
for k, δv2 := range δ2 {
δv1, already := δ[k]
if !already {
δ[k] = δv2
continue
}
// both δ and δ2 has [k] - it can be that key
// entry migrated from one bucket into another.
if !( (δv1.New == VDEL && δv2.Old == VDEL) ||
(δv1.Old == VDEL && δv2.New == VDEL) ) {
return fmt.Errorf("BUG or btree corrupt: [%v] has " +
"duplicate entries: %v, %v", k, δv1, δv2)
}
δv := ΔValue{}
switch {
// x -> ø | ø -> x
// ø -> ø
case δv2.Old == VDEL && δv2.New == VDEL: // δv2 == hole
δv = δv1
// ø -> ø
// y -> ø | ø -> y
case δv1.Old == VDEL && δv1.New == VDEL: // δv1 == hole
δv = δv2
// ø -> x -> y->x
// y -> ø
case δv2.New == VDEL:
δv.Old = δv2.Old
δv.New = δv1.New
// x -> ø -> x->y
// ø -> y
default:
δv.Old = δv1.Old
δv.New = δv2.New
}
debugfDiff(" [%v] merge %s %s -> %s\n", k, δv1, δv2, δv)
if δv.Old != δv.New {
δ[k] = δv
} else {
delete(δ, k) // NOTE also annihilates hole migration
}
}
return nil
}
// diffB computes difference in between two buckets.
// see diffX for details.
func diffB(ctx context.Context, a, b *Bucket) (δ map[Key]ΔValue, err error) {
tracefDiff(" diffB %s %s\n", xzodb.XidOf(a), xzodb.XidOf(b))
defer xerr.Contextf(&err, "diffB %s %s", xzodb.XidOf(a), xzodb.XidOf(b))
var av []BucketEntry
var bv []BucketEntry
if a != nil {
err = a.PActivate(ctx); if err != nil { return nil, err }
defer a.PDeactivate()
av = a.Entryv() // key↑
}
if b != nil {
err = b.PActivate(ctx); if err != nil { return nil, err }
defer b.PDeactivate()
bv = b.Entryv() // key↑
}
δ = map[Key]ΔValue{}
defer tracefDiff(" -> δb: %v\n", δ)
//debugfDiff(" av: %v", av)
//debugfDiff(" bv: %v", bv)
for len(av) > 0 || len(bv) > 0 {
ka, va := KeyMax, VDEL
kb, vb := KeyMax, VDEL
if len(av) > 0 {
ka = av[0].Key()
va, err = vOid(av[0].Value())
if err != nil {
return nil, fmt.Errorf("a[%v]: %s", ka, err)
}
}
if len(bv) > 0 {
kb = bv[0].Key()
vb, err = vOid(bv[0].Value())
if err != nil {
return nil, fmt.Errorf("b[%v]: %s", kb, err)
}
}
switch {
case ka < kb: // -a[0]
δ[ka] = ΔValue{va, VDEL}
av = av[1:]
case ka > kb: // +b[0]
δ[kb] = ΔValue{VDEL, vb}
bv = bv[1:]
// ka == kb // va->vb
default:
if va != vb {
δ[ka] = ΔValue{va, vb}
}
av = av[1:]
bv = bv[1:]
}
}
return δ, nil
}
// vOid returns OID of a value object.
// it is an error if value is not persistent object.
func vOid(xvalue interface{}) (zodb.Oid, error) {
value, ok := xvalue.(zodb.IPersistent)
if !ok {
return zodb.InvalidOid, fmt.Errorf("%T is not a persitent object", xvalue)
}
return value.POid(), nil
}
// ---- nodeInRange + rangeSplit ----
func (rn *nodeInRange) String() string {
done := " "; if rn.done { done = "*" }
return fmt.Sprintf("%s%s%s", done, rn.keycov, vnode(rn.node))
}
// Path returns full path to this node.
func (n *nodeInRange) Path() []zodb.Oid {
// return full copy - else .prefix can become aliased in between children of a node
return append([]zodb.Oid{}, append(n.prefix, n.node.POid())...)
}
func (rs rangeSplit) String() string {
if len(rs) == 0 {
return "ø"
}
s := ""
for _, rn := range rs {
if s != "" {
s += " "
}
s += fmt.Sprintf("%s", rn)
}
return s
}
// Get returns node covering key k.
// Get panics if k is not covered.
func (rs rangeSplit) Get(k Key) *nodeInRange {
rnode, ok := rs.Get_(k)
if !ok {
panicf("key %v not covered; coverage: %s", k, rs)
}
return rnode
}
// Get_ returns node covering key k.
func (rs rangeSplit) Get_(k Key) (rnode *nodeInRange, ok bool) {
i := sort.Search(len(rs), func(i int) bool {
return k <= rs[i].keycov.Hi_
})
if i == len(rs) {
return nil, false // key not covered
}
rn := rs[i]
if !rn.keycov.Has(k) {
panicf("BUG: get(%v) -> %s; coverage: %s", k, rn, rs)
}
return rn, true
}
// Expand replaces rnode with its children.
//
// rnode must be initially in *prs.
// rnode.node must be tree.
// rnode.node must be already activated.
//
// inserted children are returned for convenience.
func (prs *rangeSplit) Expand(rnode *nodeInRange) (children rangeSplit) {
rs := *prs
i := sort.Search(len(rs), func(i int) bool {
return rnode.keycov.Hi_ <= rs[i].keycov.Hi_
})
if i == len(rs) || rs[i] != rnode {
panicf("%s not in rangeSplit; coverage: %s", rnode, rs)
}
// [i].Key ≤ [i].Child.*.Key < [i+1].Key i ∈ [0, len([]))
//
// [0].Key = -∞ ; always returned so
// [len(ev)].Key = +∞ ; should be assumed so
tree := rnode.node.(*Tree)
treev := tree.Entryv()
children = make(rangeSplit, 0, len(treev)+1)
for i := range treev {
lo := rnode.keycov.Lo
if i > 0 {
lo = treev[i].Key()
}
hi_ := rnode.keycov.Hi_
if i < len(treev)-1 {
hi_ = treev[i+1].Key()-1 // NOTE -1 because it is hi_] not hi)
}
children = append(children, &nodeInRange{
prefix: rnode.Path(),
keycov: blib.KeyRange{lo, hi_},
node: treev[i].Child(),
})
}
// del[i]; insert(@i, children)
*prs = append(rs[:i], append(children, rs[i+1:]...)...)
return children
}
// GetToLeaf returns leaf node corresponding to key k.
//
// Leaf is usually bucket node, but, in the sole single case of empty tree, can be that root tree node.
// GetToLeaf expands step-by-step every tree through which it has to traverse to next depth level.
//
// GetToLeaf panics if k is not covered.
func (prs *rangeSplit) GetToLeaf(ctx context.Context, k Key) (*nodeInRange, error) {
rnode, ok, err := prs.GetToLeaf_(ctx, k)
if err == nil && !ok {
panicf("key %v not covered; coverage: %s", k, *prs)
}
return rnode, err
}
// GetToLeaf_ is comma-ok version of GetToLeaf.
func (prs *rangeSplit) GetToLeaf_(ctx context.Context, k Key) (rnode *nodeInRange, ok bool, err error) {
rnode, ok = prs.Get_(k)
if !ok {
return nil, false, nil // key not covered
}
for {
switch rnode.node.(type) {
// bucket = leaf
case *Bucket:
return rnode, true, nil
}
// its tree -> activate to expand; check for ø case
tree := rnode.node.(*Tree)
err = tree.PActivate(ctx)
if err != nil {
return nil, false, err
}
defer tree.PDeactivate()
// empty tree -> don't expand - it is already leaf
if len(tree.Entryv()) == 0 {
return rnode, true, nil
}
// expand tree children
children := prs.Expand(rnode)
rnode = children.Get(k) // k must be there
}
}
// ---- stack of nodeInRange ----
// push pushes element to node stack.
func push(nodeStk *[]*nodeInRange, top *nodeInRange) {
*nodeStk = append(*nodeStk, top)
}
// pop pops top element from node stack.
func pop(nodeStk *[]*nodeInRange) *nodeInRange {
stk := *nodeStk
l := len(stk)
top := stk[l-1]
*nodeStk = stk[:l-1]
return top
}
func tracefDiff(format string, argv ...interface{}) {
if traceDiff {
fmt.Printf(format, argv...)
}
}
func debugfDiff(format string, argv ...interface{}) {
if debugDiff {
fmt.Printf(format, argv...)
}
}
// Copyright (C) 2021 Nexedi SA and Contributors.
// Kirill Smelkov <kirr@nexedi.com>
//
// This program is free software: you can Use, Study, Modify and Redistribute
// it under the terms of the GNU General Public License version 3, or (at your
// option) any later version, as published by the Free Software Foundation.
//
// You can also Link and Combine this program with other software covered by
// the terms of any of the Free Software licenses or any of the Open Source
// Initiative approved licenses and Convey the resulting work. Corresponding
// source of such a combination shall include the source code for all other
// software used.
//
// This program is distributed WITHOUT ANY WARRANTY; without even the implied
// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
//
// See COPYING file for full licensing terms.
// See https://www.nexedi.com/licensing for rationale and options.
// Package xbtree complements package lab.nexedi.com/kirr/neo/go/zodb/btree.
//
// It provides the following amendments:
//
// - ΔBtail (tail of revisional changes to BTrees).
package xbtree
// this file contains only tree types and utilities.
// main code lives in δbtail.go and treediff.go .
import (
"context"
"fmt"
"lab.nexedi.com/kirr/go123/xerr"
"lab.nexedi.com/kirr/neo/go/zodb"
"lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/set"
"lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/xbtree/blib"
"lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/xzodb"
)
// XXX instead of generics
type Tree = blib.Tree
type Bucket = blib.Bucket
type Node = blib.Node
type TreeEntry = blib.TreeEntry
type BucketEntry = blib.BucketEntry
type Key = blib.Key
type KeyRange = blib.KeyRange
const KeyMax = blib.KeyMax
const KeyMin = blib.KeyMin
// value is assumed to be persistent reference.
// deletion is represented as VDEL.
type Value = zodb.Oid
const VDEL = zodb.InvalidOid
type setOid = set.Oid
// pathEqual returns whether two paths are the same.
func pathEqual(patha, pathb []zodb.Oid) bool {
if len(patha) != len(pathb) {
return false
}
for i, a := range patha {
if pathb[i] != a {
return false
}
}
return true
}
// vnode returns brief human-readable representation of node.
func vnode(node Node) string {
kind := "?"
switch node.(type) {
case *Tree: kind = "T"
case *Bucket: kind = "B"
}
return kind + node.POid().String()
}
// zgetNodeOrNil returns btree node corresponding to zconn.Get(oid) .
// if the node does not exist, (nil, ok) is returned.
func zgetNodeOrNil(ctx context.Context, zconn *zodb.Connection, oid zodb.Oid) (node Node, err error) {
defer xerr.Contextf(&err, "getnode %s@%s", oid, zconn.At())
xnode, err := xzodb.ZGetOrNil(ctx, zconn, oid)
if xnode == nil || err != nil {
return nil, err
}
node, ok := xnode.(Node)
if !ok {
return nil, fmt.Errorf("unexpected type: %s", zodb.ClassOf(xnode))
}
return node, nil
}
func panicf(format string, argv ...interface{}) {
panic(fmt.Sprintf(format, argv...))
}
...@@ -43,6 +43,7 @@ const KeyMin = blib.KeyMin ...@@ -43,6 +43,7 @@ const KeyMin = blib.KeyMin
type setKey = set.I64 type setKey = set.I64
// XXX dup from xbtree (to avoid import cycle)
const VDEL = zodb.InvalidOid const VDEL = zodb.InvalidOid
......
...@@ -22,9 +22,12 @@ package xzodb ...@@ -22,9 +22,12 @@ package xzodb
import ( import (
"context" "context"
"errors"
"fmt" "fmt"
"reflect"
"lab.nexedi.com/kirr/go123/xcontext" "lab.nexedi.com/kirr/go123/xcontext"
"lab.nexedi.com/kirr/go123/xerr"
"lab.nexedi.com/kirr/neo/go/transaction" "lab.nexedi.com/kirr/neo/go/transaction"
"lab.nexedi.com/kirr/neo/go/zodb" "lab.nexedi.com/kirr/neo/go/zodb"
...@@ -80,3 +83,54 @@ func ZOpen(ctx context.Context, zdb *zodb.DB, zopt *zodb.ConnOptions) (_ *ZConn, ...@@ -80,3 +83,54 @@ func ZOpen(ctx context.Context, zdb *zodb.DB, zopt *zodb.ConnOptions) (_ *ZConn,
TxnCtx: txnCtx, TxnCtx: txnCtx,
}, nil }, nil
} }
// ZGetOrNil returns zconn.Get(oid), or (nil,ok) if the object does not exist.
func ZGetOrNil(ctx context.Context, zconn *zodb.Connection, oid zodb.Oid) (_ zodb.IPersistent, err error) {
defer xerr.Contextf(&err, "zget %s@%s", oid, zconn.At())
obj, err := zconn.Get(ctx, oid)
if err != nil {
if IsErrNoData(err) {
err = nil
}
return nil, err
}
// activate the object to find out it really exists
// after removal on storage, the object might have stayed in Connection
// cache due to e.g. PCachePinObject, and it will be PActivate that
// will return "deleted" error.
err = obj.PActivate(ctx)
if err != nil {
if IsErrNoData(err) {
return nil, nil
}
return nil, err
}
obj.PDeactivate()
return obj, nil
}
// IsErrNoData returns whether err is due to NoDataError or NoObjectError.
func IsErrNoData(err error) bool {
var eNoData *zodb.NoDataError
var eNoObject *zodb.NoObjectError
switch {
case errors.As(err, &eNoData):
return true
case errors.As(err, &eNoObject):
return true
default:
return false
}
}
// XidOf returns string representation of object xid.
func XidOf(obj zodb.IPersistent) string {
if obj == nil || reflect.ValueOf(obj).IsNil() {
return "ø"
}
xid := zodb.Xid{At: obj.PJar().At(), Oid: obj.POid()}
return xid.String()
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment