wcfs/internal/xbtree/treediff.go · 80153aa5913de19d294da20deb30086b13719e75 · Kirill Smelkov / wendelin.core

wcfs: xbtree: BTree-diff algorithm · 80153aa5

Kirill Smelkov authored Oct 26, 2021

This algorithm will be internally used by ΔBtail in the next patch.

The algorithm would be simple, if we would need to diff two trees
completely. However in ΔBtail only subpart of BTree nodes are tracked(*)
and the diff has to work modulo that tracking set.

No tests now because ΔBtail tests will cover treediff functionality as well.

Some preliminary history:

kirr/wendelin.core@78f2f88b X wcfs/xbtree: Fix treediff(a, ø)
kirr/wendelin.core@5324547c X wcfs/xbtree: root(a) must stay in trackSet even after treediff(a,ø)
kirr/wendelin.core@f65f775b X wcfs/xbtree: treediff(ø, b)
kirr/wendelin.core@c75b1c6f X wcfs/xbtree: Start killing holeIdx
kirr/wendelin.core@ef5e5183 X treediff ret += δtkeycov
kirr/wendelin.core@9d20f8e8 X treediff: Fix BUG while computing AB coverage
kirr/wendelin.core@ddb28043 X rebuild: Don't return nil for empty ΔPPTreeSubSet - that leads to SIGSEGV
kirr/wendelin.core@f68398c9 X wcfs: Move treediff into its own file

(*) because full BTree scan is needed to discover all of its nodes.

Quoting treediff documentation:

---- 8< ----

treediff provides diff for BTrees

Use δZConnectTracked + treediff to compute BTree-diff caused by δZ:

δZConnectTracked(δZ, trackSet) -> δZTC, δtopsByRoot
treediff(root, δtops, δZTC, trackSet, zconn{Old,New}) -> δT, δtrack, δtkeycov

δZConnectTracked computes BTree-connected closure of δZ modulo tracked set
and also returns δtopsByRoot to indicate which tree objects were changed and
in which subtree parts. With that information one can call treediff for each
changed root to compute BTree-diff and δ for trackSet itself.

BTree diff algorithm

diffT, diffB and δMerge constitute the diff algorithm implementation.
diff(A,B) works on pair of A and B whole key ranges splitted into regions
covered by tree nodes. The splitting represents current state of recursion
into corresponding tree. If a node in particular key range is Bucket, that
bucket contributes to δ- in case of A, and to δ+ in case of B. If a node in
particular key range is Tree, the algorithm may want to expand that tree
node into its children and to recourse into some of the children.

There are two phases:

- Phase 1 expands A top->down driven by δZTC, adds reached buckets to δ-,
and queues key regions of those buckets to be processed on B.

- Phase 2 starts processing from queued key regions, expands them on B and
adds reached buckets to δ+. Then it iterates to reach consistency in between
A and B because processing buckets on B side may increase δ key coverage,
and so corresponding key ranges has to be again processed on A. Which in
turn may increase δ key coverage again, and needs to be processed on B side,
etc...

The final δ is merge of δ- and δ+.

diffT has more detailed explanation of phase 1 and phase 2 logic.

80153aa5

treediff.go 27.4 KB

Replace treediff.go