Readings for Balanced Trees: Handout from Weiss, pp 135-149 for AVL
trees, and 165-170 for B-trees
The obvious idea is that a binary tree can have linear search time and
insertion time properties if it, for instance, only has Right
offspring... it degenerates into a linked list. No logarithmic
advantage. Further trees mutate in shape depending on the policy for
inserts and removes (figs. p. 136). So to preserve the logarithmic
worst-case times one goes to a fair amount of bother to keep the trees
balanced. None of these techniques is very easy: all call for fairly
complex algorithms for insertion and deletion.
In AVL trees, an operation called ROTATION is used to reorder nodes in
the tree for balance. In B-trees (very useful in databases), the same
goal is achieved by SPLITTING. AVL trees are binary, B-trees are not.
Note also there are Red-Black trees, which are pretty popular.
AVL tree is Binary search tree with balance condition. You can
imagine several balance conditions, and the goal is to find one that
has OK performance and OK costs to maintain.
-- Root's Left and Right subtree have same height? nope, fig. 4.29.
-- Every node's Left and Right subtree have same height? nope, can
only guarantee if have exactly 2**n -1 nodes.
AVL Balance Condition:
-- Every node's Left and Right subtree can differ in height by at
most one (1).
By convention, the height of an empty tree is -1.
Examples in Fig. 4.30.
Worst case height is 1.44 logn, but in practice, height is about
log (n+1) + .25. Checkout Fig. 4.31, and note that you have a
Fibonacci-like recursion (which gives the 1.44 log bound):
The left subtree is minimum size of height
7 and the right of minimum size of height 8. Thus we know
N(h) = N(h-1)+ N(h-2) +1. For h=0, N(h) = 1; For h=1, N(h) = 2.
Trick for binary search trees and other data structures: if there
aren't going to be many deletions, use *lazy deletion*, which just
marks a node as deleted and leaves it in place.
Using lazy deletion, all the operations on AVL trees look quick except
possibly insertion, which keeps tree balanced.
Height information is kept in the node structure for each node and is
used to keep tree balanced. One has to update this upon insertion,
but major problem is that insertion can unbalance the tree. Luckily, a
single local readjustment of nodes preserves binary search property
and rebalances...it's a *rotation* (Fig. 4.32). See also 4.30, 4.33
Start at insertion, head up toward root of tree looking at balance
info. If any out-of-balance node is found, rotate and quit.
Example pp. 140-141. This is all very well but it's not good enough
because sometimes we need a *double rotation*. Inserting into the
``middle'' subtree (Y in figure 4.32) can lead to problem not fixed by
rotation, so need double rotation (fig. 4.34, 4.35).
Programming information and tricks p. 145.
Deletion even worse (as usual).
B-Trees (p. 165ff)
A special case of B-trees is the popular 2-3 trees (B-tree of order 3).
binary in general, bigger branching factor, good for organizing files
on disks, directories, data bases, etc.
B-trees (of order m) have following properties:
Root is either a leaf or has between 2 and m children
All nonleaf nodes but the root have between ceil(m/2) and m children.
All leaves are at same depth.
All data is stored in leaves, either keys or pointers to structures
containing the keys.
Interior nodes just have *pointers* to children and *values* giving
the smallest values to be found in the correspoinding subtrees.
Keys increase in value across subtrees: All keys in subtree p1 are < in p2.
Example in fig. 4.63.
Find: start at root, branch down to one of at most 3 children
depending on the relation of the key we seek to the values stored in
the root. etc.
Insert: for a new key x, follow path as if for Find, and x goes in
the leaf we find.
No problem unless our addition causes too many occupants of a node
(four for a 2-3 tree). In that case the node must be split: turn it
into two nodes with 2 keys each and adjust the parent. BUT that might
actually make too many children, like four, for the parent, so that
means the parent must be split and each half gets 2 children. More
examples p. 168,169. Only changes happen along the access path, but
Weiss warns that it isn't easy and there are lots of cases.
There are various other ways of rebalancing a B-tree evidently....like
you could search for underfull sibs and skoosh keys over into them
when you add to a node. This works for internal nodes too. Keeps
nodes fuller but more complex routines.
Actual (not lazy) Removal isn't too hard...combine singleton remaining
keys with siblings, or if sib has 3 steal one and give it to the
singleton. If sib has 2 keys, combine both into 3-key node, parent
loses a child, which could cause repercussions upward.
Generalizing this 2-3 tree algorithm to order m isn't hard...only have
to take action on insertion when node has m keys already, when it
splits into two of size ceil((m+1)/2), floor((m+1)/2) (recursively).
Depth of B-tree is at most ceil(log to the base ceil(m/2) of n).
In general at each node we have to figure out which branch to take,
which with binary search over the possibly m entries takes O(log m).
Insert, remove can require O(log m) to fix up node information.
So worst-case for insert and remove is O(m logtothebase m n) =
O((m/logm)logn), but Find is only O(log n).
B-trees good for databases if whole Btree stored on disk, then you pay
lots for every disk access. The number of disk accesses is
O(logbasem n), and the O(log m) in-core search is quick by comparison,
so maximize m to be largest that allows internal node to fit in a disk
block. Likewise max. elements in leaf should just fill one disk
Using insertion strategy here (not skooshing keys to sibs) yields
predicted 69% occupancy (this is ln 2...can you explain why?).
Exercises:( these are Weiss: 4.15, 4.16, 4.21, 4.36, 4.38)
1. a) Give a precise expression for the minimum number of nodes in an
AVL tree of height h.
b) What is the minimum number of nodes in an AVL tree of height 15?
2. Show the result of inserting 2,1,4,5,9,3,6,7 into an empty AVL
3. a) How many bits are required per node to store the height of a
node in an n-node AVL tree?
b) What is the smallest AVL tree that overflows an 8-bit height
4. a) Show the result of inserting the following keys into an
initially-empty 2-3 tree: 3,1,4,5,9,2,6,8,7,0.
b) Show the result of deleting 0 and then 9 from the 2-3 created in
5. A B* tree of order m is a B-tree in which each interior node has
between 2m/3 and m children. Describe a method to perform insertion
into a B* tree.