Readings for Balanced Trees: Handout from Weiss, pp 135-149 for AVL

trees, and 165-170 for B-trees

Balanced Trees

The obvious idea is that a binary tree can have linear search time and

insertion time properties if it, for instance, only has Right

offspring... it degenerates into a linked list. No logarithmic

advantage. Further trees mutate in shape depending on the policy for

inserts and removes (figs. p. 136). So to preserve the logarithmic

worst-case times one goes to a fair amount of bother to keep the trees

balanced. None of these techniques is very easy: all call for fairly

complex algorithms for insertion and deletion.

In AVL trees, an operation called ROTATION is used to reorder nodes in

the tree for balance. In B-trees (very useful in databases), the same

goal is achieved by SPLITTING. AVL trees are binary, B-trees are not.

Note also there are Red-Black trees, which are pretty popular.

AVL trees:

AVL tree is Binary search tree with balance condition. You can

imagine several balance conditions, and the goal is to find one that

has OK performance and OK costs to maintain.

-- Root's Left and Right subtree have same height? nope, fig. 4.29.

-- Every node's Left and Right subtree have same height? nope, can

only guarantee if have exactly 2**n -1 nodes.

AVL Balance Condition:

-- Every node's Left and Right subtree can differ in height by at

most one (1).

By convention, the height of an empty tree is -1.

Examples in Fig. 4.30.

Worst case height is 1.44 logn, but in practice, height is about

log (n+1) + .25. Checkout Fig. 4.31, and note that you have a

Fibonacci-like recursion (which gives the 1.44 log bound):

The left subtree is minimum size of height

7 and the right of minimum size of height 8. Thus we know

N(h) = N(h-1)+ N(h-2) +1. For h=0, N(h) = 1; For h=1, N(h) = 2.

Trick for binary search trees and other data structures: if there

aren't going to be many deletions, use *lazy deletion*, which just

marks a node as deleted and leaves it in place.

Using lazy deletion, all the operations on AVL trees look quick except

possibly insertion, which keeps tree balanced.

Height information is kept in the node structure for each node and is

used to keep tree balanced. One has to update this upon insertion,

but major problem is that insertion can unbalance the tree. Luckily, a

single local readjustment of nodes preserves binary search property

and rebalances...it's a *rotation* (Fig. 4.32). See also 4.30, 4.33

Start at insertion, head up toward root of tree looking at balance

info. If any out-of-balance node is found, rotate and quit.

Example pp. 140-141. This is all very well but it's not good enough

because sometimes we need a *double rotation*. Inserting into the

``middle'' subtree (Y in figure 4.32) can lead to problem not fixed by

rotation, so need double rotation (fig. 4.34, 4.35).

Programming information and tricks p. 145.

Deletion even worse (as usual).

B-Trees (p. 165ff)

A special case of B-trees is the popular 2-3 trees (B-tree of order 3).

B-trees not

binary in general, bigger branching factor, good for organizing files

on disks, directories, data bases, etc.

B-trees (of order m) have following properties:

Root is either a leaf or has between 2 and m children

All nonleaf nodes but the root have between ceil(m/2) and m children.

All leaves are at same depth.

All data is stored in leaves, either keys or pointers to structures

containing the keys.

Interior nodes just have *pointers* to children and *values* giving

the smallest values to be found in the correspoinding subtrees.

Keys increase in value across subtrees: All keys in subtree p1 are < in p2.

Example in fig. 4.63.

Operations:

Find: start at root, branch down to one of at most 3 children

depending on the relation of the key we seek to the values stored in

the root. etc.

Insert: for a new key x, follow path as if for Find, and x goes in

the leaf we find.

No problem unless our addition causes too many occupants of a node

(four for a 2-3 tree). In that case the node must be split: turn it

into two nodes with 2 keys each and adjust the parent. BUT that might

actually make too many children, like four, for the parent, so that

means the parent must be split and each half gets 2 children. More

examples p. 168,169. Only changes happen along the access path, but

Weiss warns that it isn't easy and there are lots of cases.

There are various other ways of rebalancing a B-tree evidently....like

you could search for underfull sibs and skoosh keys over into them

when you add to a node. This works for internal nodes too. Keeps

nodes fuller but more complex routines.

Actual (not lazy) Removal isn't too hard...combine singleton remaining

keys with siblings, or if sib has 3 steal one and give it to the

singleton. If sib has 2 keys, combine both into 3-key node, parent

loses a child, which could cause repercussions upward.

Generalizing this 2-3 tree algorithm to order m isn't hard...only have

to take action on insertion when node has m keys already, when it

splits into two of size ceil((m+1)/2), floor((m+1)/2) (recursively).

Depth of B-tree is at most ceil(log to the base ceil(m/2) of n).

In general at each node we have to figure out which branch to take,

which with binary search over the possibly m entries takes O(log m).

Insert, remove can require O(log m) to fix up node information.

So worst-case for insert and remove is O(m logtothebase m n) =

O((m/logm)logn), but Find is only O(log n).

B-trees good for databases if whole Btree stored on disk, then you pay

lots for every disk access. The number of disk accesses is

O(logbasem n), and the O(log m) in-core search is quick by comparison,

so maximize m to be largest that allows internal node to fit in a disk

block. Likewise max. elements in leaf should just fill one disk

block.

Using insertion strategy here (not skooshing keys to sibs) yields

predicted 69% occupancy (this is ln 2...can you explain why?).

Exercises:( these are Weiss: 4.15, 4.16, 4.21, 4.36, 4.38)

1. a) Give a precise expression for the minimum number of nodes in an

AVL tree of height h.

b) What is the minimum number of nodes in an AVL tree of height 15?

2. Show the result of inserting 2,1,4,5,9,3,6,7 into an empty AVL

tree.

3. a) How many bits are required per node to store the height of a

node in an n-node AVL tree?

b) What is the smallest AVL tree that overflows an 8-bit height

counter?

4. a) Show the result of inserting the following keys into an

initially-empty 2-3 tree: 3,1,4,5,9,2,6,8,7,0.

b) Show the result of deleting 0 and then 9 from the 2-3 created in

part (a).

5. A B* tree of order m is a B-tree in which each interior node has

between 2m/3 and m children. Describe a method to perform insertion

into a B* tree.

## 2010年4月11日星期日

### Readings for Balanced Trees

订阅：
博文评论 (Atom)

## 没有评论:

## 发表评论