## 2010年4月11日星期日

`    Readings for Balanced Trees: Handout from Weiss, pp 135-149 for AVL  trees, and 165-170 for B-trees    Balanced Trees    The obvious idea is that a binary tree can have linear search time and  insertion time  properties if it, for instance, only has Right  offspring... it degenerates into a linked list. No logarithmic  advantage.  Further trees mutate in shape depending on the policy for  inserts and removes (figs. p. 136).  So to preserve the logarithmic  worst-case times one goes to a fair amount of bother to keep the trees  balanced.  None of these techniques is very easy: all call for  fairly  complex algorithms for insertion and deletion.    In AVL trees, an operation called ROTATION is used to reorder nodes in  the tree for balance.  In B-trees (very useful in databases), the same  goal is achieved by SPLITTING.  AVL trees are binary, B-trees are not.  Note also there are Red-Black trees, which are pretty popular.    AVL trees:    AVL tree is Binary search tree with balance condition.  You can  imagine several balance conditions, and the goal is to find one that  has OK performance and OK costs to maintain.      -- Root's Left and Right subtree have same height?  nope, fig. 4.29.    -- Every node's Left and Right subtree have same height?  nope, can       only guarantee if have exactly 2**n -1 nodes.    AVL Balance Condition:    -- Every node's Left and Right subtree can differ in height by at       most one (1).    By convention, the  height of an empty tree is -1.    Examples in Fig. 4.30.    Worst case height is 1.44 logn, but in practice, height is about  log (n+1) + .25.  Checkout Fig. 4.31, and note that you have a  Fibonacci-like recursion (which gives the 1.44 log bound):  The left subtree is minimum size of height  7 and the right of minimum size of height 8.  Thus we know  N(h) = N(h-1)+ N(h-2) +1.  For h=0, N(h) = 1;  For h=1, N(h) = 2.      Trick for binary search trees and other data structures: if there  aren't going to be many deletions, use *lazy deletion*, which just  marks a node as deleted and leaves it in place.      Using lazy deletion, all the operations on AVL trees look quick except  possibly insertion, which keeps tree balanced.    Height information is kept in the node structure for each node and is  used to keep tree balanced.  One has to update this upon insertion,  but major problem is that insertion can unbalance the tree. Luckily, a  single local readjustment of nodes preserves binary search property  and rebalances...it's a *rotation* (Fig. 4.32).  See also 4.30, 4.33    Start at insertion, head up toward root of tree looking at balance  info. If any out-of-balance node is found, rotate and quit.  Example pp. 140-141.  This is all very well but it's not good enough  because sometimes we need a *double rotation*. Inserting into the  ``middle'' subtree (Y in figure 4.32) can lead to problem not fixed by  rotation, so need double rotation (fig. 4.34, 4.35).    Programming information and tricks p. 145.    Deletion even worse (as usual).      B-Trees (p. 165ff)    A special case of B-trees is the popular 2-3 trees (B-tree of order 3).  B-trees not  binary in general, bigger branching factor, good for organizing files  on disks, directories, data bases, etc.    B-trees (of order m) have following properties:    Root is  either a leaf or has between 2 and m children  All nonleaf nodes but the root have between ceil(m/2) and m children.  All leaves are at same depth.    All data is stored in leaves, either keys or pointers to structures  containing the keys.  Interior nodes just have *pointers* to children and *values* giving  the smallest values to be found in the correspoinding subtrees.    Keys increase in value across subtrees:  All keys in subtree p1 are < in p2.    Example in fig. 4.63.    Operations:  Find: start at root, branch down to one of at most 3 children  depending on the relation of the key we seek to the values stored in  the root. etc.  Insert:  for a new key x, follow path as if for Find, and x goes in  the leaf we find.    No problem unless our addition causes too many occupants of a node  (four for a 2-3 tree).  In that case the node must be split: turn it  into two nodes with 2 keys each and adjust the parent.  BUT that might  actually make too many children, like four, for the parent, so that  means the parent must be split and each half gets 2 children.  More  examples p. 168,169.  Only changes happen along the access path, but  Weiss warns that it isn't easy and there are lots of cases.    There are various other ways of rebalancing a B-tree evidently....like  you could search for underfull sibs and skoosh keys over into them  when you add to a node.  This works for internal nodes too. Keeps  nodes fuller but more complex routines.    Actual (not lazy) Removal isn't too hard...combine singleton remaining  keys with siblings, or if sib has 3 steal one and give it to the  singleton.  If sib has 2 keys, combine both into 3-key node, parent  loses a child, which could cause repercussions upward.    Generalizing this 2-3 tree algorithm to order m isn't hard...only have  to take action on insertion when node has m keys already, when it  splits into two of size ceil((m+1)/2), floor((m+1)/2) (recursively).    Depth of B-tree is at most ceil(log to the base ceil(m/2) of n).  In general at each node we have to figure out which branch to take,  which with binary search over the possibly m entries takes O(log m).  Insert, remove can require O(log m) to fix up node information.    So worst-case for insert and remove is O(m logtothebase m n) =  O((m/logm)logn), but Find is only O(log n).    B-trees good for databases if whole Btree stored on disk, then you pay  lots for every disk access.  The number of disk accesses is  O(logbasem n), and the O(log m) in-core search is quick by comparison,  so maximize m to be largest that allows internal node to fit in a disk  block.  Likewise max. elements in leaf should just fill one disk  block.    Using insertion strategy here (not skooshing keys to sibs) yields  predicted 69% occupancy (this is ln 2...can you explain why?).      Exercises:( these are Weiss:  4.15, 4.16, 4.21, 4.36, 4.38)      1. a) Give a precise expression for the minimum number of nodes in an  AVL tree of height h.  b) What is the minimum number of nodes in an AVL tree of height 15?    2. Show the result of inserting 2,1,4,5,9,3,6,7 into an empty AVL  tree.    3. a) How many bits are required per node to store the height of a  node in an n-node AVL tree?      b)  What is the smallest AVL tree that overflows an 8-bit height    counter?    4. a) Show the result of inserting the following keys into an  initially-empty 2-3 tree:  3,1,4,5,9,2,6,8,7,0.  b)  Show the result of deleting 0 and then 9 from the 2-3 created in  part (a).    5.  A B* tree of order m is a B-tree in which each interior node has  between 2m/3 and m children.  Describe a method to perform insertion  into a B* tree.      `