B+ Trees

Consider our textbook as an example:

How many indexes? Where?
What are keys? (keywords) What are records? (sections) What are the blocks? (page)
Clustered? (No, data is not stored in order) Dense?  (Yes, there is a link to every mention of the key)

Index Limitations

Sargable (sargability?)

Remember the % opeartor?
                Wildcard
                Can indexes search on wildcard queries?
                                Yes and no

Sargable operators: =,>,<,>=,<=,BETWEEN,LIKE without leading %

Non-sargable operators <>,IN,OR,NOT IN, NOT EXISTS, NOT LIKE, LIKE with leading %

Typically non-sargable queries include a function in the left part of a where clause condition

Non sargable:
                “…where phone like %7020”

How can we make this work?

B-Trees

How is this index created? What data structure is used?

B-Trees has a variant called B+ Trees (book describes a B+ tree but calls it simply B-Trees (as most people do)).

Why is it named B-Trees?

We don’t know. Balanced, bushy, broad might apply. Most likely stands for Boeing or Bayer (the guy who invented it was named Bayer and worked at Boeing)

B-Tree (no +)

Idea:
                Avoid duplicate keys
                Have record pointers in non-leaf nodes

B-Tree Example:

B+ Tree Basics

Parameter $d$ = the degree

Each node has [$d$, $2d$] keys (except root)
Internal node:

Leaf:

Example with $d$=2:

How large should we make d?

Example:
                Key size = 4 bytes
                Pointer size = 8 bytes
                Block size = 4096 byes

$2d \times 4  + (2d+1) \times 8  \le 4096$

$d = 170$

Searching a B+ Tree

Exact key values:  Select name From people where age = 25

Start at the root
Proceed down, to the leaf

Range Queries: Select name From people where age > 25 or age <10

Start at root
Proceed down to leaf
Then sequential traversal (B+ > B-Trees)

In Practice

Typical order: 100.  Typical fill-factor: 67%.

average fanout = 133

Typical capacities:

Height 4: 1334 = 312,900,700 records

Height 3: 1333 =     2,352,637 records

Can often hold top levels in buffer pool:

Level 1 =           1 page  =     8 Kbytes

Level 2 =      133 pages =     1 Mbyte Level 3 = 17,689 pages = 133 MBytes

Insertions on a B+ Tree

Insert (K, P)

Find leaf where K belongs, insert
If no overflow (2d keys or less), halt
If overflow (2d+1 keys), split node, insert in parent:

Internal node example:

If leaf, keep K3 too in right node. When root splits, new root has 1 key only. (That’s why root is special for degree satisfaction)

Example:

Insert 19:

After Insertion:

Insert 25:

Need to split:

After the split:

Delete 30:

Delete 25:

After deleting 25 – we need to rebalance

Number of keys < d

Now let’s delete 40:

After deleting 40 a rotation is not possible

We need to merge

Finally: