Now we need to start talking about how the database is stored and accessed from the disk.
types of indexes
We can learn how to build an index, but before we do that:
IN CLASS – Question: What is an index? Give examples of an index in real life:
An index on a file speeds up selections on the search key field(s)
Search key = any subset of the fields of a relation
Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation).
Entries in an index: (k, r), where:
$k$ = the key
$r$ = the record OR record id OR record ids
Types of Indexes
Clustered = records sorted in the key order
Unclustered = no
Dense = each record has an entry in the index
Sparse = only some records have
Primary = on the primary key
Secondary = on any key
Some textbooks interpret these differently
B+ tree / Hash table / quad tree….
What types of indexes have you learned in data structures?
Files are stored in blocks on disk
Records and blocks –
Clustered: File is sorted on the index attribute
Dense: sequence of (key,pointer) pairs
***How much space on disk? N * sizeof(type) * sizeof(ptr). Pointer is 64 bits = 8 bytes.
Clustered, Sparse index: one key per data block
***How much space on disk? N * sizeof(type) + sizeof(ptr)*#blocks_with_data
Duplicate keys? Then how do we do it?
Duplicate, Clustered, Sparse index: Pointer to lowest new search key in each block:
Often for indexing other attributes than primary key
Always dense (why ?)
…Because primary key is sorted, and because every table should have a primary key.
Clustered vs Unclustered:
Composite Search Keys
Composite Search Keys: Search on a combination of fields.
Equality query: Every field value is equal to a constant value. E.g. wrt <sal,age> index:
age=20 and sal =75
Range query: Some field value is not a constant. E.g.:
age =20; or age=20 and sal > 10
Column order of an index is important
Index age, sal
What would happen if we searched for a column $y$ on a multi-key index $<x,y>$