Now we need to start talking about how the database is stored and accessed from the disk.
Indexing
types of indexes
B+ trees
hash tables
We can learn how to build an index, but before we do that:
IN CLASS – Question: What is an index? Give examples of an index in real life:
An index on a file speeds up selections on the search key field(s)
Search key = any subset of the fields of a relation
Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation).
Entries in an index: (k, r), where:
$k$ = the key
$r$ = the record OR record id OR record ids
Types of Indexes
Clustered/unclustered
Clustered = records sorted in the key order
Unclustered = no
Dense/sparse
Dense = each record has an entry in the index
Sparse = only some records have
Primary/secondary
Primary = on the primary key
Secondary = on any key
Some textbooks interpret these differently
B+ tree / Hash table / quad tree….
What types of indexes have you learned in data structures?
Files are stored in blocks on disk
Records and blocks –
Examples
Clustered Indexes
data:image/s3,"s3://crabby-images/ffe66/ffe6611bfa0eb105e204316ecf22b79340a019f0" alt=""
Clustered: File is sorted on the index attribute
Dense: sequence of (key,pointer) pairs
***How much space on disk? N * sizeof(type) * sizeof(ptr). Pointer is 64 bits = 8 bytes.
data:image/s3,"s3://crabby-images/a3071/a307132ffe8b1ee20758642f905c5b388ee248bc" alt=""
Clustered, Sparse index: one key per data block
***How much space on disk? N * sizeof(type) + sizeof(ptr)*#blocks_with_data
Duplicate keys? Then how do we do it?
data:image/s3,"s3://crabby-images/ca09c/ca09cf8832b8af459fef423eecaa20aa7be2eabb" alt=""
Duplicate, Clustered, Sparse index: Pointer to lowest new search key in each block:
data:image/s3,"s3://crabby-images/d9667/d9667006cbd4c26d07327c1f7d83a0dd789137cc" alt=""
Unclustered Indexes
Often for indexing other attributes than primary key
Always dense (why ?)
data:image/s3,"s3://crabby-images/f7284/f72841e62d5b729ebfe8448f776555b75464a766" alt=""
…Because primary key is sorted, and because every table should have a primary key.
Clustered vs Unclustered:
data:image/s3,"s3://crabby-images/8c91a/8c91a8b5edcf00a0958558c45ea38ce7e6fa370d" alt=""
Composite Search Keys
Composite Search Keys: Search on a combination of fields.
Equality query: Every field value is equal to a constant value. E.g. wrt <sal,age> index:
age=20 and sal =75
Range query: Some field value is not a constant. E.g.:
age =20; or age=20 and sal > 10
data:image/s3,"s3://crabby-images/585b6/585b6bb16cd749eac8332371734f4d80be0b6e26" alt=""
Column Order
Column order of an index is important
Data name
Index age, sal
What would happen if we searched for a column $y$ on a multi-key index $<x,y>$