Secondary storage hash tables are much like main memory ones
Recall basics:
There are buckets
A hash function f(k) maps a key k to {0, 1, β¦, n-1}
Store in bucket f(k) a pointer to record with key k
Secondary storage: bucket = block, use overflow blocks when needed
Assume 1 bucket (block) stores 2 keys + pointers
data:image/s3,"s3://crabby-images/d3fca/d3fcaa4d15563f2e4e3fcc99fd2b968a44e158bf" alt=""
h(e)=0
h(b)=h(f)=1
h(g)=2
h(a)=h(c)=3
Search for a:
Compute h(a)=3
Read bucket 3
1 disk access
Insertion:
data:image/s3,"s3://crabby-images/f1d5c/f1d5c9a7a61247d89de0c665d476d3ade55c4ded" alt=""
place in right bucket, if space
E.g. h(d)=2
Overflow blocks
data:image/s3,"s3://crabby-images/35583/3558331a70b45f72577a7d2faf032b987ee0b4ee" alt=""
Create overflow block, if no space
E.g. h(k)=1
Performance
Excellent, if no overflow blocks
Degrades considerably when number of keys exceeds the number of buckets (I.e. many overflow blocks).
Extensible Hash Table:
Allows hash table to grow, to avoid performance degradation
Assume a hash function h that returns numbers in {0,β¦,2^k β 1}
Start with = 2^iβͺ2^k , only look at first i most significant bits π=1, π=2^π, π=4 bit int (or whatever)
data:image/s3,"s3://crabby-images/1fcf2/1fcf2271d29d4eb1359eda70056db938e26f1818" alt=""
Note β we only look at the first bit.
Insert 1010
data:image/s3,"s3://crabby-images/a6741/a6741b253c663b67f9431fe9f56762c7467866c6" alt=""
Need to extend table, split blocks:
Noww insert 0000,
then 0101
What if we
inserted 0001?
This would
βextendβ the hashtable again i=3
data:image/s3,"s3://crabby-images/259de/259debe4f69a9deec55c1dab793bc447a2f08202" alt=""
Extensible Hash Tables performance
No overflow blocks: access always one read
BUT:
Extensions can be costly and disruptive (Power of 2)