Exam 2 Review


How is data stored on disk?

Types of indexes

                Clustered vs Unclustered, Sparse vs Dense
                Composite Indexes, Why attribute order matters

B+ Trees (How are they different/better than B-Trees),


                Buckets vs disk blocks
                Extensible HashTables
                Linear HashTables

When and how do indexes help with Table Joins.

What is UTF-8 how is it different than ASCII?

Query Optimization

How does a DB decide what physical algorithms to run?

Cost Parameters, M, T, B, V

When can sorting, hashing, indexing help? How do we estimate this?

Rules based optimization

Push down Optimization
Pull up, Push Down Optimization

Cost based optimization

                Calculating a query plan cost

Transaction Management

ACID Properties

                What are they, why are they important?

Serial Schedules vs Serializable Schedules

2PL to enforce Isolation and serializability


How is the log written under various regimes:


What are the differences in recovery?

How does checkpointing work in the log

What is (non)quiescent checkpointing? How does it work?

Distributed Storage Systems

How does Hadoop split apart large files?

Why is replication important? Be able to perform an example.

What happens when a node in HDFS fails? What happens when an entire rack goes down?

Map Reduce

                What are the inputs and outputs of the mapper?
                What are the inputs and outputs of the reducer?

                What does the MapReduce subsystem do in between map and reduce?

How can Map Reduce be used to answer large SQL queries

Be able to design a map reduce program (pseudocode) that performs some SQL function


Describe the CAP theorem, name some databases that might apply to the different regimes

Why do columnar databases store their data in columns, and why is that congruent with HDFS?

What is SPARK? How is it different from MapReduce/HDFS systems?