Desirable Properties of a Schema
- Eliminate anomalies
- Preserve dependencies
- Ensure good query performance
Armstrongs Axioms
wlog: $X=\{x_1, x_2,\ldots,x_n\}$
Reflexivity:
$X\rightarrow $ some subset of $X$
Augmentation
if $X\rightarrow Y$, then $X,Z\rightarrow Y,Z$
Transitivity
if $X\rightarrow Y\wedge Y\rightarrow Z$, then $X\rightarrow Z$
Union
if $X\rightarrow Y\wedge X\rightarrow Z$, then $X\rightarrow Y,Z$
Decomposition
if $X\rightarrow Y,Z$, then $X\rightarrow Y \wedge X\rightarrow Z$
Pseudo-Transitivity
if $X\rightarrow Y\wedge YZ\rightarrow U$, then $XZ\rightarrow U$
Incomplete List of Normal Forms
Normal Forms:
- First Normal Form – 1NF
- Second Normal Form – 2NF
- Third Normal Form – 3NF
- Boyce-Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
there are others…
First Normal Form
All attributes must be atomic. No lists of things in a cell.
Second Normal Form
No partial dependencies.
Given a relation: $R(abcd)$, $S=\{ab\rightarrow c, b\rightarrow d\}$
Compute the Keys:
$X$ | a | b | c | d | a,b | a,c | a,d | b,c | b,d | c,d | a,b,c | a,c,d | a,b,d | b,c,d | a,b,c,d |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
$X^+$ | a | b | c | d | abcd | ac | ad | bcd | bd | cd | abcd | acd | abcd | bcd | abcd |
– | – | – | – | Key | – | – | – | – | – | SK | – | SK | – | SK |
Candidate Key: a minimal superkey. If more than one, then you’ve got to pick one to be the key (later).
Prime attributes: attributes that are in a candidate key. a and b are prime
Non-prime attributes: attributes that are not in a candidate key: c, d are non-prime
Partial dependency: a non-prime attribute that depends on part of a candidate key
2NF means no partial dependency.
Is this in 2NF? No. $b\rightarrow d$ is a partial dependency.
So what do we do?
Decomposition
There are lots of ways to decompose – we will cover only the easy one.
To decompose we need to find the dependency $X\rightarrow Y$ that violates the normal form condition. Then we split as follows:
Create $R_1=X\cup$(others) $R_2=X\cup Y$
Note: Try to make the split as large as possible (heuristic to minimize number of times you have to split)
Then we check $R_1$ and $R_2$ for further violations and re-decompose if necessary.
Back to the example:
Violation is $b\rightarrow d$
So we decompose into:
$R_1(abc)$ $S=\{ab\rightarrow c\}$
$R_2(bd)$ $S_2=\{b\rightarrow d\}$.
Note that its important that $X$ is in both $R_1$ and $R_2$ so we may rejoin later. But this can lose information sometimes.
We check again for for 2NF; they both pass so we’re done.
Third Normal Form
In 2NF and a non-prime attribute cannot determine another non-prime attribute: i.e., no transitive dependencies.
Definition:
Whenever there is a nontrivial FD $X\rightarrow Y$ in $R$, then $X$ is a superkey for $R$ or $Y$ is a part of a candidate key (i.e., is prime).
In English: “Everything is determined by the key or is in the key.”
3NF is not a hard condition to achieve.
Example:
$R(abcd)$ $S=\{ab\rightarrow c,c\rightarrow d\}$
Is it in 3NF? Nope
What is the violating dependency? $c\rightarrow d$.
So we decompose:
$R_1(c,d)$ $S_1(c\rightarrow d)$
$R_2(c,a,b)$ $S_1(a,b\rightarrow c)$
Is $R_1$ and $R_2$ in 3NF? Yes
Are dependencies preserved? Yes
Example:
$R(abcde)$ $S=\{ab\rightarrow c, c\rightarrow b, a\rightarrow d\}$
Compute the keys:
a | b | c | d | e | ab | ac | ad | ae | bc | bd | be | cd | ce | abc | abd | abe | acd | ace | ade |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ad | b | cb | d | e | abcd | abcd | ad | ade | bc | bd | be | cbd | cbe | abcd | abcd | abcde | abcd | abcde | ade |
– | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | key | – | key | – |
Is it in 3NF? No, not in 2NF either
What is the violating dependency? $a\rightarrow d$ Because $a$ is not a superkey
So we need to decompose $R$ based on $a\rightarrow d$
$R_1(abce)$ $S=\{ab\rightarrow c, c\rightarrow b\}$
$R_2(ad)$ $S=\{a\rightarrow d\}$
Are $R_1$ and $R_2$ this in 3NF?
For $R_1$ what are the keys? abe, ace (others too)
$ab$ is not a superkey, but $c$ is prime
$c$ is not a superkey, but $b$ is prime
Are $R_1$ and $R_2$ in 2NF? Yes
Are $R_1$ and $R_2$ in 3NF? Yes