The Normal Forms

Desirable Properties of a Schema

  1. Eliminate anomalies
  2. Preserve dependencies
  3. Ensure good query performance

Armstrongs Axioms

wlog: $X=\{x_1, x_2,\ldots,x_n\}$

Reflexivity:

$X\rightarrow $ some subset of $X$

Augmentation

if $X\rightarrow Y$, then $X,Z\rightarrow Y,Z$

Transitivity

if $X\rightarrow Y\wedge Y\rightarrow Z$, then $X\rightarrow Z$

Union

if $X\rightarrow Y\wedge X\rightarrow Z$, then $X\rightarrow Y,Z$

Decomposition

if $X\rightarrow Y,Z$, then $X\rightarrow Y \wedge X\rightarrow Z$

Pseudo-Transitivity

if $X\rightarrow Y\wedge YZ\rightarrow U$, then $XZ\rightarrow U$

Incomplete List of Normal Forms

Normal Forms:

  1. First Normal Form – 1NF
  2. Second Normal Form – 2NF
  3. Third Normal Form – 3NF
  4. Boyce-Codd Normal Form (BCNF)
  5. Fourth Normal Form (4NF)
    there are others…

First Normal Form

All attributes must be atomic. No lists of things in a cell.

Second Normal Form

No partial dependencies.

Given a relation: $R(abcd)$, $S=\{ab\rightarrow c, b\rightarrow d\}$

Compute the Keys:

$X$abcda,ba,ca,db,cb,dc,da,b,ca,c,da,b,db,c,da,b,c,d
$X^+$ abcdabcdacadbcdbdcdabcdacdabcdbcd abcd
 KeySKSKSK

Candidate Key: a minimal superkey. If more than one, then you’ve got to pick one to be the key (later).
Prime attributes
: attributes that are in a candidate key. a and b are prime
Non-prime attributes: attributes that are not in a candidate key: c, d are non-prime

Partial dependency: a non-prime attribute that depends on part of a candidate key

2NF means no partial dependency.

Is this in 2NF? No. $b\rightarrow d$ is a partial dependency.

So what do we do?

Decomposition

There are lots of ways to decompose – we will cover only the easy one.

To decompose we need to find the dependency $X\rightarrow Y$ that violates the normal form condition. Then we split as follows:

Create $R_1=X\cup$(others) $R_2=X\cup Y$

Note: Try to make the split as large as possible (heuristic to minimize number of times you have to split)

Then we check $R_1$ and $R_2$ for further violations and re-decompose if necessary.

Back to the example:

Violation is $b\rightarrow d$

So we decompose into:

$R_1(abc)$ $S=\{ab\rightarrow c\}$
$R_2(bd)$ $S_2=\{b\rightarrow d\}$.

Note that its important that $X$ is in both $R_1$ and $R_2$ so we may rejoin later. But this can lose information sometimes.

We check again for for 2NF; they both pass so we’re done.

Third Normal Form

In 2NF and a non-prime attribute cannot determine another non-prime attribute: i.e., no transitive dependencies.

Definition:
Whenever there is a nontrivial FD $X\rightarrow Y$ in $R$, then $X$ is a superkey for $R$ or $Y$ is a part of a candidate key (i.e., is prime).

In English: “Everything is determined by the key or is in the key.”

3NF is not a hard condition to achieve.

Example:

$R(abcd)$ $S=\{ab\rightarrow c,c\rightarrow d\}$

Is it in 3NF? Nope

What is the violating dependency? $c\rightarrow d$.

So we decompose:

$R_1(c,d)$ $S_1(c\rightarrow d)$
$R_2(c,a,b)$ $S_1(a,b\rightarrow c)$

Is $R_1$ and $R_2$ in 3NF? Yes

Are dependencies preserved? Yes

Example:

$R(abcde)$ $S=\{ab\rightarrow c, c\rightarrow b, a\rightarrow d\}$

Compute the keys:

abcdeabacadaebcbdbecdceabcabdabeacdaceade
adbcbdeabcdabcdadadebcbdbecbdcbeabcdabcdabcdeabcdabcdeade
keykey

Is it in 3NF? No, not in 2NF either

What is the violating dependency? $a\rightarrow d$ Because $a$ is not a superkey

So we need to decompose $R$ based on $a\rightarrow d$

$R_1(abce)$ $S=\{ab\rightarrow c, c\rightarrow b\}$
$R_2(ad)$ $S=\{a\rightarrow d\}$

Are $R_1$ and $R_2$ this in 3NF?

For $R_1$ what are the keys? abe, ace (others too)

$ab$ is not a superkey, but $c$ is prime
$c$ is not a superkey, but $b$ is prime

Are $R_1$ and $R_2$ in 2NF? Yes
Are $R_1$ and $R_2$ in 3NF? Yes