# The Normal Forms

#### Desirable Properties of a Schema

1. Eliminate anomalies
2. Preserve dependencies
3. Ensure good query performance

#### Armstrongs Axioms

wlog: $X=\{x_1, x_2,\ldots,x_n\}$

Reflexivity:

$X\rightarrow$ some subset of $X$

Augmentation

if $X\rightarrow Y$, then $X,Z\rightarrow Y,Z$

Transitivity

if $X\rightarrow Y\wedge Y\rightarrow Z$, then $X\rightarrow Z$

Union

if $X\rightarrow Y\wedge X\rightarrow Z$, then $X\rightarrow Y,Z$

Decomposition

if $X\rightarrow Y,Z$, then $X\rightarrow Y \wedge X\rightarrow Z$

Pseudo-Transitivity

if $X\rightarrow Y\wedge YZ\rightarrow U$, then $XZ\rightarrow U$

## Normal Forms:

1. First Normal Form – 1NF
2. Second Normal Form – 2NF
3. Third Normal Form – 3NF
4. Boyce-Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
there are others…

#### First Normal Form

All attributes must be atomic. No lists of things in a cell.

#### Second Normal Form

No partial dependencies.

Given a relation: $R(abcd)$, $S=\{ab\rightarrow c, b\rightarrow d\}$

Compute the Keys:

Candidate Key: a minimal superkey. If more than one, then you’ve got to pick one to be the key (later).
Prime attributes
: attributes that are in a candidate key. a and b are prime
Non-prime attributes: attributes that are not in a candidate key: c, d are non-prime

Partial dependency: a non-prime attribute that depends on part of a candidate key

2NF means no partial dependency.

Is this in 2NF? No. $b\rightarrow d$ is a partial dependency.

So what do we do?

## Decomposition

There are lots of ways to decompose – we will cover only the easy one.

To decompose we need to find the dependency $X\rightarrow Y$ that violates the normal form condition. Then we split as follows:

Create $R_1=X\cup$(others) $R_2=X\cup Y$

Note: Try to make the split as large as possible (heuristic to minimize number of times you have to split)

Then we check $R_1$ and $R_2$ for further violations and re-decompose if necessary.

#### Back to the example:

Violation is $b\rightarrow d$

So we decompose into:

$R_1(abc)$ $S=\{ab\rightarrow c\}$
$R_2(bd)$ $S_2=\{b\rightarrow d\}$.

Note that its important that $X$ is in both $R_1$ and $R_2$ so we may rejoin later. But this can lose information sometimes.

We check again for for 2NF; they both pass so we’re done.

## Third Normal Form

In 2NF and a non-prime attribute cannot determine another non-prime attribute: i.e., no transitive dependencies.

Definition:
Whenever there is a nontrivial FD $X\rightarrow Y$ in $R$, then $X$ is a superkey for $R$ or $Y$ is a part of a candidate key (i.e., is prime).

In English: “Everything is determined by the key or is in the key.”

3NF is not a hard condition to achieve.

#### Example:

$R(abcd)$ $S=\{ab\rightarrow c,c\rightarrow d\}$

Is it in 3NF? Nope

What is the violating dependency? $c\rightarrow d$.

So we decompose:

$R_1(c,d)$ $S_1(c\rightarrow d)$
$R_2(c,a,b)$ $S_1(a,b\rightarrow c)$

Is $R_1$ and $R_2$ in 3NF? Yes

Are dependencies preserved? Yes

#### Example:

$R(abcde)$ $S=\{ab\rightarrow c, c\rightarrow b, a\rightarrow d\}$

Compute the keys:

Is it in 3NF? No, not in 2NF either

What is the violating dependency? $a\rightarrow d$ Because $a$ is not a superkey

So we need to decompose $R$ based on $a\rightarrow d$

$R_1(abce)$ $S=\{ab\rightarrow c, c\rightarrow b\}$
$R_2(ad)$ $S=\{a\rightarrow d\}$

Are $R_1$ and $R_2$ this in 3NF?

For $R_1$ what are the keys? abe, ace (others too)

$ab$ is not a superkey, but $c$ is prime
$c$ is not a superkey, but $b$ is prime

Are $R_1$ and $R_2$ in 2NF? Yes
Are $R_1$ and $R_2$ in 3NF? Yes