# Functional Dependencies

So far we can:

• Design an ER Diagram
• Translate it to a relational DB $R$
• Implement $R$ in SQL
• Specify constraints over $R$

However $R$ might not be well designed. And this will cause problems. We call these anomalies.

Question? Is this a good design?

Why or why not?

Potential Problems:

• Redundancy
• Update Anomalies
• Deletion Anomalies

How do we recognize a good design? What does good mean?

## Normal Forms

There are several important normal forms: 1NF, 2NF, 3NF, BCNF, 4NF etc.

If $R^*$ is in one of these forms, then $R^*$ is guaranteed to achieve certain good properties, e.g., if $R^*$  is in Boyce Codd NF then it is guaranteed to not have certain type of redundancy.

There exist algorithms to transform $R$ into $R^*$, where $R^*$ is in one of the normal forms.

Each type of normal form is different, there are trade offs. NFs are all based on the idea of functional dependencies.

## Functional Dependencies and Keys

Let’s look at an example: Is this a good design?

How can we make it better?

Why is this better?

## Functional dependencies

A form of constraint (part pf the schema), finding them is part of the database design.

Definition:     If two tuples agree on the attributes $a_1,a_2,a_3,\ldots,a_n$, then they must also agree on the attributes $b_1,b_2,b_3,\ldots,b_m$.

Formally: $a_1,a_2,a_3,\ldots,a_n \rightarrow b_1,b_2,b_3,\ldots,b_m$ pronounced $X$ determines $Y$.

Let’s try an example:

What FDs can we find?

Salpers_ID $\rightarrow$ Name, Manager_ID, Office, Comm
Name $\rightarrow$ Salpers_ID
Office $\xrightarrow{?}$ Comm

In general, we check if $X\rightarrow Y$ is violated by erasing all other columns and checking many to one, called functional in mathematics.

Name $\rightarrow$ Salpers_ID. We say that name determines salpers_ID in this case. If we know the name, then we can known the ID.

Salpers_ID $\rightarrow$ Name is likewise true in this case.

Consider the following instance of $R(a,b,c,d,e)$. What FDs can you observe in $R$?

$a$ is unique: $a\rightarrow b, a\rightarrow c, a\rightarrow d, a\rightarrow e \equiv a\rightarrow a,b,c,d$
$e$ are all the same: $a\rightarrow e, b\rightarrow e, c\rightarrow e, d\rightarrow e$

Others:

Combinations of $b,c$ are unique: $b,c\rightarrow a,d,e$
Combinations of $b,d$ are unique: $b,d \rightarrow a,c,e$
if $c$ values match, so do $d$ values: $c\rightarrow d$
$d$ values do not determine $c$: $d\nrightarrow c$
lots of othersâ€¦ $a,e\rightarrow b,c$, etc.

## Reasoning with FDs

#### Relation Keys

Now that we know FDs we can define keys mathematically.

Key of a relation $R$ is a set of attributes that:

• Functionally determines all attributes of $R$
• None of its subsets determines all attributes of $R$.

Superkey

• A set of attributes that contains a key.

A key is the minimal set.

We need to make sure our keys are well defined.

#### Closure of FD Sets

Given a relation schema $R$ and a set of FDs $S$, we ask:

Is FD $f$ logically implied by $S$?

Example:

$R = \{a,b,c,g,h,i\}$
$S=\{a\rightarrow b, a\rightarrow c, c,g\rightarrow h, c,g\rightarrow i, b\rightarrow h\}$
Does $a\rightarrow h$ follow? Yes, and we can prove this.

Closure of $S : S^+$ contains all of the FDs that follow. We use Armstrongs axioms to find these.

#### Armstrongs Axioms

wlog: $X=\{x_1, x_2,\ldots,x_n\}$

Reflexivity:

$X\rightarrow$ some subset of $X$

Augmentation

if $X\rightarrow Y$, then $X,Z\rightarrow Y,Z$

Transitivity

if $X\rightarrow Y\wedge Y\rightarrow Z$, then $X\rightarrow Z$

Union

if $X\rightarrow Y\wedge X\rightarrow Z$, then $X\rightarrow Y,Z$

Decomposition

if $X\rightarrow Y,Z$, then $X\rightarrow Y \wedge X\rightarrow Z$

Pseudo-Transitivity

if $X\rightarrow Y\wedge YZ\rightarrow U$, then $XZ\rightarrow U$

Example:

$R(a,b,c), S=\{a\rightarrow b, c\rightarrow a\}$

What is the FD closure $S^+$?

1. $c\rightarrow b$ (transitivity)
2. $c,a \rightarrow c,a,b$ (augmentation on 1)
3. $b,c\rightarrow a,b,c$ (augmentation on $c\rightarrow a$)
4. $a,b,c \rightarrow a,b,c$ (reflexivity)

Say $R(a,b,c,d), S=\{a,b\rightarrow c, b,c\rightarrow a,d\}$. Does $a,c\rightarrow d$ follow from $S$?

Solve by checking $d\in \{a,c\}^+$

In this case, $\{a,c\}^+ = \{a,c\}$, and since $d \notin \{a,c\}^+$, then $a,c\rightarrow d$ does not follow from $S$

#### Closure of Attribute Sets

Given a set of attributes $A={a_1,a_2,\ldots,a_n}$ and a set of dependencies $S$. We want to find all the attributes $B$ such that any relation which satisfies $S$ also satisfies $\{a_1,\ldots,a_n\}\rightarrow B$.

This closure is the set of all such attributes $B$ and is denoted $\{a_1,a_2,\ldots,a_n\}^+$. We use this to test if $A$ is a superkey.

We first compute $X^+$ and check of $X^+$ contains all attributes of $R$. We can also check if $X\rightarrow Y$ holds by checking if $Y\in X^+$

Example: finding keys.

$R(abc), S=\{a\rightarrow b, c\rightarrow a\}$

Consider all attribute subsets $X$

Compute the closure of each subset $X^+$

Determine if the subset X is a superkey or key

Another Example:

$R(abcd),S=\{a \rightarrow b,d \rightarrow c,c\rightarrow a\}$

Find the keys and superkeys

#### Desirable properties of schema refinement:

1. Eliminate anomalies (BCNF)
2. Avoid information loss (BCNF)
3. Preserve dependencies (3NF, not BCNF)
4. Ensure good query performance (??)