Functional Dependencies

So far we can:

Design an ER Diagram
Translate it to a relational DB $R$
Implement $R$ in SQL
- Specify constraints over $R$

However $R$ might not be well designed. And this will cause problems. We call these anomalies.

Question? Is this a good design?

Address	Name	Children
10 Green	Bob	Caleb
10 Green	Bob	Rose
14 Purple	Alice	Gavin
14 Purple	Alice	Sophia

Why or why not?

Potential Problems:

Redundancy
Update Anomalies
Deletion Anomalies

How do we recognize a good design? What does good mean?

Normal Forms

There are several important normal forms: 1NF, 2NF, 3NF, BCNF, 4NF etc.

If $R^*$ is in one of these forms, then $R^*$ is guaranteed to achieve certain good properties, e.g., if $R^*$ is in Boyce Codd NF then it is guaranteed to not have certain type of redundancy.

There exist algorithms to transform $R$ into $R^*$, where $R^*$ is in one of the normal forms.

Each type of normal form is different, there are trade offs. NFs are all based on the idea of functional dependencies.

Functional Dependencies and Keys

Let’s look at an example: Is this a good design?

Address	Name	Children
10 Green	Bob	Caleb
10 Green	Bob	Rose
14 Purple	Alice	Gavin
14 Purple	Alice	Sophia

How can we make it better?

Name	Children
Bob	Caleb
Bob	Rose
Alice	Gavin
Alice	Sophia

Name	Address
Bob	10 Green
Alice	14 Purple

Why is this better?

A form of constraint (part pf the schema), finding them is part of the database design.

Definition: If two tuples agree on the attributes $a_1,a_2,a_3,\ldots,a_n$, then they must also agree on the attributes $b_1,b_2,b_3,\ldots,b_m$.

Formally: $a_1,a_2,a_3,\ldots,a_n \rightarrow b_1,b_2,b_3,\ldots,b_m$ pronounced $X$ determines $Y$.

Let’s try an example:

SALPERS_ID	SALPERS_NAME	MANAGER_ID	OFFICE	COMM
10	Rodney Jones	27	Chicago	10
14	Masaji Matsu	44	Tokyo	11
23	Francois Moire	35	Brussels	9
37	Elena Hermana	12	B.A.	13
39	Goro Azuma	44	Tokyo	10
27	Terry Cardon		Chicago	15
44	Albert Ige	27	Tokyo	12
35	Brigit Bovary	27	Brussels	11
12	Buster Sanchez	27	B.A.	10

What FDs can we find?

Salpers_ID $\rightarrow$ Name, Manager_ID, Office, Comm
Name $\rightarrow$ Salpers_ID
Office $\xrightarrow{?}$ Comm

In general, we check if $X\rightarrow Y$ is violated by erasing all other columns and checking many to one, called functional in mathematics.

Name $\rightarrow$ Salpers_ID. We say that name determines salpers_ID in this case. If we know the name, then we can known the ID.

Salpers_ID $\rightarrow$ Name is likewise true in this case.

Consider the following instance of $R(a,b,c,d,e)$. What FDs can you observe in $R$?

a	b	c	d	e
1	1	1	1	1
2	1	2	2	1
3	2	1	1	1
4	2	2	2	1
5	3	3	1	1

$a$ is unique: $a\rightarrow b, a\rightarrow c, a\rightarrow d, a\rightarrow e \equiv a\rightarrow a,b,c,d$
$e$ are all the same: $a\rightarrow e, b\rightarrow e, c\rightarrow e, d\rightarrow e$

Others:

Combinations of $b,c$ are unique: $b,c\rightarrow a,d,e$
Combinations of $b,d$ are unique: $b,d \rightarrow a,c,e$
if $c$ values match, so do $d$ values: $c\rightarrow d$
$d$ values do not determine $c$: $d\nrightarrow c$
lots of others… $a,e\rightarrow b,c$, etc.

Reasoning with FDs

Relation Keys

Now that we know FDs we can define keys mathematically.

Key of a relation $R$ is a set of attributes that:

Functionally determines all attributes of $R$
None of its subsets determines all attributes of $R$.

Superkey

A set of attributes that contains a key.

A key is the minimal set.

We need to make sure our keys are well defined.

Closure of FD Sets

Given a relation schema $R$ and a set of FDs $S$, we ask:

Is FD $f$ logically implied by $S$?

Example:

                $R = \{a,b,c,g,h,i\}$
                $S=\{a\rightarrow b, a\rightarrow c, c,g\rightarrow h, c,g\rightarrow i, b\rightarrow h\}$
                Does $a\rightarrow h$ follow? Yes, and we can prove this.

Closure of $S : S^+$ contains all of the FDs that follow. We use Armstrongs axioms to find these.

Armstrongs Axioms

wlog: $X=\{x_1, x_2,\ldots,x_n\}$

Reflexivity:

$X\rightarrow $ some subset of $X$

Augmentation

if $X\rightarrow Y$, then $X,Z\rightarrow Y,Z$

Transitivity

if $X\rightarrow Y\wedge Y\rightarrow Z$, then $X\rightarrow Z$

Union

if $X\rightarrow Y\wedge X\rightarrow Z$, then $X\rightarrow Y,Z$

Decomposition

if $X\rightarrow Y,Z$, then $X\rightarrow Y \wedge X\rightarrow Z$

Pseudo-Transitivity

if $X\rightarrow Y\wedge YZ\rightarrow U$, then $XZ\rightarrow U$

Example:

$R(a,b,c), S=\{a\rightarrow b, c\rightarrow a\}$

What is the FD closure $S^+$?

$c\rightarrow b$ (transitivity)
$c,a \rightarrow c,a,b$ (augmentation on 1)
$b,c\rightarrow a,b,c$ (augmentation on $c\rightarrow a$)
$a,b,c \rightarrow a,b,c$ (reflexivity)

Say $R(a,b,c,d), S=\{a,b\rightarrow c, b,c\rightarrow a,d\}$. Does $a,c\rightarrow d$ follow from $S$?

Solve by checking $d\in \{a,c\}^+$

In this case, $\{a,c\}^+ = \{a,c\}$, and since $d \notin \{a,c\}^+ $, then $a,c\rightarrow d$ does not follow from $S$

Closure of Attribute Sets

Given a set of attributes $A={a_1,a_2,\ldots,a_n}$ and a set of dependencies $S$. We want to find all the attributes $B$ such that any relation which satisfies $S$ also satisfies $\{a_1,\ldots,a_n\}\rightarrow B$.

This closure is the set of all such attributes $B$ and is denoted $\{a_1,a_2,\ldots,a_n\}^+$. We use this to test if $A$ is a superkey.

We first compute $X^+$ and check of $X^+$ contains all attributes of $R$. We can also check if $X\rightarrow Y$ holds by checking if $Y\in X^+$

Example: finding keys.

$R(abc), S=\{a\rightarrow b, c\rightarrow a\}$

Consider all attribute subsets $X$

Compute the closure of each subset $X^+$

Determine if the subset X is a superkey or key

$X$	a	b	c	a,b	a,c	a,c	a,b,c
$X^+$	a,b	b	a,b,c	a,b	a,b,c	a,b,c	a,b,c
	–	–	key	–	superkey	superkey	superkey

Another Example:

$R(abcd),S=\{a \rightarrow b,d \rightarrow c,c\rightarrow a\}$

Find the keys and superkeys

$X$	a	b	c	d	a,b	a,c	a,d	b,c	b,d	c,d	a,b,c	a,c,d	a,b,d	b,c,d	a,b,c,d
$X^+$	a,b	b	a,b,c	abcd	ab	abc	abcd	abc	abcd	abcd	abc	abcd	abcd	abcd	abcd
	–	–	–	Key	–	–	SK	–	SK	SK	–	SK	SK	SK	SK

Desirable properties of schema refinement:

Eliminate anomalies (BCNF)
Avoid information loss (BCNF)
Preserve dependencies (3NF, not BCNF)
Ensure good query performance (??)