Functional Dependencies

So far we can:

  • Design an ER Diagram
  • Translate it to a relational DB $R$
  • Implement $R$ in SQL
    • Specify constraints over $R$

However $R$ might not be well designed. And this will cause problems. We call these anomalies.

Question? Is this a good design?

10 Green123-45-6789521-632-6859
10 Green123-45-6789587-658-9362
14 Purple909-64-5874857-965-6951
14 Purple909-64-5874547-658-7854

Why or why not?

Potential Problems:

  • Redundancy
  • Update Anomalies
  • Deletion Anomalies

How do we recognize a good design? What does good mean?

Normal Forms

There are several important normal forms: 1NF, 2NF, 3NF, BCNF, 4NF etc.

If $R^*$ is in one of these forms, then $R^*$ is guaranteed to achieve certain good properties, e.g., if $R^*$  is in Boyce Codd NF then it is guaranteed to not have certain type of redundancy.

There exist algorithms to transform $R$ into $R^*$, where $R^*$ is in one of the normal forms.

Each type of normal form is different, there are trade offs. NFs are all based on the idea of functional dependencies.

Functional Dependencies and Keys

Let’s look at an example: Is this a good design?

10 Green123-45-6789521-632-6859
10 Green123-45-6789587-658-9362
14 Purple909-64-5874857-965-6951
14 Purple909-64-5874547-658-7854

How can we make it better?

SSN Phone
123-45-678910 Green
909-64-587414 Purple

Why is this better?

Functional dependencies

A form of constraint (part pf the schema), finding them is part of the database design.

Definition:     If two tuples agree on the attributes $a_1,a_2,a_3,\ldots,a_n$, then they must also agree on the attributes $b_1,b_2,b_3,\ldots,b_m$.

Formally: $a_1,a_2,a_3,\ldots,a_n \rightarrow b_1,b_2,b_3,\ldots,b_m$ pronounced $X$ determines $Y$.

Let’s try an example:

        10Rodney Jones           27 Chicago           10
        14Masaji Matsu           44 Tokyo             11
        23Francois Moire         35 Brussels         9
        37Elena Hermana           12 B.A.             13
        39Goro Azuma             44 Tokyo             10
        27Terry Cardon              Chicago           15
        44Albert Ige             27 Tokyo             12
        35Brigit Bovary           27 Brussels         11
        12Buster Sanchez         27B.A.             10

What FDs can we find?

Salpers_ID $\rightarrow$ Name, Manager_ID, Office, Comm
Name $\rightarrow$ Salpers_ID
Office $\xrightarrow{?}$ Comm

In general, we check if $X\rightarrow Y$ is violated by erasing all other columns and checking many to one, called functional in mathematics.

Name $\rightarrow$ Salpers_ID. We say that name determines salpers_ID in this case. If we know the name, then we can known the ID.

Salpers_ID $\rightarrow$ Name is likewise true in this case.

Consider the following instance of $R(a,b,c,d,e)$. What FDs can you observe in $R$?


$a$ is unique: $a\rightarrow b, a\rightarrow c, a\rightarrow d, a\rightarrow e \equiv a\rightarrow a,b,c,d$
$e$ are all the same: $a\rightarrow e, b\rightarrow e, c\rightarrow e, d\rightarrow e$


Combinations of $b,c$ are unique: $b,c\rightarrow a,d,e$
Combinations of $b,d$ are unique: $b,d \rightarrow a,c,e$
if $c$ values match, so do $d$ values: $c\rightarrow d$
$d$ values do not determine $c$: $d\nrightarrow c$
lots of others… $a,e\rightarrow b,c$, etc.

Reasoning with FDs

Relation Keys

Now that we know FDs we can define keys mathematically.

Key of a relation $R$ is a set of attributes that:

  • Functionally determines all attributes of $R$
  • None of its subsets determines all attributes of $R$.


  • A set of attributes that contains a key.

A key is the minimal set.

We need to make sure our keys are well defined.

Closure of FD Sets

Given a relation schema $R$ and a set of FDs $S$, we ask:

                Is FD $f$ logically implied by $S$?


                $R = \{a,b,c,g,h,i\}$
                $S=\{a\rightarrow b, a\rightarrow c, c,g\rightarrow h, c,g\rightarrow i, b\rightarrow h\}$
                Does $a\rightarrow h$ follow? Yes, and we can prove this.

Closure of $S : S^+$ contains all of the FDs that follow. We use Armstrongs axioms to find these.

Armstrongs Axioms

wlog: $X=\{x_1, x_2,\ldots,x_n\}$


$X\rightarrow $ some subset of $X$


if $X\rightarrow Y$, then $X,Z\rightarrow Y,Z$


if $X\rightarrow Y\wedge Y\rightarrow Z$, then $X\rightarrow Z$


if $X\rightarrow Y\wedge X\rightarrow Z$, then $X\rightarrow Y,Z$


if $X\rightarrow Y,Z$, then $X\rightarrow Y \wedge X\rightarrow Z$


if $X\rightarrow Y\wedge YZ\rightarrow U$, then $XZ\rightarrow U$


$R(a,b,c), S=\{a\rightarrow b, c\rightarrow a\}$

What is the FD closure $S^+$?

  1. $c\rightarrow b$ (transitivity)
  2. $c,a \rightarrow c,a,b$ (augmentation on 1)
  3. $b,c\rightarrow a,b,c$ (augmentation on $c\rightarrow a$)
  4. $a,b,c \rightarrow a,b,c$ (reflexivity)

Say $R(a,b,c,d), S=\{a,b\rightarrow c, b,c\rightarrow a,d\}$. Does $a,c\rightarrow d$ follow from $S$?

Solve by checking $d\in \{a,c\}^+$

In this case, $\{a,c\}^+ = \{a,c\}$, and since $d \notin \{a,c\}^+ $, then $a,c\rightarrow d$ does not follow from $S$

Closure of Attribute Sets

Given a set of attributes $A={a_1,a_2,\ldots,a_n}$ and a set of dependencies $S$. We want to find all the attributes $B$ such that any relation which satisfies $S$ also satisfies $\{a_1,\ldots,a_n\}\rightarrow B$.

This closure is the set of all such attributes $B$ and is denoted $\{a_1,a_2,\ldots,a_n\}^+$. We use this to test if $A$ is a superkey.

We first compute $X^+$ and check of $X^+$ contains all attributes of $R$. We can also check if $X\rightarrow Y$ holds by checking if $Y\in X^+$

Example: finding keys.

$R(abc), S=\{a\rightarrow b, c\rightarrow a\}$

Consider all attribute subsets $X$

Compute the closure of each subset $X^+$

Determine if the subset X is a superkey or key

$X$ abca,ba,ca,ca,b,c
$X^+$ a,bb a,b,c a,b a,b,c a,b,c a,b,c
 key  – superkeysuperkeysuperkey

Another Example:

$R(abcd),S=\{a \rightarrow b,d \rightarrow c,c\rightarrow a\}$

Find the keys and superkeys

$X^+$ a,bba,b,cabcdababcabcdabcabcdabcdabcabcdabcdabcd abcd

Desirable properties of schema refinement:

  1. Eliminate anomalies (BCNF)
  2. Avoid information loss (BCNF)
  3. Preserve dependencies (3NF, not BCNF)
  4. Ensure good query performance (??)