Boyce and Codd’s Normal Form

First a recap:

Normal Forms:

  1. First Normal Form – 1NF
  2. Second Normal Form – 2NF
  3. Third Normal Form – 3NF
  4. Boyce-Codd Normal Form (BCNF)
  5. Fourth Normal Form (4NF)
    there are others…

First Normal Form

All attributes must be atomic. No lists of things in a cell.

Second Normal Form

No Partial dependency: a non-prime attribute cannot depend on a prime attribute that is a subset of a candidate key.

Third Normal Form

No transitive dependencies. Whenever there is a nontrivial FD $X\rightarrow Y$ in $R$, then $X$ is a superkey for $R$ or $Y$ is a part of a candidate key (prime attribute).

Desirable Properties of a Schema

  1. Eliminate anomalies (2NF, 3NF)
  2. Preserve dependencies (2NF, 3NF) — except we didn’t discuss the algorithm that does this.
  3. Ensure good query performance (??)

We need a slightly better normal form.

Boyce Codd Normal Form

First Who are Boyce and Codd:

Raymond Boyce

Invented a language called SEQUEL alongside Chamberlain.

PhD at Purdue, but died very young.

Read the paper

Edgar F Codd.jpg

Ted Codd

Invented the Relational Model, OLAP, System R.

PhD at Michigan, Turing Award.

Definition: Whenever there is a nontrivial FD $X\rightarrow Y$ in $R$, then $X$ is a superkey for $R$.

In English (although a bit vague): Whenever a set of attributes of $R$ determines another attribute, it should determine all of the attributes of $R$.

Let’s work through some examples:

Address SSN Phone
10 Green123-45-6789521-632-6859
10 Green123-45-6789587-658-9362
14 Purple909-64-5874857-965-6951
14 Purple909-64-5874547-658-7854

Dependencies: SSN$\rightarrow$Address, and Phone$\rightarrow$SSN, Phone.

What are the keys?

XSSNAddressPhoneSNSPAPSAP
X$^+$SAAPSASASPAAPSSAP
 keysuperkeysuperkeysuperkey

Is it in BCNF?

No: Because SSN$\rightarrow$Address but SSN is not a Superkey

So what do we do?

Decompose:

To decompose we need to find the dependency $X\rightarrow Y$ that violates the BCNF condition. Try to make the split as large as possible (heuristic to minimize number of times you have to split)

Create $R_1=X\cup Y$ $R_2=X\cup$(others), Keep splitting until all relations are in BCNF.

SSNAddress
123-45-678910 Green
909-64-587414 Purple
SSNPhone
123-45-6789521-632-6859
123-45-6789587-658-9362
909-64-5874857-965-6951
909-64-5874547-658-7854

Are these in BCNF now?

Yes: SSN $\rightarrow$ Address, and SSN,Phone $\rightarrow$ SSN,Phone is trivial

Another Example:

$R$(Name, Price, Category) $S=\{$Name$\rightarrow$Price,Category$\}$
Is this Relation in BCNF? Yes.

Preserves Dependencies

Recall that the original dependencies were SSN$\rightarrow$Address, and Phone$\rightarrow$SSN,Phone

But now $R_1$ has $S_1=\{$SSN$\rightarrow$Address$\}$ but $S_2=\{\}$; The Phone$\rightarrow$SSN,Phone dependency was lost.

This is a caveat with BCNF, but not with 3NF. These dependencies meant something to the designer, but the strictness of BCNF may break them.

Lossless Decomposition

A decomposition is lossless if we can recover the initial relation by rejoining.

For example: $R(a,b,c) \Rightarrow R_1(a,b), R_2(a,c) \Rightarrow R^{‘}(a,b,c)=R(a,b,c)$

Turns out that R^’ is can easily be larger than R, we need to make sure they are the same.

Consider this example:

$R(a,b,c), S={a\rightarrow b,a\rightarrow c}$

Key: $a$

Decompositions will lose important information about how $a$ and $c$ are related.

Another example:

$R(abcd), S={a\rightarrow b, bc\rightarrow d, d\rightarrow a}$

Is this in good form?

What are the keys?

Xabcdabacadbcbdcdabcacdbcdabcd
abbcdabababcdadbabcdbdacdababcdacdbabcdabcd
      CK CK CKSK SK SK SK

We have three candidate keys and 4 additional superkeys (candidate keys are technically superkeys too)

Is it in BCNF? No

So lets Decompose $R$ into $R_1(a,b), S_1=\{a\rightarrow b\}$ and $R_2(acd), S_2=\{d\rightarrow a\}$.

But we’re not done.

Check: is $R_1$ in BCNF? Yes, $a\rightarrow b$ and $a$ is superkey/key in $R_1$.
Check: is $R_2$ in BCNF? No, $d\rightarrow a$ and $d$ is not a superkey/key in $R_2$.

So we need to further decompose $R_2$

Decompose $R_2$ into $R_3(ad)$ and $R_4(cd)$

Check: is $R_3$ in BCNF? Yes, $d$ is a key and $S_3=\{d\rightarrow a\}$.
Check: is $R_4$ in BCNF? Yes, they key would just be $\{cd\}$, and no dependencies are violated because there are none.

Thus, $R_1$,$R_3$, and $R_4$ are in BCNF.

So why not just go straight to BCNF? Why bother with 2NF and 3NF, etc?

It turns out that BCNF might be too strict sometimes, some decompositions are certain to eliminate anomalies, but don’t preserve dependencies.

Practically – “Every non-key attribute is determined by the key, the whole key and nothing but the key, so help me Codd.” (This is slightly stronger version than BCNF)

And we can keep going too $5NF \subset 4NF \subset BCNF \subset 3NF \subset 2NF \subset 1NF$

Advice: aim for BCNF, settle for 3NF if too many important FDs are broken.