# Relational Algebra

Goal: specify what we want from our database

#### Example:

Products:

$\sigma_{(\textrm{price}>22)} (\textrm{Products})$

Projection: Unary Operation that returns certain columns
Notation: $\Pi_{(a_1,a_2,\ldots a_k)}(R)$
Input Schema: $R(b_1,\ldots b_m)$
Condition: $(a_1,\ldots,a_k )\subseteq (b_1,\ldots,b_m)$
Ouput schema: $S(a_1,\ldots,a_k)$
Example: Project prod_desc and price from Products: $\Pi_{(\textrm{prod_desc,price})}(\textrm{Products})$

#### Example

Products:

$\Pi_{(\textrm{prod_desc,price})}(\textrm{Products})$

How are they the same, how are they different, why do we need both?

Cartesian Product: Binary Operation that combines two tables
Each tuple in $R_1$ is attached with each tuple in $R_2$
Notation: $R_1\times R_2$
Input Schema: $R_1(a_1,\ldots,a_n)$ and $R_2(b_1,\ldots,b_m)$
Condition: $(a_1,\ldots,a_n)\cap(b_1,\ldots,b_m )=\emptyset$
Ouput schema: $S(a_1,\ldots,a_n,b_1,\ldots,b_m)$
Example: Manufacturers $\times$ Products

Manufacturers

Products:

Manufacturers $\times$ Products

Let’s try an example:

PhoneNumbers

Persons

PhoneNumbers $\times$ Persons:

Renaming:
Does not change the relational instance
Changes the relational schema only
Notation: $\rho_{S(b_1,\ldots,b_n )}(R)$
Input schema: $R(a_1,\ldots,a_n)$
Output schema: $S(b_1,\ldots,b_n)$
Example: change price to MSRP $\rho_{(\textrm{id},\textrm{desc},\textrm{manf},\textrm{cost},\textrm{msrp})}(\textrm{Product})$

$\rho_{(\textrm{id},\textrm{desc},\textrm{manf},\textrm{cost},\textrm{msrp})}(\textrm{Product})$

#### Derived Relational Algebra Operators

Intersection
all tuples both in $R_1$ and in $R_2$
Notation: $R_1\cap R_2$
$R_1$, $R_2$ must have the same schema
$R_1\cap R_2$ has the same schema as $R_1$, $R_2$
Example UnionizedEmployees$\cap$RetiredEmployees
Intersection is derived: $R_1\cap R_2=R_1-(R_1-R_2)$

## Joins

Theta join, Natural join, Equi-join, Semi-join, Inner join, Outer join, etc.

Theta Join
A join that involves a predicate
Notation: $R_1\bowtie_\theta R_2$ where $\theta$ is a condition
Input schemas: $R_1(a_1,\ldots,a_n)$ and $R_2(b_1,\ldots,b_m)$ s.t. $(a_1,\ldots,a_n)\cap (b_1,\ldots,b_m)=\emptyset$ Output schema: $S(a_1,\ldots,a_n,b_1,\ldots,b_m)$
Derived operator: $R_1\bowtie_\theta R_2=\sigma_\theta(R_1\times R_2)$

Manufacturers

Products:

$\textrm{Manufacturers}\bowtie_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} \textrm{Product} = \sigma_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} (\textrm{Manufacturer}\times \textrm{Product})$

Theta Join: $\textrm{PhoneNumbers}\bowtie_{(\textrm{Phone}>\textrm{6699586})} \textrm{Persons}$

Equi-join
Notation: $R_1 \bowtie_{(a=b)} R_2$ where $\theta$ is an equality. Natural join is a special case of equi-join.
There is a LOT of research on how to do this efficiently.

Equi-Join: $\textrm{PhoneNumbers}\bowtie_{(\textrm{PID}=\textrm{ID})} \textrm{Persons}$

Natural Join
Notation: $R_1\bowtie R_2$ where $\theta$ is assumed
Input schemas: $R_1 (a_1,\ldots,a_n )$ and $R_2 (b_1,\ldots,b_m)$
Output schema: $S(c_1,\ldots,c_p )$ where $c_1,\ldots,c_p=(a_1,\ldots,a_n )\cup(b_1,\ldots,b_m )$
Meaning we combine all pairs of tuples in R_1 and R_2 that agree on attributes
$(a_1,…,a_n )\cap(b_1,…b_m )$ we call the join attributes
Derived operator:
$R_1\bowtie R_2=\sigma(R_1\times R_2)$

$\textrm{Manufacturers}\bowtie \textrm{Product} = \sigma_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} (\textrm{Manufacturer}\times \textrm{Product})$

Equi-Join: $\textrm{PhoneNumbers}\bowtie \textrm{Persons}$

Given the schemas $R(a,b,c,d)$, $S(a,c,e)$, what is the schema of $R\bowtie S$ ?
$R\bowtie S\rightarrow(a,b,c,d,e)$

Given $R(a,b,c)$, $S(d,e)$, what is $R\bowtie S$ ?
$R\bowtie S\rightarrow(a,b,c,d,e)$ (but result looks a lot different)

Given $R(a,b)$, $S(a,b)$, what is $R\bowtie S$ ?
$R\bowtie S\rightarrow (a,b)$