Relational Algebra – Tim Weninger, PhD

Goal: specify what we want from our database

Find all the products that are Sweaters whose price is greater than $20.

Could write in C++/Java, but bad idea

Instead use high-level query languages:

Theoretical: Relational Algebra, Datalog
Practical: SQL
Relational algebra: a basic set of operations on relations that provide the basic principles.

Motivation – The Stack Data Structure

To use the “stack” data structure in my program, I need to know

what a stack looks like
what (useful) operations I can perform on a stack
- PUSH and POP

Next, we look for an implementation of stack

browse the Web
find many of them
choose one, say LEDA

LEDA already implement PUSH and POP

It also gives me a simple language L, in which to define a stack and call PUSH and POP

S = init_stack(int); S.push(3); S.push(5); int x = S.pop();

Can also define an expression of operations on stacks

T = init_stack(int); T.push(S.pop());

To summarize, I know:

definition of stack (relations)
its operations (PUSH, POP): (algebra)
an implementation called LEDA, which tells me how to call PUSH and POP in a language $L$ (SQL)
I can use these implementations to manipulate stacks
- LEDA hides the implementation details
- LEDA optimizes implementation of PUSH and POP

What is an algebra?

Mathematical system consisting of:

Operands — variables or values from which new values can be constructed.
Operators — symbols denoting procedures that construct new values from given values.

In Arithmetic algebra what are the operands? Operators?

Operands -> = Numbers
Operators -> +/*-

What is Relational Algebra?

An algebra whose operands are relations or variables that represent relations.
Operators are designed to do common things that we need to do with relations in a database.
The result is an algebra that can be used as a query language for relations.

Basic Operations

Union: all tuples in $R_1$ or $R_2$
Notation: $R_1\cup R_2$
$R_1$,$R_2$ must have the same schema
$R_1\cup R_2$ has the same schema as $R_1$,$R_2$
Example: ActiveProducts $\cup$ DiscontinuedProducts

Difference: all tuples in $R_1$ and not in $R_2$
Notation: $R_1-R_2$
$R_1$,$R_2$ must have the same schema
$R_1-R_2$ has the same schema as $R_1$,$R_2$
Example: AllProducts – DiscontinuedProducts

Selection: Returns all tuples which satisfy a condition
Notation: $\sigma_C (R)$
$C$ is a condition =,>,<, and, or, not
Ouput schema: same as input schema
Example: Find all Products with price more than \$22: $\sigma_{(\textrm{price} > 22)} (\textrm{Products})$

Example:

Products:

prod_id	prod_desc	manufactr_id	cost	price
1035	Sweater	210	11.25	22.00
2241	Table Lamp	317	22.25	33.25
2249	Desk Lamp	317	13.55	24.80
2518	Brass Sculpture	253	13.60	21.20

$\sigma_{(\textrm{price}>22)} (\textrm{Products})$

prod_id	prod_desc	manufactr_id	cost	price
2241	Table Lamp	317	22.25	33.25
2249	Desk Lamp	317	13.55	24.80

Projection: Unary Operation that returns certain columns
Notation: $\Pi_{(a_1,a_2,\ldots a_k)}(R)$
Input Schema: $R(b_1,\ldots b_m)$
Condition: $(a_1,\ldots,a_k )\subseteq (b_1,\ldots,b_m)$
Ouput schema: $S(a_1,\ldots,a_k)$
Example: Project prod_desc and price from Products: $\Pi_{(\textrm{prod_desc,price})}(\textrm{Products})$

Example

Products:

prod_id	prod_desc	manufactr_id	cost	price
1035	Sweater	210	11.25	22.00
2241	Table Lamp	317	22.25	33.25
2249	Desk Lamp	317	13.55	24.80
2518	Brass Sculpture	253	13.60	21.20

$\Pi_{(\textrm{prod_desc,price})}(\textrm{Products})$

prod_desc	price
Sweater	22.00
Table Lamp	33.25
Desk Lamp	24.80
Brass Sculpture	21.20

Think about projection and selection:

How are they the same, how are they different, why do we need both?

Cartesian Product: Binary Operation that combines two tables
Each tuple in $R_1$ is attached with each tuple in $R_2$
Notation: $R_1\times R_2$
Input Schema: $R_1(a_1,\ldots,a_n)$ and $R_2(b_1,\ldots,b_m)$
Condition: $(a_1,\ldots,a_n)\cap(b_1,\ldots,b_m )=\emptyset$
Ouput schema: $S(a_1,\ldots,a_n,b_1,\ldots,b_m)$
Example: Manufacturers $\times$ Products

Manufacturers

manufactr_id	manufactr_name	city	country
210	Kiwi Klothes	Auckland	New Zealand
253	Brass Works	Lagos	Nigeria
317	Llama Lamps	Lima	Peru

Products:

prod_id	prod_desc	manufactr_id	cost	price
1035	Sweater	210	11.25	22.00
2241	Table Lamp	317	22.25	33.25
2249	Desk Lamp	317	13.55	24.80
2518	Brass Sculpture	253	13.60	21.20

Manufacturers $\times$ Products

manufactr_id	manufactr_name	city	country	prod_id	prod_desc	manufactr_id	cost	price
210	Kiwi Klothes	Auckland	New Zealand	1035	Sweater	210	11.25	22.00
210	Kiwi Klothes	Auckland	New Zealand	2241	Table Lamp	317	22.25	33.25
210	Kiwi Klothes	Auckland	New Zealand	2249	Desk Lamp	317	13.55	24.80
210	Kiwi Klothes	Auckland	New Zealand	2518	Brass Sculpture	253	13.60	21.20
253	Brass Works	Lagos	Nigeria	1035	Sweater	210	11.25	22.00
253	Brass Works	Lagos	Nigeria	2241	Table Lamp	317	22.25	33.25
253	Brass Works	Lagos	Nigeria	2249	Desk Lamp	317	13.55	24.80
253	Brass Works	Lagos	Nigeria	2518	Brass Sculpture	253	13.60	21.20
317	Llama Lamps	Lima	Peru	1035	Sweater	210	11.25	22.00
317	Llama Lamps	Lima	Peru	2241	Table Lamp	317	22.25	33.25
317	Llama Lamps	Lima	Peru	2249	Desk Lamp	317	13.55	24.80
317	Llama Lamps	Lima	Peru	2518	Brass Sculpture	253	13.60	21.20

Let’s try an example:

PhoneNumbers

PID	ID	Number
1	1	8675309
2	2	6316770
3	2	6699586

Persons

ID	Name
1	Peter Bui
2	Tim Weninger

PhoneNumbers $\times$ Persons:

PID	ID	Number	ID	Name
1	1	8675309	1	Peter Bui
2	2	6316770	1	Peter Bui
3	2	6699586	1	Peter Bui
1	1	8675309	2	Tim Weninger
2	2	6316770	2	Tim Weninger
3	2	6699586	2	Tim Weninger

Renaming:
Does not change the relational instance
Changes the relational schema only
Notation: $\rho_{S(b_1,\ldots,b_n )}(R)$
Input schema: $R(a_1,\ldots,a_n)$
Output schema: $S(b_1,\ldots,b_n)$
Example: change price to MSRP $\rho_{(\textrm{id},\textrm{desc},\textrm{manf},\textrm{cost},\textrm{msrp})}(\textrm{Product})$

prod_id	prod_desc	manufactr_id	cost	price
1035	Sweater	210	11.25	22.00
2241	Table Lamp	317	22.25	33.25
2249	Desk Lamp	317	13.55	24.80
2518	Brass Sculpture	253	13.60	21.20

$\rho_{(\textrm{id},\textrm{desc},\textrm{manf},\textrm{cost},\textrm{msrp})}(\textrm{Product})$

prod_id	prod_desc	manufactr_id	cost	msrp
1035	Sweater	210	11.25	22.00
2241	Table Lamp	317	22.25	33.25
2249	Desk Lamp	317	13.55	24.80
2518	Brass Sculpture	253	13.60	21.20

Derived Relational Algebra Operators

Intersection
all tuples both in $R_1$ and in $R_2$
Notation: $R_1\cap R_2$
$R_1$, $R_2$ must have the same schema
$R_1\cap R_2$ has the same schema as $R_1$, $R_2$
Example UnionizedEmployees$\cap$RetiredEmployees
Intersection is derived: $R_1\cap R_2=R_1-(R_1-R_2)$

Joins

Theta join, Natural join, Equi-join, Semi-join, Inner join, Outer join, etc.

Theta Join
A join that involves a predicate
Notation: $R_1\bowtie_\theta R_2$ where $\theta$ is a condition
Input schemas: $R_1(a_1,\ldots,a_n)$ and $R_2(b_1,\ldots,b_m)$ s.t. $(a_1,\ldots,a_n)\cap (b_1,\ldots,b_m)=\emptyset$ Output schema: $S(a_1,\ldots,a_n,b_1,\ldots,b_m)$
Derived operator: $R_1\bowtie_\theta R_2=\sigma_\theta(R_1\times R_2)$

Manufacturers

manufactr_id	manufactr_name	city	country
210	Kiwi Klothes	Auckland	New Zealand
253	Brass Works	Lagos	Nigeria
317	Llama Lamps	Lima	Peru

Products:

prod_id	prod_desc	manufactr_id	cost	price
1035	Sweater	210	11.25	22.00
2241	Table Lamp	317	22.25	33.25
2249	Desk Lamp	317	13.55	24.80
2518	Brass Sculpture	253	13.60	21.20

$\textrm{Manufacturers}\bowtie_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} \textrm{Product} = \sigma_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} (\textrm{Manufacturer}\times \textrm{Product})$

manufactr_id	manufactr_name	city	country	prod_id	prod_desc	manufactr_id	cost	price
210	Kiwi Klothes	Auckland	New Zealand	1035	Sweater	210	11.25	22.00
253	Brass Works	Lagos	Nigeria	2518	Brass Sculpture	253	13.60	21.20
317	Llama Lamps	Lima	Peru	2241	Table Lamp	317	22.25	33.25
317	Llama Lamps	Lima	Peru	2249	Desk Lamp	317	13.55	24.80

Theta Join: $\textrm{PhoneNumbers}\bowtie_{(\textrm{Phone}>\textrm{6699586})} \textrm{Persons}$

PID	ID	Number	ID	Name
1	1	8675309	1	Peter Bui
2	2	~~6316770~~	1	~~Peter Bui~~
3	2	~~6699586~~	1	~~Peter Bui~~
1	1	8675309	2	Tim Weninger
2	2	~~6316770~~	2	~~Tim Weninger~~
3	2	~~6699586~~	2	~~Tim Weninger~~

Equi-join
Notation: $R_1 \bowtie_{(a=b)} R_2$ where $\theta$ is an equality. Natural join is a special case of equi-join.
There is a LOT of research on how to do this efficiently.

Equi-Join: $\textrm{PhoneNumbers}\bowtie_{(\textrm{PID}=\textrm{ID})} \textrm{Persons}$

PID	ID	Number	ID	Name
1	1	8675309	1	Peter Bui
2	2	~~6316770~~	1	~~Peter Bui~~
3	2	~~6699586~~	1	~~Peter Bui~~
1	1	~~8675309~~	2	~~Tim Weninger~~
2	2	6316770	2	Tim Weninger
3	2	~~6699586~~	2	~~Tim Weninger~~

Natural Join
Notation: $R_1\bowtie R_2$ where $\theta$ is assumed
Input schemas: $R_1 (a_1,\ldots,a_n )$ and $R_2 (b_1,\ldots,b_m)$
Output schema: $S(c_1,\ldots,c_p )$ where $c_1,\ldots,c_p=(a_1,\ldots,a_n )\cup(b_1,\ldots,b_m )$
Meaning we combine all pairs of tuples in R_1 and R_2 that agree on attributes
$(a_1,…,a_n )\cap(b_1,…b_m )$ we call the join attributes
Derived operator:
$R_1\bowtie R_2=\sigma(R_1\times R_2)$

$\textrm{Manufacturers}\bowtie \textrm{Product} = \sigma_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} (\textrm{Manufacturer}\times \textrm{Product})$

manufactr_id	manufactr_name	city	country	prod_id	prod_desc	manufactr_id	cost	price
210	Kiwi Klothes	Auckland	New Zealand	1035	Sweater	210	11.25	22.00
253	Brass Works	Lagos	Nigeria	2518	Brass Sculpture	253	13.60	21.20
317	Llama Lamps	Lima	Peru	2241	Table Lamp	317	22.25	33.25
317	Llama Lamps	Lima	Peru	2249	Desk Lamp	317	13.55	24.80

Equi-Join: $\textrm{PhoneNumbers}\bowtie \textrm{Persons}$

PID	ID	Number	ID	Name
1	1	8675309	1	Peter Bui
2	2	~~6316770~~	1	~~Peter Bui~~
3	2	~~6699586~~	1	~~Peter Bui~~
1	1	~~8675309~~	2	~~Tim Weninger~~
2	2	6316770	2	Tim Weninger
3	2	6699586	2	Tim Weninger

Given the schemas $R(a,b,c,d)$, $S(a,c,e)$, what is the schema of $R\bowtie S$ ?
$R\bowtie S\rightarrow(a,b,c,d,e)$

Given $R(a,b,c)$, $S(d,e)$, what is $R\bowtie S$ ?
$R\bowtie S\rightarrow(a,b,c,d,e)$ (but result looks a lot different)

Given $R(a,b)$, $S(a,b)$, what is $R\bowtie S$ ?
$R\bowtie S\rightarrow (a,b)$