Relational Algebra

Goal: specify what we want from our database

    Find all the products that are Sweaters whose price is greater than $20.

Could write in C++/Java, but bad idea

Instead use high-level query languages:

  • Theoretical: Relational Algebra, Datalog
  • Practical: SQL
  • Relational algebra: a basic set of operations on relations that provide the basic principles.

Motivation – The Stack Data Structure

To use the “stack” data structure in my program, I need to know

  • what a stack looks like
  • what (useful) operations I can perform on a stack
    • PUSH and POP

Next, we look for an implementation of stack

  • browse the Web
  • find many of them
  • choose one, say LEDA

LEDA already implement PUSH and POP

  • It also gives me a simple language L, in which to  define a stack and call PUSH and POP

S = init_stack(int);
S.push(3); S.push(5);
int x = S.pop();

  • Can also define an expression of operations on stacks

T = init_stack(int);
T.push(S.pop());

To summarize, I know:

  • definition of stack (relations)
  • its operations (PUSH, POP): (algebra)
  • an implementation called LEDA, which tells me how to call PUSH and POP in a language $L$ (SQL)
  • I can use these implementations to manipulate stacks
    • LEDA hides the implementation details
    • LEDA optimizes implementation of PUSH and POP

What is an algebra?

Mathematical system consisting of:

  • Operands — variables or values from which new values can be constructed.
  • Operators — symbols denoting procedures that construct new values from given values.

In Arithmetic algebra what are the operands? Operators?

Operands -> = Numbers
Operators -> +/*-

What is Relational Algebra?

  • An algebra whose operands are relations or variables that represent relations.
  • Operators are designed to do common things that we need to do with relations in a database.
  • The result is an algebra that can be used as a query language for relations.

Basic Operations

Union: all tuples in $R_1$ or $R_2$
Notation: $R_1\cup R_2$
$R_1$,$R_2$ must have the same schema
$R_1\cup R_2$ has the same schema as $R_1$,$R_2$
Example: ActiveProducts $\cup$ DiscontinuedProducts

Difference: all tuples in $R_1$ and not in $R_2$
Notation: $R_1-R_2$
$R_1$,$R_2$ must have the same schema
$R_1-R_2$ has the same schema as $R_1$,$R_2$
Example: AllProducts – DiscontinuedProducts

Selection: Returns all tuples which satisfy a condition
Notation: $\sigma_C (R)$
$C$ is a condition =,>,<, and, or, not
Ouput schema: same as input schema
Example: Find all Products with price more than \$22: $\sigma_{(\textrm{price} > 22)} (\textrm{Products})$

Example:

Products:

prod_idprod_descmanufactr_idcostprice
1035Sweater21011.2522.00
2241Table Lamp31722.2533.25
2249Desk Lamp31713.5524.80
2518Brass Sculpture25313.6021.20

$\sigma_{(\textrm{price}>22)} (\textrm{Products})$

prod_idprod_descmanufactr_idcostprice
2241Table Lamp31722.2533.25
2249Desk Lamp31713.5524.80

Projection: Unary Operation that returns certain columns
Notation: $\Pi_{(a_1,a_2,\ldots a_k)}(R)$
Input Schema: $R(b_1,\ldots b_m)$
Condition: $(a_1,\ldots,a_k )\subseteq (b_1,\ldots,b_m)$
Ouput schema: $S(a_1,\ldots,a_k)$
Example: Project prod_desc and price from Products: $\Pi_{(\textrm{prod_desc,price})}(\textrm{Products})$

Example

Products:

prod_idprod_descmanufactr_idcostprice
1035Sweater21011.2522.00
2241Table Lamp31722.2533.25
2249Desk Lamp31713.5524.80
2518Brass Sculpture25313.6021.20

$\Pi_{(\textrm{prod_desc,price})}(\textrm{Products})$

prod_descprice
Sweater22.00
Table Lamp33.25
Desk Lamp24.80
Brass Sculpture21.20

Think about projection and selection:

How are they the same, how are they different, why do we need both?

Cartesian Product: Binary Operation that combines two tables
Each tuple in $R_1$ is attached with each tuple in $R_2$
Notation: $R_1\times R_2$
Input Schema: $R_1(a_1,\ldots,a_n)$ and $R_2(b_1,\ldots,b_m)$
Condition: $(a_1,\ldots,a_n)\cap(b_1,\ldots,b_m )=\emptyset$
Ouput schema: $S(a_1,\ldots,a_n,b_1,\ldots,b_m)$
Example: Manufacturers $\times$ Products

Manufacturers

manufactr_idmanufactr_namecitycountry
210Kiwi KlothesAucklandNew Zealand
253Brass WorksLagosNigeria
317Llama LampsLimaPeru

Products:

prod_idprod_descmanufactr_idcostprice
1035Sweater21011.2522.00
2241Table Lamp31722.2533.25
2249Desk Lamp31713.5524.80
2518Brass Sculpture25313.6021.20

Manufacturers $\times$ Products

manufactr_idmanufactr_namecitycountryprod_idprod_descmanufactr_idcostprice
210Kiwi KlothesAucklandNew Zealand 1035 Sweater 210 11.25 22.00
210 Kiwi Klothes Auckland New Zealand 2241 Table Lamp 31722.2533.25
210 Kiwi Klothes Auckland New Zealand 2249Desk Lamp 31713.5524.80
210 Kiwi Klothes Auckland New Zealand 2518Brass Sculpture 25313.6021.20
253Brass WorksLagosNigeria1035 Sweater 210 11.25 22.00
253 Brass Works Lagos Nigeria 2241 Table Lamp 31722.25 33.25
253 Brass Works Lagos Nigeria 2249Desk Lamp 31713.5524.80
253 Brass Works Lagos Nigeria 2518Brass Sculpture 25313.6021.20
317Llama LampsLimaPeru1035 Sweater 210 11.25 22.00
317 Llama Lamps LimaPeru 2241 Table Lamp 31722.25 33.25
317 Llama Lamps Lima Peru 2249Desk Lamp 31713.5524.80
317 Llama Lamps Lima Peru 2518Brass Sculpture 25313.6021.20

Let’s try an example:

PhoneNumbers

PIDIDNumber
118675309
226316770
326699586

Persons

IDName
1Peter Bui
2Tim Weninger

PhoneNumbers $\times$ Persons:

PIDIDNumberIDName
118675309 1Peter Bui
226316770 1Peter Bui
3266995861Peter Bui
1186753092Tim Weninger
2263167702Tim Weninger
3266995862Tim Weninger

Renaming:
Does not change the relational instance
Changes the relational schema only
Notation: $\rho_{S(b_1,\ldots,b_n )}(R)$
Input schema: $R(a_1,\ldots,a_n)$
Output schema: $S(b_1,\ldots,b_n)$
Example: change price to MSRP $\rho_{(\textrm{id},\textrm{desc},\textrm{manf},\textrm{cost},\textrm{msrp})}(\textrm{Product})$

prod_idprod_descmanufactr_idcostprice
1035Sweater21011.2522.00
2241Table Lamp31722.2533.25
2249Desk Lamp31713.5524.80
2518Brass Sculpture25313.6021.20

$\rho_{(\textrm{id},\textrm{desc},\textrm{manf},\textrm{cost},\textrm{msrp})}(\textrm{Product})$

prod_idprod_descmanufactr_idcostmsrp
1035Sweater21011.2522.00
2241Table Lamp31722.2533.25
2249Desk Lamp31713.5524.80
2518Brass Sculpture25313.6021.20

Derived Relational Algebra Operators

Intersection
all tuples both in $R_1$ and in $R_2$
Notation: $R_1\cap R_2$
$R_1$, $R_2$ must have the same schema
$R_1\cap R_2$ has the same schema as $R_1$, $R_2$
Example UnionizedEmployees$\cap$RetiredEmployees
Intersection is derived: $R_1\cap R_2=R_1-(R_1-R_2)$

Joins

Theta join, Natural join, Equi-join, Semi-join, Inner join, Outer join, etc.

Theta Join
A join that involves a predicate
Notation: $R_1\bowtie_\theta R_2$ where $\theta$ is a condition
Input schemas: $R_1(a_1,\ldots,a_n)$ and $R_2(b_1,\ldots,b_m)$ s.t. $(a_1,\ldots,a_n)\cap (b_1,\ldots,b_m)=\emptyset$ Output schema: $S(a_1,\ldots,a_n,b_1,\ldots,b_m)$
Derived operator: $R_1\bowtie_\theta R_2=\sigma_\theta(R_1\times R_2)$

Manufacturers

manufactr_idmanufactr_namecitycountry
210Kiwi KlothesAucklandNew Zealand
253Brass WorksLagosNigeria
317Llama LampsLimaPeru

Products:

prod_idprod_descmanufactr_idcostprice
1035Sweater21011.2522.00
2241Table Lamp31722.2533.25
2249Desk Lamp31713.5524.80
2518Brass Sculpture25313.6021.20

$\textrm{Manufacturers}\bowtie_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} \textrm{Product} = \sigma_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} (\textrm{Manufacturer}\times \textrm{Product})$

manufactr_idmanufactr_namecitycountryprod_idprod_descmanufactr_idcostprice
210Kiwi KlothesAucklandNew Zealand 1035 Sweater 210 11.25 22.00
253 Brass Works Lagos Nigeria 2518Brass Sculpture 25313.6021.20
317 Llama Lamps LimaPeru 2241 Table Lamp 31722.25 33.25
317 Llama Lamps Lima Peru 2249Desk Lamp 31713.5524.80

Theta Join: $\textrm{PhoneNumbers}\bowtie_{(\textrm{Phone}>\textrm{6699586})} \textrm{Persons}$

PIDIDNumberIDName
118675309 1Peter Bui
226316770 1Peter Bui
3266995861Peter Bui
1186753092Tim Weninger
2263167702Tim Weninger
3266995862Tim Weninger

Equi-join
Notation: $R_1 \bowtie_{(a=b)} R_2$ where $\theta$ is an equality. Natural join is a special case of equi-join.
There is a LOT of research on how to do this efficiently.

Equi-Join: $\textrm{PhoneNumbers}\bowtie_{(\textrm{PID}=\textrm{ID})} \textrm{Persons}$

PIDIDNumberIDName
118675309 1Peter Bui
226316770 1Peter Bui
3266995861Peter Bui
1186753092Tim Weninger
2263167702Tim Weninger
3266995862Tim Weninger

Natural Join
Notation: $R_1\bowtie R_2$ where $\theta$ is assumed
Input schemas: $R_1 (a_1,\ldots,a_n )$ and $R_2 (b_1,\ldots,b_m)$
Output schema: $S(c_1,\ldots,c_p )$ where $c_1,\ldots,c_p=(a_1,\ldots,a_n )\cup(b_1,\ldots,b_m )$
Meaning we combine all pairs of tuples in R_1 and R_2 that agree on attributes
$(a_1,…,a_n )\cap(b_1,…b_m )$ we call the join attributes
Derived operator:
$R_1\bowtie R_2=\sigma(R_1\times R_2)$

$\textrm{Manufacturers}\bowtie \textrm{Product} = \sigma_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} (\textrm{Manufacturer}\times \textrm{Product})$

manufactr_idmanufactr_namecitycountryprod_idprod_descmanufactr_idcostprice
210Kiwi KlothesAucklandNew Zealand 1035 Sweater 210 11.25 22.00
253 Brass Works Lagos Nigeria 2518Brass Sculpture 25313.6021.20
317 Llama Lamps LimaPeru 2241 Table Lamp 31722.25 33.25
317 Llama Lamps Lima Peru 2249Desk Lamp 31713.5524.80

Equi-Join: $\textrm{PhoneNumbers}\bowtie \textrm{Persons}$

PIDIDNumberIDName
118675309 1Peter Bui
226316770 1Peter Bui
3266995861Peter Bui
1186753092Tim Weninger
2263167702Tim Weninger
3266995862Tim Weninger

Given the schemas $R(a,b,c,d)$, $S(a,c,e)$, what is the schema of $R\bowtie S$ ?
$R\bowtie S\rightarrow(a,b,c,d,e)$

Given $R(a,b,c)$, $S(d,e)$, what is $R\bowtie S$ ?
$R\bowtie S\rightarrow(a,b,c,d,e)$ (but result looks a lot different)

Given $R(a,b)$, $S(a,b)$, what is $R\bowtie S$ ?
$R\bowtie S\rightarrow (a,b)$