Goal: specify what we want from our database

*Find all the products that are Sweaters whose price is greater than $20.*

Could write in C++/Java, but bad idea

Instead use *high-level query languages:*

- Theoretical: Relational Algebra, Datalog
- Practical: SQL
- Relational algebra: a basic set of operations on relations that provide the
*basic principles*.

#### Motivation – The Stack Data Structure

To use the “stack” data structure in my program, I need to know

- what a stack looks like
- what (useful) operations I can perform on a stack
- PUSH and POP

Next, we look for an implementation of stack

- browse the Web
- find many of them
- choose one, say LEDA

LEDA already implement PUSH and POP

- It also gives me a simple language L, in which to define a stack and call PUSH and POP

`S = init_stack(int);`

S.push(3); S.push(5);

int x = S.pop();

- Can also define an expression of operations on stacks

`T = init_stack(int);`

T.push(S.pop());

To summarize, I know:

- definition of stack (
*relations*) - its operations (PUSH, POP):
*(algebra*) - an implementation called LEDA, which tells me how to call PUSH and POP in a language $L$ (
*SQL*) - I can use these implementations to manipulate stacks
- LEDA hides the implementation details
- LEDA optimizes implementation of PUSH and POP

## What is an algebra?

Mathematical system consisting of:

*Operands*— variables or values from which new values can be constructed.*Operators*— symbols denoting procedures that construct new values from given values.

In Arithmetic algebra what are the operands? Operators?

Operands -> = Numbers

Operators -> +/*-

## What is Relational Algebra?

- An algebra whose
**operands are relations**or variables that represent relations. - Operators are designed to do common things that we need to do with relations in a database.
- The result is an algebra that can be used as a
*query language*for relations.

#### Basic Operations

**Union**: all tuples in $R_1$ or $R_2$

Notation: $R_1\cup R_2$

$R_1$,$R_2$ must have the same schema

$R_1\cup R_2$ has the same schema as $R_1$,$R_2$

Example: ActiveProducts $\cup$ DiscontinuedProducts

**Difference**: all tuples in $R_1$ and not in $R_2$

Notation: $R_1-R_2$

$R_1$,$R_2$ must have the same schema

$R_1-R_2$ has the same schema as $R_1$,$R_2$

Example: AllProducts – DiscontinuedProducts

**Selection**: Returns all tuples which satisfy a condition

Notation: $\sigma_C (R)$

$C$ is a condition =,>,<, and, or, not

Ouput schema: same as input schema

Example: Find all Products with price more than \$22: $\sigma_{(\textrm{price} > 22)} (\textrm{Products})$

#### Example:

**Product**s:

prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|

1035 | Sweater | 210 | 11.25 | 22.00 |

2241 | Table Lamp | 317 | 22.25 | 33.25 |

2249 | Desk Lamp | 317 | 13.55 | 24.80 |

2518 | Brass Sculpture | 253 | 13.60 | 21.20 |

$\sigma_{(\textrm{price}>22)} (\textrm{Products})$

prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|

2241 | Table Lamp | 317 | 22.25 | 33.25 |

2249 | Desk Lamp | 317 | 13.55 | 24.80 |

**Projection**: Unary Operation that returns certain columns

Notation: $\Pi_{(a_1,a_2,\ldots a_k)}(R)$

Input Schema: $R(b_1,\ldots b_m)$

Condition: $(a_1,\ldots,a_k )\subseteq (b_1,\ldots,b_m)$

Ouput schema: $S(a_1,\ldots,a_k)$

Example: Project prod_desc and price from Products: $\Pi_{(\textrm{prod_desc,price})}(\textrm{Products})$

#### Example

**Product**s:

prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|

1035 | Sweater | 210 | 11.25 | 22.00 |

2241 | Table Lamp | 317 | 22.25 | 33.25 |

2249 | Desk Lamp | 317 | 13.55 | 24.80 |

2518 | Brass Sculpture | 253 | 13.60 | 21.20 |

$\Pi_{(\textrm{prod_desc,price})}(\textrm{Products})$

prod_desc | price |
---|---|

Sweater | 22.00 |

Table Lamp | 33.25 |

Desk Lamp | 24.80 |

Brass Sculpture | 21.20 |

Think about projection and selection:

How are they the same, how are they different, why do we need both?

**Cartesian Product**: Binary Operation that combines two tables

Each tuple in $R_1$ is attached with each tuple in $R_2$

Notation: $R_1\times R_2$

Input Schema: $R_1(a_1,\ldots,a_n)$ and $R_2(b_1,\ldots,b_m)$

Condition: $(a_1,\ldots,a_n)\cap(b_1,\ldots,b_m )=\emptyset$

Ouput schema: $S(a_1,\ldots,a_n,b_1,\ldots,b_m)$

Example: Manufacturers $\times$ Products

**Manufacturer**s

manufactr_id | manufactr_name | city | country |
---|---|---|---|

210 | Kiwi Klothes | Auckland | New Zealand |

253 | Brass Works | Lagos | Nigeria |

317 | Llama Lamps | Lima | Peru |

**Products**:

prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|

1035 | Sweater | 210 | 11.25 | 22.00 |

2241 | Table Lamp | 317 | 22.25 | 33.25 |

2249 | Desk Lamp | 317 | 13.55 | 24.80 |

2518 | Brass Sculpture | 253 | 13.60 | 21.20 |

Manufacturers $\times$ Products

manufactr_id | manufactr_name | city | country | prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|---|---|---|---|

210 | Kiwi Klothes | Auckland | New Zealand | 1035 | Sweater | 210 | 11.25 | 22.00 |

210 | Kiwi Klothes | Auckland | New Zealand | 2241 | Table Lamp | 317 | 22.25 | 33.25 |

210 | Kiwi Klothes | Auckland | New Zealand | 2249 | Desk Lamp | 317 | 13.55 | 24.80 |

210 | Kiwi Klothes | Auckland | New Zealand | 2518 | Brass Sculpture | 253 | 13.60 | 21.20 |

253 | Brass Works | Lagos | Nigeria | 1035 | Sweater | 210 | 11.25 | 22.00 |

253 | Brass Works | Lagos | Nigeria | 2241 | Table Lamp | 317 | 22.25 | 33.25 |

253 | Brass Works | Lagos | Nigeria | 2249 | Desk Lamp | 317 | 13.55 | 24.80 |

253 | Brass Works | Lagos | Nigeria | 2518 | Brass Sculpture | 253 | 13.60 | 21.20 |

317 | Llama Lamps | Lima | Peru | 1035 | Sweater | 210 | 11.25 | 22.00 |

317 | Llama Lamps | Lima | Peru | 2241 | Table Lamp | 317 | 22.25 | 33.25 |

317 | Llama Lamps | Lima | Peru | 2249 | Desk Lamp | 317 | 13.55 | 24.80 |

317 | Llama Lamps | Lima | Peru | 2518 | Brass Sculpture | 253 | 13.60 | 21.20 |

Let’s try an example:

PhoneNumbers

PID | ID | Number |
---|---|---|

1 | 1 | 8675309 |

2 | 2 | 6316770 |

3 | 2 | 6699586 |

Persons

ID | Name |
---|---|

1 | Peter Bui |

2 | Tim Weninger |

PhoneNumbers $\times$ Persons:

PID | ID | Number | ID | Name |
---|---|---|---|---|

1 | 1 | 8675309 | 1 | Peter Bui |

2 | 2 | 6316770 | 1 | Peter Bui |

3 | 2 | 6699586 | 1 | Peter Bui |

1 | 1 | 8675309 | 2 | Tim Weninger |

2 | 2 | 6316770 | 2 | Tim Weninger |

3 | 2 | 6699586 | 2 | Tim Weninger |

**Renaming**:

Does not change the relational instance

Changes the relational schema only

Notation: $\rho_{S(b_1,\ldots,b_n )}(R)$

Input schema: $R(a_1,\ldots,a_n)$

Output schema: $S(b_1,\ldots,b_n)$

Example: change price to MSRP $\rho_{(\textrm{id},\textrm{desc},\textrm{manf},\textrm{cost},\textrm{msrp})}(\textrm{Product})$

prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|

1035 | Sweater | 210 | 11.25 | 22.00 |

2241 | Table Lamp | 317 | 22.25 | 33.25 |

2249 | Desk Lamp | 317 | 13.55 | 24.80 |

2518 | Brass Sculpture | 253 | 13.60 | 21.20 |

$\rho_{(\textrm{id},\textrm{desc},\textrm{manf},\textrm{cost},\textrm{msrp})}(\textrm{Product})$

prod_id | prod_desc | manufactr_id | cost | msrp |
---|---|---|---|---|

1035 | Sweater | 210 | 11.25 | 22.00 |

2241 | Table Lamp | 317 | 22.25 | 33.25 |

2249 | Desk Lamp | 317 | 13.55 | 24.80 |

2518 | Brass Sculpture | 253 | 13.60 | 21.20 |

#### Derived Relational Algebra Operators

**Intersection**

all tuples both in $R_1$ and in $R_2$

Notation: $R_1\cap R_2$

$R_1$, $R_2$ must have the same schema

$R_1\cap R_2$ has the same schema as $R_1$, $R_2$

Example UnionizedEmployees$\cap$RetiredEmployees

Intersection is derived: $R_1\cap R_2=R_1-(R_1-R_2)$

## Joins

Theta join, Natural join, Equi-join, Semi-join, Inner join, Outer join, etc.

**Theta Join**

A join that involves a predicate

Notation: $R_1\bowtie_\theta R_2$ where $\theta$ is a condition

Input schemas: $R_1(a_1,\ldots,a_n)$ and $R_2(b_1,\ldots,b_m)$ s.t. $(a_1,\ldots,a_n)\cap (b_1,\ldots,b_m)=\emptyset$ Output schema: $S(a_1,\ldots,a_n,b_1,\ldots,b_m)$

Derived operator: $R_1\bowtie_\theta R_2=\sigma_\theta(R_1\times R_2)$

**Manufacturer**s

manufactr_id | manufactr_name | city | country |
---|---|---|---|

210 | Kiwi Klothes | Auckland | New Zealand |

253 | Brass Works | Lagos | Nigeria |

317 | Llama Lamps | Lima | Peru |

**Products**:

prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|

1035 | Sweater | 210 | 11.25 | 22.00 |

2241 | Table Lamp | 317 | 22.25 | 33.25 |

2249 | Desk Lamp | 317 | 13.55 | 24.80 |

2518 | Brass Sculpture | 253 | 13.60 | 21.20 |

$\textrm{Manufacturers}\bowtie_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} \textrm{Product} = \sigma_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} (\textrm{Manufacturer}\times \textrm{Product})$

manufactr_id | manufactr_name | city | country | prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|---|---|---|---|

210 | Kiwi Klothes | Auckland | New Zealand | 1035 | Sweater | 210 | 11.25 | 22.00 |

253 | Brass Works | Lagos | Nigeria | 2518 | Brass Sculpture | 253 | 13.60 | 21.20 |

317 | Llama Lamps | Lima | Peru | 2241 | Table Lamp | 317 | 22.25 | 33.25 |

317 | Llama Lamps | Lima | Peru | 2249 | Desk Lamp | 317 | 13.55 | 24.80 |

Theta Join: $\textrm{PhoneNumbers}\bowtie_{(\textrm{Phone}>\textrm{6699586})} \textrm{Persons}$

PID | ID | Number | ID | Name |
---|---|---|---|---|

1 | 1 | 8675309 | 1 | Peter Bui |

1 | 1 | 8675309 | 2 | Tim Weninger |

Equi-join

Notation: $R_1 \bowtie_{(a=b)} R_2$ where $\theta$ is an equality. Natural join is a special case of equi-join.

There is a LOT of research on how to do this efficiently.

Equi-Join: $\textrm{PhoneNumbers}\bowtie_{(\textrm{PID}=\textrm{ID})} \textrm{Persons}$

PID | ID | Number | ID | Name |
---|---|---|---|---|

1 | 1 | 8675309 | 1 | Peter Bui |

2 | 2 | 6316770 | 2 | Tim Weninger |

**Natural Join**

Notation: $R_1\bowtie R_2$ where $\theta$ is assumed

Input schemas: $R_1 (a_1,\ldots,a_n )$ and $R_2 (b_1,\ldots,b_m)$

Output schema: $S(c_1,\ldots,c_p )$ where $c_1,\ldots,c_p=(a_1,\ldots,a_n )\cup(b_1,\ldots,b_m )$

Meaning we combine all pairs of tuples in R_1 and R_2 that agree on attributes

$(a_1,…,a_n )\cap(b_1,…b_m )$ we call the join attributes

Derived operator:

$R_1\bowtie R_2=\sigma(R_1\times R_2)$

$\textrm{Manufacturers}\bowtie \textrm{Product} = \sigma_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} (\textrm{Manufacturer}\times \textrm{Product})$

manufactr_id | manufactr_name | city | country | prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|---|---|---|---|

210 | Kiwi Klothes | Auckland | New Zealand | 1035 | Sweater | 210 | 11.25 | 22.00 |

253 | Brass Works | Lagos | Nigeria | 2518 | Brass Sculpture | 253 | 13.60 | 21.20 |

317 | Llama Lamps | Lima | Peru | 2241 | Table Lamp | 317 | 22.25 | 33.25 |

317 | Llama Lamps | Lima | Peru | 2249 | Desk Lamp | 317 | 13.55 | 24.80 |

Equi-Join: $\textrm{PhoneNumbers}\bowtie \textrm{Persons}$

PID | ID | Number | Name | |
---|---|---|---|---|

1 | 1 | 8675309 | Peter Bui | |

2 | 2 | 6316770 | Tim Weninger | |

3 | 2 | 6699586 | Tim Weninger |

Given the schemas $R(a,b,c,d)$, $S(a,c,e)$, what is the schema of $R\bowtie S$ ?

$R\bowtie S\rightarrow(a,b,c,d,e)$

Given $R(a,b,c)$, $S(d,e)$, what is $R\bowtie S$ ?

$R\bowtie S\rightarrow(a,b,c,d,e)$ (but result looks a lot different)

Given $R(a,b)$, $S(a,b)$, what is $R\bowtie S$ ?

$R\bowtie S\rightarrow (a,b)$