Goal: specify what we want from our database
Find all the products that are Sweaters whose price is greater than $20.
Could write in C++/Java, but bad idea
Instead use high-level query languages:
- Theoretical: Relational Algebra, Datalog
- Practical: SQL
- Relational algebra: a basic set of operations on relations that provide the basic principles.
Motivation – The Stack Data Structure
To use the “stack” data structure in my program, I need to know
- what a stack looks like
- what (useful) operations I can perform on a stack
- PUSH and POP
Next, we look for an implementation of stack
- browse the Web
- find many of them
- choose one, say LEDA
LEDA already implement PUSH and POP
- It also gives me a simple language L, in which to define a stack and call PUSH and POP
S = init_stack(int);
S.push(3); S.push(5);
int x = S.pop();
- Can also define an expression of operations on stacks
T = init_stack(int);
T.push(S.pop());
To summarize, I know:
- definition of stack (relations)
- its operations (PUSH, POP): (algebra)
- an implementation called LEDA, which tells me how to call PUSH and POP in a language $L$ (SQL)
- I can use these implementations to manipulate stacks
- LEDA hides the implementation details
- LEDA optimizes implementation of PUSH and POP
What is an algebra?
Mathematical system consisting of:
- Operands — variables or values from which new values can be constructed.
- Operators — symbols denoting procedures that construct new values from given values.
In Arithmetic algebra what are the operands? Operators?
Operands -> = Numbers
Operators -> +/*-
What is Relational Algebra?
- An algebra whose operands are relations or variables that represent relations.
- Operators are designed to do common things that we need to do with relations in a database.
- The result is an algebra that can be used as a query language for relations.
Basic Operations
Union: all tuples in $R_1$ or $R_2$
Notation: $R_1\cup R_2$
$R_1$,$R_2$ must have the same schema
$R_1\cup R_2$ has the same schema as $R_1$,$R_2$
Example: ActiveProducts $\cup$ DiscontinuedProducts
Difference: all tuples in $R_1$ and not in $R_2$
Notation: $R_1-R_2$
$R_1$,$R_2$ must have the same schema
$R_1-R_2$ has the same schema as $R_1$,$R_2$
Example: AllProducts – DiscontinuedProducts
Selection: Returns all tuples which satisfy a condition
Notation: $\sigma_C (R)$
$C$ is a condition =,>,<, and, or, not
Ouput schema: same as input schema
Example: Find all Products with price more than \$22: $\sigma_{(\textrm{price} > 22)} (\textrm{Products})$
Example:
Products:
prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|
1035 | Sweater | 210 | 11.25 | 22.00 |
2241 | Table Lamp | 317 | 22.25 | 33.25 |
2249 | Desk Lamp | 317 | 13.55 | 24.80 |
2518 | Brass Sculpture | 253 | 13.60 | 21.20 |
$\sigma_{(\textrm{price}>22)} (\textrm{Products})$
prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|
2241 | Table Lamp | 317 | 22.25 | 33.25 |
2249 | Desk Lamp | 317 | 13.55 | 24.80 |
Projection: Unary Operation that returns certain columns
Notation: $\Pi_{(a_1,a_2,\ldots a_k)}(R)$
Input Schema: $R(b_1,\ldots b_m)$
Condition: $(a_1,\ldots,a_k )\subseteq (b_1,\ldots,b_m)$
Ouput schema: $S(a_1,\ldots,a_k)$
Example: Project prod_desc and price from Products: $\Pi_{(\textrm{prod_desc,price})}(\textrm{Products})$
Example
Products:
prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|
1035 | Sweater | 210 | 11.25 | 22.00 |
2241 | Table Lamp | 317 | 22.25 | 33.25 |
2249 | Desk Lamp | 317 | 13.55 | 24.80 |
2518 | Brass Sculpture | 253 | 13.60 | 21.20 |
$\Pi_{(\textrm{prod_desc,price})}(\textrm{Products})$
prod_desc | price |
---|---|
Sweater | 22.00 |
Table Lamp | 33.25 |
Desk Lamp | 24.80 |
Brass Sculpture | 21.20 |
Think about projection and selection:
How are they the same, how are they different, why do we need both?
Cartesian Product: Binary Operation that combines two tables
Each tuple in $R_1$ is attached with each tuple in $R_2$
Notation: $R_1\times R_2$
Input Schema: $R_1(a_1,\ldots,a_n)$ and $R_2(b_1,\ldots,b_m)$
Condition: $(a_1,\ldots,a_n)\cap(b_1,\ldots,b_m )=\emptyset$
Ouput schema: $S(a_1,\ldots,a_n,b_1,\ldots,b_m)$
Example: Manufacturers $\times$ Products
Manufacturers
manufactr_id | manufactr_name | city | country |
---|---|---|---|
210 | Kiwi Klothes | Auckland | New Zealand |
253 | Brass Works | Lagos | Nigeria |
317 | Llama Lamps | Lima | Peru |
Products:
prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|
1035 | Sweater | 210 | 11.25 | 22.00 |
2241 | Table Lamp | 317 | 22.25 | 33.25 |
2249 | Desk Lamp | 317 | 13.55 | 24.80 |
2518 | Brass Sculpture | 253 | 13.60 | 21.20 |
Manufacturers $\times$ Products
manufactr_id | manufactr_name | city | country | prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|---|---|---|---|
210 | Kiwi Klothes | Auckland | New Zealand | 1035 | Sweater | 210 | 11.25 | 22.00 |
210 | Kiwi Klothes | Auckland | New Zealand | 2241 | Table Lamp | 317 | 22.25 | 33.25 |
210 | Kiwi Klothes | Auckland | New Zealand | 2249 | Desk Lamp | 317 | 13.55 | 24.80 |
210 | Kiwi Klothes | Auckland | New Zealand | 2518 | Brass Sculpture | 253 | 13.60 | 21.20 |
253 | Brass Works | Lagos | Nigeria | 1035 | Sweater | 210 | 11.25 | 22.00 |
253 | Brass Works | Lagos | Nigeria | 2241 | Table Lamp | 317 | 22.25 | 33.25 |
253 | Brass Works | Lagos | Nigeria | 2249 | Desk Lamp | 317 | 13.55 | 24.80 |
253 | Brass Works | Lagos | Nigeria | 2518 | Brass Sculpture | 253 | 13.60 | 21.20 |
317 | Llama Lamps | Lima | Peru | 1035 | Sweater | 210 | 11.25 | 22.00 |
317 | Llama Lamps | Lima | Peru | 2241 | Table Lamp | 317 | 22.25 | 33.25 |
317 | Llama Lamps | Lima | Peru | 2249 | Desk Lamp | 317 | 13.55 | 24.80 |
317 | Llama Lamps | Lima | Peru | 2518 | Brass Sculpture | 253 | 13.60 | 21.20 |
Let’s try an example:
PhoneNumbers
PID | ID | Number |
---|---|---|
1 | 1 | 8675309 |
2 | 2 | 6316770 |
3 | 2 | 6699586 |
Persons
ID | Name |
---|---|
1 | Peter Bui |
2 | Tim Weninger |
PhoneNumbers $\times$ Persons:
PID | ID | Number | ID | Name |
---|---|---|---|---|
1 | 1 | 8675309 | 1 | Peter Bui |
2 | 2 | 6316770 | 1 | Peter Bui |
3 | 2 | 6699586 | 1 | Peter Bui |
1 | 1 | 8675309 | 2 | Tim Weninger |
2 | 2 | 6316770 | 2 | Tim Weninger |
3 | 2 | 6699586 | 2 | Tim Weninger |
Renaming:
Does not change the relational instance
Changes the relational schema only
Notation: $\rho_{S(b_1,\ldots,b_n )}(R)$
Input schema: $R(a_1,\ldots,a_n)$
Output schema: $S(b_1,\ldots,b_n)$
Example: change price to MSRP $\rho_{(\textrm{id},\textrm{desc},\textrm{manf},\textrm{cost},\textrm{msrp})}(\textrm{Product})$
prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|
1035 | Sweater | 210 | 11.25 | 22.00 |
2241 | Table Lamp | 317 | 22.25 | 33.25 |
2249 | Desk Lamp | 317 | 13.55 | 24.80 |
2518 | Brass Sculpture | 253 | 13.60 | 21.20 |
$\rho_{(\textrm{id},\textrm{desc},\textrm{manf},\textrm{cost},\textrm{msrp})}(\textrm{Product})$
prod_id | prod_desc | manufactr_id | cost | msrp |
---|---|---|---|---|
1035 | Sweater | 210 | 11.25 | 22.00 |
2241 | Table Lamp | 317 | 22.25 | 33.25 |
2249 | Desk Lamp | 317 | 13.55 | 24.80 |
2518 | Brass Sculpture | 253 | 13.60 | 21.20 |
Derived Relational Algebra Operators
Intersection
all tuples both in $R_1$ and in $R_2$
Notation: $R_1\cap R_2$
$R_1$, $R_2$ must have the same schema
$R_1\cap R_2$ has the same schema as $R_1$, $R_2$
Example UnionizedEmployees$\cap$RetiredEmployees
Intersection is derived: $R_1\cap R_2=R_1-(R_1-R_2)$
Joins
Theta join, Natural join, Equi-join, Semi-join, Inner join, Outer join, etc.
Theta Join
A join that involves a predicate
Notation: $R_1\bowtie_\theta R_2$ where $\theta$ is a condition
Input schemas: $R_1(a_1,\ldots,a_n)$ and $R_2(b_1,\ldots,b_m)$ s.t. $(a_1,\ldots,a_n)\cap (b_1,\ldots,b_m)=\emptyset$ Output schema: $S(a_1,\ldots,a_n,b_1,\ldots,b_m)$
Derived operator: $R_1\bowtie_\theta R_2=\sigma_\theta(R_1\times R_2)$
Manufacturers
manufactr_id | manufactr_name | city | country |
---|---|---|---|
210 | Kiwi Klothes | Auckland | New Zealand |
253 | Brass Works | Lagos | Nigeria |
317 | Llama Lamps | Lima | Peru |
Products:
prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|
1035 | Sweater | 210 | 11.25 | 22.00 |
2241 | Table Lamp | 317 | 22.25 | 33.25 |
2249 | Desk Lamp | 317 | 13.55 | 24.80 |
2518 | Brass Sculpture | 253 | 13.60 | 21.20 |
$\textrm{Manufacturers}\bowtie_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} \textrm{Product} = \sigma_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} (\textrm{Manufacturer}\times \textrm{Product})$
manufactr_id | manufactr_name | city | country | prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|---|---|---|---|
210 | Kiwi Klothes | Auckland | New Zealand | 1035 | Sweater | 210 | 11.25 | 22.00 |
253 | Brass Works | Lagos | Nigeria | 2518 | Brass Sculpture | 253 | 13.60 | 21.20 |
317 | Llama Lamps | Lima | Peru | 2241 | Table Lamp | 317 | 22.25 | 33.25 |
317 | Llama Lamps | Lima | Peru | 2249 | Desk Lamp | 317 | 13.55 | 24.80 |
Theta Join: $\textrm{PhoneNumbers}\bowtie_{(\textrm{Phone}>\textrm{6699586})} \textrm{Persons}$
PID | ID | Number | ID | Name |
---|---|---|---|---|
1 | 1 | 8675309 | 1 | Peter Bui |
1 | 1 | 8675309 | 2 | Tim Weninger |
Equi-join
Notation: $R_1 \bowtie_{(a=b)} R_2$ where $\theta$ is an equality. Natural join is a special case of equi-join.
There is a LOT of research on how to do this efficiently.
Equi-Join: $\textrm{PhoneNumbers}\bowtie_{(\textrm{PID}=\textrm{ID})} \textrm{Persons}$
PID | ID | Number | ID | Name |
---|---|---|---|---|
1 | 1 | 8675309 | 1 | Peter Bui |
2 | 2 | 6316770 | 2 | Tim Weninger |
Natural Join
Notation: $R_1\bowtie R_2$ where $\theta$ is assumed
Input schemas: $R_1 (a_1,\ldots,a_n )$ and $R_2 (b_1,\ldots,b_m)$
Output schema: $S(c_1,\ldots,c_p )$ where $c_1,\ldots,c_p=(a_1,\ldots,a_n )\cup(b_1,\ldots,b_m )$
Meaning we combine all pairs of tuples in R_1 and R_2 that agree on attributes
$(a_1,…,a_n )\cap(b_1,…b_m )$ we call the join attributes
Derived operator:
$R_1\bowtie R_2=\sigma(R_1\times R_2)$
$\textrm{Manufacturers}\bowtie \textrm{Product} = \sigma_{(\textrm{Manufacturer.manufactr_id}=\textrm{Product.manufactr_id})} (\textrm{Manufacturer}\times \textrm{Product})$
manufactr_id | manufactr_name | city | country | prod_id | prod_desc | manufactr_id | cost | price |
---|---|---|---|---|---|---|---|---|
210 | Kiwi Klothes | Auckland | New Zealand | 1035 | Sweater | 210 | 11.25 | 22.00 |
253 | Brass Works | Lagos | Nigeria | 2518 | Brass Sculpture | 253 | 13.60 | 21.20 |
317 | Llama Lamps | Lima | Peru | 2241 | Table Lamp | 317 | 22.25 | 33.25 |
317 | Llama Lamps | Lima | Peru | 2249 | Desk Lamp | 317 | 13.55 | 24.80 |
Equi-Join: $\textrm{PhoneNumbers}\bowtie \textrm{Persons}$
PID | ID | Number | Name | |
---|---|---|---|---|
1 | 1 | 8675309 | Peter Bui | |
2 | 2 | 6316770 | Tim Weninger | |
3 | 2 | 6699586 | Tim Weninger |
Given the schemas $R(a,b,c,d)$, $S(a,c,e)$, what is the schema of $R\bowtie S$ ?
$R\bowtie S\rightarrow(a,b,c,d,e)$
Given $R(a,b,c)$, $S(d,e)$, what is $R\bowtie S$ ?
$R\bowtie S\rightarrow(a,b,c,d,e)$ (but result looks a lot different)
Given $R(a,b)$, $S(a,b)$, what is $R\bowtie S$ ?
$R\bowtie S\rightarrow (a,b)$