## Relational Algebra

Basic:

$R$ $\sigma_{C}(R)$

$\Pi_{(a_1\ldots,a_k )}(R)$

$R_1\times R_2$

$R_1\cup R_2$

$R_1-R_2$

$\rho_{S(a_1,\ldots,a_n )}(R)$

Derived:

$R_1\bowtie R_2$

$R_1 \bowtie_\theta R_2$

$R_1\cap R_2$

Additional:

$\gamma_{(a,f(b))}(R)$

$\delta(R)$

#### Building Complex Expressions

Algebras allow us to express sequences of operations in a natural way.

**Example**:

- in arithmetic algebra: (x + 4)*(y – 3)
- in stack “algebra”: T.push(S.pop())

Relational algebra allows the same.

Three notations, just as in arithmetic:

- Sequences of assignment statements.
- Expressions with several operators.
- Expression trees.

#### Sequences of Assignments

Create temporary relation names. Renaming can be implied by giving relations a list of attributes.

Example: $R_3 = R_1 \bowtie_\theta R_2$ can be written:

$R_4=R_1\times R_2$

$R_3 =\sigma_\theta (R_4)$

Find Salespersons who have a commission less than or equal to 11 and are managed by manager with id of 44.

$R=\sigma_{(\textrm{“comm”}<11\wedge\textrm{“manager_id”}=44)} (\textrm{Salesperson})$

Find Names of Customers who bought an item on February 12th.

$R = \textrm{Sale}\bowtie\textrm{Customer}$

$S= \sigma_{(\textrm{date=02/12})} (R)$

$T =\Pi_{(\textrm{cust_name})} (S)$

#### Expressions with several operators

Example: the theta-join $R_3 = R_1 \bowtie_\theta R_2$ can be written $R_3 =\sigma_\theta (R_1\times R_2)$

**Precedence of relational operators:**

- Unary operators — select, project, rename — have highest precedence, bind first.
- Then come products and joins.
- Then intersection.
- Finally, union and set difference bind last.

But you can always insert parentheses to force the order you desire.

Find Products made in Peru

$R=\Pi_{(\textrm{prod_desc})} (\textrm{Product}\bowtie(\sigma_{(\textrm{country=’Peru’})} \textrm{Manufacturer}))$

#### Expression Trees

- Leaves are operands — either variables standing for relations or particular, constant relations.
- Interior nodes are operators, applied to their child or children.

Find Salespersons who sold something to Customers with less than 50000 in the bank.

$\Pi_{(\textrm{salpers_name})} (\textrm{Salesperson}\bowtie(\textrm{Sale}\bowtie\sigma_{(\textrm{curbal}<50000)} \textrm{Customers}))$

Example: Find all manufacturer ids that sell a product for less than $25 made in Peru.

Find Customers or Salespersons who live in Tokyo

$\rho_{\textrm{name}} (\Pi_{\textrm{cust_name}} (\sigma_{(\textrm{city =’Tokyo’})}(\textrm{Customer}))) \cup \rho_\textrm{name} (\Pi_{\textrm{salpers_name}} (\sigma_{(\textrm{office=’Tokyo’})}(\textrm{Salesperson})))$

#### Operations on Bags

(and why we care)

Relational Algebra uses set-semantics, which is why $\Pi$ eliminates duplicate tuples.

Databases actually use bag-semantics:

**Union**: $\{a,b,b,c\}\cup\{a,b,b,b,e,f,f\} = \{a,a,b,b,b,b,b,c,e,f,f\}$

(add the number of occurrences)

**Difference**: $\{a,b,b,b,c,c\}-\{b,c,c,c,d\} = \{a,b,b\}$

(subtract the number of occurrences)

**Intersection**: $\{a,b,b,b,c,c\}\cap\{b,b,c,c,c,c,d\} = \{b,b,c,c\}$

(minimum of the two numbers of occurrences)

**Selection**: preserve the number of occurrences

**Projection**: preserve the number of occurrences (no duplicate elimination in real databases)

**Cartesian product**, join: no duplicate elimination

## Summary of Relational Algebra

Why bother? Can write any RA expression directly in C++/Java, seems easy.

**Two reasons**:

- Each operator admits sophisticated implementations (think of $\bowtie$,$\sigma_theta$)
- Expressions in relational algebra can be rewritten: optimized