# SQL Aggregation

UNION, INTERSECTION, EXCEPT

R UNION S
R INTERSECT S
R EXCEPT S

UNION

SQL uses bag semantics (not set semantics)

Find salespersons in Chicago that sold at least one table lamp.

(SELECT salpers_idFROM SalespersonWHERE office=’Chicago’)UNION/Except/Intersect(SELECT salpers_idFROM Sale, ProductWHERE Sale.prod_id = Product.prod_id            and Product.prod_desc = ‘Table Lamp’);

Except, intersect doesn’t work in MySQL. Why?
We already have an intersect operator… called IN, except is NOT IN

SELECT salpers_idFROM SalespersonWHERE office=’Chicago’ and salpers_ID IN      (SELECT salpers_id      FROM Sale, Product      WHERE Sale.prod_id = Product.prod_id            and Product.prod_desc = ‘Table Lamp’);

#### Controlling duplicates

SELECT does not automatically eliminate duplicates like Projection in Relational Algebra.

SELECT DISTINCT … does

Example:

SELECT DISTINCT qtyFROM Sale;

There are many functions in SQL (and its dialects) that take in a bag of tuples and returns a result.

sum, avg, count, min, max

These can be applied to a column to produce an aggregation on that column.

Example:

SELECT avg(qty)FROM Sale

Remember this uses bag semantics, we can apply distinct to get set semantics in SQL:

SELECT avg(distinct(qty))FROM Sale

Nulls are ignored in aggregation, but an aggregation of only null values is null.

How many salespersons are there?

SELECT count(*)FROM Salesperson

How many manager’s are there?

SELECT count(distinct(manager_id))FROM Salesperson

## Grouping

SELECTFROMWHEREGROUP BY

The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group.

What is the avg commission per office?

SELECT office, avg(comm)FROM salespersonGROUP BY office;

Compute gross revenue for each product:

SELECT product.prod_desc, sum(price*qty)FROM product, saleWHERE product.prod_ID = sale.prod_idGROUP BY product.prod_desc;

Whenever aggregation is used each element in SELECT must be either

1. Aggregated or
2. Grouped
SELECT salpers_id, avg(qty)FROM Sale;

Does this work? No

SELECT salpers_id, avg(qty)FROM Sale;

or

SELECT salpers_id, avg(qty)FROM SaleGROUP BY salpers_id;

## HAVING

The having clause is like a where clause from groups. It follows the GROUP BY clause

SELECTFROMWHEREGROUP BYHAVING

Condition applies to each group, and is used on aggregate value or grouped attribute after the grouping has occurred.

Find salespersons’ average sale quantity for those salespersons who have at least three sales.

SELECT salpers_id, avg(qty)FROM saleGROUP BY salpers_idHAVING count(salpers_id) > 3

This can get very very complex.

What is the total revenue of the offices that made at least 3 sales?

SELECT office, SUM(price*qty)FROM salesperson, sale, productWHERE salesperson.salpers_id = sale.salpers_id and sale.prod_id = product.prod_idGROUP BY officeHAVING count(*) >= 3;

SELECT $S$
FROM $R_1,\ldots,R_n$
WHERE $C_1$
GROUP BY $a_1,\ldots,a_k$
HAVING $C_2$

1. Compute From-WHERE part, obtain table with all attributes in $R_1,\ldots,R_n$
2. Group by the attributes $a_1,\ldots,a_k$
3. Compute the aggregates $C_2$ and keep only groups satisfying $C_2$.
4. Compute aggregates in $S$ and return the result.