SQL Aggregation

UNION, INTERSECTION, EXCEPT

R UNION S
R INTERSECT S
R EXCEPT S

UNION

SQL uses bag semantics (not set semantics)

Find salespersons in Chicago that sold at least one table lamp.

(SELECT salpers_id
FROM Salesperson
WHERE office=’Chicago’)
UNION/Except/Intersect
(SELECT salpers_id
FROM Sale, Product
WHERE Sale.prod_id = Product.prod_id
            and Product.prod_desc = ‘Table Lamp’);

Except, intersect doesn’t work in MySQL. Why?
We already have an intersect operator… called IN, except is NOT IN

SELECT salpers_id
FROM Salesperson
WHERE office=’Chicago’ and salpers_ID IN
      (SELECT salpers_id
      FROM Sale, Product
      WHERE Sale.prod_id = Product.prod_id
            and Product.prod_desc = ‘Table Lamp’);

Controlling duplicates

SELECT does not automatically eliminate duplicates like Projection in Relational Algebra.

SELECT DISTINCT … does

Example:

SELECT DISTINCT qty
FROM Sale;

There are many functions in SQL (and its dialects) that take in a bag of tuples and returns a result.

sum, avg, count, min, max

These can be applied to a column to produce an aggregation on that column.

Example:

SELECT avg(qty)
FROM Sale

Remember this uses bag semantics, we can apply distinct to get set semantics in SQL:

SELECT avg(distinct(qty))
FROM Sale

Nulls are ignored in aggregation, but an aggregation of only null values is null.

How many salespersons are there?

SELECT count(*)
FROM Salesperson

How many manager’s are there?

SELECT count(distinct(manager_id))
FROM Salesperson

Grouping

SELECT
FROM
WHERE
GROUP BY

The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group.

What is the avg commission per office?

SELECT office, avg(comm)
FROM salesperson
GROUP BY office;

Compute gross revenue for each product:

SELECT product.prod_desc, sum(price*qty)
FROM product, sale
WHERE product.prod_ID = sale.prod_id
GROUP BY product.prod_desc;

Whenever aggregation is used each element in SELECT must be either

  1. Aggregated or
  2. Grouped
SELECT salpers_id, avg(qty)
FROM Sale;

Does this work? No        

SELECT salpers_id, avg(qty)
FROM Sale; 

or

SELECT salpers_id, avg(qty)
FROM Sale
GROUP BY salpers_id;

HAVING

The having clause is like a where clause from groups. It follows the GROUP BY clause

SELECT
FROM
WHERE
GROUP BY
HAVING

Condition applies to each group, and is used on aggregate value or grouped attribute after the grouping has occurred.

Find salespersons’ average sale quantity for those salespersons who have at least three sales.

SELECT salpers_id, avg(qty)
FROM sale
GROUP BY salpers_id
HAVING count(salpers_id) > 3

This can get very very complex.

What is the total revenue of the offices that made at least 3 sales?

SELECT office, SUM(price*qty)
FROM salesperson, sale, product
WHERE salesperson.salpers_id = sale.salpers_id and sale.prod_id = product.prod_id
GROUP BY office
HAVING count(*) >= 3;

SELECT $S$                      
FROM $R_1,\ldots,R_n$              
WHERE $C_1$                      
GROUP BY $a_1,\ldots,a_k$     
HAVING $C_2$

  1. Compute From-WHERE part, obtain table with all attributes in $R_1,\ldots,R_n$
  2. Group by the attributes $a_1,\ldots,a_k$
  3. Compute the aggregates $C_2$ and keep only groups satisfying $C_2$.
  4. Compute aggregates in $S$ and return the result.