# SQL Aggregation Operators

Remember SELECT FROM WHERE GROUP BY HAVING

What is the total revenue of the offices that made at least 3 sales?

SELECT office, SUM(price*qty)FROM salesperson, sale, productWHERE salesperson.salpers_id = sale.salpers_id and sale.prod_id = product.prod_idGROUP BY officeHAVING count(*) >= 3;

SELECT $S$
FROM $R_1,\ldots,R_n$
WHERE $C_1$
GROUP BY $a_1,\ldots,a_k$
HAVING $C_2$

1. Compute From-WHERE part, obtain table with all attributes in $R_1,\ldots,R_n$
2. Group by the attributes $a_1,\ldots,a_k$
3. Compute the aggregates $C_2$ and keep only groups satisfying $C_2$.
4. Compute aggregates in $S$ and return the result.

## User Defined Functions (UDFs)

You can define your own UDF because some functions are not available.

Median, NTile (e.g., quartile, quintile, decile)

Why?

Because this takes too much memory.

Three categories of functions:

1. Distributive (sum, count, min, max) <- these functions can be distributed easily/trivially,
2. Algebratic (avg, stdev) <- these functions are simple arithmetic on distributive functions
3. Holistic (median, mode, ntile) <- these functions require the viewing of the whole relation/set

Each database system has its own ways of dealing with this, one way for median is to order and select the middle number via LIMIT.

#### LIMIT/TOP

Say we have a million rows in a table, but we only want the top 5 results

LIMIT/TOP is the keyword to use:

SELECT TOP 5 salpers_nameFROM Salesperson;

^ This is oracle, TOP doesn’t work in MySQL, instead we use LIMIT

SELECT salpers_nameFROM salespersonLIMIT 5

But, which 5? Without any ordering tuples are returns how they’re stored.

#### ORDER BY

So we need to order the items based on some attribute

Top 2 salespersons who have made the most sales:

SELECT salpers_name, count(*) as cntFROM salesperson, saleWHERE salesperson.salpers_id = sale.salpers_idGROUP BY salpers_nameORDER BY cnt DESC                     (ASC for ascending, which is also the default)LIMIT 2;                            (LIMIT X,Y to skip X and print Y tuples)