Homework 3

I’ve created a database of college football games and plays from 2005 to 2013. I got this data from Kaggle. You will use this database to complete this assignment and submit this homework to Canvas by the due date as a file named netid_cse30246_hw3.sql. We will test run your queries on a modified version of the database.

Log in to db8.cse.nd.edu, log in to mysql and use the cse30246 database. The class-lecture tables are in this database as are the college football tables, prefixed by cfb_

These are the tables:

mysql> use cse30246;
Database changed
mysql> show tables;
+--------------------+
| Tables_in_cse30246 |
+--------------------+
| cfb_conference     |
| cfb_drive          |
| cfb_game           |
| cfb_game_stats     |
| cfb_kickoff        |
| cfb_kickoff_return |
| cfb_pass           |
| cfb_play           |
| cfb_player         |
| cfb_punt           |
| cfb_punt_return    |
| cfb_reception      |
| cfb_rush           |
| cfb_stadium        |
| cfb_team           |
+--------------------+

We’re going to use these tables to answer interesting questions and hone our SQL skills.

For example, if I want to know the size of Notre Dame Stadium I can ask the following query:

mysql> select capacity from cfb_stadium where name = 'Notre Dame Stadium';
+----------+
| capacity |
+----------+
|    80795 |
+----------+

For each question, write a single query to return the answer to the following questions.

1. 5pts How many total spectators have seen a game in this dataset?

+------------------+
| total_attendance |
+------------------+
| 332382121        |
+------------------+

2. 15pts On which down do most penalties occur?

+---------+
| down |
+---------+
| no hint |
+---------+

3. 15pts The Red zone is when the ball is within the 20 yards of the goal line. A red zone attempt is when the team moves inside the red zone, and a successful red zone attempt is when a touchdown is scored. Which 5 schools with over 10 red zone attempts have the best red zone success rate (%) and what was their success rate?

+---------------+--------+
| name | rate |
+---------------+--------+
| Towson | 0.7647 |
| South Dakota | 0.7500 |
| New Hampshire | 0.7407 |
| Indiana State | 0.7143 |
| Liberty | 0.7143 |
+---------------+--------+

4. 20pts Which conference has the shortest average game duration?

+-----------------+----------+
| conference_name | duration |
+-----------------+----------+
| no hint | no hint |
+-----------------+----------+

5. 15pts Which player has gained the most yards on 3rd down in October?

+------------+-----------+-------------+
| first_name | last_name | total_yards |
+------------+-----------+-------------+
| Colt | McCoy | 1328 |
+------------+-----------+-------------+

6. 15pts Find the top 3 states from which Notre Dame receives the most players? How many does that state send, and what percentage of ND players come from that state?

+-----------+-----+------------+
| homestate | cnt | proportion |
+-----------+-----+------------+
| CA        |  31 |     0.1047 |
| IL        |  30 |     0.1014 |
| FL        |  28 |     0.0946 |
+-----------+-----+------------+

7. 15pts How many players have played for three or more different teams?

+----------------+
| num_players |
+----------------+
| <no hint> |
+----------------+

8. 20pts Which stadium has sees the longest punt returns on average, and how long is that average?

+---------------+---------+
| name | yds |
+---------------+---------+
| Wrigley Field | 19.3333 |
+---------------+---------+

9. 20pts Which stadium has witnessed the home team lose the most times?

+-------------------------------------------+-------------+
| name                                      | home_losses |
+-------------------------------------------+-------------+

Hint: #2 is Aggie Stadium, Texas A&M New Mexico with 39 losses, which is #1?

10. 20pts Which player(s) had the longest reception in the 4th quarter. What team where they on and what year was it? (breaking non-trivial ties is 5 of the 15 points, (i.e 10 points with LIMIT, 20 points without LIMIT)

+------------+-----------+------------------+------+
| first_name | last_name | name             | year |
+------------+-----------+------------------+------+
| Amari      | Cooper    | Alabama          | 2013 |
| Johnny     | Forzani   | Washington State | 2009 |
| Jamison    | Crowder   | Duke             | 2012 |
+------------+-----------+------------------+------+

<HINT: Order is not important>

11. 20pts Which 5 teams (and seasons) had the most attendance at their games?

+-----------+------+--------+
| name | y | att |
+-----------+------+--------+
| Michigan | 2011 | 897431 |
| Michigan | 2007 | 882115 |
| Michigan | 2009 | 871464 |
| Tennessee | 2009 | 867537 |
| Michigan | 2006 | 864035 |
+-----------+------+--------+

This is an individual assignment. You may discuss strategies in groups, but you may not work together when writing the solutions. SQL-Plagiarism is easy to detect.

Submissions must follow the naming convention netid_cse30246_hw3.sql and be submitted to Canvas by midnight on the due date

We will test your solution on reduced data using the command:
mysql -p -Dcse30246 < netid_cse30246_hw3.sql