Welcome!
Please look over the Syllabus. We will be using Canvas and Gradescope to manage the class. Also homework assignments will be posted to this site.
What is a Database?
A database is a system that stores and manages data. The key aspect of a database is its ability to organize, store, and retrieve data in an efficient way. But there is more to it than just storing information. Databases play a crucial role in managing the integrity, consistency, and accessibility of data in any system, from business operations to scientific research.
What Databases Can You Name?
Take a moment to think about the databases you’ve used or heard of. Here are a few examples:
- Relational databases (e.g., MySQL, PostgreSQL, Oracle)
- NoSQL databases (e.g., MongoDB, Cassandra)
- Cloud databases (e.g., AWS RDS, Google Firebase)
- File-based databases (e.g., SQLite)
Each of these has its own structure and uses, but they all have one thing in common: they store data that needs to be organized and managed efficiently. We will think about these principles in this class
Thinking About Data Organization
Smart people think carefully about how data is organized. Data doesn’t just exist in isolation: it needs structure to be useful. In this class, we’ll explore the principles and best practices behind organizing data. This includes learning about normalization, table design, and query optimization.
We will also discuss how databases can be used to improve data integrity, prevent redundancy, and minimize errors.
Let’s consider a simple table:
Student Name | SID | Major | Course | CRN | Term | Instructor |
---|---|---|---|---|---|---|
Alice | 9019958 | CS | CSE30246 | 12028 | F24 | Weninger |
Robert | 9025547 | CPEG | CSE40497 | 27364 | S23 | Weninger |
Felicity | 9015863 | CS | CSE30246 | 12028 | F22 | Weninger |
James | 9018654 | EE | CSE60246 | 12029 | F24 | Weninger |
What happens if James takes another course?
Student Name | SID | Major | Course | CRN | Term | Instructor |
---|---|---|---|---|---|---|
Alice | 9019958 | CS | CSE30246 | 12028 | F24 | Weninger |
Robert | 9025547 | CPEG | CSE40497 | 27364 | S23 | Weninger |
Felicity | 9015863 | CS | CSE30246 | 12028 | F22 | Weninger |
James | 9018654 | EE | CSE60246 | 12029 | F24 | Weninger |
James | 9018654 | EE | CSE30332 | 11013 | F24 | Morrison |
Notice that James‘s course is duplicated, and now there are inconsistencies like his instructor changing mid-term.
What happens if James switches from EE to CPEG?
Student Name | SID | Major | Course | CRN | Term | Instructor |
---|---|---|---|---|---|---|
Alice | 9019958 | CS | CSE30246 | 12028 | F24 | Weninger |
Robert | 9025547 | CPEG | CSE40497 | 27364 | S23 | Weninger |
Felicity | 9015863 | CS | CSE30246 | 12028 | F22 | Weninger |
James | 9018654 | CPEG | CSE60246 | 12029 | F24 | Weninger |
James | 9018654 | CPEG | CSE30332 | 11013 | F24 | Morrison |
Now we have to go through all instances of James’ data and manually update them. Imagine the consequences if there were hundreds or thousands of records like this!
Do we update earlier semesters? If not then now apparently EE no longer exists.
What happens if James drops CSE30332?
Student Name | SID | Major | Course | CRN | Term | Instructor |
---|---|---|---|---|---|---|
Alice | 9019958 | CS | CSE30246 | 12028 | F24 | Weninger |
Robert | 9025547 | CPEG | CSE40497 | 27364 | S23 | Weninger |
Felicity | 9015863 | CS | CSE30246 | 12028 | F22 | Weninger |
James | 9018654 | CPEG | CSE60246 | 12029 | F24 | Weninger |
In this case, Prof. Morrison is also removed from the system since the course he was teaching is now deleted.
What Are Data Anomalies?
These issues (duplicate data, unnecessary updates, and deletion of important information) are all examples of data anomalies. There are three types of data anomalies:
- Update Anomalies: These happen when some records are updated and others are not. For example, if James’ major changes from EE to CPEG, but we forget to update it in some instances, this will lead to inconsistency.
- Insertion Anomalies: These occur when we cannot insert data unless other data already exists. For instance, when James enrolled in CSE30332, we had to invent a course and a professor.
- Deletion Anomalies: These occur when deleting one piece of information leads to unintended consequences. For example, when James drops CSE30332, Prof. McMillan is also deleted from the system, even though he is still teaching other courses.
These principles will form the backbone of our discussions throughout this course.
We will explore how database design, normalization, and proper schema management help prevent anomalies and make the data management process more efficient.