Entity Relationship Diagrams

When designing a database, the first question is simple but profound:

How do we model our universe?

We want to represent what exists, what properties those things have, and how they are related. ER modeling provides a formal, systematic way to do this.


Three Key Components

All ER models revolve around three types of elements:

  1. Entity Sets – collections of real-world objects or concepts
  2. Attributes – properties that describe entities or relationships
  3. Relationships – associations between entities

Steps in Database Design

When working on a project, we typically follow a multi-step process:

Step 0: Pick an application domain

  • What is the world we are modeling? A library, a university, a movie studio, etc.

Step 1: Conceptual design

  • Discuss with your team what to model in this domain
  • Use a modeling language to express what information the database must hold
  • ER Model is the most popular language for conceptual design
  • Output: an ER diagram representing your domain

Step 2: Pick a DBMS type

  • Most common: Relational DBMS, which will be the focus of this course

Step 3: Translate ER design to a relational schema

  • Use standard rules to convert entities, relationships, and attributes into tables
  • Refine the schema to reduce redundancy and avoid anomalies

Subsequent steps:

  • Implement the schema in a database using SQL
  • Write an application program (C++, Java, Python, etc.) to handle user interaction and logic that the database cannot enforce

The ER Model in Detail

Entities

  • A real-world object that is distinguishable from other objects
  • Represented as a rectangle in an ER diagram
  • Described using a set of attributes

Examples:

  • Student, Course, Instructor, Movie, Studio

Attributes

  • Each attribute has an atomic domain: integer, string, real, date, etc.
  • Attributes describe entities or relationships
  • Keys: an attribute (or combination) that uniquely identifies an entity

Examples of attributes for Movie:

  • Title, ReleaseYear, Genre, Rating

Entity Sets

  • A collection of similar entities
  • Example: the set of all Students or all Movies

Relationships

  • Define associations between entities
  • Mathematically: if $A$ and $B$ are sets, a relationship $R \subseteq A \times B$
  • Example:
    • $A = {1,2,3}$ (students), $B = {a,b,c,d}$ (courses)
    • $R = {(1,a),(1,c),(3,b)}$
  • Relationships can have attributes: e.g., Enrollment has Grade and Term

Makes is a subset of Product $\times$ Company:

Relationship Cardinality

  • Cardinality expresses the number of instances of one entity that can be associated with another

Types:

Important Note: “one” means at most one, could be zero

What are some examples of each case:

  • One-to-one
  • Many-to-one/one-to-many
  • Many-to-many

Multiway relationship

There can be attributes on a relationship

Decision: Product — as an entity set or an attribute?

Can we model this as a mathematical set? How?

Convert to Binary:

Roles in Relationships

What if we need an entity set twice in one relationship?

Arrows in multi-way relationships?

What does the arrow mean?

Only one salesperson can be on a purchase. But many buyers can participate in a purchase.

Constraints

Constraints restrict the data in the database and give meaning to the schema. They are essential for correct design.

Common constraints include:

  • Keys: uniquely identify entities
    • E.g., SSN uniquely identifies a person
  • Single-value constraints: an entity attribute can have only one value
    • E.g., a person has only one mother
  • Referential integrity constraints: ensure referenced entities exist
    • E.g., a Product must be made by an existing Company
  • Domain constraints: restrict values of attributes
    • E.g., Age $\in [0, 150]$
  • General constraints: other rules
    • E.g., a course may have at most 50 students

Constraints improve:

  • Data integrity
  • Query efficiency
  • Clarity of the domain

Keys

  • Primary key: uniquely identifies an entity
  • Candidate key: multiple attributes that could serve as the key
  • Multi-attribute keys: keys composed of more than one attribute

Underline to refer to keys:

What are the keys?

Every entity set must have a key

Why?

A key can consist of more than one attribute, we call these multi attribute keys.

There can be more than one key for an entity set, called candidate keys: one key will be designated as primary key

Referential Integrity

  • Ensures references in relationships are valid
  • Example: Each Product is produced by exactly one Company
  • Prevents “dangling pointers” like in traditional programming

Many to one, but one can also be zero.

Referential integrity makes sure that a product is made by one and only one company, eg.

Draw ER diagram for the entities: Movies, Studios, CEOs

Assumptions: Every CEO runs a Studio, if a studio ceases to exist, then the CEO ceases to be CEO. If a CEO is fired or resigns, then the Studio still exists.