📖 MySQL Database Design

Database design is a formal process with specific critical rules to ensure a well-formed data structure that can be maintained without causing data errors and anomalies.

Steps for Designing a Data Structure

Identify the Data Elements
  • List the information you need to accomplish the task (e.g., contact information, product descriptions, or inventory).
  • Interview users, analyze existing systems, and review documents, reports, and forms.
  • Associate elements with entities (a person, place, or thing).
Subdivide Elements Into Smallest Useful Form
  • Identify meaningful descriptive attributes for each entity (e.g., name, address, phone, email, etc.).
  • Break attributes into the smallest useful form (e.g., name into first and last name, address into street, city, state, and zip).
Identify Tables and Assign Columns
  • Group data elements by entities.
  • If an element relates to more than one entity, include it in all relevant entities.
  • Add and remove elements as appropriate according to the entities.
  • Data elements (entity attributes) will become columns in the database.
Identify Primary Keys
  • Each entity must have one or more data elements that uniquely identify each row.
  • Primary key values should never change.
  • If a suitable element does not exist, create an ID column to uniquely identify it.
  • The book talks about foreign keys at this point, but I usually include this in the normalization process.
Normalize the Data
  • More on this below.
Identify Indexes
  • Indexes are used frequently in searches or joins.
  • Indexes should contain a large number of distinct values.
  • Indexes should be updated infrequently to maintain performance.

Data Normalization

Normalization is the process of organizing data into tables in a way that reduces redundancy, eliminates errors, and minimizes maintenance.

First Normal Form (1NF)
Ensure that each column contains only atomic (indivisible) values, and eliminate repeating groups.
  • Each column must contain a single, atomic value.
  • The table must not contain repeating columns or groups to be in First Normal Form.
  • Move related fields that have more than one value to a new table.
  • Determine the primary key for the new table.

Let's consider we are working with a database that tracks orders for a company. We create a database table to store the information for those orders as shown below. Immediately, we see that the ItemNumber, Description, Quantity, and Price fields contain multiple values in the same row. This violates the principle of atomicity, causing difficulty in maintaining our database and violates First Normal Form: Remove repeating groups.

Sample Orders Table
orderIDOrderDateItemNumberDescriptionQuantityPrice
14892 3/4/2021 WS42 Washer 1 289.99
14893 3/18/2021 WS42
DR89
Washer
Dryer
1
1
289.99
259.99
14894 4/6/2021 RF14 Refrigerator 2 2299.99
14895 4/17/2021 MW66
RF14
Mower
Refrigerator
1
1
349.99
2299.99

The most straightforward way to remove the repeating groups is to create separate rows for each item being ordered, as shown below. What we are left with is a table that correctly adheres to First Normal Form by ensuring atomic values and no repeating groups. However, the table now contains repeated orderID values, which will need to be addressed in Second Normal Form.

Sample Orders Table: First Normal Form
orderIDOrderDateItemNumberDescriptionQuantityPrice
14892 3/4/2021 WS42 Washer 1 289.99
14893 3/18/2021 WS42 Washer 1 289.99
14893 3/18/2021 DR89 Dryer 1 259.99
14894 4/6/2021 RF14 Refrigerator 2 2299.99
14895 4/17/2021 MW66 Mower 1 349.99
14895 4/17/2021 RF14 Refrigerator 1 2299.99

Important Consideration: The structure and configuration of your primary key are critical when normalizing a database. If you use a single field like orderID as the primary key, it must be unique across all rows. This means that to manage multiple items within a single order, you'll need to create a separate table (such as OrderItems) where each item can be listed individually, with orderID serving as a foreign key rather than the primary key. Alternatively, if your primary key is a composite key, combining orderID with another field like ItemNumber, the combination must be unique. This allows orderID to repeat, but ensures that each item in an order is uniquely identified by the composite key. Choosing the right primary key structure is essential for maintaining the integrity of your database while achieving First Normal Form (1NF).

Second Normal Form (2NF)
Eliminate redundancy and update anomalies by ensuring that all non-key attributes are fully dependent on the entire primary key.
  • Applies to tables that have a composite or compound primary key.
  • Move fields that are dependent on only a portion of the primary key to a new table.
  • Identify the primary key for the new table.
  • Identify a foreign key to relate the two tables.

The problem with the table above is that we now have the orderID repeated in several rows. While this does not violate 1NF, it introduces redundancy because each order appears multiple times for different items. To address this, we need to consider the functional dependencies of each field. For example, the Description is dependent on the ItemNumber, not the orderID. This suggests that the Description should be moved to a separate Items table. We also need to create a linking table to manage the many-to-many relationship between orders and items.

Looking closely at the information in this table, we see it contains attributes for three different entities: Order, Item, and Line Item (which tracks the details of each transaction). To bring the table into Second Normal Form, we should restructure it as follows:

ORDERS (orderID(pk), OrderDate)

ITEMS (itemID(pk), Description)

LINE_ITEMS (lineID(pk), orderID(fk), itemID(fk), Quantity, Price)

The LINE_ITEMS table uses the primary keys from the ORDERS and ITEMS tables as foreign keys, defining their relationships. This structure creates a many-to-many relationship between ORDERS and ITEMS, with the LINE_ITEMS table acting as the linking table.

Orders Table: Second Normal Form
orderIDOrderDate
14892 3/4/2021
14893 3/18/2021
14894 4/6/2021
14895 4/17/2021
Items Table: Second Normal Form
itemIDDescription
WS42 Washer
DR89 Dryer
RF14 Refrigerator
MW66 Mower
MW108 Mower
Line Item Table: Second Normal Form
 lineID  orderID  itemID  Quantity  Price 
1 14892 WS42 1 289.99
2 14893 WS42 1 289.99
3 14893 DR89 1 259.99
4 14894 RF14 2 2299.99
5 14895 MW66 1 349.99
6 14895 RF14 1 2299.99
Third Normal Form (3NF)
Remove fields that are dependent on a non-key attribute.
  • Move related fields that are dependent on a non-key attribute to a new table.
  • Add a primary key to the new table.

Third Normal Form is sometimes easier to achieve than Second Normal Form. In this case, you are looking for fields that are dependent on a non-key attribute. These are fields that do not depend on the primary key for identification, rather they depend on another non-key field. The table below has a list of customers with their sales reps. The reps are identified by a repID and a RepName. The RepName is dependent on the repID and not the customerID(pk) in this table. This violates Third Normal Form because RepName is not dependent on the table's primary key, but on another non-key attribute.

Sample Customers Table
customerIdCompanyBalanceCreditLimitrepIdRepName
489 Lowest Appliance 4285.15 10000 1 Brad Norton
492 Best Bargains 1632.48 5000 2 Boe Regard
506 Home Improvements 985.14 2500 4 LC DeCow
512 The Building Co. 6587.56 15000 2 Boe Regard

To fix this, the RepName should be removed from this table and placed in a separate Reps table, which lists the reps by name and ID.

Customers Table: Third Normal Form
customerIdCompanyBalanceCreditLimitrepId
489 Lowest Appliance 4285.15 10000 1
492 Best Bargains 1632.48 5000 2
506 Home Improvements 985.14 2500 4
512 The Building Co. 6587.56 15000 2
Reps Table: Third Normal Form
repIdRepName
1 Brad Norton
2 Boe Regard
3 Shive Gency
4 LC DeCow

Enhanced Entity Relationship (EER) Diagram

  • Representation of the entities in the database, including tables, views, and stored programs.
  • Shows the relationships between entities.
  • In MySQL Workbench, you can create an EER model of a data structure.
  • You can also create an EER model from an existing database.
  • After creating an EER model, you can forward engineer a SQL Create Script to create a database and tables.
Sample EER Diagram showing 5 tables: Customers, Reps, Orders, Line Items and Items
Sample EER Diagram showing 5 tables: Customers, Reps, Orders, Line Items and Items

Tables and relationships in an RDBMS (Relational Database Management System) can be illustrated using an EER (Enhanced Entity Relationship) diagram. This type of diagram, as shown above, is used to show the relationships between the database tables using field lists by table with connecting lines from primary key fields to foreign key fields in related tables.

Connections with endpoints using the numeral 1 and the infinity symbol (∞) indicate a one-to-many relationship where the one-side primary key can be used many times as a foreign key in the many-side table. One-to-many relationships are the most common type.

Two tables that have a one-to-many relationship to a common third table are said to have a many-to-many relationship where each table can have many records in the linking third table. An example of this is with the Orders and Items tables through the LineItems table.