📖 MySQL Joining Tables

In most real-world database scenarios, data is distributed across multiple related tables. Multi-table queries allow you to extract and combine this data to produce meaningful results. Whether you're retrieving a customer’s order history, tracking inventory, or generating business reports, multi-table queries are essential for bringing data together.

In this article, we will explore how to use different types of multi-table queries, such as JOIN and UNION, and explain when and why each method is useful in practice. We’ll also demonstrate how to apply these queries using MySQL with examples drawn from common business scenarios.

Why Use Multi-Table Queries?

Data in relational databases is typically divided into multiple tables to minimize redundancy and ensure data integrity. Multi-table queries allow you to combine related data, producing comprehensive reports that provide deeper insights. By joining or combining tables, you can generate insights that would otherwise be difficult to achieve if you only used single-table queries.

Cartesian Joins

A Cartesian Join (or Cross Join) returns all possible combinations of rows between two or more tables. It usually occurs when a JOIN condition is missing, leading to a multiplication of rows. While rarely used intentionally, it's important to recognize this type of join to avoid unintended results.

Example of Unintended Cartesian Join
SELECT customer.Name, product.ProductName
FROM customer, product;

If the JOIN condition is omitted, every row from the customer table is paired with every row from the product table. If there are 10 customers and 20 products, the result will include 200 rows—regardless of actual relationships.

Example with JOIN keyword
SELECT customer.Name, product.ProductName
FROM customer
JOIN orders ON customer.CustomerId = orders.CustomerId;

Using JOIN Clauses to Combine Data

The JOIN clause is used to combine rows from two or more tables based on a related column. MySQL supports several types of joins, each of which serves a different purpose depending on how the data is related.

Syntax
SELECT columns
specifies the columns to be selected.
column may be in either of the 2 tables that are being joined
if both tables contain a column with the same name, you must precede the column name with the tablename (table.column)
FROM table1
the name of the first table to be joined
the order of tables listed doesn't matter.
[INNER] JOIN table 2
the name of the second table to be joined.
INNER is optional. INNER JOIN and JOIN do the same thing
ON table1.column = table2.column;
specifies how the tables are linked together.
usually one column is a primary key and the other one is a foreign key
[WHERE condition]
OPTIONAL: use if you want to restrict the rows being selected.
same as for single table queries
[ORDER BY column]
OPTIONAL: Use if you want to sort the results
same as for single table queries
[LIMIT n]
OPTIONAL: Use if you want to limit the number of rows returned
same as for single table queries
Example
SELECT 
    invoice_number, vendor_name
FROM
    vendors
        INNER JOIN
    invoices ON vendors.vendor_id = invoices.vendor_id
ORDER BY invoice_number;
-- This query joins the vendors table to the invoices table by 
-- matching the vendor_id in the vendors table (primary key) 
-- to the vendor_id in the invoices table (foreign key).

Inner Join

An INNER JOIN retrieves only the rows that have matching values in both tables. It's the most common type of join and is used when you want to return data where there is a relationship between the tables. The INNER keyword is optional as it is the default type of JOIN.

SELECT 
    InvoiceId, CONCAT(FirstName, ' ', LastName) AS Customer
FROM
    customer
        JOIN
    invoice ON customer.CustomerId = invoice.CustomerId;

Real-world use case: This query combines customer and order information, showing only those customers who have placed orders.

Fully Qualified Column Names

A fully qualified column name refers to the practice of specifying a column in an SQL query by explicitly including the table (and sometimes the database) it belongs to (customer.CustomerId), ensuring clarity and avoiding ambiguity, especially when multiple tables in a query have columns with the same name.

The USING Keyword

Short hand keyword that may be used when the two column names used to join the tables have the same name.

SELECT 
    InvoiceId, FirstName, LastName
FROM
    customer
        JOIN
    invoice USING (CustomerId);
-- Only works when join columns are the same name.
Table Aliases

For readability and ease of coding table aliases may be specified. This makes qualifying column names easier and often makes a query simpler to follow.

SELECT 
    CONCAT(c.FirstName, ' ', c.LastName) AS Customer, CONCAT(e.FirstName, ' ', e.LastName) AS Rep
FROM
    customer c
        JOIN
    employee e ON SupportRepId = EmployeeId;
-- Can be used anywhere in the query
-- Replaces the original table name as reference

Outer Joins

Lists all the rows from one table whether or not it matches the second table based on the condition of the WHERE clause. There are three types. 

LEFT JOIN
includes all rows from the table on the left (i.e. first table listed), but only rows from the table on the right (second table listed) that match the first table.
RIGHT JOIN
includes all rows from the table on the right (i.e. second table listed), but only rows from the table on the left (first table listed) that match the second table.
FULL OUTER JOIN or PRODUCT
includes all rows from both tables whether or not they match based on the condition in the WHERE clause. Rarely used!

An example of an outer join would be if you required a list of all customers and if the customers have an open order list that too.

To specify an outer join requires a combination of two key words. LEFT or RIGHT and JOIN.

Left Join

A LEFT JOIN returns all rows from the left table (the first table), and the matched rows from the right table. If there is no match, the result is NULL for the right table.

SELECT 
    FirstName, InvoiceId
FROM
    customer c
        LEFT JOIN
    invoice i USING (CustomerId)
ORDER BY InvoiceId;

Real-world use case: This query is useful when you need to display a list of all customers, including those who haven't placed an order yet, for a targeted marketing campaign.

Self Join

A SELF JOIN allows a table to be joined with itself. This is useful for hierarchical data, such as employee-manager relationships, where one row references another row in the same table.

SELECT e1.employee_name AS employee, e2.employee_name AS manager
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.employee_id;

Real-world use case: This query lists employees alongside their respective managers by joining the employee table to itself.

Joining a Table to Itself
  • Self-joins are useful for tables with a self-referencing relationship, or when one wants to determine data inconsistencies.
  • When joining a table to itself, aliases must be used to treat the table as two separate tables, that is, giving the same table two different aliases.
  • Use != or <> operators to avoid repetitive rows in the result set.
  • Use two separate table aliases and qualify the column names with the aliases.
Example: Show all the customers who live in the same city and state.
SELECT DISTINCT
    c1.FirstName, c1.City, c1.State
FROM
    customer c1
        JOIN
    customer c2 ON c1.City = c2.City
        AND c1.State = c2.State
        AND c1.FirstName <> c2.FirstName
ORDER BY c1.State , c1.City;
Useful for Analysis
  • Find customers located in the same zip code.
  • Find employees who share the same manager.
  • Identify potential data integrity issues, like duplicate customer entries.
Using a Self-Join on a Primary Key

A self-join can involve the primary key of a table. This occurs when a foreign key in the table is related to the primary key of the same table. For example, in an employee table, EMPLOYEE_NUM may be the primary key and MGR_EMPLOYEE_NUM could be another column, which represents the employee number of the employee's manager. Because the manager is also an employee, MGR_EMPLOYEE_NUM is a foreign key whose value will match the value of the primary key of the same table.

SELECT 
    e1.EmployeeId,
    e1.FirstName,
    e1.LastName,
    e1.ReportsTo,
    e2.EmployeeId,
    e2.FirstName,
    e2.LastName
FROM
    employee e1
        LEFT JOIN
    employee e2 ON e1.ReportsTo = e2.EmployeeId;

Joining Tables from Different Databases

An accounts payable (vendor) dataset may exist in another database than a customer order (invoices) dataset. To join tables from separate databases qualify the table names with the database name.

SELECT 
    *
FROM
    invoice i
        JOIN
    om.customers c ON i.CustomerId = c.customer_id;

Note that just as columns may always be qualified, but not always necessary, most often a query is working with tables within the same database and the database qualification is not necessary.

Joining More than Two Tables

Joining multiple tables in a relational database requires separate test conditions for each table. Join one table that has a direct one-to-many relationship with the next table, join the second table to third, and the third to fourth. It's common in a database to join many tables in a query.

Joining 4 tables to produce a result set would require at least three join conditions.

The best way to approach a complex multiple table join is to familiarize yourself with the tables in a relational database, and structure the query's join conditions based on the primary and foreign keys of each table. In other words what does table one have in common with table two? Table two to table three? Table three to table four?

Suppose the need is to generate a report that provides the name of the tracks that companies have ordered in the past.

By inspecting the database we can determine the following.

  • The Company names are listed in the customer table.
  • Company orders are listed in the invoice table and related to the company using the CustomerId.
  • Each order contains one or more line items from the invoiceline table.
  • The invoice line items are related to the tracks from the track table using the TrackId.

The invoiceline table connects the invoice table to the track table using the IDs from each of the outter tables as foreign keys. These foreign keys could have been used as a compound primary key, but in this case they were not. The invoiceline table has it's own primary key field and servers as a bridge between the outter invoice and track tables.

Example Joining 4 tables.
SELECT 
    c.Company, t.Name
FROM
    customer c
        JOIN
    invoice i USING (CustomerId)
        JOIN
    invoiceline USING (InvoiceId)
        JOIN
    track t USING (TrackId)
WHERE
    c.Company IS NOT NULL;
Steps in this process.
  1. Determine columns and, if the column names are repeated in different tables, qualify them.
  2. Join table 1 (customer) to table 2 (invoice) based on the tables' relationship.
  3. Join table 2 (invoice) to table 3 (invoiceline) based on the tables' relationship.
  4. Join table 3 (invoiceline) to table 4 (track) based on the tables' relationship.
  5. Qualify which records you want (in this example only customers that have a company name).

UNION Statements

While JOINs are used to combine related data from multiple tables, UNION allows you to merge results from separate SELECT queries.

Similar to a join but works with two result sets generated by two or more SELECT statements.

  • The number of columns selected from each table must be the same.
  • The data types (numeric, character, date) must be compatible in the columns.
  • Column selection order must be the same left to right although column name need not be.
    For example don’t select DATE, AMOUNT in one select statement and BALANCE, DATE In the second SELECT statement.
  • To sort the combined result set you must use the column name(s) from the first SELECT statement.
  • Optionally use the AS keyword to define an additional column to reference which table a row is selected from. IE: 'Table1' AS source.
SELECT 'Table1' AS source, column1, column2
FROM table1
UNION
SELECT 'Table2' AS source, column1, column2
FROM table2;

Differs from an outer join in that an outer join combines all records of one table and some records from a second table, whereas a UNION can limit the result sets of both SELECT statements.

SELECT 
    InvoiceId, 'Archive' AS Status, BillingCountry, InvoiceDate, Total
FROM
    invoice
WHERE
    InvoiceDate < '2022-01-01'
        AND Total > 10 
UNION SELECT 
    InvoiceId, 'Active' AS Status, BillingCountry, InvoiceDate, Total
FROM
    invoice
WHERE
    InvoiceDate >= '2022-01-01'
        AND Total > 10
ORDER BY BillingCountry;

Troubleshooting Multiple Table Queries

Often joins will execute without errors but the result set is wrong. Here are some tips to assist with ensuring your queries display the data requested accurately.

  • If using calculations on a table value use a calculator to test your result set on one or more records.
  • Know the row counts of the tables being read, usually a row count of a result set that is equal to the row count of one of the tables means something is wrong.
  • If possible test your result sets against a small data set before executing over a large amount of data.
  • Always remember this general rule: Number of tables – 1 = number of comparison operators required.
  • Review your result set.

In the workbench you may sort the columns of a result set simply by clicking on the column. If your query is West Coast Customers and you’re result set contains rows with a state value of TX, something is probably wrong.