AM

Database Joins in MySQL

Database Joins in MySQL

Introduction to Joins

  • In real-world databases, data is often spread across multiple tables for efficiency.
  • The JOIN keyword allows you to pull data from multiple tables using a single SELECT query.
  • Joins match related data across tables, often using primary key and foreign key relationships.

Types of Joins

  • Inner Join (Simple Join): Returns rows where the specified data appears in both tables.
    • If you imagine the data in one table as a set and the data in a second table as a set, then pick a type of value like a name or an ID number, and an inner join will return rows where that same data appears in both tables.
  • Outer Join: Selects all data from one table and only matching data from a second table.
    • If there is no matching data in the second table, NULL values are returned for the columns from the second table.
    • In an inner join, these rows would not appear in the results at all.

Left vs. Right Outer Joins

  • Outer joins can be either left or right.
  • Left Outer Join: Returns all rows from the first table (left table) in the JOIN clause and matching rows from the second table.
  • Right Outer Join: Returns all rows from the second table (right table) in the JOIN clause and matching rows from the first table.
  • The difference lies in which table returns all rows versus only matching rows.
  • Left outer joins are more common than right outer joins.

Join Conditions

  • Most JOIN clauses require a condition that specifies which columns will be matched across the tables (e.g., table1.ID = table2.ID).
  • If an inner join is used without a condition, it will match every row from the first table with every row from the second table, creating a very large dataset. This is generally not desired unless one of the datasets is very small.
  • In MySQL, JOIN, INNER JOIN, and CROSS JOIN are often used interchangeably.
    • This differs from standard SQL, where inner joins require a condition, and cross joins do not.

Inner Joins in Detail

  • Inner joins are the most common type of join in MySQL.
  • If you simply use the JOIN keyword without specifying the type, it defaults to an inner join.

Example: Joining Movies Data

  • The movies database is split into multiple tables (e.g., titles, studio).
  • Suppose you want to match the title of each film from the titles table to the city it was filmed in, available in the studio table.
  • The titles table has a studioID column (foreign key) that refers to the ID column (primary key) of the studio table.

Query Example:

  • To select the film title and the city where it was filmed, you can use the following query:
    sql SELECT titles.title, studio.city FROM titles INNER JOIN studio ON titles.studioID = studio.ID ORDER BY titles.title;
  • It is best practice to explicitly specify the table name for each column (e.g., titles.title) when joining multiple tables.

The Importance of the ON Clause

  • The ON clause is crucial because it tells MySQL which data is the same in both tables.
  • Using foreign keys enforces the integrity of the link between tables.
  • Joining on a primary key/foreign key relationship ensures data consistency.
  • An incorrect join condition can return invalid data that may appear correct at first glance (e.g., joining on directorID instead of studioID).

Using Multiple Joins

  • It is possible to use multiple joins in a single query.
  • Example: Linking a director to the critic score for their movies.

Query Example:

  • To select the director name, movie title, and critic rating, you can use the following query:
    sql SELECT director.dir_name, titles.title, criticrating.criticsRating FROM titles INNER JOIN director ON titles.directorID = director.ID INNER JOIN criticrating ON criticrating.titlesID = titles.ID ORDER BY director.dir_name;
  • The order in which you join the tables becomes less meaningful as you add more joins.
  • The equality in the ON clause is reversible (e.g., titles.ID = criticrating.titlesID is equivalent to criticrating.titlesID = titles.ID).

Selecting Columns from Joined Tables

  • Even if a table is required to join data, you do not necessarily need to select any columns from it.
  • You can link data without including all columns in the final result.

Outer Joins in Detail

  • An outer join is used to select all data from one table and only matching data from a second table.
  • This is useful for identifying missing data or when data in one table is optional.

Example: Finding Titles Without Poster Art

  • Create a new table named posters that contains filenames and resolutions for film posters.
    sql CREATE TABLE posters ( ID INT PRIMARY KEY AUTO_INCREMENT, titlesID INT, poster_filename VARCHAR(30), resolution VARCHAR(10), FOREIGN KEY (titlesID) REFERENCES titles(ID) );
  • Populate the posters table with data.
  • Note that the posters table has fewer rows than the titles table, indicating that some titles do not have poster art.

Query Example:

  • To find titles without poster art, use a left outer join:
    sql SELECT titles.title, director.dir_name, posters.filename, posters.resolution FROM titles INNER JOIN director ON titles.directorID = director.ID LEFT OUTER JOIN posters ON posters.titlesID = titles.ID ORDER BY titles.title;
  • Rows with NULL values in the filename and resolution columns indicate titles without corresponding poster art.
  • If you change the left outer join to an inner join, only rows with matching data in the posters table will be returned.
  • A right outer join would return all rows from the posters table and matching rows from the titles table. Because of the foreign key relationship, this will produce a result very similar to an inner join due to no posters existing without a corresponding title.

Challenge: Finding the Best Film

  • Use inner and outer joins to find information about the best film in the database.
  • The result should include the title, director name, critic's rating, and poster filename (if available).
  • The query should return only one row (the best film).

Solution:

  • Select the required columns:
    sql SELECT titles.title, director.dir_name, criticrating.criticsRating, posters.filename FROM titles INNER JOIN director ON titles.directorID = director.ID INNER JOIN criticrating ON criticrating.titlesID = titles.ID LEFT OUTER JOIN posters ON titles.ID = posters.titlesID ORDER BY criticrating.criticsRating DESC LIMIT 1;
  • The best film is determined by the highest critic's rating.
  • The query returns the film with the highest rating, its director, rating, and poster filename (if available).