SQL Joins
When you’re querying most data sets, you’re most likely going to need to use a JOIN clause. This article will take you through when you want to use a join, the requirements of using a join, and the types of joins.
What is a SQL Join?
A SQL join is an instruction used within the SQL (Structured Query Language) to combine data from two table on a key. Let’s take you through an example of using a join in practice.
Example of when you want to use a SQL join
Say you have two tables relating to menu performance at a restaurant. The first table, daily_menu, provides you with information what’s on the menu for the day, the price of the item on the menu, and the number of unit sold. The primary key for this table is a combination of the date and item_id (these are the two fields that uniquely identify each record in the table). The second table, menu_items, provides you with information regarding the menu item name, if it’s vegetarian, and the average price of the item. The primary key for this table is the item_id. This is a simple example of a relational database.
Let us pretend you are an analyst for the restaurant chain, and you want to know how well the vegetarian items on the menu are selling doing compared to the non-vegetarian items. Given these two tables, you realize you cannot determine the performance based on daily_menu items alone, you also need to pull in the is_veg field from menu_items. You’ve identified a relationship between menu_items and daily_menu that can help you pull is_veg from menu_items and map it onto daily_menu. See graphic below.
How can you join daily_menu with menu_items in SQL? A perfect opportunity for a join! The next dilemma you run into is trying to figure out what type of join to use.
#We know we have 2 sets of data daily_menu and menu_items, we can pretend that daily_menu is Table A and menu_items is Table B. In a venn diagram, we essentially want to bring the two sets of data together.
There are 4 basic ways we can join this data together: inner join, left join, right join, or full join. We’ll go into each on in detail below.
INNER JOIN
An inner join lets us output records referenced in both Table A and Table B. If there is a mismatch of records in Table A and Table B, those records will not show up in the output of the query.
To do this in practice, if we take our example of daily_menu and menu_items here’s an example query of an inner join and the corresponding output.
SELECT
d.date,
i.name,
i.is_veg,
d.price,
d.units_sold
FROM daily_menu d
INNER JOIN menu_items i
ON d.item_id = i.item_id
FULL OUTER JOIN
FULL JOIN or FULL OUTER JOIN, lists all records from both table, regardless of whether the records have a match in the opposite table. Again, let's use the daily_menu and menu_item tables from above to practice. If we do a full outer join on item_id, we’re keeping all primary ids of both tables as well as the corresponding columns we select. The SQL and output table are below.
SELECT
d.date,
i.name,
i.is_veg,
d.price,
d.units_sold
FROM daily_menu d
FULL OUTER JOIN menu_items i
ON d.item_id = i.item_id
LEFT JOIN
A left join, as illustrated in our SQL joins picture above, will return all records from Table A, and any matching records from Table B. Using the daily_menu and menu_item tables, below is an example query for a left join and the resulting output.
SELECT
d.date,
i.name,
i.is_veg,
d.price,
d.units_sold
FROM daily_menu d
LEFT JOIN menu_items i
ON d.item_id = i.item_id
RIGHT JOIN
A right join, contrast to a left join, will return all records from Table B and any matching records in Table A. Again, we’re going to use daily_menu and menu_item tables to write an example query for a right join and the resulting output.
SELECT
d.date,
i.name,
i.is_veg,
d.price,
d.units_sold
FROM daily_menu d
RIGHT JOIN menu_items i
ON d.item_id = i.item_id
What’s next?
The basic SQL joins described above let you tie different pieces of data together, and allow you to start asking and answering more challenging questions! You’ll be using these often, especially if you have multiple data sources, or data that lives in different places.