SQL Nested Subquery Example Interview Question
Overview of Nested Queries
A SQL nested query, also known as a subquery or subselect, is a SELECT query method embedded within another query. It’s an efficient way to get the desired output that may require multiple query steps or to handle queries that are interdependent.
When to use Nested Subqueries
You can use nested subqueries in a couple different ways:
- You can query over subqueries (e.g. selecting from a subquery)
- You can replace 1D arrays (e.g. a typical list of items) and single field joins with a single subquery in the WHERE or HAVING clause
In order to use a subquery, there are a couple of syntactical rules that need to be followed:
- The subquery needs to be enclosed within parentheses
- Depending on the SQL engine you are using, you might need to alias a given subquery
- If using in a WHERE or HAVING clause, the SELECT statement of an subquery can only return the single field being evaluated
Example SQL interview question using a nested subquery
Suppose you are given the following table showing company sales:
Table: sales_info
date | sale_id | sale_usd |
---|---|---|
2020-01-05 | 1111 | 93695 |
2020-01-07 | 1112 | 879617 |
2020-01-07 | 1113 | 752878 |
... | ... | ... |
Calculate the cumulative percent of total sales on a given day. The output of the table should look like the example table below.
date | pct_total_sales |
---|---|
2020-01-05 | X% |
2020-01-07 | Y% |
You can work through this example using the interactive SQL fiddle here.
Before we start writing SQL, we'll break the question into steps:
- Calculate the daily sales total
- Calculate the cumulative sum of the daily sales total and total sales for all days
- Divide the daily total sales by the cumulative sum
1. Calculate the daily sales total
First, we'll write the base query, which will become a subquery in the next step. The below query calculates the daily sales total and you can interact with the query with this SQL fiddle.
SELECT
#we'll need to sum sale_usd by the date
date,
sum(sale_usd) as total_usd
FROM sales_info
#because we're aggregating sale_usd by date, we need
#need to group by date
GROUP BY date
2. Calculate the cumulative sum of the daily sales total and total sales for all days
The query below calculates the cumulative sum of the daily sales total and total sales for all days. You can interact with the query below with this SQL fiddle.
You'll notice in this step that we build a query around the previous query from the step above. In order to successfully build the subquery, we needed to enclose the subquery in parentheses and ensure that we alias the table (since we're using MySQL, note that this isn't a requirement for some SQL variants). We also need to ensure that we include all fields needed for the outer query (e.g. date and total sales).
SELECT
#in this query, we do not group by since we're using a windowing function
date,
SUM(total_usd) OVER (
ORDER BY date ASC rows
BETWEEN unbounded preceding and current row)
as cum_total, # this is a windowing function to
# calculate the cumulative sum
SUM(total_usd) OVER () as total # this is a windowing function to
# calculate the total
FROM(
SELECT
#we'll need to sum sale_usd by the date
date,
sum(sale_usd) as total_usd
FROM sales_info
#because we're aggregating sale_usd by date, we need
#need to group by date
GROUP BY date
) as q1 #we create an alias for this table as required by MySQL
3. Divide the cumulative total sales by the cumulative sum
The last step is to divide the cum_total
by the total. We can accomplish this in the same step as above (just dividing the two windowing functions), or we can build a subquery over the previous step. The query below uses another subquery resulting in the end query having two nested subqueries. You can interact with the query below with this SQL fiddle.
SELECT
date,
100 * cum_total / total as
FROM(
#in this query, we do not group by
#since we're using a windowing function
SELECT
date,
SUM(total_usd) OVER (ORDER BY date ASC
rows BETWEEN unbounded preceding and current row)
as cum_total, # this is a windowing function to
# calculate the cumulative sum
SUM(total_usd) OVER () as total # this is a windowing function
# to calculate the total
FROM(
SELECT
#we'll need to sum sale_usd by the date
date,
sum(sale_usd) as total_usd
FROM sales_info
#because we're aggregating sale_usd by date, we need
#need to group by date
GROUP BY date
) as q1 #we create an alias for this table as required by MySQL
) as q2