SQL Window Function Example Interview Question - Aggregation
When you're interviewing for a data scientist or data analyst role, it's highly likely you'll encounter SQL questions in your interview. Additionally, it's likely one or more of those SQL questions will require a window function to be solved. Window functions are a core concept in intermediate/advanced SQL, and mastering them will put you one step closer to landing your analytics role. In this post we'll give a quick overview of what window functions are, and then we'll dive into an example interview question along with our solution.
What are SQL window functions?
A window function defines a frame or ‘window’ of rows with a given length around the current row, and performs a calculation across the set of data in the window.
If you’re struggling with the definition above, don’t worry, it should become more clear as you put it into practice. Below we'll step through an example window function interview question that uses a simple aggregation.
Example question - Aggregation in SQL window function
Supose you're given the following table that shows spend by keyword, advertiser, and unique ad ID:
Table: keyword_summary
keyword | ad_id | advertiser_id | total_spend |
---|---|---|---|
bicycle | 1243213 | 1234 | 95 |
bike | 1243213 | 1234 | 71 |
bike tires | 1243213 | 1234 | 22 |
bike brakes | 1243213 | 1234 | 95 |
bike accessories | 1243213 | 1234 | 28 |
... | ... | ... | ... |
Using the table above, write a SQL query that returns each keyword along with the advertiser_id of the top spending advertiser, and total spend on the keyword (agnostic of advertiser). Sort the results in descending order by the total spend on the keyword. You can view/query the data in an interactive SQL fiddle here.
Solution
Click here to view this solution in an interactive SQL fiddle.
## First we separate the base query to get (1) the total spend on the
## keywords and (2) the advertiser spend ranking by keyword.
## Next, we get the total spend by keyword and join to the subquery
## that will provide the top spend by advertiser
## Last, we sort by the total spend on the keyword
WITH
base AS (
# separating the base sub-query to get the baseline calculations that we can
# use in our subqueries
SELECT
keyword,
advertiser_id,
## calculating the total spend on each keyword
SUM(total_spend) OVER(PARTITION BY keyword) AS spend_keyword,
## ranking the advertisers based on how much they spend on a given keyword
RANK() OVER (PARTITION BY keyword ORDER BY total_spend DESC) as advertiser_rank
FROM keyword_summary
)
SELECT
q1.keyword,
q2.advertiser_id,
SUM(q1.spend_keyword) AS spend_keyword
# we need to query the "base" table 2 times,
# 1. to get the total spend on each keyword
# 2. to get the top advertiser spend on each keyword
FROM base AS q1
LEFT JOIN (
SELECT
keyword,
advertiser_id
FROM base
WHERE advertiser_rank = 1
) AS q2
ON q1.keyword = q2.keyword
GROUP BY
q1.keyword,
q2.advertiser_id
# sorting the results by the total spend
ORDER BY spend_keyword DESC
Output:
keyword | advertiser_id | spend_keyword |
---|---|---|
bike | 21781 | 3608 |
bicycle | 1234 | 2520 |
bike brakes | 1234 | 266 |
bike accessories | 3829 | 196 |
bike tires | 3829 | 170 |
mountain bike | 3829 | 59 |