Case Study Example 1:An eCommerce Company Evaluation
Why did we do this? A lot of interviews have a take-home case component. This is different than a lot of the prepping we provide because this is challenging you to think end-to-end, on what is probably a real business problem the organization you're interviewing for is facing.
How can I use this? We have provided the data (downloaded from Kaggle, thank you Kaggle!), a mock prompt (similar to ones experienced at top-tier companies), a potential presentation (which could be thought of as a solution), and the work we did to get to the solution. We hope that if you run into a similar scenario or want to extra practice, you can use our work to guide you in the right direction!
The data It was really hard to find a real dataset that we could use as a "practice case." We ended up finding data on Kaggle! You can download the data there, or you can visit the links below. We didn't use all of the data provided in this practice set by Kaggle, so below we only include ones we used to get to the solution.
- Orders dataset: Provide information for each item ordered
- Order items dataset: Information for items within each order and the cost to ship and price broken out for each item within an order.
- Order payments dataset: Provides information regarding payments made on each order. Make sure you aggregate total payment for each order to get unique order price.
- Product dataset: Provides information about the product.
- Product category name translated dataset: This dataset is Brazilian, so all categories are in Portuguese, join category name on this table to get the translated category.
- Order reviews dataset: This table has review information for each order.
- Customers dataset dataset: This dataset has information regarding customer_id, which links directly to order_id in the orders dataset. However, to get the unique customer id for each order you need to link to this table.
The prompt
You’re a Data Scientist / Business Analyst working for a new eCommerce company called A&B Co. (similar to Amazon) and you’ve been asked to prepare a presentation for the Vice President of Sales and the Vice President of Operations that summarizes sales and operations thus far. The summary should include (at a minimum) a summary of current state the business, current customer satisfaction, and a proposal of 2-3 areas where the company can improve. Here are some facts:
- It’s currently September 2018 (e.g., you can ignore all data after September 2018)
- The company’s inception was January 2017 (so you can ignore all data before January 2017)
- Company is US-based, but launched in Brazil (which is why some information is in Portuguese)
- You can assume all orders are delivered (so ignore the order state field)
Your presentation should not have more than 10 slides of content, and the presentation itself should only take ~15 minutes.
The data
Note all data was provided by Kaggle. Feel free to read about each file on Kaggle (or by clicking the "Click for more infomation" button above), however you can download the data by clicking on each link below.
- Orders dataset
- Order items dataset
- Order payments dataset
- Product dataset
- Product category name translated dataset
- Order reviews dataset
- Customers dataset dataset
A solution
This is a potential solution to address the problems outlined in the prompt. Note this is not the only solution, nor is it necessarily the best. The goal of providing you this is to give you an idea of how we would go about solving this problem and how we would present this to an interview pannel.
The work!
Below are the code from our iPython notebook, the work we did in Google Sheets (how we created most of the charts), and some thoughts we had on solving case.
Python notebook You can view the Jupyter notebook we used here.
Google sheets A link to the spreadsheet.
Most of the charts / quick side analysis I did was using Google sheets. Python charts are nice, but I find Google sheets generally quicker/more flexible for basic charts. For a case, you typically get 24-48 hours and you might end up spending a lot of time trying to make charts look "pretty". I found it's best to use Python for data aggregation/manipulation and use a spreadsheet software to make the more simple charts (unless otherwise specified in prompt).
Our thought process
Build a framework to answer the questions. If you’re not sure what the questions are, create questions for yourself to answer. It makes the process of digging for data so much easier. It’s hard to come up with an answer if you don’t know what the questions are. This point seems like a no-brainer, but it’s good to make sure that you’ve created a structure to answer the key points. If you’re not sure what the question at hand is, you might need to play with the data a bit to understand what seems to be the problem at hand. For this particular case the ask is really clear, we need to create a summary which includes current state the business, current customer satisfaction, and a proposal for an area where the company can improve. The questions we came up with (and some answers listed below) given the ask are:
-
What should be shown to reflect current state of business? (Probably something to do with product growth and amount of money)
-
Revenue → how much money are we making?
-
Volume of sales → how many orders are we getting?
-
Customer summary based on spend + behavior
-
How should we present customer satisfaction?
-
Is customer satisfaction a problem? Do initial analysis to figure it out?
-
Does customer satisfaction reflect our current business state? (e.g. if business is down, does customer satisfaction also go down?, if business is boomin’ is customer satisfaction higher?)
-
Should I focus on sales area of improvement or customer satisfaction area of improvement?
-
This depends on what bullet points 1 + 2 yield, also depends on customer churn
Questions don’t always have analytical answers, sometimes questions lead to more questions. This is of course a high-level framework, but the above should give you and idea of how we went through solving this! (the details of course are provided in the code/presentation material)