This text file was sent from a SQL assignment for a data analyst role. The text file includes an explanation of the sqlite database that was sent and the questions that need to be answered.
This question will test some basic knowledge of SQL. The first four questions should not take more than an hour to complete.
There are three tables involved in this question: transactions
, segments
and
products
, which simulate a simplified retail data schema. Here is a semantic
description of the tables:
transactions
: contains detailed information about each product a customer has purchased. A transaction consists of one or more products purchased by a customer indexed by a unique transaction id.trans_id
: the transaction idcust_id
: the customer idprod_id
: the product iditem_qty
: the quantity of the product that is being purchaseditem_price
: the per unit price of the product (NOTE: the total revenue for a product isitem_qty * item_price
)
products
: contains detailed attributes about each product.prod_id
: the product id (same meaning as intransactions
)prod_name
: the product namebrand
: the brand of the productcategory
: the category of the product
segments
: contains a history of customer segmentation labelling for each customer. Segments are computed periodically for all current customers and appended to the table after each computation. The current (most up to date) active segment for each customer is specified byactive_flag = 'Y'
column.cust_id
: the customer id (same meaning as intransactions
)seg_name
: the segment of this customerupdate_dt
: the date when this segment was updatedactive_flag
: whether or not this segment is the active segment for this customer
Included is a sample SQLite3 database for the above tables named sample.db
.
Please refer to https://www.sqlite.org/download.html to download command-line
tools so you can run your queries against the sample data.
Please provide the SQL for each one of these questions. If there is not enough information given, please make a reasonable assumption and we can discuss it in the in-person interview. Please document any assumptions you’ve made in the comments of each solution and include any additional queries you ran to validate that they are reasonable.
-
Find the current active segment for each customer sorted by the segment update date. The output should contain three columns:
cust_id
,seg_name
,updated_at
. Here is some sample output:cust_id seg_name updated_at 4402 LAPSED 2014-06-01 00:00:00 11248 ONE-OFFS 2015-10-01 00:00:00
-
For each product purchased between Jan 2016 and May 2016 (inclusive), find the number of distinct transactions. The output should contain
prod_id
,prod_name
and distinct transaction columns. Here is some sample output:prod_id prod_name count 199922 Product 199922 1 207344 Product 207344 1 209732 Product 209732 1
-
Find the most recent segment of each customer as of 2016-03-01. Hint: You cannot simply use
active_flag
since that is as of the current date not 2016-03-01. The output should contain thecust_id
,seg_name
andupdate_at
columns and should have at most one row per customer. Here is some sample output:cust_id seg_name update_at 4402 ONE-OFFS 2016-02-01 00:00:00 11248 LOYAL 2016-02-01 00:00:00 126169 ONE-OFFS 2015-03-01 00:00:00
-
Find the most popular category (by revenue) for each active segment. Hint: The current (most up to date) active segment is specified by
active_flag = 'Y'
column in the segments table. Here is the some sample output:seg_name category revenue INFREQUENT Women 20264
-
Use the current sample database to find insights. Please document your steps, include any code/documents you used (Excel, SQL, Python, etc), and have the final results in a Powerpoint format (limit 3 slides) Hint: This is open ended, but you could look at sales trend, category insights, customers insights