Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solved #17

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions .ipynb_checkpoints/readme-checkpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
![logo_ironhack_blue 7](https://user-images.githubusercontent.com/23629340/40541063-a07a0a8a-601a-11e8-91b5-2f13e4e6b441.png)

# Lab | Window Functions

<details>
<summary>
<h2>Learning Goals</h2>
</summary>

This lab allows you to practice and apply the concepts and techniques taught in class.

Upon completion of this lab, you will be able to:

- Use window functions to perform complex analytical queries and gain insights into data, including computing rolling calculations, ranking data, and performing aggregations over subsets of data.

<br>
<hr>

</details>

<details>
<summary>
<h2>Prerequisites</h2>
</summary>

Before this starting this lab, you should have learnt about:

- SELECT, FROM, ORDER BY, LIMIT, WHERE, GROUP BY, and HAVING clauses. DISTINCT, AS keywords.
- Built-in SQL functions such as COUNT, MAX, MIN, AVG, ROUND, DATEDIFF, or DATE_FORMAT.
- JOIN to combine data from multiple tables.
- Subqueries, Temporary Tables, Views, CTEs.
- Window Functions: RANK() OVER with PARTITION BY, LAG().

<br>
<hr>

</details>


## Introduction

Welcome to the Window Functions lab!

In this lab, you will be working with the [Sakila](https://dev.mysql.com/doc/sakila/en/) database on movie rentals. The goal of this lab is to help you practice and gain proficiency in using window functions in SQL queries.

Window functions are a powerful tool for performing complex data analysis in SQL. They allow you to perform calculations across multiple rows of a result set, without the need for subqueries or self-joins. This can greatly simplify your SQL code and make it easier to understand and maintain.

By the end of this lab, you will have a better understanding of how to use window functions in SQL to perform complex data analysis, assign rankings, and retrieve previous row values. These skills will be useful in a variety of real-world scenarios, such as sales analysis, financial reporting, and trend analysis.

## Challenge 1

This challenge consists of three exercises that will test your ability to use the SQL RANK() function. You will use it to rank films by their length, their length within the rating category, and by the actor or actress who has acted in the greatest number of films.

1. Rank films by their length and create an output table that includes the title, length, and rank columns only. Filter out any rows with null or zero values in the length column.

2. Rank films by length within the rating category and create an output table that includes the title, length, rating and rank columns only. Filter out any rows with null or zero values in the length column.

3. Produce a list that shows for each film in the Sakila database, the actor or actress who has acted in the greatest number of films, as well as the total number of films in which they have acted. *Hint: Use temporary tables, CTEs, or Views when appropiate to simplify your queries.*

## Challenge 2

This challenge involves analyzing customer activity and retention in the Sakila database to gain insight into business performance.
By analyzing customer behavior over time, businesses can identify trends and make data-driven decisions to improve customer retention and increase revenue.

The goal of this exercise is to perform a comprehensive analysis of customer activity and retention by conducting an analysis on the monthly percentage change in the number of active customers and the number of retained customers. Use the Sakila database and progressively build queries to achieve the desired outcome.

- Step 1. Retrieve the number of monthly active customers, i.e., the number of unique customers who rented a movie in each month.
- Step 2. Retrieve the number of active users in the previous month.
- Step 3. Calculate the percentage change in the number of active customers between the current and previous month.
- Step 4. Calculate the number of retained customers every month, i.e., customers who rented movies in the current and previous months.

*Hint: Use temporary tables, CTEs, or Views when appropiate to simplify your queries.*

## Requirements

- Fork this repo
- Clone it to your machine


## Getting Started

Complete the challenge in this readme in a `.sql` file.

## Submission

- Upon completion, run the following commands:

```bash
git add .
git commit -m "Solved lab"
git push origin master
```

- Paste the link of your lab in Student Portal.



148 changes: 148 additions & 0 deletions win-sql-lab.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
USE sakila;
-- 1
SELECT
title,
length,
RANK() OVER (ORDER BY length DESC) AS 'rank'
FROM
film
WHERE
length IS NOT NULL
AND length > 0;
-- 2
SELECT
title,
length,
rating,
RANK() OVER (PARTITION BY rating ORDER BY length DESC) AS 'rank'
FROM
film
WHERE
length IS NOT NULL
AND length > 0;
-- 3
WITH ActorFilmCount AS (
SELECT
a.actor_id,
a.first_name,
a.last_name,
COUNT(fa.film_id) AS film_count
FROM
actor a
JOIN
film_actor fa ON a.actor_id = fa.actor_id
GROUP BY
a.actor_id, a.first_name, a.last_name
),
MaxActorFilmCount AS (
SELECT
afc.actor_id,
afc.first_name,
afc.last_name,
afc.film_count,
RANK() OVER (ORDER BY afc.film_count DESC) AS 'rank'
FROM
ActorFilmCount afc
)
SELECT
maf.first_name,
maf.last_name,
maf.film_count
FROM
MaxActorFilmCount maf
WHERE
maf.rank = 1;
-- 4
SELECT
DATE_FORMAT(rental_date, '%Y-%m') AS month,
COUNT(DISTINCT customer_id) AS active_customers
FROM
rental
GROUP BY
month
ORDER BY
month;
-- 5
WITH MonthlyActiveCustomers AS (
SELECT
DATE_FORMAT(rental_date, '%Y-%m') AS month,
COUNT(DISTINCT customer_id) AS active_customers
FROM
rental
GROUP BY
month
)

SELECT
mac1.month,
mac1.active_customers,
mac2.active_customers AS prev_month_active_customers
FROM
MonthlyActiveCustomers mac1
LEFT JOIN
MonthlyActiveCustomers mac2
ON
mac2.month = DATE_FORMAT(DATE_SUB(STR_TO_DATE(mac1.month, '%Y-%m'), INTERVAL 1 MONTH), '%Y-%m')
ORDER BY
mac1.month;
-- 6
WITH MonthlyActiveCustomers AS (
SELECT
DATE_FORMAT(rental_date, '%Y-%m') AS month,
COUNT(DISTINCT customer_id) AS active_customers
FROM
rental
GROUP BY
month
)

SELECT
mac1.month,
mac1.active_customers,
mac2.active_customers AS prev_month_active_customers,
((mac1.active_customers - mac2.active_customers) / mac2.active_customers) * 100 AS percentage_change
FROM
MonthlyActiveCustomers mac1
LEFT JOIN
MonthlyActiveCustomers mac2
ON
mac2.month = DATE_FORMAT(DATE_SUB(STR_TO_DATE(mac1.month, '%Y-%m'), INTERVAL 1 MONTH), '%Y-%m')
ORDER BY
mac1.month;
-- 7
WITH CustomerMonthlyActivity AS (
SELECT
customer_id,
DATE_FORMAT(rental_date, '%Y-%m') AS month
FROM
rental
GROUP BY
customer_id, month
),

RetainedCustomers AS (
SELECT
curr.month AS current_month,
prev.month AS previous_month,
COUNT(DISTINCT curr.customer_id) AS retained_customers
FROM
CustomerMonthlyActivity curr
JOIN
CustomerMonthlyActivity prev
ON
curr.customer_id = prev.customer_id
AND prev.month = DATE_FORMAT(DATE_SUB(STR_TO_DATE(curr.month, '%Y-%m'), INTERVAL 1 MONTH), '%Y-%m')
GROUP BY
curr.month, prev.month
)

SELECT
current_month,
retained_customers
FROM
RetainedCustomers
ORDER BY
current_month;