coderefinery
diff --git a/‎content/10.DataFrame_Manipulation.md
+223-50 b/‎content/10.DataFrame_Manipulation.md
+223-50
@@ -1,24 +1,25 @@
-# DataFrame Manipulation & Sorting (25 mins)
-## Lecture Materials with Exercises
+# DataFrame Manipulation & Sorting
 
-### Introduction (2 minutes)
+## Introduction
 
-**Slide 1: DataFrame Manipulation & Sorting**
-- Now that we can import data, we need to reshape it for analysis
-- Most real-world datasets need significant manipulation before analysis
-- Selecting, adding, removing, and reordering data are fundamental skills
-- These operations build directly on our understanding of DataFrames as labeled, 2D structures
+**DataFrame Manipulation & Sorting:**
 
-**Talking Points:**
-- "In real-world data analysis, you'll spend about 80% of your time cleaning and manipulating data, and only 20% on actual analysis."
-- "The skills we're covering today form the backbone of data wrangling in Python."
-- "Think of these operations as transforming raw data into analysis-ready information."
+* Now that we can import data, we need to reshape it for analysis
+* Most real-world datasets need significant manipulation before analysis
+* Selecting, adding, removing, and reordering data are fundamental skills
+* These operations build directly on our understanding of DataFrames as labeled, 2D structures
 
----
+:::{discussion}
+
+* In real-world data analysis, you'll spend about 80% of your time cleaning and manipulating data, and only 20% on actual analysis
+* The skills we're covering today form the backbone of data wrangling in Python
+* Think of these operations as transforming raw data into analysis-ready information
 
-### 1. Column and Row Selection (7 minutes)
+:::
 
-**Slide 2: Different Ways to Select Data**
+## Column and Row Selection
+
+**Different Ways to Select Data:**
 
 | Selection Type | Purpose | Example Syntax |
 |----------------|---------|----------------|
@@ -29,7 +30,10 @@
 | Row and column | Get specific value(s) | `df.loc['index', 'column']` |
 | Slicing | Get ranges of data | `df.loc['idx1':'idx2', 'col1':'col2']` |
 
-**Code Example 1: Basic Selection**
+### Basic Selection
+
+:::{demo}
+
 ```python
 import pandas as pd
 import numpy as np
@@ -79,7 +83,79 @@ print("\n8. Selecting subset of rows and columns:")
 print(df.loc['emp001':'emp003', ['Name', 'Age', 'Salary']])
 ```
 
-**Code Example 2: Advanced Selection with Conditions**
+Output
+
+```none
+Original DataFrame:
+           Name  Age      City  Salary Department
+emp001    Alice   24  New York   65000         HR
+emp002      Bob   30    Boston   72000      Sales
+emp003  Charlie   35   Chicago   85000       Tech
+emp004    David   42   Seattle   92000       Tech
+emp005      Eva   28     Miami   70000    Finance
+
+1. Single column as Series:
+emp001    24
+emp002    30
+emp003    35
+emp004    42
+emp005    28
+Name: Age, dtype: int64
+
+2. Alternative syntax for columns without spaces:
+emp001    24
+emp002    30
+emp003    35
+emp004    42
+emp005    28
+Name: Age, dtype: int64
+
+3. Selecting multiple columns:
+           Name  Salary
+emp001    Alice   65000
+emp002      Bob   72000
+emp003  Charlie   85000
+emp004    David   92000
+emp005      Eva   70000
+
+4. Selecting row by index label:
+Name          Charlie
+Age                35
+City          Chicago
+Salary          85000
+Department       Tech
+Name: emp003, dtype: object
+
+5. Selecting row by position (third row):
+Name          Charlie
+Age                35
+City          Chicago
+Salary          85000
+Department       Tech
+Name: emp003, dtype: object
+
+6. Selecting multiple rows by position:
+           Name  Age     City  Salary Department
+emp002      Bob   30   Boston   72000      Sales
+emp003  Charlie   35  Chicago   85000       Tech
+emp004    David   42  Seattle   92000       Tech
+
+7. Selecting specific value (cell):
+72000
+
+8. Selecting subset of rows and columns:
+           Name  Age  Salary
+emp001    Alice   24   65000
+emp002      Bob   30   72000
+emp003  Charlie   35   85000
+```
+
+:::
+
+### Advanced Selection with Conditions
+
+:::{demo}
+
 ```python
 # Boolean selection - rows where Age > 30
 print("\n9. Boolean selection - employees over 30:")
@@ -98,15 +174,52 @@ print("\n12. Using .isin() - employees in HR or Finance:")
 print(df[df['Department'].isin(['HR', 'Finance'])])
 ```
 
-**Talking Points:**
-- "Notice that selecting a single column returns a Series, while selecting multiple columns maintains the DataFrame structure."
-- "The `.loc` accessor is used for label-based indexing, while `.iloc` is for position-based indexing."
-- "Boolean selection is incredibly powerful - it lets you filter data based on specific conditions."
-- "These selection methods can be combined in powerful ways to extract exactly the data you need."
+Output
 
-**Exercise 1: Selection Practice (2 minutes)**
+```none
+9. Boolean selection - employees over 30:
+           Name  Age     City  Salary Department
+emp003  Charlie   35  Chicago   85000       Tech
+emp004    David   42  Seattle   92000       Tech
+
+10. Multiple conditions - Tech department with salary > 80000:
+           Name  Age     City  Salary Department
+emp003  Charlie   35  Chicago   85000       Tech
+emp004    David   42  Seattle   92000       Tech
+
+11. Using query method - same condition:
+           Name  Age     City  Salary Department
+emp003  Charlie   35  Chicago   85000       Tech
+emp004    David   42  Seattle   92000       Tech
+
+12. Using .isin() - employees in HR or Finance:
+         Name  Age      City  Salary Department
+emp001  Alice   24  New York   65000         HR
+emp005    Eva   28     Miami   70000    Finance
+```
+
+:::
+
+:::{discussion}
+
+* Notice that selecting a single column returns a Series, while selecting multiple columns maintains the DataFrame structure
+* The `.loc` accessor is used for label-based indexing, while `.iloc` is for position-based indexing
+* Boolean selection is incredibly powerful - it lets you filter data based on specific conditions
+* These selection methods can be combined in powerful ways to extract exactly the data you need
+
+:::
+
+:::{exercise}
+
+**Selection Practice:**
+
+Use `inventory` dataframe and
+
+* Select just the Product_Name and Price columns
+* Select all products that are in stock (In_Stock is True)
+* Select all electronics that cost less than 500
+* Select the 2nd and 3rd products using position-based indexing 
 
-Have students execute:
 ```python
 # Create a dataset of product inventory
 products = {
@@ -119,7 +232,13 @@ products = {
     'Units': [15, 28, 0, 10, 45, 0]
 }
 inventory = pd.DataFrame(products)
+```
+
+:::
 
+:::{solution}
+
+```python
 # Tasks:
 # 1. Select just the Product_Name and Price columns
 names_prices = inventory[['Product_Name', 'Price']]
@@ -143,13 +262,13 @@ print("\nSecond and third products:")
 print(second_third)
 ```
 
-**Expected Learning Outcome:** Students should understand the different ways to select data from a DataFrame, including column selection, row selection by label and position, boolean filtering, and combinations of these methods.
+Understand the different ways to select data from a DataFrame, including column selection, row selection by label and position, boolean filtering, and combinations of these methods
 
----
+:::
 
-### 2. Adding and Removing Columns/Rows (6 minutes)
+## Adding and Removing Columns/Rows
 
-**Slide 3: Modifying DataFrame Structure**
+**Modifying DataFrame Structure:**
 
 | Operation | Method | Example |
 |-----------|--------|---------|
@@ -161,8 +280,25 @@ print(second_third)
 | Add row | Using loc | `df.loc['new_index'] = values` |
 | Add row | Using append/concat | `pd.concat([df, new_row])` |
 
-**Code Example 3: Adding and Removing Columns**
+### Adding and Removing Columns
+
+:::{done}
+
 ```python
+
+# Create a sample dataset
+data = {
+    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
+    'Age': [24, 30, 35, 42, 28],
+    'City': ['New York', 'Boston', 'Chicago', 'Seattle', 'Miami'],
+    'Salary': [65000, 72000, 85000, 92000, 70000],
+    'Department': ['HR', 'Sales', 'Tech', 'Tech', 'Finance']
+}
+df = pd.DataFrame(data)
+df.index = ['emp001', 'emp002', 'emp003', 'emp004', 'emp005']  # Custom index
+print("Original DataFrame:")
+print(df)
+
 # Continuing with our employee DataFrame
 print("Original DataFrame:")
 print(df)
@@ -201,7 +337,12 @@ print(df_minimal)
 # df.drop(['City', 'Active'], axis=1, inplace=True)
 ```
 
-**Code Example 4: Adding and Removing Rows**
+:::
+
+### Adding and Removing Rows
+
+:::{demo}
+
 ```python
 # 1. Removing a row by index label
 df_no_bob = df.drop('emp002')
@@ -232,16 +373,58 @@ print("\n4. DataFrame with another new employee:")
 print(df_newer)
 ```
 
-**Talking Points:**
-- "Adding columns is a common operation, especially when you need to create derived fields or features."
-- "Notice that we can add columns based on calculations from other columns - this is ideal for metrics and KPIs."
-- "The `drop()` function is powerful but doesn't modify the original DataFrame unless you specify `inplace=True`."
-- "Adding rows is less common but useful for simulation, testing, or creating summary rows."
-- "Always be careful with the `axis` parameter - `axis=0` is for rows, `axis=1` is for columns."
+:::
 
-**Exercise 2: Adding and Removing Data (2 minutes)**
+:::{discussion}
+
+* Adding columns is a common operation, especially when you need to create derived fields or features
+* Notice that we can add columns based on calculations from other columns - this is ideal for metrics and KPIs
+* The `drop()` function is powerful but doesn't modify the original DataFrame unless you specify `inplace=True`
+* Adding rows is less common but useful for simulation, testing, or creating summary rows
+* Always be careful with the `axis` parameter - `axis=0` is for rows, `axis=1` is for columns
+
+:::
+
+:::{exercise}
+
+**Adding and Removing Data:**
+
+```python
+products = {
+    'Product_ID': ['P001', 'P002', 'P003', 'P004', 'P005', 'P006'],
+    'Product_Name': ['Laptop', 'Smartphone', 'Tablet', 'Monitor', 'Keyboard', 'Mouse'],
+    'Category': ['Electronics', 'Electronics', 'Electronics', 
+                 'Electronics', 'Accessories', 'Accessories'],
+    'Price': [1200, 800, 350, 250, 75, 25],
+    'In_Stock': [True, True, False, True, True, False],
+    'Units': [15, 28, 0, 10, 45, 0]
+}
+inventory = pd.DataFrame(products)
+```
+
+Use inventory dataframe and
+
+* Add a 'Value' column that multiplies Price by Units
+* Add a 'Status' column: 'Available' if In_Stock is True, 'Out of Stock' otherwise
+* Remove the In_Stock column (now redundant with Status)
+* Add a `new_product` row
+
+```python
+new_product = pd.Series({
+    'Product_ID': 'P007',
+    'Product_Name': 'Headphones',
+    'Category': 'Accessories',
+    'Price': 150,
+    'Units': 20,
+    'Value': 3000,
+    'Status': 'Available'
+})
+```
+
+:::
+
+:::{solution}
 
-Have students execute:
 ```python
 # Continue with the inventory DataFrame from Exercise 1
 print("Original inventory:")
@@ -263,25 +446,15 @@ print("\nInventory without In_Stock column:")
 print(inventory_updated)
 
 # 4. Add a new product row
-new_product = pd.Series({
-    'Product_ID': 'P007',
-    'Product_Name': 'Headphones',
-    'Category': 'Accessories',
-    'Price': 150,
-    'Units': 20,
-    'Value': 3000,
-    'Status': 'Available'
-})
 inventory_final = pd.concat([inventory_updated, pd.DataFrame([new_product])])
 print("\nInventory with new product:")
 print(inventory_final)
 ```
 
-**Expected Learning Outcome:** Students should understand how to add and remove columns and rows from a DataFrame using different methods, and how to create calculated columns based on existing data.
+:::
 
----
 
-### 3. DataFrame Sorting (5 minutes)
+### DataFrame Sorting (5 minutes)
 
 **Slide 4: Sorting Data**