coderefinery
diff --git a/‎content/2.NumPy_Data_Types.md
+54-1 b/‎content/2.NumPy_Data_Types.md
+54-1
diff --git a/‎content/3.Indexing_and_Slicing.md
+21-18 b/‎content/3.Indexing_and_Slicing.md
+21-18
diff --git a/‎content/4.Advance_indexing_filtering.md
+74-9 b/‎content/4.Advance_indexing_filtering.md
+74-9
diff --git a/‎content/5.Essential_array_operations.md
+97 b/‎content/5.Essential_array_operations.md
+97
@@ -134,7 +134,7 @@ This homogeneity enables:
 
 For bioinformatics applications, this homogeneity helps ensure consistency when processing large datasets of gene expression values, sequence reads, or alignment scores.
 
-## Key NumPy Data Types for Bioinformatics
+## Key NumPy Data Types
 
 ### Integer Types
 
@@ -329,6 +329,8 @@ Different bioinformatics tools may expect specific data types:
 
 Being aware of these requirements helps create more robust analysis pipelines.
 
+## Key Takeaways
+
 :::{Keypoints}
 
 NumPy's specialized data types provide significant advantages for bioinformatics applications:
@@ -347,3 +349,54 @@ By choosing appropriate data types, bioinformaticians can:
 
 Understanding the distinctions between Python's general-purpose types and NumPy's specialized numeric types is essential for effective scientific programming in bioinformatics.
 :::
+
+## Hands-on
+
+:::{exercise} Hands-on
+
+```python
+# What is NumPy and why it's important for bioinformatics
+# Performance advantages over Python lists
+# Foundation for other scientific libraries
+
+import numpy as np
+
+# Read the CSV file into a numpy array
+## CSV file contains sample group information
+data = np.genfromtxt("test_data/Sample_group_info.csv", delimiter=',', dtype='str')
+
+# Print the numpy array information
+
+def print_array_info(array):
+    # Get the shape of the array
+    shape = array.shape
+
+    # Get the number of dimensions of the array
+    ndim = array.ndim
+
+    # Get the data type of the array
+    dtype = array.dtype
+
+    # Get the number of elements in the array
+    size = array.size
+
+    print(f"Shape: {shape} \nNumber of dimensions: {ndim} \nData type: {dtype} \nSize: {size}")
+
+
+print_array_info(data)
+
+# Read the CSV file into a numpy array with string dtype
+## CSV file contains RNA count matrix
+count_matrix = np.genfromtxt("test_data/count_matrix.csv", delimiter=',',
+                     dtype='str')
+print_array_info(count_matrix)
+
+# Remove sample names from the count matrix (cm) - Delete the first row
+## Convert the cm to a float32 array
+print(count_matrix[0:5, 0:5])
+print("___")
+cm = np.delete(count_matrix, 0, axis=0).astype("float32")
+print(cm[0:5, 0:5])
+```
+
+:::
@@ -144,7 +144,7 @@ print(gene_expr[::2, :2])
 
 :::
 
-**Real-world significance in bioinformatics:**
+## Real-world significance in bioinformatics
 
 * Indexing:
   * Retrieving expression value for a specific gene in a specific condition
@@ -159,23 +159,6 @@ print(gene_expr[::2, :2])
   * Analyzing specific regions in protein contact maps
   * Extracting protein domains from structure coordinate arrays
 
-:::{Keypoints}
-
-* Efficient indexing and slicing are crucial for bioinformatics workflows
-* Key takeaways:
-  * Indexing for accessing individual elements
-  * Slicing for extracting regions of interest
-  * Leverage both for efficient data manipulation in matrices (gene × condition, position × sequence, etc.)
-  * Combine with boolean operations for filtering
-  * Remember zero-based indexing
-* Common pitfalls:
-  * Off-by-one errors (especially when converting between biology's 1-based and programming's 0-based systems)
-  * Overlooking the exclusive upper bound in slicing (end index is not included)
-  * Forgetting that modifying slices can modify the original array (use .copy() when needed)
-  * Confusing row-major vs. column-major operations
-
-:::
-
 ## Exercises - Array Indexing and Slicing Exercises
 
 :::{exercise}
@@ -373,3 +356,23 @@ print("Condition 2 values < 20:", gene_expr[:, 1][condition2_low])
 ```
 
 :::
+
+## Key Takeaways
+
+:::{Keypoints}
+
+* Efficient indexing and slicing are crucial for bioinformatics workflows
+* Key takeaways:
+  * Indexing for accessing individual elements
+  * Slicing for extracting regions of interest
+  * Leverage both for efficient data manipulation in matrices (gene × condition, position × sequence, etc.)
+  * Combine with boolean operations for filtering
+  * Remember zero-based indexing
+* Common pitfalls:
+  * Off-by-one errors (especially when converting between biology's 1-based and programming's 0-based systems)
+  * Overlooking the exclusive upper bound in slicing (end index is not included)
+  * Forgetting that modifying slices can modify the original array (use .copy() when needed)
+  * Confusing row-major vs. column-major operations
+
+:::
+
@@ -215,15 +215,6 @@ Boolean masking and `np.where()` operations are highly optimized in NumPy. They:
 
 For large datasets, these techniques are drastically faster than traditional iteration.
 
-:::{Keypoints}
-
-* Boolean masking provides an intuitive way to filter arrays based on conditions
-* `np.where()` in its single-argument form finds indices where conditions are true
-* `np.where(condition, x, y)` acts as a vectorized if-else statement
-* `np.isin()` lets us filter based on membership in a set of values
-
-:::
-
 ## Exercises: NumPy Boolean Masking and Advanced Filtering
 
 :::{exercise}
@@ -346,3 +337,77 @@ print(f"A: {a_count}, T: {t_count}, G: {g_count}, C: {c_count}")
 ```
 
 :::
+
+## Key Takeaways
+
+:::{Keypoints}
+
+* Boolean masking provides an intuitive way to filter arrays based on conditions
+* `np.where()` in its single-argument form finds indices where conditions are true
+* `np.where(condition, x, y)` acts as a vectorized if-else statement
+* `np.isin()` lets us filter based on membership in a set of values
+
+:::
+
+## Hands-on
+
+:::{exercise} Hands-on
+
+```python
+
+import numpy as np
+
+# Read the CSV file into a numpy array
+data = np.genfromtxt("test_data/Sample_group_info.csv", delimiter=',', dtype='str')
+
+def print_array_info(array):
+    # Get the shape of the array
+    shape = array.shape
+    # Get the number of dimensions of the array
+    ndim = array.ndim
+    # Get the data type of the array
+    dtype = array.dtype
+    # Get the number of elements in the array
+    size = array.size
+    print(f"Shape: {shape} \nNumber of dimensions: {ndim} \nData type: {dtype} \nSize: {size}")
+
+# Access indices of the array where the second column is 'iweak'
+iweak_index = np.where(data[:, 1] == 'iweak')
+print(iweak_index)
+print_array_info(iweak_index[0])
+
+# Access indices of the array where the second column is 'iweak'
+## Assign the indices to a iweak_index (not the tuple returned by np.where)
+iweak_index = np.where(data[:, 1] == 'iweak')[0]
+print_array_info(iweak_index)
+
+# Access indices of the array where the second column is 'istrong'
+## Assign the indices to a istrong_index (not the tuple returned by np.where)
+istrong_index = np.where(data[:, 1] == 'istrong')[0]
+print(istrong_index)
+print_array_info(istrong_index)
+
+# Load count matrix
+count_matrix = np.genfromtxt("test_data/count_matrix.csv", delimiter=',', dtype='str')
+
+# View the first column of the count matrix where the sample group is 'iweak'
+print(count_matrix[0:5, 0:5])
+print("___")
+
+# Create a boolean mask to find if the columns in the count matrix where the sample group is 'iweak'
+cm_iweak_mask = np.isin(count_matrix[0, :], data[iweak_index, 0])
+print(cm_iweak_mask[:30])
+
+# Find the indices of the columns in the count matrix where the sample group is 'iweak'
+cm_weak_cols = np.where(cm_iweak_mask)[0]
+print(cm_weak_cols)
+print_array_info(cm_weak_cols)
+
+# Find the indices of the columns in the count matrix where the sample group is 'istrong'
+cm_strong_cols = np.where(np.isin(count_matrix[0, :], data[istrong_index, 0]))[0]
+print(cm_strong_cols)
+print_array_info(cm_strong_cols)
+
+```
+
+:::
@@ -361,3 +361,100 @@ print(f"Row sums: {row_sums}")  # [6 15]
 ```
 
 :::
+
+:::{Keypoints}
+
+* **Reshaping Arrays:** Maintain the total number of elements when reshaping; use -1 for automatic dimension calculation.
+* **Concatenation of Arrays:** Combine arrays while matching dimensions, except along the concatenation axis.
+* **Statistical Functions:** Utilize NumPy’s statistical functions for data analysis, operating across different axes.
+* **Error Handling:** Be aware of shape requirements for concatenation to avoid errors.
+:::
+
+## Hands-on
+
+:::{exercise} Hands-on
+
+```python
+
+import numpy as np
+
+# Read the CSV file into a numpy array
+data = np.genfromtxt("test_data/Sample_group_info.csv", delimiter=',', dtype='str')
+
+def print_array_info(array):
+    # Get the shape of the array
+    shape = array.shape
+    # Get the number of dimensions of the array
+    ndim = array.ndim
+    # Get the data type of the array
+    dtype = array.dtype
+    # Get the number of elements in the array
+    size = array.size
+    print(f"Shape: {shape} \nNumber of dimensions: {ndim} \nData type: {dtype} \nSize: {size}")
+
+# Access indices of the array where the second column is 'iweak'
+iweak_index = np.where(data[:, 1] == 'iweak')
+print(iweak_index)
+print_array_info(iweak_index[0])
+
+# Access indices of the array where the second column is 'iweak'
+## Assign the indices to a iweak_index (not the tuple returned by np.where)
+iweak_index = np.where(data[:, 1] == 'iweak')[0]
+print_array_info(iweak_index)
+
+# Access indices of the array where the second column is 'istrong'
+## Assign the indices to a istrong_index (not the tuple returned by np.where)
+istrong_index = np.where(data[:, 1] == 'istrong')[0]
+print(istrong_index)
+print_array_info(istrong_index)
+
+# Load count matrix
+count_matrix = np.genfromtxt("test_data/count_matrix.csv", delimiter=',', dtype='str')
+
+# View the first column of the count matrix where the sample group is 'iweak'
+print(count_matrix[0:5, 0:5])
+print("___")
+
+# Create a boolean mask to find if the columns in the count matrix where the sample group is 'iweak'
+cm_iweak_mask = np.isin(count_matrix[0, :], data[iweak_index, 0])
+print(cm_iweak_mask[:30])
+
+# Find the indices of the columns in the count matrix where the sample group is 'iweak'
+cm_weak_cols = np.where(cm_iweak_mask)[0]
+print(cm_weak_cols)
+print_array_info(cm_weak_cols)
+
+# Find the indices of the columns in the count matrix where the sample group is 'istrong'
+cm_strong_cols = np.where(np.isin(count_matrix[0, :], data[istrong_index, 0]))[0]
+print(cm_strong_cols)
+print_array_info(cm_strong_cols)
+
+# Remove sample names from the count matrix (cm) - Delete the first row
+## Convert the cm to a float32 array
+print(count_matrix[0:5, 0:5])
+print("___")
+cm = np.delete(count_matrix, 0, axis=0).astype("float32")
+print(cm[0:5, 0:5])
+
+# Convert cm to log scale
+cm = np.log2(cm + 1)
+print(cm)
+print_array_info(cm)
+
+# Calculate mean and STD of each gene in iweak samples
+iweak_mean = cm[:, cm_weak_cols].mean(1)    ## Mean of iweak samples
+iweak_std = cm[:, cm_weak_cols].std(1)      ## STD of iweak samples
+
+print(cm.shape)
+print("--------")
+print(iweak_mean[:5], iweak_mean.shape)
+print("--------")
+print(iweak_mean[:5, np.newaxis], iweak_mean[:, np.newaxis].shape)
+
+# Calculate mean and STD of each gene in istrong samples
+istrong_mean = cm[:,cm_strong_cols].mean(1) ## Mean of istrong disease samples
+istrong_std = cm[:,cm_strong_cols].std(1)   ## STD of istrong samples
+
+```
+
+:::