coderefinery
diff --git a/‎.DS_Store
0 Bytes b/‎.DS_Store
0 Bytes
diff --git a/‎content/10.DataFrame_Manipulation.md
+1-4 b/‎content/10.DataFrame_Manipulation.md
+1-4
diff --git a/‎content/11.Pandas_indexing_slicing.md
+2 b/‎content/11.Pandas_indexing_slicing.md
+2
diff --git a/‎content/14.Pandas_summary_stats.md
+2 b/‎content/14.Pandas_summary_stats.md
+2
diff --git a/‎content/4.Advance_indexing_filtering.md
+31 b/‎content/4.Advance_indexing_filtering.md
+31
diff --git a/‎content/5.Essential_array_operations.md
+7 b/‎content/5.Essential_array_operations.md
+7
diff --git a/‎content/6.Vectorized_Operations_in_NumPy.md
+73-16 b/‎content/6.Vectorized_Operations_in_NumPy.md
+73-16
diff --git a/‎content/8.Introduction_to_pandas.md
+2 b/‎content/8.Introduction_to_pandas.md
+2
diff --git a/‎content/image-12.png
93.5 KB b/‎content/image-12.png
93.5 KB
diff --git a/‎content/image-13.png
11.6 KB b/‎content/image-13.png
11.6 KB
diff --git a/‎content/image-14.png
36.3 KB b/‎content/image-14.png
36.3 KB
diff --git a/‎content/image-15.png
106 KB b/‎content/image-15.png
106 KB
diff --git a/‎content/image-16.png
1.58 MB b/‎content/image-16.png
1.58 MB
diff --git a/‎content/image-17.png
1.51 MB b/‎content/image-17.png
1.51 MB
@@ -588,7 +588,6 @@ print(inventory_final)
 
 :::
 
-
 ## DataFrame Sorting
 
 ### Sorting functions
@@ -600,7 +599,7 @@ print(inventory_final)
 | Multi-column sorting | Sort by multiple columns | `by=['col1', 'col2']` |
 | Custom sorting | Sort with custom orders | `by=col, key=function` |
 
-### Basic Sorting*
+### Basic Sorting
 
 * Sorting by a single column - ascending and descending
 * Sorting by index
@@ -867,9 +866,7 @@ print("\n3. Genes according to P-value and Effect Size within each Chromosome:")
 print(genetic_df.sort_values(by=["PValue","EffectSize", "Chromosome_Sorted"], ascending=[True, False, True])["Gene"])
 ```
 
-:::{discussion}
 The most significant findings are in the DNA repair genes BRCA1 and BRCA2 (p-values 0.0001 and 0.0005), with BRCA1 carriers showing 2.5 times higher odds and BRCA2 carriers showing 2.7 times higher odds of developing breast cancer. Other genes showing significant but less pronounced associations include inflammatory pathway genes (TNF) and key tumor suppressor genes (TP53, PTEN), highlighting the multifactorial genetic architecture of breast cancer susceptibility that spans DNA repair, cell cycle regulation, and inflammatory response pathways
-:::
 
 :::
 
 
@@ -496,6 +496,8 @@ print(science_art_17)
 3. **Boolean Filtering:** Powerful way to extract data matching specific conditions
 :::
 
+## Homework
+
 :::{homework}
 
 ### `.loc`, .`iloc`, and `.at` Selection Methods
 
@@ -591,6 +591,8 @@ print(class_stats)
 5. **Reshaping Results:** Tools like unstack() and pivot_table() help transform grouped results into useful formats
 :::
 
+## Homework
+
 :::{homework}
 
 ### GroupBy Operations and the Split-Apply-Combine Pattern (7 minutes)
 
@@ -386,7 +386,18 @@ print_array_info(iweak_index)
 istrong_index = np.where(data[:, 1] == 'istrong')[0]
 print(istrong_index)
 print_array_info(istrong_index)
+```
+
+:::
 
+:::{solution} Visual representation
+**Visual representation - extracting `iweak` and `istong` indices:**
+![alt text](image-12.png)
+:::
+
+:::{exercise} Hands-on
+
+```python
 # Load count matrix
 count_matrix = np.genfromtxt("test_data/count_matrix.csv", delimiter=',', dtype='str')
 
@@ -398,6 +409,19 @@ print("___")
 cm_iweak_mask = np.isin(count_matrix[0, :], data[iweak_index, 0])
 print(cm_iweak_mask[:30])
 
+```
+
+:::
+
+:::{solution} Visual representation
+
+**Visual representation - masking `iweak` in header:**
+![alt text](image-13.png)
+:::
+
+:::{exercise} Hands-on
+
+```python
 # Find the indices of the columns in the count matrix where the sample group is 'iweak'
 cm_weak_cols = np.where(cm_iweak_mask)[0]
 print(cm_weak_cols)
@@ -411,3 +435,10 @@ print_array_info(cm_strong_cols)
 ```
 
 :::
+
+:::{solution} Visual representation
+
+**Visual representation - extracting indices of `iweak` and `istring` columns in count_matrix:**
+![alt text](image-14.png)
+
+:::
@@ -458,3 +458,10 @@ istrong_std = cm[:,cm_strong_cols].std(1)   ## STD of istrong samples
 ```
 
 :::
+
+:::{solution} Visual representation
+
+**Visual representation - Calculating mean and STD of each gene in `iweak` group:**
+![alt text](image-15.png)
+
+:::
@@ -691,47 +691,104 @@ iweak_std = cm[:, cm_weak_cols].std(1)      ## STD of iweak samples
 print(cm.shape)
 print("--------")
 print(iweak_mean[:5], iweak_mean.shape)
-print("--------")
-print(iweak_mean[:5, np.newaxis], iweak_mean[:, np.newaxis].shape)
 
 # Calculate mean and STD of each gene in istrong samples
 istrong_mean = cm[:,cm_strong_cols].mean(1) ## Mean of istrong disease samples
 istrong_std = cm[:,cm_strong_cols].std(1)   ## STD of istrong samples
 
+```
+
+**Z-scores:**
+
+* Gene expression measurements (counts) can have vastly different scales across different samples due to technical variations
+* The Z-score transformation standardizes these measurements
+
+```{math}
+    Z_{G} = \frac{(Count_G - \mu_{Count_{group}})}{\sigma_{Count_{group}}}
+```
+
+$$ Z_{G} : Z-score\ for\ a\ gene\ G$$
+$$ Count_G: Log10\ count\ of\ gene\ G\ in\ a\ given\ sample$$
+$$ \mu_{Count_{group}}: The\ overall\ average\ across\ all\ samples\ in\ the\ given\ group\ for\ each\ gene$$
+$$ \sigma_{Count_{group}}: Standard\ deviation\ all\ samples\ in\ the\ given\ group\ for\ each\ gene$$
+
+**Z-ratio = Z-score difference (per gene):**
+
+* The Z-ratio provides a standardized measure of the difference between conditions for each gene
+* This accounts for the overall variability in the experiment
+* A gene showing a difference of, say, 0.5 in average Z-score
+  * might be highly significant if most genes show very little difference (small Z-score difference - SD),
+  * but not significant if many genes show large differences (large Z-score difference - SD)
+* It puts the individual gene's change in the context of the overall experimental variation
+
+```{math}
+Z.score_{Diff_{gene}} = \bar{Z}_{Gene, istring} - \bar{Z}_{Gene, iweak} \\
+
+Z_{Ratio, Gene} = \frac{Z.score_{Diff_{gene}}}{SD_{Z.score_{Diff_{gene}}}}
+```
+
+```python
 # Calculate Z-scores of each gene in iweak samples (vectorized)
-cm_iweak_z = (cm[:, cm_weak_cols] - iweak_mean[:, np.newaxis]) / iweak_std[:, np.newaxis]
+cm_iweak_z = (cm[:, cm_weak_cols] - iweak_mean.reshape(-1, 1)) / iweak_std.reshape(-1, 1)
 print_array_info(cm_iweak_z)
 
 # Calculate stats for each gene in istrong samples
 istrong_mean = cm[:,cm_strong_cols].mean(1)
 istrong_std = cm[:,cm_strong_cols].std(1)
 
 # Calculate Z-scores of each gene in istrong samples (vectorized)
-cm_istrong_z = (cm[:, cm_strong_cols] - istrong_mean[:, np.newaxis]) / istrong_std[:, np.newaxis]
+cm_istrong_z = (cm[:, cm_strong_cols] - istrong_mean.reshape(-1, 1)) / istrong_std.reshape(-1, 1)
 print_array_info(cm_istrong_z)
 
-# Calculate Z-Ratio
-## Calculate difference between the averages of the observed gene Z scores of the two groups
-## Divide by the SD of all of the differences for that particular comparison
+### """ 
+# Calculate Z-Ratio differences between two groups
+# Calculate
+#   difference between the averages of the observed gene Z scores of the two groups
+#   SD of Z-Ratio difference
+### """
+
 diff_z_scores = cm_istrong_z.mean(1) - cm_iweak_z.mean(1)
 std_diff = diff_z_scores.std()
 
 ### z-score ratio for each gene
+## Divide Z-Ratio differences by the Z-Ratio differences SD
 z_score_ratios = diff_z_scores / std_diff
 print_array_info(z_score_ratios)
 print(z_score_ratios[:10])
+```
 
-## Rank genes according to the Z score ratio
+:::
 
-### """ 
-# The Z-ratio provides a standardized measure of the difference between conditions for each gene. 
-# Dividing by the SD (difference - all genes) accounts for the overall variability in the experiment.
-# A gene showing a difference of, say, 0.5 in average Z-score might be highly significant if most genes show very little difference (small Z-score difference - SD), but not significant if many genes show large differences (large Z-score difference - SD).
-# It puts the individual gene's change in the context of the overall experimental variation.
-### """
+:::{solution} Visual representation
+
+```{math}
+Z_{G} = \frac{(Count_G - \mu_{Count_{group}})}{\sigma_{Count_{group}}}
+```
+
+**Visual representation - converting count matrix to z-score matrix:**
+![alt text](image-16.png)
+
+```{math}
+
+Z.score_{Diff_{gene}} = \bar{Z}_{Gene, istring} - \bar{Z}_{Gene, iweak} \\
+
+Z_{Ratio, Gene} = \frac{Z.score_{Diff_{gene}}}{SD_{Z.score_{Diff_{gene}}}}
+```
 
-### Sort z_score_ratio in descending order and access indices
-### Rank genes using indices
+**Visual representation - Calculate z-ratio:**
+
+![alt text](image-17.png)
+
+:::
+
+:::{exercise} Hands-on
+
+**Rank genes according to the Z score ratio:**
+
+* Sort `z_score_ratio` in descending order and access indices
+* Rank genes using indices
+
+```python
 
 gene_list = ["ACTR3B", "ANLN", "APOBEC3G", "AURKA", "BAG1", "BCL2", "BIRC5", "BLVRA", "CCL5", "CCNB1", "CCNE1", "CCR2", "CD2", "CD27", "CD3D", "CD52", "CD68", "CDC20", "CDC6", "CDH3", "CENPF", "CEP55", "CORO1A", "CTSL2", "CXCL9", "CXXC5", "EGFR", "ERBB2", "ESR1", "EXO1", "FGFR4", "FOXA1", "FOXC1", "GAPDH", "GPR160", "GRB7", "GSTM1", "GUSB", "GZMA", "GZMK", "HLA-DMA", "IL2RG", "KIF2C", "KRT14", "KRT17", "KRT5", "LCK", "MAPT", "MDM2", "MELK", "MIA", "MKI67", "MLPH", "MMP11", "MRPL19", "MYBL2", "MYC", "NAT1", "NDC80", "NUF2", "ORC6", "PGR", "PHGDH", "PRKCB", "PSMC4", "PTPRC", "PTTG1", "RRM2", "SCUBE2", "SF3A1", "SFRP1", "SH2D1A", "SLC39A6", "TFRC", "TMEM45B", "TP53", "TYMS", "UBE2C", "UBE2T", "VEGFA"]
 
 
@@ -435,6 +435,8 @@ print(f"Series type: {type(q2_series)}")
 
 :::
 
+## Homework
+
 :::{homework}
 
 **Creating pandas DataFrames from list of dictionaries:**