Skip to content

Commit 9eafb15

Browse files
author
pubudu
committed
Visual representation
1 parent 2982557 commit 9eafb15

14 files changed

+118
-20
lines changed

.DS_Store

0 Bytes
Binary file not shown.

content/10.DataFrame_Manipulation.md

+1-4
Original file line numberDiff line numberDiff line change
@@ -588,7 +588,6 @@ print(inventory_final)
588588

589589
:::
590590

591-
592591
## DataFrame Sorting
593592

594593
### Sorting functions
@@ -600,7 +599,7 @@ print(inventory_final)
600599
| Multi-column sorting | Sort by multiple columns | `by=['col1', 'col2']` |
601600
| Custom sorting | Sort with custom orders | `by=col, key=function` |
602601

603-
### Basic Sorting*
602+
### Basic Sorting
604603

605604
* Sorting by a single column - ascending and descending
606605
* Sorting by index
@@ -867,9 +866,7 @@ print("\n3. Genes according to P-value and Effect Size within each Chromosome:")
867866
print(genetic_df.sort_values(by=["PValue","EffectSize", "Chromosome_Sorted"], ascending=[True, False, True])["Gene"])
868867
```
869868

870-
:::{discussion}
871869
The most significant findings are in the DNA repair genes BRCA1 and BRCA2 (p-values 0.0001 and 0.0005), with BRCA1 carriers showing 2.5 times higher odds and BRCA2 carriers showing 2.7 times higher odds of developing breast cancer. Other genes showing significant but less pronounced associations include inflammatory pathway genes (TNF) and key tumor suppressor genes (TP53, PTEN), highlighting the multifactorial genetic architecture of breast cancer susceptibility that spans DNA repair, cell cycle regulation, and inflammatory response pathways
872-
:::
873870

874871
:::
875872

content/11.Pandas_indexing_slicing.md

+2
Original file line numberDiff line numberDiff line change
@@ -496,6 +496,8 @@ print(science_art_17)
496496
3. **Boolean Filtering:** Powerful way to extract data matching specific conditions
497497
:::
498498

499+
## Homework
500+
499501
:::{homework}
500502

501503
### `.loc`, .`iloc`, and `.at` Selection Methods

content/14.Pandas_summary_stats.md

+2
Original file line numberDiff line numberDiff line change
@@ -591,6 +591,8 @@ print(class_stats)
591591
5. **Reshaping Results:** Tools like unstack() and pivot_table() help transform grouped results into useful formats
592592
:::
593593

594+
## Homework
595+
594596
:::{homework}
595597

596598
### GroupBy Operations and the Split-Apply-Combine Pattern (7 minutes)

content/4.Advance_indexing_filtering.md

+31
Original file line numberDiff line numberDiff line change
@@ -386,7 +386,18 @@ print_array_info(iweak_index)
386386
istrong_index = np.where(data[:, 1] == 'istrong')[0]
387387
print(istrong_index)
388388
print_array_info(istrong_index)
389+
```
390+
391+
:::
389392

393+
:::{solution} Visual representation
394+
**Visual representation - extracting `iweak` and `istong` indices:**
395+
![alt text](image-12.png)
396+
:::
397+
398+
:::{exercise} Hands-on
399+
400+
```python
390401
# Load count matrix
391402
count_matrix = np.genfromtxt("test_data/count_matrix.csv", delimiter=',', dtype='str')
392403

@@ -398,6 +409,19 @@ print("___")
398409
cm_iweak_mask = np.isin(count_matrix[0, :], data[iweak_index, 0])
399410
print(cm_iweak_mask[:30])
400411

412+
```
413+
414+
:::
415+
416+
:::{solution} Visual representation
417+
418+
**Visual representation - masking `iweak` in header:**
419+
![alt text](image-13.png)
420+
:::
421+
422+
:::{exercise} Hands-on
423+
424+
```python
401425
# Find the indices of the columns in the count matrix where the sample group is 'iweak'
402426
cm_weak_cols = np.where(cm_iweak_mask)[0]
403427
print(cm_weak_cols)
@@ -411,3 +435,10 @@ print_array_info(cm_strong_cols)
411435
```
412436

413437
:::
438+
439+
:::{solution} Visual representation
440+
441+
**Visual representation - extracting indices of `iweak` and `istring` columns in count_matrix:**
442+
![alt text](image-14.png)
443+
444+
:::

content/5.Essential_array_operations.md

+7
Original file line numberDiff line numberDiff line change
@@ -458,3 +458,10 @@ istrong_std = cm[:,cm_strong_cols].std(1) ## STD of istrong samples
458458
```
459459

460460
:::
461+
462+
:::{solution} Visual representation
463+
464+
**Visual representation - Calculating mean and STD of each gene in `iweak` group:**
465+
![alt text](image-15.png)
466+
467+
:::

content/6.Vectorized_Operations_in_NumPy.md

+73-16
Original file line numberDiff line numberDiff line change
@@ -691,47 +691,104 @@ iweak_std = cm[:, cm_weak_cols].std(1) ## STD of iweak samples
691691
print(cm.shape)
692692
print("--------")
693693
print(iweak_mean[:5], iweak_mean.shape)
694-
print("--------")
695-
print(iweak_mean[:5, np.newaxis], iweak_mean[:, np.newaxis].shape)
696694

697695
# Calculate mean and STD of each gene in istrong samples
698696
istrong_mean = cm[:,cm_strong_cols].mean(1) ## Mean of istrong disease samples
699697
istrong_std = cm[:,cm_strong_cols].std(1) ## STD of istrong samples
700698

699+
```
700+
701+
**Z-scores:**
702+
703+
* Gene expression measurements (counts) can have vastly different scales across different samples due to technical variations
704+
* The Z-score transformation standardizes these measurements
705+
706+
```{math}
707+
Z_{G} = \frac{(Count_G - \mu_{Count_{group}})}{\sigma_{Count_{group}}}
708+
```
709+
710+
$$ Z_{G} : Z-score\ for\ a\ gene\ G$$
711+
$$ Count_G: Log10\ count\ of\ gene\ G\ in\ a\ given\ sample$$
712+
$$ \mu_{Count_{group}}: The\ overall\ average\ across\ all\ samples\ in\ the\ given\ group\ for\ each\ gene$$
713+
$$ \sigma_{Count_{group}}: Standard\ deviation\ all\ samples\ in\ the\ given\ group\ for\ each\ gene$$
714+
715+
**Z-ratio = Z-score difference (per gene):**
716+
717+
* The Z-ratio provides a standardized measure of the difference between conditions for each gene
718+
* This accounts for the overall variability in the experiment
719+
* A gene showing a difference of, say, 0.5 in average Z-score
720+
* might be highly significant if most genes show very little difference (small Z-score difference - SD),
721+
* but not significant if many genes show large differences (large Z-score difference - SD)
722+
* It puts the individual gene's change in the context of the overall experimental variation
723+
724+
```{math}
725+
Z.score_{Diff_{gene}} = \bar{Z}_{Gene, istring} - \bar{Z}_{Gene, iweak} \\
726+
727+
Z_{Ratio, Gene} = \frac{Z.score_{Diff_{gene}}}{SD_{Z.score_{Diff_{gene}}}}
728+
```
729+
730+
```python
701731
# Calculate Z-scores of each gene in iweak samples (vectorized)
702-
cm_iweak_z = (cm[:, cm_weak_cols] - iweak_mean[:, np.newaxis]) / iweak_std[:, np.newaxis]
732+
cm_iweak_z = (cm[:, cm_weak_cols] - iweak_mean.reshape(-1, 1)) / iweak_std.reshape(-1, 1)
703733
print_array_info(cm_iweak_z)
704734

705735
# Calculate stats for each gene in istrong samples
706736
istrong_mean = cm[:,cm_strong_cols].mean(1)
707737
istrong_std = cm[:,cm_strong_cols].std(1)
708738

709739
# Calculate Z-scores of each gene in istrong samples (vectorized)
710-
cm_istrong_z = (cm[:, cm_strong_cols] - istrong_mean[:, np.newaxis]) / istrong_std[:, np.newaxis]
740+
cm_istrong_z = (cm[:, cm_strong_cols] - istrong_mean.reshape(-1, 1)) / istrong_std.reshape(-1, 1)
711741
print_array_info(cm_istrong_z)
712742

713-
# Calculate Z-Ratio
714-
## Calculate difference between the averages of the observed gene Z scores of the two groups
715-
## Divide by the SD of all of the differences for that particular comparison
743+
### """
744+
# Calculate Z-Ratio differences between two groups
745+
# Calculate
746+
# difference between the averages of the observed gene Z scores of the two groups
747+
# SD of Z-Ratio difference
748+
### """
749+
716750
diff_z_scores = cm_istrong_z.mean(1) - cm_iweak_z.mean(1)
717751
std_diff = diff_z_scores.std()
718752

719753
### z-score ratio for each gene
754+
## Divide Z-Ratio differences by the Z-Ratio differences SD
720755
z_score_ratios = diff_z_scores / std_diff
721756
print_array_info(z_score_ratios)
722757
print(z_score_ratios[:10])
758+
```
723759

724-
## Rank genes according to the Z score ratio
760+
:::
725761

726-
### """
727-
# The Z-ratio provides a standardized measure of the difference between conditions for each gene.
728-
# Dividing by the SD (difference - all genes) accounts for the overall variability in the experiment.
729-
# A gene showing a difference of, say, 0.5 in average Z-score might be highly significant if most genes show very little difference (small Z-score difference - SD), but not significant if many genes show large differences (large Z-score difference - SD).
730-
# It puts the individual gene's change in the context of the overall experimental variation.
731-
### """
762+
:::{solution} Visual representation
763+
764+
```{math}
765+
Z_{G} = \frac{(Count_G - \mu_{Count_{group}})}{\sigma_{Count_{group}}}
766+
```
767+
768+
**Visual representation - converting count matrix to z-score matrix:**
769+
![alt text](image-16.png)
770+
771+
```{math}
772+
773+
Z.score_{Diff_{gene}} = \bar{Z}_{Gene, istring} - \bar{Z}_{Gene, iweak} \\
774+
775+
Z_{Ratio, Gene} = \frac{Z.score_{Diff_{gene}}}{SD_{Z.score_{Diff_{gene}}}}
776+
```
732777

733-
### Sort z_score_ratio in descending order and access indices
734-
### Rank genes using indices
778+
**Visual representation - Calculate z-ratio:**
779+
780+
![alt text](image-17.png)
781+
782+
:::
783+
784+
:::{exercise} Hands-on
785+
786+
**Rank genes according to the Z score ratio:**
787+
788+
* Sort `z_score_ratio` in descending order and access indices
789+
* Rank genes using indices
790+
791+
```python
735792

736793
gene_list = ["ACTR3B", "ANLN", "APOBEC3G", "AURKA", "BAG1", "BCL2", "BIRC5", "BLVRA", "CCL5", "CCNB1", "CCNE1", "CCR2", "CD2", "CD27", "CD3D", "CD52", "CD68", "CDC20", "CDC6", "CDH3", "CENPF", "CEP55", "CORO1A", "CTSL2", "CXCL9", "CXXC5", "EGFR", "ERBB2", "ESR1", "EXO1", "FGFR4", "FOXA1", "FOXC1", "GAPDH", "GPR160", "GRB7", "GSTM1", "GUSB", "GZMA", "GZMK", "HLA-DMA", "IL2RG", "KIF2C", "KRT14", "KRT17", "KRT5", "LCK", "MAPT", "MDM2", "MELK", "MIA", "MKI67", "MLPH", "MMP11", "MRPL19", "MYBL2", "MYC", "NAT1", "NDC80", "NUF2", "ORC6", "PGR", "PHGDH", "PRKCB", "PSMC4", "PTPRC", "PTTG1", "RRM2", "SCUBE2", "SF3A1", "SFRP1", "SH2D1A", "SLC39A6", "TFRC", "TMEM45B", "TP53", "TYMS", "UBE2C", "UBE2T", "VEGFA"]
737794

content/8.Introduction_to_pandas.md

+2
Original file line numberDiff line numberDiff line change
@@ -435,6 +435,8 @@ print(f"Series type: {type(q2_series)}")
435435

436436
:::
437437

438+
## Homework
439+
438440
:::{homework}
439441

440442
**Creating pandas DataFrames from list of dictionaries:**

content/image-12.png

93.5 KB
Loading

content/image-13.png

11.6 KB
Loading

content/image-14.png

36.3 KB
Loading

content/image-15.png

106 KB
Loading

content/image-16.png

1.58 MB
Loading

content/image-17.png

1.51 MB
Loading

0 commit comments

Comments
 (0)