Skip to content

Commit 0cbe82d

Browse files
author
pubudu
committed
Boolean masking
1 parent 6a94101 commit 0cbe82d

File tree

3 files changed

+128
-3
lines changed

3 files changed

+128
-3
lines changed

content/0.Numpy_for_bioinformatics.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Numpy for Bioinformatics
1+
# Lesson plan: Numpy for Bioinformatics
22

33
## Overall objectives
44

content/3.Indexing_and_Slicing.md

+1
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,7 @@ gene_expr = np.array([
339339
```
340340

341341
**Tasks:**
342+
342343
1. Find all expression values greater than 30
343344
2. Identify which genes have at least one expression value greater than 30
344345
3. Create a boolean mask showing positions where expression is between 15 and 25

content/4.Advance_indexing_filtering.md

+126-2
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,13 @@ print("Boolean mask (data > 3):", mask)
6262
selected_data = data[mask]
6363
print("Selected elements:", selected_data)
6464
# This produces: [4, 5]
65+
66+
## Elegant approach - mask array has the exact same shape as data array
67+
## Each position containing information about whether that element meets our criteria
6568
```
6669

6770
:::
6871

69-
Notice how elegant this approach is: the mask array has the exact same shape as our data array, with each position containing information about whether that element meets our criteria.
70-
7172
## Combining Multiple Conditions
7273

7374
We can combine multiple conditions using logical operators:
@@ -222,3 +223,126 @@ For large datasets, these techniques are drastically faster than traditional ite
222223
* `np.isin()` lets us filter based on membership in a set of values
223224

224225
:::
226+
227+
## Exercises: NumPy Boolean Masking and Advanced Filtering
228+
229+
:::{exercise}
230+
231+
**Exercise 1 - Basic Boolean Masking:**
232+
233+
Create a NumPy array of 20 random integers between 0 and 100. Then:
234+
235+
* Create a boolean mask to identify all numbers divisible by 7
236+
* Use the mask to extract these numbers
237+
* Count how many numbers are divisible by 7
238+
239+
:::
240+
241+
:::{solution}
242+
243+
```python
244+
import numpy as np
245+
246+
# Create an array of 20 random integers between 0 and 100
247+
np.random.seed(42) # for reproducibility
248+
numbers = np.random.randint(0, 101, 20)
249+
print("Original array:", numbers)
250+
251+
# Create a boolean mask for numbers divisible by 7
252+
mask = numbers % 7 == 0
253+
print("Boolean mask:", mask)
254+
255+
# Extract numbers divisible by 7
256+
divisible_by_7 = numbers[mask]
257+
print("Numbers divisible by 7:", divisible_by_7)
258+
259+
# Count how many numbers are divisible by 7
260+
count = np.sum(mask) # True values are treated as 1, False as 0
261+
print(f"Count of numbers divisible by 7: {count}")
262+
```
263+
264+
:::
265+
266+
:::{exercise}
267+
268+
**Exercise 2 - Combined Conditions:**
269+
270+
Generate a NumPy array of 30 random integers between -50 and 50. Then:
271+
272+
Create a mask to find all numbers that are both positive and even
273+
Create another mask to find all numbers that are either negative or divisible by 5
274+
Apply both masks to the array and display the results
275+
276+
:::
277+
278+
:::{solution}
279+
280+
:::
281+
282+
:::{exercise}
283+
**Exercise 3 - np.where() for Conditional Assignment:**
284+
285+
Create a 4x4 matrix of random integers between 1 and 20. Then:
286+
287+
* Use np.where() to replace all odd numbers with -1 while keeping even numbers unchanged
288+
* Use np.where() again to create a new matrix where:
289+
* Numbers less than 10 remain the same
290+
* Numbers between 10 and 15 are replaced with 100
291+
* Numbers greater than 15 are replaced with 200
292+
:::
293+
294+
:::{solution}
295+
296+
```python
297+
# Create a 4x4 matrix of random integers between 1 and 20
298+
np.random.seed(42)
299+
matrix = np.random.randint(1, 21, (4, 4))
300+
print("Original matrix:")
301+
print(matrix)
302+
303+
# Replace odd numbers with -1, keep even numbers
304+
odd_replaced = np.where(matrix % 2 == 0, matrix, -1)
305+
print("\nMatrix with odd numbers replaced by -1:")
306+
print(odd_replaced)
307+
308+
# Replace based on value ranges
309+
transformed = np.where(matrix < 10, matrix,
310+
np.where((matrix >= 10) & (matrix <= 15), 100, 200))
311+
print("\nMatrix with conditional replacements:")
312+
print(transformed)
313+
314+
```
315+
316+
:::
317+
318+
:::{exercise}
319+
**Exercise 4 - DNA Sequence Analysis:**
320+
321+
You are given a DNA sequence as a NumPy array of characters (A, T, G, C).
322+
323+
* Create a random DNA sequence of length 50 using `np.random.choice(['A', 'T', 'G', 'C'], 50)`
324+
* Use boolean masking to Count the number of each nucleotide (A, T, G, C)
325+
326+
:::
327+
328+
:::{solution}
329+
330+
```python
331+
import numpy as np
332+
333+
# Create a random DNA sequence
334+
np.random.seed(42) # for reproducibility
335+
dna_sequence = np.random.choice(['A', 'T', 'G', 'C'], 50)
336+
print("DNA sequence:", ''.join(dna_sequence))
337+
338+
# Count the number of each nucleotide
339+
a_count = np.sum(dna_sequence == 'A')
340+
t_count = np.sum(dna_sequence == 'T')
341+
g_count = np.sum(dna_sequence == 'G')
342+
c_count = np.sum(dna_sequence == 'C')
343+
344+
print(f"A: {a_count}, T: {t_count}, G: {g_count}, C: {c_count}")
345+
346+
```
347+
348+
:::

0 commit comments

Comments
 (0)