Skip to content

Commit ab0dc65

Browse files
author
pubudu
committed
Essential Array Operations with NumPy
1 parent 0cbe82d commit ab0dc65

File tree

7 files changed

+323
-0
lines changed

7 files changed

+323
-0
lines changed

.DS_Store

6 KB
Binary file not shown.
+322
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,322 @@
1+
# Essential Array Operations with NumPy
2+
3+
:::{objectives}
4+
5+
1. Reshape arrays to transform data structures while preserving values
6+
2. Combine arrays using concatenation operations along different axes
7+
3. Generate descriptive statistics from arrays using NumPy's built-in functions
8+
4. Apply the axis parameter correctly to perform row-wise and column-wise operations
9+
5. Integrate reshaping, concatenation, and statistical functions to solve practical data problems
10+
11+
:::
12+
13+
:::{exercise} Time
14+
20 Minutes
15+
:::
16+
17+
## Introduction
18+
19+
NumPy is the foundation of Python's data science ecosystem. At its core is the powerful ndarray object - an efficient, versatile container for large datasets. We'll explore three essential capabilities:
20+
21+
* Reshaping arrays to organize data differently
22+
* Combining arrays using concatenation
23+
* Generating summary statistics to understand our data
24+
25+
Let's dive into how these operations can transform the way we work with numerical data.
26+
27+
## Reshaping Arrays
28+
29+
### Understanding Array Dimensions
30+
31+
Arrays can have different dimensions:
32+
33+
* 1D arrays (vectors): Simple sequences of values
34+
* 2D arrays (matrices): Tables with rows and columns
35+
* 3D arrays and beyond: Multi-dimensional structures
36+
37+
![alt text](image-4.png)
38+
39+
The shape and dimension of an array tell us how data is organized:
40+
41+
```python
42+
import numpy as np
43+
44+
# Create a simple 1D array
45+
a = np.ones(6)
46+
print("Original array:")
47+
print(a)
48+
print(f"Dimensions: {a.ndim}") # Number of dimensions
49+
print(f"Shape: {a.shape}") # Tuple showing size in each dimension
50+
```
51+
52+
Output:
53+
54+
```none
55+
Original array:
56+
[1. 1. 1. 1. 1. 1.]
57+
Dimensions: 1
58+
Shape: (6,)
59+
```
60+
61+
### Reshaping Arrays uisng `reshape`
62+
63+
![alt text](image-6.png)
64+
65+
* Reshaping allows us to reorganize the same data into different dimensions
66+
* The key rule: the total number of elements must remain the same
67+
68+
```python
69+
a = np.array(range(1,7))
70+
# Reshape our 1D array with 6 elements into a 2D array (2 rows, 3 columns)
71+
b = a.reshape(2, 3)
72+
print("\nReshaped to 2x3 array:")
73+
print(b)
74+
print(f"Dimensions: {b.ndim}")
75+
print(f"Shape: {b.shape}")
76+
```
77+
78+
Output
79+
80+
```none
81+
Reshaped to 2x3 array:
82+
[[1 2 3]
83+
[4 5 6]]
84+
Dimensions: 2
85+
Shape: (2, 3)
86+
```
87+
88+
#### Practical Example: Preparing a Simple Grayscale Image for an ML Model
89+
90+
* Imagine you have a tiny grayscale image, maybe from a very simple dataset. It's represented as a 2D grid of pixel values. Many basic machine learning algorithms (like Logistic Regression or simple Neural Networks) expect input data where each row is a single sample (a single image) and each column is a feature (a single pixel value).
91+
92+
* Our task is to take a 2D image representation and "flatten" it into a 1D row vector suitable for these algorithms.
93+
94+
```python
95+
import numpy as np
96+
97+
# 2. Imagine a tiny 3x3 pixel grayscale image
98+
# Each number represents the brightness of a pixel (0=black, 255=white)
99+
# This is a 2D NumPy array (a matrix)
100+
image_2d = np.array([
101+
[10, 20, 30],
102+
[40, 50, 60],
103+
[70, 80, 90]
104+
])
105+
106+
print("Original 2D Image Array:")
107+
print(image_2d)
108+
print("Shape of original image:", image_2d.shape) # Output: (3, 3) -> 3 rows, 3 columns
109+
110+
# 3. Prepare for ML: Flatten the image
111+
# Many ML models expect each sample (our image) as a single row.
112+
# We need to convert the 3x3 grid into a 1x9 row (1 row, 9 features/pixels).
113+
# Total number of pixels = 3 * 3 = 9
114+
115+
# Using reshape:
116+
# We want 1 row, and NumPy can figure out the number of columns needed.
117+
# We use '-1' to tell NumPy: "calculate the correct number of columns for me".
118+
flattened_image = image_2d.reshape(1, 9)
119+
120+
# Alternatively, we could be explicit:
121+
# flattened_image = image_2d.reshape(1, 9)
122+
123+
print("Flattened Image Array (Ready for ML Model):")
124+
print(flattened_image)
125+
print("Shape of flattened image:", flattened_image.shape) # Output: (1, 9) -> 1 row, 9 columns
126+
127+
```
128+
129+
Output
130+
131+
```none
132+
Original 2D Image Array:
133+
[[10 20 30]
134+
[40 50 60]
135+
[70 80 90]]
136+
Shape of original image: (3, 3)
137+
138+
Flattened Image Array (Ready for ML Model):
139+
[[10 20 30 40 50 60 70 80 90]]
140+
Shape of flattened image: (1, 9)
141+
```
142+
143+
#### Using -1 as a Dimension
144+
145+
NumPy can automatically calculate one dimension when you use -1:
146+
147+
```python
148+
image_2d2 = np.array([
149+
[10, 20, 30],
150+
[40, 50, 60],
151+
[70, 80, 90],
152+
[100, 50, 60],
153+
[55, 150, 200],
154+
[150, 100, 220]
155+
])
156+
157+
print(f"Flattened image: {image_2d2.reshape(-1, 9)}")
158+
```
159+
160+
Output
161+
162+
```none
163+
array([[ 10, 20, 30, 40, 50, 60, 70, 80, 90],
164+
[100, 50, 60, 55, 150, 200, 150, 100, 220]])
165+
```
166+
167+
## Array Concatenation
168+
169+
Concatenation lets us combine multiple arrays into a single larger array. This is essential when:
170+
171+
* Merging datasets
172+
* Building up arrays piece by piece
173+
* Combining results from different operations
174+
175+
![alt text](image-7.png)
176+
177+
### 1D Array Concatenation
178+
179+
Let's start with the simplest case - joining two 1D arrays:
180+
181+
```python
182+
# Create two 1D arrays
183+
a = np.array([1, 2, 3, 4])
184+
b = np.array([5, 6, 7, 8])
185+
186+
# Concatenate them
187+
combined = np.concatenate((a, b))
188+
print("Concatenated 1D arrays:")
189+
print(combined)
190+
```
191+
192+
Output
193+
194+
```none
195+
Concatenated 1D arrays:
196+
[1 2 3 4 5 6 7 8]
197+
```
198+
199+
### 2D Array Concatenation
200+
201+
When working with 2D arrays, we need to specify the axis of concatenation:
202+
203+
axis=0: Join vertically (collapse rows)
204+
axis=1: Join horizontally (collapse columns)
205+
206+
#### Vertical Concatenation (axis=0)
207+
208+
```python
209+
# Create 2D arrays
210+
x = np.array([[1, 2], [3, 4]]) # 2x2 array
211+
y = np.array([[5, 6]]) # 1x2 array
212+
213+
# Vertical concatenation (default is axis=0)
214+
v_combined = np.concatenate((x, y))
215+
print("\nVertical concatenation (axis=0):")
216+
print(v_combined)
217+
```
218+
219+
Output
220+
221+
```none
222+
Vertical concatenation (axis=0):
223+
[[1 2]
224+
[3 4]
225+
[5 6]]
226+
```
227+
228+
#### Horizontal Concatenation (axis=1)
229+
230+
```python
231+
# Create arrays for horizontal concatenation
232+
p = np.array([[1, 2], [3, 4]]) # 2x2 array
233+
q = np.array([[5], [6]]) # 2x1 array
234+
235+
# Horizontal concatenation (axis=1)
236+
h_combined = np.concatenate((p, q), axis=1)
237+
print("\nHorizontal concatenation (axis=1):")
238+
print(h_combined)
239+
```
240+
241+
Output
242+
243+
```none
244+
Horizontal concatenation (axis=1):
245+
[[1 2 5]
246+
[3 4 6]]
247+
```
248+
249+
### Concatenation Requirements
250+
251+
For concatenation to work properly:
252+
253+
* Arrays must have the same shape except in the dimension you're concatenating
254+
* The non-concatenation dimensions must match exactly
255+
256+
```python
257+
a = np.array([[1, 2, 3]]) # Shape: (1, 3)
258+
b = np.array([[4, 5, 6, 7]]) # Shape: (1, 4)
259+
260+
np.concatenate((a,b), axis=0)
261+
```
262+
263+
Output
264+
265+
```none
266+
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 4
267+
```
268+
269+
```python
270+
a = np.array([[1, 2, 3]]) # Shape: (1, 3)
271+
b = np.array([[4, 5, 6, 7]]) # Shape: (1, 4)
272+
np.concatenate((a,b), axis=1)
273+
```
274+
275+
Output
276+
277+
```python
278+
array([[1, 2, 3, 4, 5, 6, 7]])
279+
```
280+
281+
## Summary Statistics
282+
283+
NumPy provides efficient functions to calculate statistical measures across arrays. These are essential for:
284+
285+
* Data exploration and understanding
286+
* Identifying patterns and outliers
287+
* Summarizing large datasets
288+
289+
| Function | Description |
290+
| np.sum() | Sum of array elements |
291+
| np.min() | Minimum value |
292+
| np.max() | Maximum value |
293+
| np.mean() | Arithmetic mean (average) |
294+
| np.median() | Median value |
295+
| np.std() | Standard deviation |
296+
| np.var() | Variance |
297+
298+
axis=None (default): Operate on all elements (flattened array)
299+
axis=0: Collapse rows and operate along columns (down)
300+
axis=1: Collapse columns and operate along rows (across)
301+
302+
```python
303+
# Create a 2D array
304+
data = np.array([[1, 2, 3],
305+
[4, 5, 6]])
306+
307+
print("Our data:")
308+
print(data)
309+
310+
# Sum of all elements
311+
total = np.sum(data)
312+
print(f"\nTotal sum: {total}") # 21
313+
314+
# Column sums (axis=0)
315+
col_sums = np.sum(data, axis=0)
316+
print(f"Column sums: {col_sums}") # [5 7 9]
317+
318+
# Row sums (axis=1)
319+
row_sums = np.sum(data, axis=1)
320+
print(f"Row sums: {row_sums}") # [6 15]
321+
```
322+

content/image-4.png

54.5 KB
Loading

content/image-5.png

111 KB
Loading

content/image-6.png

111 KB
Loading

content/image-7.png

22.9 KB
Loading

content/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ Pandas for Bioinformatics ; 3 Hours
5252
2.NumPy_Data_Types.md
5353
3.Indexing_and_Slicing.md
5454
4.Advance_indexing_filtering.md
55+
5.Essential_array_operations.md
5556
5657
```
5758

0 commit comments

Comments
 (0)