From 58d34b60c226cea634b74308541174649e6db8f5 Mon Sep 17 00:00:00 2001
From: Vlad Dracula <erin.graham@jcu.edu.au>
Date: Mon, 11 Dec 2023 14:25:55 +1000
Subject: [PATCH] style guide changes cont; reduce passive voice

---
 episodes/01-introduction.md         | 44 +++++++++----------
 episodes/02-image-data.md           | 32 +++++++-------
 episodes/03-build-cnn.md            | 40 ++++++++---------
 episodes/04-fit-cnn.md              | 66 ++++++++++++++---------------
 episodes/05-evaluate-predict-cnn.md | 16 +++----
 episodes/06-conclusion.md           | 17 ++++----
 episodes/setup-gpu.md               | 20 ++++-----
 learners/setup.md                   | 20 ++++-----
 8 files changed, 126 insertions(+), 129 deletions(-)

diff --git a/episodes/01-introduction.md b/episodes/01-introduction.md
index 44b8d308..9e4fc657 100644
--- a/episodes/01-introduction.md
+++ b/episodes/01-introduction.md
@@ -24,23 +24,23 @@ exercises: 0
 ## What is machine learning?
 Machine learning is a set of tools and techniques which let us find patterns in data. This lesson will introduce you to only one of these techniques, **Deep Learning** with **Convolutional Neural Network**, abbreviated as **CNN**, but there are many more.
 
-The techniques break down into two broad categories, predictors and classifiers. Predictors are used to predict a value (or set of values) given a set of inputs, for example trying to predict the cost of something given the economic conditions and the cost of raw materials or predicting a country’s GDP given its life expectancy. Classifiers try to classify data into different categories, or assign a label; for example, deciding what characters are visible in a picture of some writing or if a message is spam or not.
+The techniques break down into two broad categories, predictors and classifiers. Predictors are used to predict a value (or set of values) given a set of inputs, for example trying to predict the cost of something given the economic conditions and the cost of raw materials or predicting a country’s GDP given its life expectancy. Classifiers try to classify data into different categories, or assign a label; for example, deciding what characters are visible in a picture of some writing or if an email or text message is spam or not.
 
 ## Training Data
 
-Many (but not all) machine learning systems “learn” by taking a series of input data and output data and using it to form a model. The maths behind the machine learning doesn’t care what the data is as long as it can represented numerically or categorised. Some examples might include:
+Many, but not all, machine learning systems “learn” by taking a series of input data and output data and using it to form a model. The maths behind the machine learning doesn’t care what the data is as long as it can represented numerically or categorised. Some examples might include:
 
-- predicting a person’s weight based on their height
-- predicting house prices given stock market prices
-- classifying if an email is spam or not
-- classifying an image as, e.g., person, place, or particular object
+- Predicting a person’s weight based on their height.
+- Predicting house prices given stock market prices.
+- Classifying an email as spam or not.
+- Classifying an image as, e.g., a person, place, or particular object.
 
-Typically we will need to train our models with hundreds, thousands or even millions of examples before they work well enough to do any useful predictions or classifications with them.
+Typically we train our models with hundreds, thousands or even millions of examples before they work well enough to do any useful predictions or classifications with them.
 
 
 ## Deep Learning, Machine Learning and Artificial Intelligence
 
-Deep Learning (DL) is just one of many machine learning techniques, in which people often talk about machine learning being a form of artificial intelligence (AI). Definitions of artificial intelligence vary, but usually involve having computers mimic the behaviour of intelligent biological systems. Since the 1950s many works of science fiction have dealt with the idea of an artificial intelligence which matches (or exceeds) human intelligence in all areas. Although there have been great advances in AI and ML research recently, we can only come close to human like intelligence in a few specialist areas and are still a long way from a general purpose AI. The image below shows some differences between artificial intelligence, machine learning and deep learning.
+Deep Learning (DL) is just one of many machine learning techniques, in which people often talk about machine learning being a form of artificial intelligence (AI). Definitions of artificial intelligence vary, but usually involve having computers mimic the behaviour of intelligent biological systems. Since the 1950s many works of science fiction have dealt with the idea of an artificial intelligence which matches, or exceeds, human intelligence in all areas. Although there have been great advances in AI and ML research recently, we can only come close to human like intelligence in a few specialist areas and are still a long way from a general purpose AI. The image below illustrates some differences between artificial intelligence, machine learning and deep learning.
 
 ![The image above is by Tukijaaliwa, CC BY-SA 4.0, via Wikimedia Commons, [original source]](fig/01_AI_ML_DL_differences.png){alt='Three nested circles defining deep learning as a subset of machine learning which is a subset of artifical intelligence'}
 
@@ -57,7 +57,7 @@ Concept: Differentiation between traditional Machine Learning models and Deep Le
 
 Image classification is a fundamental task in computer vision, which is a field of artificial intelligence focused on teaching computers to interpret and understand visual information from the world. Image classification specifically involves the process of assigning a label or category to an input image. The goal is to enable computers to recognise and categorise objects, scenes, or patterns within images, just as a human would. Image classification can refer to one of several tasks:
 
-![](fig/01_Fei-Fei_Li_Justin_Johnson_Serena_Young__CS231N_2017.png){alt='Four types of image classification tasks include semantic segmentation where every pixel is labelled; classification and localisation that detects a single object like a cat; object detection that detects multiple objects like cats and dogs; and instance segmentation that detects each pixel of multiple objects'}
+![](fig/01_Fei-Fei_Li_Justin_Johnson_Serena_Young__CS231N_2017.png){alt='Four types of image classification tasks include semantic segmentation to label every pixel; classification and localisation to detect a single object like a cat; object detection to detect multiple objects like cats and dogs; and instance segmentation to detect each pixel of multiple objects'}
 
 Image classification has numerous practical applications, including:
 
@@ -70,13 +70,13 @@ Image classification has numerous practical applications, including:
 Convolutional Neural Networks (CNNs) have become a cornerstone in image classification due to their ability to automatically learn hierarchical features from images and achieve remarkable performance on a wide range of tasks.
 
 ## Deep Learning Workflow
-To apply Deep Learning to a problem there are several steps we need to go through:
+To apply Deep Learning to a problem there are several steps to go through:
 
 ### Step 1. Formulate / Outline the problem
 Firstly we must decide what it is we want our Deep Learning system to do. This lesson is all about image classification so our aim is to put an image into one of a few categories. Specifically in our case, we have 10 categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
 
 ### Step 2. Identify inputs and outputs
-Next we need to identify what the inputs and outputs of the neural network will be. In our case, the data is images and the inputs could be the individual pixels of the images. We are performing a classification problem and we will have one output for each potential class.
+Next identify what the inputs and outputs of the neural network will be. In our case, the data is images and the inputs could be the individual pixels of the images. We are performing a classification problem and we will have one output for each potential class.
 
 ### Step 3. Prepare data
 Many datasets are not ready for immediate use in a neural network and will require some preparation. Neural networks can only really deal with numerical data, so any non-numerical data (e.g., images) will have to be somehow converted to numerical data. Information on how this is done and the data structure will be explored in [Episode 02 Introduction to Image Data](episodes/02-image-data).
@@ -85,7 +85,7 @@ For this lesson, we will use an existing image dataset known as CIFAR-10. We wil
 
 #### Preparing the code
 
-It is the goal of this training workshop to produce a Deep Learning program, using a Convolutional Neural Network.  At the end of this workshop, we hope that this code can be used as a "starting point".  We will be creating an "initial program" for this introduction chapter, that will be copied and used as a foundation for the rest of the episodes.
+It is the goal of this training workshop to produce a Deep Learning program, using a Convolutional Neural Network.  At the end of this workshop, we hope this code can be used as a "starting point".  We will create an "initial program" for this introduction chapter that will be copied and used as a foundation for the rest of the episodes.
 
 ```python
 # load the required packages
@@ -163,11 +163,11 @@ plt.show()
 
 ### Step 4. Choose a pre-trained model or build a new architecture from scratch
 
-Often we can use an existing neural network instead of designing one from scratch. Training a network can take a lot of time and computational resources. There are a number of well publicised networks which have been shown to perform well at certain tasks. If you know of one which already does a similar task well, then it makes sense to use one of these.
+Often we can use an existing neural network instead of designing one from scratch. Training a network can take a lot of time and computational resources. There are a number of well publicised networks which have been demonstrated to perform well at certain tasks. If you know of one which already does a similar task well, then it makes sense to use one of these.
 
-If instead we decide we do want to design our own network then we need to think about how many input neurons it will have, how many hidden layers and how many outputs, what types of layers we use (we will explore the different types later on). This will probably need some experimentation and we might have to try tweaking the network design a few times before we see acceptable results.
+If instead we decide to design our own network, then we need to think about how many input neurons it will have, how many hidden layers and how many outputs, and what types of layers to use. This will require some experimentation and tweaking of the network design a few times before achieving acceptable results.
 
-Here we present an initial model that will be explained in detail later on:
+Here we present an initial model to be explained in detail later on:
 
 #### Define the Model
 
@@ -202,9 +202,9 @@ model_intro = keras.Model(inputs = inputs_intro,
 
 ### Step 5. Choose a loss function and optimizer
 
-The loss function tells the training algorithm how far away the predicted value was from the true value. We will learn how to choose a loss function in more detail later on.
+The loss function tells the training algorithm how far away the predicted value was from the true value. We will learn how to choose a loss function in more detail in [Episode 4 Compile and Train (Fit) a Convolutional Neural Network](episodes/04-fit-cnn.md).
 
-The optimizer is responsible for taking the output of the loss function and then applying some changes to the weights within the network. It is through this process that the “learning” (adjustment of the weights) is achieved.
+The optimizer is responsible for taking the output of the loss function and then applying some changes to the weights within the network. It is through this process that “learning” (adjustment of the weights) is achieved.
 
 ```python
 # compile the model
@@ -270,19 +270,19 @@ My result is different!
 
 While the neural network itself is deterministic, various factors in the training process, system setup, and data variability can lead to small variations in the output. These variations are usually minor and should not significantly impact the overall performance or behavior of the model.
 
-If you are finding significant differences in the model predictions, this could be a sign that the model is not fully converged, where "convergence" refers to the point where the model has reached an optimal or near-optimal state in terms of learning from the training data.
+If you are finding significant differences in the model predictions, this could be a sign the model is not fully converged. "Convergence" refers to the point where the model has reached an optimal or near-optimal state in terms of learning from the training data.
 :::::::::::::::::::::::::::::::::::::::::::::::::
 
 Congratulations, you just created your first image classification model and used it to classify an image! 
 
-Was the classification correct? Why might it be incorrect and What can we do about? 
+Was the classification correct? Why might it be incorrect and what can we do about? 
 
-There are many ways we can try to improve the accuracy of our model, such as adding or removing layers to the model definition and fine-tuning the hyperparameters, which takes us to the next steps in our workflow.
+There are many ways to try to improve the accuracy of our model, such as adding or removing layers to the model definition and fine-tuning the hyperparameters, which takes us to the next steps in our workflow.
 
 
 ### Step 8. Measure Performance
 
-Once we trained the network we want to measure its performance. To do this we use some additional data that was **not** part of the training; this is known as a test set. There are many different methods available for measuring performance and which one is best depends on the type of task we are attempting. These metrics are often published as an indication of how well our network performs.
+Once we trained the network we want to measure its performance. To do this, we use additional data that was **not** part of the training, called a test dataset. There are many different methods available for measuring performance and which one is best depends on the type of task we are attempting. These metrics are often published as an indication of how well our network performs.
 
 ### Step 9. Tune Hyperparameters
 
@@ -310,7 +310,7 @@ associated with the lessons. They appear in the "Instructor View"
 - Machine learning is the process where computers learn to recognise patterns of data.
 - Deep learning is a subset of machine learning, which is a subset of artificial intelligence.
 - Convolutional neural networks are well suited for image classification.
-- To use Deep Learning effectively we need to go through a workflow of: defining the problem, identifying inputs and outputs, preparing data, choosing the type of network, training the model, tuning hyperparameters, measuring performance before we can classify data.
+- To use Deep Learning effectively we follow a workflow of: defining the problem, identifying inputs and outputs, preparing data, choosing the type of network, training the model, tuning hyperparameters, measuring performance before we can classify data.
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 <!-- Collect your link references at the bottom of your document -->
diff --git a/episodes/02-image-data.md b/episodes/02-image-data.md
index c8ac2cae..c9d25b43 100644
--- a/episodes/02-image-data.md
+++ b/episodes/02-image-data.md
@@ -32,13 +32,13 @@ Firstly we must decide what it is we want our Deep Learning system to do. This l
 
 ### Step 2. Identify inputs and outputs
 
-Next we need to identify what the inputs and outputs of the neural network will be. In our case, the data is images and the inputs could be the individual pixels of the images. 
+Next we identify the inputs and outputs of the neural network. In our case, the data is images and the inputs could be the individual pixels of the images. 
 
 We are performing a classification problem and we want to output one category for each image.
 
 ### Step 3. Prepare data
 
-Deep Learning requires extensive training using example data which shows the network what output it should produce for a given input. In this workshop our network will be trained by being “shown” a series of images and told what they contain. Once the network is trained it should be able to take another image and correctly classify its contents.
+Deep Learning requires extensive training using example data which tells the network what output it should produce for a given input. In this workshop, our network will be trained on a series of images and told what they contain. Once the network is trained, it should be able to take another image and correctly classify its contents.
 
 You can use pre-existing data or prepare your own.
 
@@ -50,7 +50,7 @@ In some cases you will be able to download an image dataset that is already labe
 - [ImageNet] - 14 million hand-annotated images indicating objects from more than 20,000 categories. ImageNet sponsors an [annual software contest] where programs compete to achieve the highest accuracy. When choosing a pretrained network, the winners of these sorts of competitions are generally a good place to start.
 - [MS COCO] - >200,000 labelled images used for object detection, instance segmentation, keypoint analysis, and captioning
 
-Where labelled data exists, in most cases the data provider or other users will have created functions that you can use to load the data. We already did this in the introduction:
+Where labelled data exists, in most cases the data provider or other users will have created data-specific functions you can use to load the data. We already did this in the introduction:
 
 ```python
 from tensorflow import keras
@@ -66,7 +66,7 @@ In this instance the data is likely already prepared for use in a CNN. However,
 
 How much data do you need for Deep Learning?
 
-The rise of Deep Learning is partially due to the increased availability of very large datasets. But how much data do you actually need to train a Deep Learning model? Unfortunately, this question is not easy to answer. It depends, among other things, on the complexity of the task (which you often do not know beforehand), the quality of the available dataset and the complexity of the network. For complex tasks with large neural networks, we often find that adding more data continues to improve performance. However, this is also not a generic truth: if the data you add is too similar to the data you already have, it will not give much new information to the neural network.
+The rise of Deep Learning is partially due to the increased availability of very large datasets. But how much data do you actually need to train a Deep Learning model? Unfortunately, this question is not easy to answer. It depends, among other things, on the complexity of the task (which you often do not know beforehand), the quality of the available dataset and the complexity of the network. For complex tasks with large neural networks, adding more data often improves performance. However, this is also not a generic truth: if the data you add is too similar to the data you already have, it will not give much new information to the neural network.
 
 In case you have too little data available to train a complex network from scratch, it is sometimes possible to use a pretrained network that was trained on a similar problem. Another trick is data augmentation, where you expand the dataset with artificial data points that could be real. An example of this is mirroring images when trying to classify cats and dogs. An horizontally mirrored animal retains the label, but exposes a different view.
 
@@ -74,13 +74,13 @@ In case you have too little data available to train a complex network from scrat
 
 #### Custom image data
 
-In other cases, you will need to create your own set of labelled images. 
+In other cases, you will create your own set of labelled images. 
 
 **Custom data i. Data collection and Labeling:**
 
 For image classification the label applies to the entire image; object detection requires bounding boxes around objects of interest, and instance or semantic segmentation requires each pixel to be labelled.
 
-There are a number of open source software that can be used to label your dataset, including:
+There are a number of open source software used to label your dataset, including:
 
 - (Visual Geometry Group) [VGG Image Annotator] (VIA)
 - [ImageJ] can be extended with plugins for annotation
@@ -110,9 +110,9 @@ For example, consider this image of a Jabiru, with a square area designated by a
 
 Now, if we zoomed in close enough to the red box, inte individual pixels would stand out:
 
-![](fig/02_Jabiru_TGS_marked_zoom_enlarged.jpg){alt='zoomed in area of Jabiru where you can the individual pixels stand out'}
+![](fig/02_Jabiru_TGS_marked_zoom_enlarged.jpg){alt='zoomed in area of Jabiru where the individual pixels stand out'}
 
-Note that each square in the enlarged image area (i.e. each pixel) is all one color, but that each pixel can have a different color from its neighbors. Viewed from a distance, these pixels seem to blend together to form the image.
+Note each square in the enlarged image area (i.e. each pixel) is all one color, but each pixel can be a different color from its neighbors. Viewed from a distance, these pixels seem to blend together to form the image.
 
 ### Working with Pixels
 
@@ -172,7 +172,7 @@ The new image has shape `(573, 552, 3)`, meaning it is much larger in size, 573x
 
 Recall from the introduction that our training data set consists of 50000 images of 32x32 pixels and three channels. 
 
-To reduce the computational load and ensure all of our images have a uniform size, we need to choose an image resolution (or size in pixels) and ensure that all of the images we use are resized to that shape to be consistent.
+To reduce the computational load and ensure all of our images have a uniform size, we need to choose an image resolution (or size in pixels) and ensure all of the images we use are resized to that shape to be consistent.
 
 There are a couple of ways to do this in python but one way is to specify the size you want using an argument to the `load_img()` function from `keras.utils`.
 
@@ -248,7 +248,7 @@ The min, max, and mean pixel values are 0.0 , 255.0 , and 87.0 respectively.
 After normalization, the min, max, and mean pixel values are 0.0 , 1.0 , and 0.0 respectively.
 ```
 
-Of course, if there are a large number of images to preprocess you do not want to copy and paste these steps for each image! Fortunately, Keras has a solution for that: [tf.keras.utils.image_dataset_from_directory]
+Of course, if there are a large number of images to preprocess you do not want to copy and paste these steps for each image! Fortunately, Keras has a solution: [tf.keras.utils.image_dataset_from_directory]
 
 
 ### One-hot encoding
@@ -277,7 +277,7 @@ Table 2. After One-Hot Encoding.
 | 0         | 0             | 1             |
 | 1         | 0             | 0             |
 
-Each category has its own binary column, and the value is set to 1 in the corresponding column for each row that matches that category.
+Each category has its own binary column, and the value is set to 1 in the corresponding column for each row matches that category.
 
 The Keras function for one_hot encoding is called [to_categorical]:
 
@@ -334,7 +334,7 @@ There are several ways to augment your data to increase the diversity of the tra
   - brightness, contrast, or hue
   - these changes simulate variations in lighting conditions
  
-We will discuss image augmentation in this lesson, but it is important that you be aware of this type of data preparation because it can make a big difference in your model's ability to predict outside of your training data.
+We will not discuss image augmentation in this lesson, but it is important that you are aware of this type of data preparation because it can make a big difference in your model's ability to predict outside of your training data.
 
 Information about these operations are included in the Keras document for [Image augmentation layers]. 
 
@@ -342,16 +342,16 @@ Information about these operations are included in the Keras document for [Image
 
 The typical practice in machine learning is to split your data into two subsets: a **training** set and a **test** set. This initial split separates the data you will use to train your model from the data you will use to evaluate its performance.
 
-After this initial split, you can choose to further split the training set into a training set and a **validation set**. This is often done when you need to fine-tune hyperparameters, select the best model from a set of candidate models, or prevent overfitting.
+After this initial split, you can choose to further split the training set into a training set and a **validation set**. This is often done when you are fine-tuning hyperparameters, selecting the best model from a set of candidate models, or preventing overfitting.
 
-In the previous episodes we used the 'cifar10.load_data()' that comes with the Keras installation to return data that is split into two (train and test sets). Now we want to split the training data into training and validation sets.
+In the previous episode, we used the 'cifar10.load_data()' method included with the Keras installation to return a dataset split into train and test sets. Now we want to split the training data into training and validation sets.
 
 To split a dataset into training and test sets there is a very convenient function from sklearn called [train_test_split]: 
 
 `sklearn.model_selection.train_test_split(*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)`
 
 - The first two parameters are the dataset (X) and the corresponding targets (y) (i.e. class labels).
-- Next is the named parameter `test_size` this is the fraction of the dataset that is used for testing, in this case `0.2` means 20 per cent of the data will be used for testing.
+- Next is the named parameter `test_size`. This is the fraction of the dataset used for testing and in this case `0.2` means 20 per cent of the data will be used for testing.
 - `random_state` controls the shuffling of the dataset, setting this value will reproduce the same results (assuming you give the same integer) every time it is called.
 - `shuffle` which can be either `True` or `False`, it controls whether the order of the rows of the dataset is shuffled before splitting. It defaults to `True`.
 - `stratify` is a more advanced parameter that controls how the split is done. By setting it to `target` the train and test sets the function will return will have roughly the same proportions (with regards to the number of images of a certain class) as the dataset.
@@ -455,7 +455,7 @@ Data is typically split into the training, validation, and test data sets using
 
   - The data is split in such a way that each subset (training, validation, or test) maintains the same class distribution as the original dataset.
 
-  - This ensures that all classes are well-represented in each subset, which is important to avoid biased model evaluation.
+  - This ensures all classes are well-represented in each subset, which is important to avoid biased model evaluation.
 
 It's important to note that the exact split ratios (e.g., 80-10-10 or 70-15-15) may vary depending on the problem, dataset size, and specific requirements. Additionally, data splitting should be performed randomly to avoid introducing any biases into the model training and evaluation process.
 
diff --git a/episodes/03-build-cnn.md b/episodes/03-build-cnn.md
index 8e1de9a5..26fb9fac 100644
--- a/episodes/03-build-cnn.md
+++ b/episodes/03-build-cnn.md
@@ -33,7 +33,7 @@ A **neural network** is an artificial intelligence technique loosely based on th
 
 ![](fig/03_neuron.png){alt='diagram of a single neuron taking multiple inputs and their associated weights in and then applying an activation function to predict a single output'}
 
-Multiple neurons can be joined together by connecting the output of one to the input of another. These connections are associated with weights that determine the 'strength' of the connection, the weights are adjusted during training. In this way, the combination of neurons and connections describe a computational graph, an example can be seen in the image below. In most neural networks neurons are aggregated into layers. Signals travel from the input layer to the output layer, possibly through one or more intermediate layers called hidden layers. The image below shows an example of a neural network with three layers, each circle is a neuron, each line is an edge and the arrows indicate the direction data moves in.
+Multiple neurons can be joined together by connecting the output of one to the input of another. These connections are associated with weights that determine the 'strength' of the connection, the weights are adjusted during training. In this way, the combination of neurons and connections describe a computational graph, an example can be seen in the image below. In most neural networks neurons are aggregated into layers. Signals travel from the input layer to the output layer, possibly through one or more intermediate layers called hidden layers. The image below illustrates an example of a neural network with three layers, each circle is a neuron, each line is an edge and the arrows indicate the direction data moves in.
 
 ![The image above is by Glosser.ca, [CC BY-SA 3.0], via Wikimedia Commons, [original source]](fig/03_neural_net.png){alt='diagram of a neural with four neurons taking multiple inputs and their weights and predicting multiple outputs'}
 
@@ -41,7 +41,7 @@ Neural networks aren't a new technique, they have been around since the late 194
 
 ## Convolutional Neural Networks
 
-A convolutional neural network (CNN) is a type of artificial neural network (ANN) that is most commonly applied to analyze visual imagery. They are designed to recognize the spatial structure of images when extracting features.
+A convolutional neural network (CNN) is a type of artificial neural network (ANN) most commonly applied to analyze visual imagery. They are designed to recognize the spatial structure of images when extracting features.
 
 ### Step 4. Build an architecture from scratch or choose a pretrained model
 
@@ -85,7 +85,7 @@ The output from each layer becomes the input to the next layer.
 
 #### CNN Part 1. Input Layer
 
-The Input in Keras gets special treatment when images are used. Keras automatically calculates the number of inputs and outputs a specific layer needs and therefore how many edges need to be created. This means we need to let Keras know how big our input is going to be. We do this by instantiating a `keras.Input` class and pass it a tuple that indicates the dimensionality of the input data.
+The Input in Keras gets special treatment when images are used. Keras automatically calculates the number of inputs and outputs a specific layer needs and therefore how many edges need to be created. This means we must let Keras know how big our input is going to be. We do this by instantiating a `keras.Input` class and pass it a tuple to indicate the dimensionality of the input data.
 
 In our case, the shape of an image is defined by its pixel dimensions and number of channels:
 
@@ -120,7 +120,7 @@ Check out the [Layers API] section of the Keras documentation for each layer typ
 
 A **convolutional** layer is a fundamental building block in a CNN designed for processing structured grid data, such as images. It applies convolution operations to input data using learnable filters or kernels, extracting local patterns and features (e.g. edges, corners). These filters enable the network to capture hierarchical representations of visual information, allowing for effective feature learning.
 
-To find the particular features of an image, CNN's make use of a concept from image processing that precedes Deep Learning.
+To find the particular features of an image, CNNs make use of a concept from image processing that precedes Deep Learning.
 
 A **convolution matrix**, or **kernel**, is a matrix transformation that we 'slide' over the image to calculate features at each position of the image. For each pixel, we calculate the matrix product between the kernel and the pixel with its surroundings. A kernel is typically small, between 3x3 and 7x7 pixels. We can for example think of the 3x3 kernel:
 
@@ -129,13 +129,13 @@ A **convolution matrix**, or **kernel**, is a matrix transformation that we 'sli
  [0,   0,  0]
  [1,   1,  1]]
 ```
-This kernel will give a high value to a pixel if it is on a horizontal border between dark and light areas. Note that for RGB images, the kernel should also have a depth of 3, one for each color channel.
+This kernel will give a high value to a pixel if it is on a horizontal border between dark and light areas. Note for RGB images, the kernel should also have a depth of 3, one for each color channel.
 
-In the following image, the effect of such a kernel on the values of a single-channel image stands out. The red cell in the output matrix is the result of multiplying and summing the values of the red square in the input, and the kernel. Applying this kernel to a real image shows that it indeed detects horizontal edges.
+In the following image, the effect of such a kernel on the values of a single-channel image stands out. The red cell in the output matrix is the result of multiplying and summing the values of the red square in the input, and the kernel. Applying this kernel to a real image demonstrates it does indeed detect horizontal edges.
 
-![](fig/03_conv_matrix.png){alt='6x5 input matrix representing a single color channel image being multipled by a 3x3 kernel to produce a 4x4 output matrix that detects horizonal edges in an image '}
+![](fig/03_conv_matrix.png){alt='6x5 input matrix representing a single color channel image being multipled by a 3x3 kernel to produce a 4x4 output matrix to detect horizonal edges in an image '}
 
-![](fig/03_conv_image.png){alt='single color channel image of a cat multiplied by a 3x3 kernel to produce an image of a cat where the edges that stand out'}
+![](fig/03_conv_image.png){alt='single color channel image of a cat multiplied by a 3x3 kernel to produce an image of a cat where the edges  stand out'}
 
 Within our convolutional layer, the hidden units comprise multiple convolutional matrices, also known as kernels. The matrix values, serving as weights, are learned during the training process. The convolutional layer produces an 'image' for each kernel, representing the output derived by applying the kernel to each pixel.
 
@@ -150,17 +150,17 @@ We define arguments for the number of filters, the kernel size, and the activati
 
 The instantiation here has three parameters and a seemingly strange combination of parentheses, so let us break it down.
 
-- The first parameter is the number of filters we want in this layer and this is one of the hyperparameters of our system and needs to be chosen carefully. 
+- The first parameter is the number of filters in this layer. This is one of the hyperparameters of our system and should be chosen carefully. 
 
-The term **filter** in the context of CNN's is often used synonymously with kernel. However, a filter refers to the learned parameters (weights) that are applied during the convolution operation. For example, in a convolutional layer, you might have multiple filters (or kernels), each responsible for detecting different features in the input data. The parameter here specifies the number of output filters in the convolution.
+The term **filter** in the context of CNNs is often used synonymously with kernel. However, a filter refers to the learned parameters (weights) that are applied during the convolution operation. For example, in a convolutional layer, you might have multiple filters (or kernels), each responsible for detecting different features in the input data. The parameter here specifies the number of output filters in the convolution.
 
 It's good practice to start with a relatively small number of filters in the first layer to prevent overfitting and choosing a number of filters as a power of two (e.g., 32, 64, 128) is common.
 
 - The second parameter is the kernel size which we already discussed. Smaller kernels are often used to capture fine-grained features and odd-sized filters are preferred because they have a centre pixel which helps maintain spatial symmetry during covolutions.
 
-- The third parameter is the activation function to use; here we choose **relu** which is 0 for inputs that are 0 and below and the identity function (returning the same value) for inputs above 0. This is a commonly used activation function in deep neural networks that is proven to work well. We will discuss activation functions later in **Step 9. Tune hyperparameters** but to satisfy your curiosity, `ReLU` stands for Rectified Linear Unit (ReLU).
+- The third parameter is the activation function to use; here we choose **relu** which is zero for inputs that are zero and below and the identity function (returning the same value) for inputs above zero. This is a commonly used activation function in deep neural networks that is proven to work well. We will discuss activation functions later in **Step 9. Tune hyperparameters** but to satisfy your curiosity, `ReLU` stands for Rectified Linear Unit (ReLU).
 
-- Next is an extra set of parenthenses with inputs in them that means that after an instance of the Conv2D layer is created, it can be called as if it was a function. This tells the Conv2D layer to connect the layer passed as a parameter, in this case the inputs.
+- Next is an extra set of parenthenses with inputs in them that means after an instance of the Conv2D layer is created, it can be called as if it was a function. This tells the Conv2D layer to connect the layer passed as a parameter, in this case the inputs.
 
 - Finally, we store a reference so we can pass it to the next layer.
 
@@ -171,9 +171,9 @@ It's good practice to start with a relatively small number of filters in the fir
 
 Convolutions applied to images can be hard to grasp at first. Fortunately, there are resources out there that enable users to interactively play around with images and convolutions:
 
-- [Image kernels explained] shows how different convolutions can achieve certain effects on an image, like sharpening and blurring.
+- [Image kernels explained] illustrates how different convolutions can achieve certain effects on an image, like sharpening and blurring.
 
-- The [convolutional neural network cheat sheet] shows animated examples of the different components of convolutional neural nets.
+- The [convolutional neural network cheat sheet] provides animated examples of the different components of convolutional neural nets.
 :::::::::::::::::::::::::::::::::::::::::::::::
 
 ::::::::::::::::::::::::::::::::::::: challenge
@@ -184,7 +184,7 @@ What do you think happens to the border pixels when applying a convolution?
 
 :::::::::::::::::::::::: solution
 
-There are different ways of dealing with border pixels. You can ignore them, which means that your output image is slightly smaller then your input. It is also possible to 'pad' the borders, e.g., with the same value or with zeros, so that the convolution can also be applied to the border pixels. In that case, the output image will have the same size as the input image.
+There are different ways of dealing with border pixels. You can ignore them, which means your output image is slightly smaller then your input. It is also possible to 'pad' the borders, e.g., with the same value or with zeros, so that the convolution can also be applied to the border pixels. In that case, the output image will have the same size as the input image.
 :::::::::::::::::::::::::::::::::
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -215,7 +215,7 @@ Convolutional and Pooling layers are also applicable to different types of data
 
 ##### **Dense layers**
 
-A **dense** layer has a number of neurons, which is a parameter you can choose when you create the layer. When connecting the layer to its input and output layers every neuron in the dense layer gets an edge (i.e. connection) to **all** of the input neurons and **all** of the output neurons.
+A **dense** layer has a number of neurons, which is a parameter you choose when you create the layer. When connecting the layer to its input and output layers every neuron in the dense layer gets an edge (i.e. connection) to **all** of the input neurons and **all** of the output neurons.
 
 ![](fig/03-neural_network_sketch_dense.png){alt='diagram of a neural network with multiple inputs feeding into to two seperate dense layers with connections between all the inputs and outputs'}
 
@@ -237,7 +237,7 @@ The choice of how many neurons to specify is often determined through experiment
 
 Number of parameters
 
-Suppose we create a single Dense (fully connected) layer with 100 hidden units that connect to the input pixels, how many parameters does this layer have?
+Suppose we create a single Dense (fully connected) layer with 100 hidden units that connects to the input pixels. How many parameters does this layer have?
 
 :::::::::::::::::::::::: solution
 
@@ -292,7 +292,7 @@ The **Flatten** layer converts the output of the previous layer into a single on
 
 #### CNN Part 3. Output Layer
 
-Recall for the outputs we need to ask what we want to identify from the data. If we are performing a classification problem then typically we will have one output for each potential class. We finish with a Dense layer to connect the output cells of the convolutional layer to the outputs for our 10 classes.
+Recall for the outputs we asked ourselves what we want to identify from the data. If we are performing a classification problem, then typically we have one output for each potential class. We finish with a Dense layer to connect the output cells of the convolutional layer to the outputs for our 10 classes.
 
 Note the use of `softmax` activation for this Dense layer as opposed to the `ReLU` activation used above. We use softmax for multiclass data because it helps the computer give each option (class) a likelihood score, and the scores add up to 100 per cent. This way, it's easier to pick the one the computer thinks is most probable.
 
@@ -379,9 +379,9 @@ Even for this neural network, we had to make a choice on the number of hidden ne
 
 ## We have a model now what?
 
-This CNN should be able to run with the CIFAR-10 dataset and provide reasonable results for basic classification tasks. However, do keep in mind that this model is relatively simple, and its performance may not be as high as more complex architectures. The reason it's called deep learning is because in most cases, the more layers we have, i.e. the deeper and more sophisticated CNN architecture we use, the better the performance.
+This CNN should be able to run with the CIFAR-10 dataset and provide reasonable results for basic classification tasks. However, do keep in mind this model is relatively simple, and its performance may not be as high as more complex architectures. The reason it's called deep learning is because in most cases, the more layers we have, i.e. the deeper and more sophisticated CNN architecture we use, the better the performance.
 
-How can we tell? We can inspect a couple metrics produced during the training process to detect whether our model is underfitting or overfitting. To do that, we first need to continue with the next steps in our Deep Learning workflow, **Step 5. Choose a loss function and optimizer** and **Step 6. Train model**. 
+How can we tell? We can inspect a couple metrics produced during the training process to detect whether our model is underfitting or overfitting. To do that, we continue with the next steps in our Deep Learning workflow, **Step 5. Choose a loss function and optimizer** and **Step 6. Train model**. 
 
 Make sure you saved your model before moving on.
 
diff --git a/episodes/04-fit-cnn.md b/episodes/04-fit-cnn.md
index d6ee01a0..c7d91663 100644
--- a/episodes/04-fit-cnn.md
+++ b/episodes/04-fit-cnn.md
@@ -30,7 +30,7 @@ exercises: 2
 
 We have designed a convolutional neural network (CNN) that in theory we should be able to train to classify images. 
 
-We now need to select an appropriate optimizer and loss function that we will use during training (fitting). 
+We now select an appropriate optimizer and loss function to use during training (fitting). 
 
 Recall how we compiled our model in the introduction:
 ```
@@ -57,7 +57,7 @@ More information about loss functions can be found in the Keras [loss documentat
 
 Somewhat coupled to the loss function is the **optimizer**. The optimizer here refers to the algorithm with which the model learns to optimize on the provided loss function.
 
-We need to choose which optimizer to use and, if this optimizer has parameters, what values to use for those. Furthermore, we need to specify how many times to show the training samples to the optimizer. In other words, the optimizer is responsible for taking the output of the loss function and then applying some changes to the weights within the network. It is through this process that the “learning” (adjustment of the weights) is achieved.
+We need to choose which optimizer to use and, if this optimizer has parameters, what values to use for those. Furthermore, we specify how many times to present the training samples to the optimizer. In other words, the optimizer is responsible for taking the output of the loss function and then applying some changes to the weights within the network. It is through this process that the “learning” (adjustment of the weights) is achieved.
 
 ```
 ## compile the model
@@ -68,11 +68,11 @@ We need to choose which optimizer to use and, if this optimizer has parameters,
 
 **Adam** 
 
-Here we picked one of the most common optimizers that works well for most tasks, the **Adam** optimizer. Similar to activation functions, the choice of optimizer depends on the problem you are trying to solve, your model architecture and your data. Adam is a good starting point though, which is why we chose it. Adam has a number of parameters, but the default values work well for most problems so we will use it with its default parameters.
+Here we picked one of the most common optimizers demonstrated to work well for most tasks, the **Adam** optimizer. Similar to activation functions, the choice of optimizer depends on the problem you are trying to solve, your model architecture, and your data. Adam is a good starting point though, which is why we chose it. Adam has a number of parameters, but the default values work well for most problems so we will use it with its default parameters.
 
 It is defined by the `keras.optimizers.Adam` class and takes a single parameter `learning_rate=0.01`
 
-There are many optimizers to choose from so check the [optimizer documentation]. A couple more popular or famous ones include:
+The [optimizer documentation] describes the optimizers to choose. A couple more popular or famous ones include:
 
 - **Stochastic Gradient Descent (sgd)**: Stochastic Gradient Descent (SGD) is one of the fundamental optimization algorithms used to train machine learning models, especially neural networks. It is a variant of the gradient descent algorithm, designed to handle large datasets efficiently.
 
@@ -88,26 +88,26 @@ ChatGPT
 
 **Learning rate** is a hyperparameter that determines the step size at which the model's parameters are updated during training. A higher learning rate allows for more substantial parameter updates, which can lead to faster convergence, but it may risk overshooting the optimal solution. On the other hand, a lower learning rate leads to smaller updates, providing more cautious convergence, but it may take longer to reach the optimal solution. Finding an appropriate learning rate is crucial for effectively training machine learning models.
 
-The figure below illustrates a small learning rate that will not traverse toward the minima of the gradient descent algorithm in a timely manner, i.e. number of epochs.
+The figure below illustrates a small learning rate will not traverse toward the minima of the gradient descent algorithm in a timely manner, i.e. number of epochs.
 
-![Small learning rate leads to inefficient approach to loss minima](https://developers.google.com/static/machine-learning/crash-course/images/LearningRateTooSmall.svg "Small learning rate leads to inefficient approach to loss minima"){alt='plot of loss over value of weight shows how a small learning rate takes a long time to reach the optimal solution'}
+![Small learning rate leads to inefficient approach to loss minima](https://developers.google.com/static/machine-learning/crash-course/images/LearningRateTooSmall.svg "Small learning rate leads to inefficient approach to loss minima"){alt='Plot of loss over weight value illustrating how a small learning rate takes a long time to reach the optimal solution.'}
 
 On the other hand, specifying a learning rate that is *too high* will result in a loss value that never approaches the minima. That is, 'bouncing between the sides', thus never reaching a minima to cease learning.
 
-![A large learning rate results in overshooting the gradient descent minima](https://developers.google.com/static/machine-learning/crash-course/images/LearningRateTooLarge.svg){alt='plot of loss over value of weight shows how a large learning rate never approaches the optimal solution because it bounces between the sides'}
+![A large learning rate results in overshooting the gradient descent minima](https://developers.google.com/static/machine-learning/crash-course/images/LearningRateTooLarge.svg){alt='Plot of loss over weight value illustrating how a large learning rate never approaches the optimal solution because it bounces between the sides.'}
 
-Lastly, we can observe below that a modest learning rate will ensure that the product of multiplying the scalar gradient value, and the learning rate does not result in too small steps, nor a chaotic bounce between sides of the gradient where steepness is greatest.
+Finally, a modest learning rate will ensure that the product of multiplying the scalar gradient value and the learning rate does not result in too small steps, nor a chaotic bounce between sides of the gradient where steepness is greatest.
 
-![An optimal learning rate supports a gradual approach to the minima](https://developers.google.com/static/machine-learning/crash-course/images/LearningRateJustRight.svg){alt='plot of loss over value of weight shows how a a good learning rate gets to optimal solution gradually'}
+![An optimal learning rate supports a gradual approach to the minima](https://developers.google.com/static/machine-learning/crash-course/images/LearningRateJustRight.svg){alt='Plot of loss over weight value illustrating how a good learning rate gets to optimal solution gradually.'}
 
-(These images were obtained from [Google Developers Machine Learning Crash Course] and is licenced under the [Creative Commons 4.0 Attribution Licence].)
+These images were obtained from [Google Developers Machine Learning Crash Course] and is licenced under the [Creative Commons 4.0 Attribution Licence].
 
 ::::::::::::::::::::::::::::::::::::::::::::::
 
 
 #### Metrics
 
-After we select the desired optimizer and loss function we want to specify the metric(s) to be evaluated by the model during training and testing. A **metric** is a function that is used to judge the performance of your model.
+After we select the desired optimizer and loss function we specify the metric(s) to be evaluated by the model during training and testing. A **metric** is a function used to judge the performance of your model.
 
 ```
 ## compile the model
@@ -116,15 +116,15 @@ After we select the desired optimizer and loss function we want to specify the m
 #                    metrics = ['accuracy']) 
 ```
 
-Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model. Note that you may use any loss function as a metric.
+Metric functions are similar to loss functions, except the results from evaluating a metric are not used when training the model. Note you are able to use any loss function as a metric.
 
-Typically you will use `accuracy` which calculates how often predictions matches labels.
+Typically you will use `accuracy`, which calculates how often the model predictions match the true labels.
 
-The accuracy function creates two local variables, total and count that are used to compute the frequency with which predictions matches labels. This frequency is ultimately returned as accuracy: an operation that divides the  total by count.
+The accuracy function creates two local variables, total and count, that it uses to compute the frequency with which predictions matches labels. This frequency is ultimately returned as accuracy: an operation that divides the  total by count.
 
-A list of metrics can be found in the Keras [metrics] documentation.
+The Keras [metrics] documentation provides a list of potential metrics.
 
-Now that we have decided on which loss function, optimizer, and metric to use we can compile the model using `model.compile`. Compiling the model prepares it for training.
+Now that we selected which loss function, optimizer, and metric to use, we compile the model using `model.compile`. Compiling the model prepares it for training.
 
 
 ### Step 6. Train (Fit) model
@@ -133,7 +133,7 @@ We are ready to train the model.
 
 Training the model is done using the `fit` method. It takes the image data and target (label) data as inputs and has several other parameters for certain options of the training. Here we only set a different number of epochs.
 
-A training **epoch** means that every sample in the training data has been shown to the neural network and used to update its parameters. In general, CNN models improve with more epochs of training, but only to a point.
+A training **epoch** means that every sample in the training data has been given to the neural network and used to update its parameters. In general, CNN models improve with more epochs of training, but only to a point.
 
 We want to train our model for 10 epochs:
 
@@ -168,16 +168,16 @@ The choice of batch size can have various implications, and there are situations
 
 **Generalization**: Using smaller batch sizes may improve the generalization of the model. It prevents the model from overfitting to the training data, as it gets updated more frequently and experiences more diverse samples during training.
 
-However, it's essential to consider the trade-offs of using different batch sizes. Smaller batch sizes may require more iterations to cover the entire dataset, which can lead to longer training times. Larger batch sizes can provide more stable gradients but might suffer from generalization issues. There is no one-size-fits-all answer, and you may need to experiment with different batch sizes to find the one that works best for your specific model, architecture, and dataset.
+However, it's essential to consider the trade-offs of using different batch sizes. Smaller batch sizes may require more iterations to cover the entire dataset, which can lead to longer training times. Larger batch sizes can provide more stable gradients, but might suffer from generalization issues. There is no one-size-fits-all answer. You should experiment with different batch sizes to find the best-performing one for your specific model, architecture, and dataset.
 
 :::::::::::::::::::::::::::::::::::::::::::::::
 
 
 ### Monitor Training Progress (aka Model Evaluation during Training)
 
-Now that we know more about the compilation and fitting of CNN's let us take a inspect the training metrics for our model.
+We now know more about the compilation and fitting of CNNs. Let us inspect the training metrics for our model.
 
-Using seaborn we can plot the training process using the history:
+Using seaborn, we can plot the training process using the history:
 
 ```python
 import seaborn as sns
@@ -193,9 +193,9 @@ sns.lineplot(ax=axes[0], data=history_intro_df[['loss', 'val_loss']])
 sns.lineplot(ax=axes[1], data=history_intro_df[['accuracy', 'val_accuracy']])
 ```
 
-![](fig/04_model_intro_accuracy_loss.png){alt='two panel figure; the figure on the left shows the training loss starting at 1.5 and decreasing to 0.7 and the validation loss decreasing from 1.3 to 1.0 before leveling out; the figure on the right shows the training accuracy increasing from 0.45 to 0.75 and the validation accuracy increasing from 0.53 to 0.65 before leveling off'}
+![](fig/04_model_intro_accuracy_loss.png){alt='two panel figure; the figure on the left illustrates the training loss starting at 1.5 and decreasing to 0.7 and the validation loss decreasing from 1.3 to 1.0 before leveling out; the figure on the right illustrates the training accuracy increasing from 0.45 to 0.75 and the validation accuracy increasing from 0.53 to 0.65 before leveling off'}
 
-This plot can be used to identify whether the training is well configured or whether there are problems that need to be addressed. The solid blue lines show the training loss and accuracy; the dashed orange lines show the validation loss and accuracy.
+This plot is used to identify whether the training is well configured or whether there are problems to address. The solid blue lines represent the training loss and accuracy; the dashed orange lines represent the validation loss and accuracy.
 
 ::::::::::::::::::::::::::::::::::::: challenge 
 
@@ -220,13 +220,13 @@ Inspect the training curves we have just made and recall the difference between
 :::::::::::::::::::::::::::::::::
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
-These is evidence of **overfitting** in these plots. If a model is overfitting, it means that the model performs exceptionally well on the training data but poorly on the validation or test data. Overfitting occurs when the model has learned to memorize the noise and specific patterns in the training data instead of generalizing the underlying relationships. As a result, the model fails to perform well on new, unseen data because it has become too specialized to the training set.
+These is evidence of **overfitting** in these plots. If a model is overfitting, it means the model performs exceptionally well on the training data, but poorly on the validation data. Overfitting occurs when the model has learned to memorize the noise and specific patterns in the training data instead of generalizing the underlying relationships. As a result, the model fails to perform well on new, unseen, data because it has become too specialized to the training set.
 
 Key characteristics of an overfit model include:
 
 - High Training Accuracy, Low Validation Accuracy: The model achieves high accuracy on the training data but significantly lower accuracy on the validation (or test) data.
 
-- Small Training Loss, Large Validation Loss: The training loss is low, indicating that the model's predictions closely match the true labels in the training set. However, the validation loss is high, indicating that the model's predictions are far from the true labels in the validation set.
+- Small Training Loss, Large Validation Loss: The training loss is low, indicating the model's predictions closely match the true labels in the training set. However, the validation loss is high, indicating the model's predictions are far from the true labels in the validation set.
 
 How to Address Overfitting:
 
@@ -242,8 +242,8 @@ Underfitting occurs when the model is too simple or lacks the capacity to captur
 
 Key characteristics of an underfit model include:
 
-- Low Validation Accuracy: This indicates that the model is not learning from the data effectively.
-- Large Training Loss: The training loss (error) is high, indicating that the model's predictions are far from the true labels in the training set.
+- Low Validation Accuracy: This indicates the model is not learning from the data effectively.
+- Large Training Loss: The training loss (error) is high, indicating the model's predictions are far from the true labels in the training set.
 - Increasing validation loss.
 
 How to address underfitting:
@@ -258,9 +258,9 @@ How to address underfitting:
 
 #### Dropout
 
-Note that the training loss continues to decrease, while the validation loss stagnates, and even starts to increase over the course of the epochs. Similarly, the accuracy for the validation set does not improve anymore after some epochs. This means we are overfitting on our training data set.
+Note the training loss continues to decrease, while the validation loss stagnates, and even starts to increase over the course of the epochs. Similarly, the accuracy for the validation set does not improve anymore after some epochs. This means we are overfitting on our training data set.
 
-Techniques to avoid overfitting, or to improve model generalization, are termed **regularization techniques**. One of the most versatile regularization technique is **dropout** (Srivastava et al., 2014). Dropout essentially means that during each training cycle a random fraction of the dense layer nodes are turned off. This is described with the dropout rate between 0 and 1 which determines the fraction of nodes to silence at a time. 
+Techniques to avoid overfitting, or to improve model generalization, are termed **regularization techniques**. One of the most versatile regularization technique is **dropout** (Srivastava et al., 2014). Dropout essentially means that during each training cycle a random fraction of the dense layer nodes are turned off. This is described with the dropout rate between zero and one, which determines the fraction of nodes to silence at a time. 
 
 ![](fig/04-neural_network_sketch_dropout.png){alt='diagram of two neural networks; the first network is densely connected without dropout and the second network has some of the neurons dropped out of of the network'}
 
@@ -392,9 +392,9 @@ sns.lineplot(ax=axes[1], data=history_dropout_df[['accuracy', 'val_accuracy']])
 val_loss_dropout, val_acc_dropout = model_dropout.evaluate(val_images, val_labels, verbose=2)
 ```
 
-![](fig/04_model_dropout_accuracy_loss.png){alt='two panel figure; the figure on the left shows the training loss starting at 1.7 and decreasing to 1.0 and the validation loss decreasing from 1.4 to 0.9 before leveling out; the figure on the right shows the training accuracy increasing from 0.40 to 0.65 and the validation accuracy increasing from 0.5 to 0.67'}
+![](fig/04_model_dropout_accuracy_loss.png){alt='two panel figure; the figure on the left illustrates the training loss starting at 1.7 and decreasing to 1.0 and the validation loss decreasing from 1.4 to 0.9 before leveling out; the figure on the right illustrates the training accuracy increasing from 0.40 to 0.65 and the validation accuracy increasing from 0.5 to 0.67'}
 
-In this relatively uncommon ,  the training loss is higher than our validation loss while the validation accuracy is higher than the training accuracy. Using dropout or other regularization techniques during training can lead to a lower training accuracy.
+In this relatively uncommon situation, the training loss is higher than our validation loss while the validation accuracy is higher than the training accuracy. Using dropout, or other regularization techniques during training, can lead to a lower training accuracy.
 
 Dropout randomly "drops out" units during training, which can prevent the model from fitting the training data too closely. This regularization effect may lead to a situation where the model generalizes better on the validation set.
 
@@ -421,15 +421,15 @@ ChatGPT
 
 The regularization strength is controlled by a hyperparameter, often denoted as lambda (λ), that determines how much weight should be given to the regularization term. A larger λ value increases the impact of regularization, making the model simpler and more regularized.
 
-b. randomly "dropping out" a fraction of neurons during training. This means that during each training iteration, some neurons are temporarily removed from the network. Dropout effectively reduces the interdependence between neurons, preventing the network from relying too heavily on specific neurons and making it more robust.
+b. randomly "dropping out" a fraction of neurons during training. This means during each training iteration, some neurons are temporarily removed from the network. Dropout effectively reduces the interdependence between neurons, preventing the network from relying too heavily on specific neurons, and making it more robust.
 
 **Batch Normalization**: While not explicitly a regularization technique, Batch Normalization has a regularizing effect on the model. It normalizes the activations of each layer in the network, reducing internal covariate shift. This can improve training stability and reduce the need for aggressive dropout or weight decay.
 
 **Data Augmentation**: Data augmentation is a technique where the training data is artificially augmented by applying various transformations like rotation, scaling, flipping, and cropping to create new examples. This increases the diversity of the training data and helps the model generalize better to unseen data.
 
-**Early Stopping**: Early stopping is a form of regularization that stops the training process when the model's performance on a validation set starts to degrade. This prevents the model from overfitting by avoiding further training after the point of best validation performance.
+**Early Stopping**: Early stopping is a form of regularization that stops the training process when the model's performance on a validation set starts to degrade. It prevents the model from overfitting by avoiding further training after the point of best validation performance.
 
-By using regularization techniques, you can improve the generalization performance of CNNs and reduce the risk of overfitting. It's essential to experiment with different regularization methods and hyperparameters to find the optimal combination for your specific CNN architecture and dataset.
+Using regularization techniques improves the generalization performance of CNNs and reduces the risk of overfitting. It's essential to experiment with different regularization methods and hyperparameters to find the optimal combination for your specific CNN architecture and dataset.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
diff --git a/episodes/05-evaluate-predict-cnn.md b/episodes/05-evaluate-predict-cnn.md
index 6b8f84b4..c9c46e97 100644
--- a/episodes/05-evaluate-predict-cnn.md
+++ b/episodes/05-evaluate-predict-cnn.md
@@ -33,7 +33,7 @@ Recall in [Episode 02 Introduction to Image Data](episodes/02-image-data.md) we
 
 When creating and using a test set there are a few things to check:
 
-- It only contains images that the model has never seen before.
+- It only contains images the model has never seen before.
 - It is sufficiently large to provide a meaningful evaluation of model performance. It should include images from every target label and images of classes not in your target set.
 - It is processed in the same way as your training set.
 
@@ -143,9 +143,9 @@ To understand a bit more about how this accuracy is obtained, we create a confus
 
 In the case of multiclass classifications, each cell value (C~i,j~) is equal to the number of observations known to be in group _i_ and predicted to be in group _j_. The diagonal cells in the matrix are where the true class and predicted class match.
 
-![](fig/05_confusion_matrix_explained.png){alt='for ten classes an example confusion matrix has 10 rows and 10 columns where the value in each cell is the number of observations predicted in that class and known to be that class. The diagonal cells are where the true and predicted classes match'}
+![](fig/05_confusion_matrix_explained.png){alt='for ten classes an example confusion matrix has 10 rows and 10 columns where the value in each cell is the number of observations predicted in that class and known to be in that class. The diagonal cells are where the true and predicted classes match.'}
 
-To create a confusion matrix we will use another convenient function from sklearn called `confusion_matrix`. This function takes as a first parameter the true labels of the test set. The second parameter is the predicted labels from our model.
+To create a confusion matrix, we use another convenient function from sklearn called `confusion_matrix`. This function takes as a first parameter the true labels of the test set. The second parameter is the predicted labels from our model.
 
 ```python
 from sklearn.metrics import confusion_matrix
@@ -186,7 +186,7 @@ sns.heatmap(confusion_df, annot=True)
 - The `annot=True` parameter here will put the numbers from the confusion matrix in the heatmap.
 - The `fmt=3g` will display the values with three significant digits.
 
-![](fig/05_pred_v_true_confusion_matrix.png){alt='Confusion matrix of model predictions where the color scale goes from black to light to represent values from 0 to the total number of test observations in our test set of 1000. The diagonal has much lighter colors indicating our model is predicting well but a few non-diagonal cells also have a ligher color to show where the model is making prediction errors.'}
+![](fig/05_pred_v_true_confusion_matrix.png){alt='Confusion matrix of model predictions where the color scale goes from black to light to represent values from 0 to the total number of test observations in our test set of 1000. The diagonal has much lighter colors, indicating our model is predicting well, but a few non-diagonal cells also have a lighter color to indicate where the model is making a large number of prediction errors.'}
 
 
 ::::::::::::::::::::::::::::::::::::: challenge 
@@ -203,7 +203,7 @@ Q3. What could we do to improve the performance?
 
 :::::::::::::::::::::::: solution 
 
-Q1. The confusion matrix shows that the predictions are not bad but can improved.
+Q1. The confusion matrix illustrates that the predictions are not bad but can improved.
 
 Q2. I expected the performance to be better than average because the accuracy of the model I chose was 67 per cent on the validation set.
 
@@ -221,7 +221,7 @@ Recall the following from [Episode 01 Introduction to Deep Learning](episodes/01
 
 Hyperparameters are the parameters set by the person configuring the model instead of those learned by the algorithm itself. Like the dials on a radio which are *tuned* to the best frequency, hyperparameters can be *tuned* to the best combination for a given model and context.
 
-These hyperparameters can include the learning rate, the number of layers in the network, the number of neurons per layer, and many more. The tuning process is systematic searching for the best combination of hyperparameters that will optimize the model's performance.
+These hyperparameters can include the learning rate, the number of layers in the network, the number of neurons per layer, and many more. The tuning process is systematic searching for the best combination of hyperparameters to  optimize the model's performance.
 
 In some cases, it might be necessary to adjust these and re-run the training many times before we are happy with the result.
 
@@ -415,9 +415,9 @@ A third way to tune hyperparaters is brute force.
 
 In [Episode 03 Build a Convolutional Neural Network](episodes/03-build-cnn.md) we talked briefly about the `relu` activation function passed as an argument to our `Conv2D` hidden layers.
 
-An activation function is like a switch or a filter that we use in artificial neural networks, inspired by how our brains work. These functions play a crucial role in determining whether a neuron (a small unit in the neural network) should "fire" or become active. 
+An activation function is like a switch, or a filter, that we use in artificial neural networks, inspired by how our brains work. These functions play a crucial role in determining whether a neuron (a small unit in the neural network) should "fire" or become active. 
 
-Think of an activation function as a tiny decision-maker for each neuron in a neural network. It helps determine whether the neuron should 'fire', or pass on information, or stay 'off' and remain silent, much like a light switch that decides whether the light should be ON or OFF. Activation functions are crucial because they add non-linearity to the neural network. Without them, the network would be like a simple linear model, unable to learn complex patterns in data. 
+Think of an activation function as a tiny decision-maker for each neuron in a neural network. It helps determine whether the neuron should 'fire', or pass on information, or stay 'off' and remain silent, much like a light switch controls whether the light should be ON or OFF. Activation functions are crucial because they add non-linearity to the neural network. Without them, the network would be like a simple linear model, unable to learn complex patterns in data. 
 
 #### How do you know what activation function to choose?
 
diff --git a/episodes/06-conclusion.md b/episodes/06-conclusion.md
index 3db07669..5661a269 100644
--- a/episodes/06-conclusion.md
+++ b/episodes/06-conclusion.md
@@ -23,7 +23,7 @@ exercises: 2
 
 ### Step 10. Share model
 
-Now that we have a trained network that performs at a level we are happy with and can maintain high prediction accuracy on a test dataset we might want to consider publishing a file with both the architecture of our network and the weights which it has learned (assuming we did not use a pre-trained network). This will allow others to use it as as pre-trained network for their own purposes and for them to (mostly) reproduce our result.
+We now have a trained network that performs at a level we are happy with and maintains high prediction accuracy on a test dataset. We should consider publishing a file with both the architecture of our network and the weights which it has learned (assuming we did not use a pre-trained network). This will allow others to use it as as pre-trained network for their own purposes and for them to (mostly) reproduce our result.
 
 Use `model.save` to save a model:
 
@@ -60,7 +60,7 @@ The saved .keras file contains:
 - The model's weights.
 - The model's optimizer's state (if any).
 
-Note that saving the model does not save the training history (i.e. training and validation loss and accuracy). For that you will need to save the model history dataframe we created for plotting.
+Note that saving the model does not save the training history (i.e. training and validation loss and accuracy). For that, you save the model history dataframe we used for plotting.
 
 The Keras documentation for [Saving and Serialization] explains other ways to save your model.
 
@@ -68,7 +68,7 @@ To share your model with a wider audience it is recommended you create git repos
 
 #### Choosing a pretrained model
 
-If your data and problem is very similar to what others have done, you can often use a pretrained network. Even if your problem is different, but the data type is common (for example images), you can use a pretrained network and finetune it for your problem. A large number of openly available pretrained networks can be found in the [Model Zoo], [pytorch hub] or [tensorflow hub].
+If your data and problem is very similar to what others have done, a pre-trained network might be preferable. Even if your problem is different, if the data type is common (for example images), you can use a pre-trained network and fine-tune it for your problem. A large number of openly available pre-trained networks can be found in the [Model Zoo], [pytorch hub] or [tensorflow hub].
 
 ### What else do I need to know?
 
@@ -78,12 +78,11 @@ In this lesson we chose to use [Keras] because it was designed to be easy to use
 
 The performance of Keras is sometimes not as good as other libraries and if you are going to move on to create very large networks using very large datasets then you might want to consider one of the other libraries. But for many applications the performance difference will not be enough to worry about and the time you will save with simpler code will exceed what you will save by having the code run a little faster.
 
-Keras also benefits from a very good set of [online documentation] and a large user community. You will find that most of the concepts from Keras translate very well across to the other libraries if you wish to learn them at a later date.
-
+Keras also benefits from a very good set of [online documentation] and a large user community. You will find most of the concepts from Keras translate very well across to the other libraries if you wish to learn them at a later date.
 
 A couple of those libraries include:
 
-- [TensorFlow] was developed by Google and is one of the older Deep Learning libraries, ported across many languages since it was first released to the public in 2015. It is very versatile and capable of much more than Deep Learning but as a result it often takes a lot more lines of code to write Deep Learning operations in TensorFlow than in other libraries. It offers (almost) seamless integration with GPU accelerators and Google's own TPU (Tensor Processing Unit) chips that are built specially for machine learning.
+- [TensorFlow] was developed by Google and is one of the older Deep Learning libraries, ported across many languages since it was first released to the public in 2015. It is very versatile and capable of much more than Deep Learning but as a result it often takes a lot more lines of code to write Deep Learning operations in TensorFlow than in other libraries. It offers (almost) seamless integration with GPU accelerators and Google's own TPU (Tensor Processing Unit) chips specially built for machine learning.
 
 - [PyTorch] was developed by Facebook in 2016 and is a popular choice for Deep Learning applications. It was developed for Python from the start and feels a lot more "pythonic" than TensorFlow. Like TensorFlow it was designed to do more than just Deep Learning and offers some very low level interfaces. [PyTorch Lightning] offers a higher level interface to PyTorch to set up experiments. Like TensorFlow it is also very easy to integrate PyTorch with a GPU. In many benchmarks it outperforms the other libraries.
 
@@ -96,9 +95,9 @@ A **GPU**, or **Graphics Processing Unit**, is a specialized electronic circuit
 
 As you have experienced in this lesson, training CNN models can take a long time. If you follow the steps presented here you will find you are training multiple models to find the one best suited to your needs, particularly when fine tuning hyperparameters. However you have also seen that running on CPU only machines can be done! So while a GPU is not an absolute requirement for deep learning, it can significantly accelerate your deep learning work and make it more efficient, especially for larger and more complex tasks. 
 
-If you don't have access to a powerful GPU locally, you can use cloud services that provide GPU instances for deep learning. This can be a cost-effective option for many users.
+If you don't have access to a powerful GPU locally, there are cloud services that provide GPU instances for deep learning. This may be the most cost-effective option for many users.
 
-#### It this the best/only way to code up CNN's for image classification?
+#### It this the best/only way to code up CNNs for image classification?
 
 Absolutely not! The code we used in today's workshop might today be considered old fashioned. A lot of the data preprocessing we did by hand can now be done by adding different layer types to your model. The [preprocessing layers] section fo the Keras documentation provides several examples.
 
@@ -130,7 +129,7 @@ However, there are many other tasks which CNNs are well suited for:
 ::::::::::::::::::::::::::::::::::::: keypoints 
 
 - Deep Learning is well suited to classification and prediction problems such as image recognition.
-- To use Deep Learning effectively we need to go through a workflow of: defining the problem, identifying inputs and outputs, preparing data, choosing the type of network, choosing a loss function, training the model, tuning Hyperparameters, measuring performance before we can classify data.
+- To use Deep Learning effectively, go through a workflow of: defining the problem, identifying inputs and outputs, preparing data, choosing the type of network, choosing a loss function, training the model, tuning Hyperparameters, measuring performance before we can classify data.
 - Keras is a Deep Learning library that is easier to use than many of the alternatives such as TensorFlow and PyTorch.
 - Graphical Processing Units are useful, though not essential, for deep learning tasks.
 
diff --git a/episodes/setup-gpu.md b/episodes/setup-gpu.md
index 4d547b0d..5087226a 100644
--- a/episodes/setup-gpu.md
+++ b/episodes/setup-gpu.md
@@ -13,8 +13,7 @@ These instructions are for setting up tensorflow in a **GPU** capable environmen
 ::::::::::::::::::::::::::::::::::::: challenge
 ## Install Python using Anaconda
 
-[Python] is a popular language for scientific computing, and a frequent choice for machine learning as well. Installing all of its scientific packages
-individually can be a bit difficult, however, so we recommend the installer [Anaconda] which includes most (but not all) of the software you will need. Make sure you install the latest Python version 3.xx.
+[Python] is a popular language for scientific computing, and a frequent choice for machine learning as well. Installing all of its scientific packages individually can be a bit difficult, however, so we recommend the installer [Anaconda] which includes most (but not all) of the software you need. Make sure you install the latest Python version 3.xx.
 
 Also, please set up your python environment **at least** a day in advance of the workshop. If you encounter problems with the installation procedure *for Anaconda*, ask your workshop organizers via e-mail for assistance so you are ready to go as soon as the workshop begins.
 
@@ -45,8 +44,7 @@ Check out the [Mac OS X - Video tutorial] or:
 :::::::::::::::::::::::: solution 
 ### Linux
 
-Note that the following installation steps require you to work from the shell.
-If you run into any difficulties, please request help before the workshop begins.
+Note the following installation steps require you to work from the shell. If you run into any difficulties, please request help before the workshop begins.
 
 1.  Open [https://www.anaconda.com/products/distribution] with your web browser.
 
@@ -92,7 +90,7 @@ A terminal window will open with the title 'Anaconda Prompt':
 
 ![](fig/00_anaconda_prompt_window.png){alt='Screenshot of the terminal window that opens when you launch the Anaconda Prompt application'}
 
-Note the notation of the prompt inside the terminal window. The name inside the parentheses refers to which conda environment you are working inside of, and 'base' is the name given to the default environment that comes with every Anaconda distribution.
+Note the notation of the prompt inside the terminal window. The name inside the parentheses refers to which conda environment you are working inside of, and 'base' is the name given to the default environment included with every Anaconda distribution.
 
 To create a new environment for this lesson, the command starts with the conda keywords `conda create`, followed by a name for the new environment and the package(s) to install:
 
@@ -107,10 +105,10 @@ After the environment is created we tell Anaconda to use the new environment wit
 (cnn_workshop_gpu) C:\Users\Lab>
 ```
 
-You will know that you are in the right environment because the prompt changes from (base) to (cnn_workshop_gpu). 
+You will know you are in the right environment because the prompt changes from (base) to (cnn_workshop_gpu). 
 
 ::::::::::::::::::::::::::::::::::::::::: callout
-To set up a GPU environment you need to make sure that you have the appropriate hardware, system, and software necessary for GPU support. Here we are following the [Windows TensorFlow installation instructions] starting at **Step 5. GPU setup** but using Anaconda instead of Miniconda. Specific instructions can also be found there for [MacOS] and [Linux] environments.
+To set up a GPU environment, make sure you have the appropriate hardware, system, and software necessary for GPU support. Here we are following the [Windows TensorFlow installation instructions] starting at **Step 5. GPU setup** but using Anaconda instead of Miniconda. Specific instructions can also be found there for [MacOS] and [Linux] environments.
 :::::::::::::::::::::::::::::::::::::::::::::::::
 
 ### NVIDIA GPU
@@ -131,9 +129,9 @@ TODO Finish these instructions
 
 :::::::::::::::::::::::::::::::::::::::::::::::::
 
-There are two other packages we need to install that we could not install at the same time that we created the environment, `tensorflow` and `scikeras`.
+There are two other packages we could not install when we created the environment, `tensorflow` and `scikeras`.
 
-To install these two packages we have to use a different package manager called `pip`.
+To install these two packages we use a different package manager called `pip`.
 
 [pip] is the package management system for Python software packages. It is integrated into your local Python installation and runs regardless of your operating system too.
 
@@ -147,7 +145,7 @@ To install these two packages we have to use a different package manager called
 
 ## Start Spyder
 
-We teach this lesson using Python in [Spyder] (Scientific Python Development Environment), a free integrated development environment (IDE) that comes with Anaconda. Editing, interactive testing, debugging, and introspection tools are all included in Spyder.
+We teach this lesson using Python in [Spyder] (Scientific Python Development Environment), a free integrated development environment (IDE) included with Anaconda. Editing, interactive testing, debugging, and introspection tools are all included in Spyder.
 
 To start Spyder, type the command `spyder`, making sure you are still in the workshop environment:
 
@@ -186,7 +184,7 @@ Most versions will work fine with this lesson, but:
 
 ## Download the exercise python template file
 
-The aim for this workshop is to create a python script that you can used as a "base python program" that can be used for future projects.
+The aim for this workshop is to create a python script to use as a "base python program" for future projects.
 
 In an effort to not clutter the scripts developed in the workshop with episode exercise/challenge code, this workshop will use an exercises python script for all of the exercises completed throughout the workshop.
 
diff --git a/learners/setup.md b/learners/setup.md
index 41f3acfa..4f3fbbd2 100644
--- a/learners/setup.md
+++ b/learners/setup.md
@@ -17,7 +17,7 @@ Please note you might want to consider installing the GPU enabled version of Ten
 ::::::::::::::::::::::::::::::::::::: challenge
 ## Install Python Using Anaconda
 
-[Python] is a popular language for scientific computing, and a frequent choice for machine learning as well. Installing all of its scientific packages individually can be a bit difficult, however, so we recommend the installer [Anaconda] which includes most (but not all) of the software you will need. Make sure you install the latest Python version 3.xx.
+[Python] is a popular language for scientific computing, and a frequent choice for machine learning as well. Installing all of its scientific packages individually can be a bit difficult, however, so we recommend the installer [Anaconda] which includes most (but not all) of the software you need. Make sure you install the latest Python version 3.xx.
 
 Also, please set up your python environment **at least** a day in advance of the workshop. If you encounter problems with the installation procedure, ask your workshop organizers via e-mail for assistance so you are ready to go as soon as the workshop begins.
 
@@ -28,7 +28,7 @@ Check out the [Windows - Video tutorial] or:
 
 1. Open [https://www.anaconda.com/products/distribution] with your web browser.
 
-2. Download the latest Python 3.xx installer for Windows.
+2. Download the Python 3.xx installer for Windows.
 
 3. Double-click the executable and install Python 3 using _MOST_ of the default settings. The only exception is to check the **Make Anaconda the default Python** option.
 ::::::::::::::::::::::::::::::::::
@@ -48,7 +48,7 @@ Check out the [Mac OS X - Video tutorial] or:
 :::::::::::::::::::::::: solution 
 ### Linux
 
-Note that the following installation steps require you to work from the shell.
+Note the following installation steps require you to work from the shell.
 If you run into any difficulties, please request help before the workshop begins.
 
 1.  Open [https://www.anaconda.com/products/distribution] with your web browser.
@@ -95,7 +95,7 @@ A terminal window will open with the title 'Anaconda Prompt':
 
 ![](fig/00_anaconda_prompt_window.png){alt='Screenshot of the terminal window that opens when you launch the Anaconda Prompt application'}
 
-Note the notation of the prompt inside the terminal window. The name inside the parentheses refers to which conda environment you are working inside of, and 'base' is the name given to the default environment that comes with every Anaconda distribution.
+Note the notation of the prompt inside the terminal window. The name inside the parentheses refers to which conda environment you are working inside of, and 'base' is the name given to the default environment included with every Anaconda distribution.
 
 To create a new environment for this lesson, the command starts with the conda keywords `conda create`, followed by a name for the new environment and the package(s) to install:
 
@@ -110,11 +110,11 @@ After the environment is created we tell Anaconda to use the new environment wit
 (cnn_workshop) C:\Users\Lab>
 ```
 
-You will know that you are in the right environment because the prompt changes from (base) to (cnn_workshop). 
+You will know you are in the right environment because the prompt changes from (base) to (cnn_workshop). 
 
-There are two other packages we need to install that we could not install at the same time that we created the environment, `tensorflow` and `scikeras`.
+There are two other packages we could not install when we created the environment, `tensorflow` and `scikeras`.
 
-To install these two packages we have to use a different package manager called `pip`.
+To install these two packages we use a different package manager called `pip`.
 
 [pip] is the package management system for Python software packages. It is integrated into your local Python installation and runs regardless of your operating system too.
 
@@ -125,7 +125,7 @@ To install these two packages we have to use a different package manager called
 ::::::::::::::::::::::::::::::::::::::::: spoiler
 ### Troubleshooting for Windows
 
-It is possible that Windows users will run into version conflicts. If you are on Windows and get errors running the command, you can try installing the packages using pip within a conda environment:
+Windows users may run into version conflicts. If you are on Windows and get errors running the command, try installing the packages using pip within a conda environment:
 
 ```code
 (base) C:\Users\Lab> conda create -n cnn_workshop python spyder
@@ -147,7 +147,7 @@ If you get errors running the installation command or conda hangs endlessly, you
 
 ## Start Spyder
 
-We teach this lesson using Python in [Spyder] (Scientific Python Development Environment), a free integrated development environment (IDE) that comes with Anaconda. Editing, interactive testing, debugging, and introspection tools are all included in Spyder.
+We teach this lesson using Python in [Spyder] (Scientific Python Development Environment), a free integrated development environment (IDE) included with Anaconda. Editing, interactive testing, debugging, and introspection tools are all included in Spyder.
 
 To start Spyder, type the command `spyder`, making sure you are still in the workshop environment:
 
@@ -185,7 +185,7 @@ Most versions will work fine with this lesson, but:
 
 ## Download the exercise python template file
 
-The aim for this workshop is to create a python script that you can used as a "base python program" that can be used for future projects.
+The aim for this workshop is to create a python script to use as a "base python program" for future projects.
 
 In an effort to not clutter the scripts developed in the workshop with episode exercise/challenge code, this workshop will use an exercises python script for all of the exercises completed throughout the workshop.