diff --git a/episodes/01-introduction.md b/episodes/01-introduction.md
index bc837cd4..44b8d308 100644
--- a/episodes/01-introduction.md
+++ b/episodes/01-introduction.md
@@ -14,10 +14,10 @@ exercises: 0
 
 ::::::::::::::::::::::::::::::::::::: objectives
 
-- Explain the difference between artificial intelligence, machine learning and deep learning
-- Understand the different types of computer vision tasks
-- Know the difference between training, testing, and validation datasets
-- Perform an image classification using a convolutional neural network (CNN)
+- Explain the difference between artificial intelligence, machine learning and deep learning.
+- Understand the different types of computer vision tasks.
+- Know the difference between training, testing, and validation datasets.
+- Perform an image classification using a convolutional neural network (CNN).
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -33,7 +33,7 @@ Many (but not all) machine learning systems “learn” by taking a series of in
 - predicting a person’s weight based on their height
 - predicting house prices given stock market prices
 - classifying if an email is spam or not
-- classifying an image as eg, person, place, or particular object
+- classifying an image as, e.g., person, place, or particular object
 
 Typically we will need to train our models with hundreds, thousands or even millions of examples before they work well enough to do any useful predictions or classifications with them.
 
@@ -55,9 +55,9 @@ Concept: Differentiation between traditional Machine Learning models and Deep Le
 
 ## What is image classification?
 
-Image classification is a fundamental task in computer vision, which is a field of artificial intelligence focused on teaching computers to interpret and understand visual information from the world. Image classification specifically involves the process of assigning a label or category to an input image. The goal is to enable computers to recognize and categorize objects, scenes, or patterns within images, just as a human would. Image classification can refer to one of several tasks:
+Image classification is a fundamental task in computer vision, which is a field of artificial intelligence focused on teaching computers to interpret and understand visual information from the world. Image classification specifically involves the process of assigning a label or category to an input image. The goal is to enable computers to recognise and categorise objects, scenes, or patterns within images, just as a human would. Image classification can refer to one of several tasks:
 
-![](fig/01_Fei-Fei_Li_Justin_Johnson_Serena_Young__CS231N_2017.png){alt='Four types of image classification tasks include semantic segmentation where every pixel is labelled; classification and localization that detects a single object like a cat; object detection that detects multiple objects like cats and dogs; and instance segmentation that detects each pixel of multiple objects'}
+![](fig/01_Fei-Fei_Li_Justin_Johnson_Serena_Young__CS231N_2017.png){alt='Four types of image classification tasks include semantic segmentation where every pixel is labelled; classification and localisation that detects a single object like a cat; object detection that detects multiple objects like cats and dogs; and instance segmentation that detects each pixel of multiple objects'}
 
 Image classification has numerous practical applications, including:
 
@@ -65,7 +65,7 @@ Image classification has numerous practical applications, including:
 - **Medical Imaging**: Diagnosing diseases from medical images like X-rays or MRIs.
 - **Quality Control**: Inspecting products for defects on manufacturing lines.
 - **Autonomous Vehicles**: Identifying pedestrians, traffic signs, and other vehicles in self-driving cars.
-- **Security and Surveillance**: Detecting anomalies or unauthorized objects in security footage.
+- **Security and Surveillance**: Detecting anomalies or unauthorised objects in security footage.
 
 Convolutional Neural Networks (CNNs) have become a cornerstone in image classification due to their ability to automatically learn hierarchical features from images and achieve remarkable performance on a wide range of tasks.
 
@@ -73,15 +73,15 @@ Convolutional Neural Networks (CNNs) have become a cornerstone in image classifi
 To apply Deep Learning to a problem there are several steps we need to go through:
 
 ### Step 1. Formulate / Outline the problem
-Firstly we must decide what it is we want our Deep Learning system to do. This lesson is all about image classification so our aim is to put an image into one of a few categories. Specifically in our case, we will be looking at 10 categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
+Firstly we must decide what it is we want our Deep Learning system to do. This lesson is all about image classification so our aim is to put an image into one of a few categories. Specifically in our case, we have 10 categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
 
 ### Step 2. Identify inputs and outputs
 Next we need to identify what the inputs and outputs of the neural network will be. In our case, the data is images and the inputs could be the individual pixels of the images. We are performing a classification problem and we will have one output for each potential class.
 
 ### Step 3. Prepare data
-Many datasets are not ready for immediate use in a neural network and will require some preparation. Neural networks can only really deal with numerical data, so any non-numerical data (eg images) will have to be somehow converted to numerical data. Information on how this is done and what the data looks like will be explored in [Episode 02 Introduction to Image Data](episodes/02-image-data).
+Many datasets are not ready for immediate use in a neural network and will require some preparation. Neural networks can only really deal with numerical data, so any non-numerical data (e.g., images) will have to be somehow converted to numerical data. Information on how this is done and the data structure will be explored in [Episode 02 Introduction to Image Data](episodes/02-image-data).
 
-For this lesson, we will use an existing image dataset known as CIFAR-10. We will introduce this dataset and the different data preparation tasks in more detail in the next episode but for this introduction, we want to divide the data into **training**, **validation**, and **test** subsets; normalize the image pixel values to be between 0 and 1; and one-hot encode our image labels.
+For this lesson, we will use an existing image dataset known as CIFAR-10. We will introduce this dataset and the different data preparation tasks in more detail in the next episode but for this introduction, we want to divide the data into **training**, **validation**, and **test** subsets; normalise the image pixel values to be between 0 and 1; and one-hot encode our image labels.
 
 #### Preparing the code
 
@@ -97,7 +97,7 @@ import numpy as np # library for working with images as arrays
 # load the CIFAR-10 dataset included with the keras library
 (train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()
 
-# normalize the RGB values to be between 0 and 1
+# normalise the RGB values to be between 0 and 1
 train_images = train_images / 255.0
 val_images = val_images / 255.0
 
@@ -135,16 +135,16 @@ Train: Images=(50000, 32, 32, 3), Labels=(40000, 10)
 Validate: Images=(10000, 32, 32, 3), Labels=(10000, 10)
 Test: Images=(10000, 32, 32, 3), Labels=(10000, 10)
 ```
-The training set consists of 40000 images of 32x32 pixels and 3 channels (RGB values) and labels.
+The training set consists of 40000 images of 32x32 pixels and three channels (RGB values) and labels.
 
-The validation and test datasets consist of 10000 images of 32x32 pixels and 3 channels (RGB values) and labels.
+The validation and test datasets consist of 10000 images of 32x32 pixels and three channels (RGB values) and labels.
 
 
 :::::::::::::::::::::::::::::::::
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 
-#### Visualize a subset of the CIFAR-10 dataset
+#### Visualise a subset of the CIFAR-10 dataset
 
 ```python
 # create a figure object and specify width, height in inches
@@ -202,7 +202,7 @@ model_intro = keras.Model(inputs = inputs_intro,
 
 ### Step 5. Choose a loss function and optimizer
 
-The loss function tells the training algorithm how far away the predicted value was from the true value. We will look at choosing a loss function in more detail later on.
+The loss function tells the training algorithm how far away the predicted value was from the true value. We will learn how to choose a loss function in more detail later on.
 
 The optimizer is responsible for taking the output of the loss function and then applying some changes to the weights within the network. It is through this process that the “learning” (adjustment of the weights) is achieved.
 
@@ -290,7 +290,7 @@ When building image recognition models in Python, especially using libraries lik
 
 #### What are hyperparameters? 
 
-Hyperparameters are all the parameters set by the person configuring the machine learning instead of those learned by the algorithm itself. These hyperparameters can include the learning rate, the number of layers in the network, the number of neurons per layer, and many more. Hyperparameter tuning refers to the process of systematically searching for the best combination of hyperparameters that will optimize the model's performance. This concept will be continued, with practical examples, in [Episode 05 Evaluate a Convolutional Neural Network and Make Predictions (Classifications)](episodes/05-evaluate-predict-cnn.md)
+Hyperparameters are all the parameters set by the person configuring the machine learning instead of those learned by the algorithm itself. These hyperparameters can include the learning rate, the number of layers in the network, the number of neurons per layer, and many more. Hyperparameter tuning refers to the process of systematically searching for the best combination of hyperparameters that will optimise the model's performance. This concept will be continued, with practical examples, in [Episode 05 Evaluate a Convolutional Neural Network and Make Predictions (Classifications)](episodes/05-evaluate-predict-cnn.md)
 
 ### Step 10. Share Model
 
@@ -307,11 +307,12 @@ associated with the lessons. They appear in the "Instructor View"
 
 ::::::::::::::::::::::::::::::::::::: keypoints
 
-- Machine learning is the process where computers learn to recognise patterns of data
-- Deep learning is a subset of machine learning, which is a subset of artificial intelligence
-- Convolutional neural networks are well suited for image classification
+- Machine learning is the process where computers learn to recognise patterns of data.
+- Deep learning is a subset of machine learning, which is a subset of artificial intelligence.
+- Convolutional neural networks are well suited for image classification.
 - To use Deep Learning effectively we need to go through a workflow of: defining the problem, identifying inputs and outputs, preparing data, choosing the type of network, training the model, tuning hyperparameters, measuring performance before we can classify data.
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 <!-- Collect your link references at the bottom of your document -->
+
 [original source]: https://en.wikipedia.org/wiki/File:AI-ML-DL.svg
diff --git a/episodes/02-image-data.md b/episodes/02-image-data.md
index bace53df..c8ac2cae 100644
--- a/episodes/02-image-data.md
+++ b/episodes/02-image-data.md
@@ -15,10 +15,10 @@ exercises: 2
 
 ::::::::::::::::::::::::::::::::::::: objectives
 
-- Identify sources of image data
-- Understand the properties of image data
-- Write code to plot image data
-- Prepare an image dataset to train a convolutional neural network (CNN)
+- Identify sources of image data.
+- Understand the properties of image data.
+- Write code to plot image data.
+- Prepare an image dataset to train a convolutional neural network (CNN).
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -44,18 +44,18 @@ You can use pre-existing data or prepare your own.
 
 #### Pre-existing image data
 
-In some cases you will be able to download an image dataset that is already labelled and can be used to classify a number of different object like we see with the CIFAR-10 dataset. Other examples include:
+In some cases you will be able to download an image dataset that is already labelled and can be used to classify a number of different object like the CIFAR-10 dataset. Other examples include:
 
 - [MNIST database] - 60,000 training images of handwritten digits (0-9)
 - [ImageNet] - 14 million hand-annotated images indicating objects from more than 20,000 categories. ImageNet sponsors an [annual software contest] where programs compete to achieve the highest accuracy. When choosing a pretrained network, the winners of these sorts of competitions are generally a good place to start.
 - [MS COCO] - >200,000 labelled images used for object detection, instance segmentation, keypoint analysis, and captioning
 
-Where labelled data exists, in most cases the data provider or other users will have created functions that you can use to load the data. We already saw an example of this in the introduction:
+Where labelled data exists, in most cases the data provider or other users will have created functions that you can use to load the data. We already did this in the introduction:
 
 ```python
 from tensorflow import keras
 
-# load the cifar dataset included with the keras library
+# load the CIFAR-10 dataset included with the keras library
 (train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()
 ```
 
@@ -66,7 +66,7 @@ In this instance the data is likely already prepared for use in a CNN. However,
 
 How much data do you need for Deep Learning?
 
-The rise of Deep Learning is partially due to the increased availability of very large datasets. But how much data do you actually need to train a Deep Learning model? Unfortunately, this question is not easy to answer. It depends, among other things, on the complexity of the task (which you often do not know beforehand), the quality of the available dataset and the complexity of the network. For complex tasks with large neural networks, we often see that adding more data continues to improve performance. However, this is also not a generic truth: if the data you add is too similar to the data you already have, it will not give much new information to the neural network.
+The rise of Deep Learning is partially due to the increased availability of very large datasets. But how much data do you actually need to train a Deep Learning model? Unfortunately, this question is not easy to answer. It depends, among other things, on the complexity of the task (which you often do not know beforehand), the quality of the available dataset and the complexity of the network. For complex tasks with large neural networks, we often find that adding more data continues to improve performance. However, this is also not a generic truth: if the data you add is too similar to the data you already have, it will not give much new information to the neural network.
 
 In case you have too little data available to train a complex network from scratch, it is sometimes possible to use a pretrained network that was trained on a similar problem. Another trick is data augmentation, where you expand the dataset with artificial data points that could be real. An example of this is mirroring images when trying to classify cats and dogs. An horizontally mirrored animal retains the label, but exposes a different view.
 
@@ -92,37 +92,37 @@ This step involves various tasks to enhance the quality and consistency of the d
 
 - **Resizing**: Resize images to a consistent resolution to ensure uniformity and reduce computational load.
 
-- **Normalization**: Scale pixel values to a common range, often between 0 and 1 or -1 and 1. Normalization helps the model converge faster during training.
+- **Normalisation**: Scale pixel values to a common range, often between 0 and 1 or -1 and 1. Normalisation helps the model converge faster during training.
 
 - **Label encoding** is a technique used to represent categorical data with numerical labels.
 
-- **Data Augmentation**: Apply random transformations (e.g., rotations, flips, shifts) to create new variations of the same image. This helps improve the model's robustness and generalization by exposing it to more diverse data.
+- **Data Augmentation**: Apply random transformations (e.g., rotations, flips, shifts) to create new variations of the same image. This helps improve the model's robustness and generalisation by exposing it to more diverse data.
 
-Before we look at some of these tasks in more detail we need to understand that the images we see on hard copy, view with our electronic devices, or process with our programs are represented and stored in the computer as numeric abstractions, or approximations of what we see with our eyes in the real world. And before we begin to learn how to process images with Python programs, we need to spend some time understanding how these abstractions work.
+Before we learn about some of these tasks in more detail, we need to understand that the images on hard copy or electronic devices, or processed with our programs, are represented and stored in the computer as numeric abstractions, or approximations of the real world. And before we begin to learn how to process images with Python programs, we need to spend some time understanding how these abstractions work.
 
 ### Pixels
 
-It is important to realise that images are stored as rectangular arrays of hundreds, thousands, or millions of discrete "picture elements," otherwise known as pixels. Each pixel can be thought of as a single square point of coloured light.
+It is important to realise that images are stored as rectangular arrays of hundreds, thousands, or millions of discrete "picture elements," otherwise known as pixels. Each pixel can be thought of as a single square point of colored light.
 
 For example, consider this image of a Jabiru, with a square area designated by a red box:
 
 ![](fig/02_Jabiru_TGS_marked.jpg){alt='Jabiru image that is 552 pixels wide and 573 pixels high. A red square around the neck region indicates the area to zoom in on.'}
 
-Now, if we zoomed in close enough to see the pixels in the red box, we would see something like this:
+Now, if we zoomed in close enough to the red box, inte individual pixels would stand out:
 
-![](fig/02_Jabiru_TGS_marked_zoom_enlarged.jpg){alt='zoomed in area of Jabiru where you can see individual pixels'}
+![](fig/02_Jabiru_TGS_marked_zoom_enlarged.jpg){alt='zoomed in area of Jabiru where you can the individual pixels stand out'}
 
-Note that each square in the enlarged image area - each pixel - is all one colour, but that each pixel can have a different colour from its neighbors. Viewed from a distance, these pixels seem to blend together to form the image we see.
+Note that each square in the enlarged image area (i.e. each pixel) is all one color, but that each pixel can have a different color from its neighbors. Viewed from a distance, these pixels seem to blend together to form the image.
 
 ### Working with Pixels
 
-As noted, in practice, real world images will typically be made up of a vast number of pixels, and each of these pixels will be one of potentially millions of colours. In python, an image can be represented as a multidimensional array, also known as a `tensor`, where each element in the array corresponds to a pixel value in the image. In the context of images, these arrays often have dimensions for height, width, and color channels (if applicable).
+As noted, in practice, real world images will typically be made up of a vast number of pixels, and each of these pixels will be one of potentially millions of colors. In python, an image can be represented as a multidimensional array, also known as a `tensor`, where each element in the array corresponds to a pixel value in the image. In the context of images, these arrays often have dimensions for height, width, and color channels (if applicable).
 
 ::::::::::::::::::::::::::::::::::::::::: callout
 
 Matrices, arrays, images and pixels
 
-The matrix is mathematical concept - numbers evenly arranged in a rectangle. This can be a two dimensional rectangle, like the shape of the screen you're looking at now. Or it could be a three dimensional equivalent, a cuboid, or have even more dimensions, but always keeping the evenly spaced arrangement of numbers. In computing, array refers to a structure in the computer's memory where data is stored in evenly-spaced elements. This is strongly analogous to a matrix. A NumPy array is a type of variable (a simpler example of a type is an integer). For our purposes, the distinction between matrices and arrays is not important, we don't really care how the computer arranges our data in its memory. The important thing is that the computer stores values describing the pixels in images, as arrays. And the terms matrix and array can be used interchangeably.
+The matrix is mathematical concept where numbers are evenly arranged in a rectangle. This can be a two dimensional rectangle, like the shape of your computer screen. Or it could be a three dimensional equivalent, a cuboid, or have even more dimensions, but always keeping the evenly spaced arrangement of numbers. In computing, array refers to a structure in the computer's memory where data is stored in evenly-spaced elements. This is strongly analogous to a matrix. A NumPy array is a type of variable (a simpler example of a type is an integer). For our purposes, the distinction between matrices and arrays is not important, we don't really care how the computer arranges our data in its memory. The important thing is that the computer stores values describing the pixels in images, as arrays. And the terms matrix and array can be used interchangeably.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -137,16 +137,16 @@ Two of the most commonly used libraries for image representation and manipulatio
 
 - The Pillow library (PIL fork) provides functions to open, manipulate, and save various image file formats. It represents images using its own Image class. 
   - `from PIL import Image`
-  - see [PIL Image Module]
+  - [PIL Image Module] documentation
 
 - TensorFlow images are often represented as tensors that have dimensions for batch size, height, width, and color channels. This framework provide tools to load, preprocess, and work with image data seamlessly. 
   - `from tensorflow import keras`
-  - see [image preprocessing] documentation
+  - [image preprocessing] documentation
   - Note Keras image functions also use PIL 
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 
-Let us start by taking a closer look at the Jabiru image.
+Let us start with the Jabiru image.
 
 ```python
 # load the libraries required
@@ -168,9 +168,9 @@ The new image is of type : <class 'PIL.JpegImagePlugin.JpegImageFile'> and has t
 
 ### Image Dimensions - Resizing
 
-Here we see our new image has shape `(573, 552, 3)`, meaning it is much larger in size, 573x552 pixels; a rectangle instead of a square; and consists of 3 colour channels (RGB).
+The new image has shape `(573, 552, 3)`, meaning it is much larger in size, 573x552 pixels; a rectangle instead of a square; and consists of three color channels (RGB).
 
-Recall from the introduction that our training data set consists of 50000 images of 32x32 pixels and 3 channels. 
+Recall from the introduction that our training data set consists of 50000 images of 32x32 pixels and three channels. 
 
 To reduce the computational load and ensure all of our images have a uniform size, we need to choose an image resolution (or size in pixels) and ensure that all of the images we use are resized to that shape to be consistent.
 
@@ -187,7 +187,7 @@ print('The new image is still of type:', new_img_pil_small.__class__, 'but now h
 The new image is still of type: <class 'PIL.Image.Image'> but now has the same size (32, 32) as our training data.
 ```
 
-### Normalization
+### Normalisation
 
 Image RGB values are between 0 and 255. As input for neural networks, it is better to have small input values. The process of converting the RGB values to be between 0 and 1 is called **normalization**.
 
@@ -218,7 +218,7 @@ Remember that normalization is not always mandatory, and there could be cases wh
 
 Before we can normalize our image values we must convert the image to an numpy array.
 
-We saw how to do this in the introduction but what you may not have noticed is that the `keras.datasets.cifar10.load_data` function did the conversion for us whereas now we will do it ourselves.
+We introduced how to do this in [Episode 01 Introduction to Deep Learning](episodes/01-introduction.md) but what you may not have noticed is that the `keras.datasets.cifar10.load_data` function did the conversion for us whereas now we will do it ourselves.
 
 ```python
 # convert the Image into an array for normalization
@@ -283,11 +283,11 @@ The Keras function for one_hot encoding is called [to_categorical]:
 
 `tf.keras.utils.to_categorical(y, num_classes=None, dtype="float32")`
 
-- `y` is array-like with class values to be converted into a matrix (integers from 0 to num_classes - 1)
-- `num_classes` is the total number of classes. If None, this would be inferred as max(y) + 1
+- `y` is array-like with class values to be converted into a matrix (integers from 0 to num_classes - 1).
+- `num_classes` is the total number of classes. If None, this would be inferred as max(y) + 1.
 - `dtype` is the data type expected by the input. Default: 'float32'
 
-We performed this operation in **Step 3. Prepare data** of the Introduction but let us look at the labels before and after one-hot encoding.
+We performed this operation in **Step 3. Prepare data** of the Introduction but let us inspect the labels before and after one-hot encoding.
 
 ```
 print()
@@ -334,9 +334,9 @@ There are several ways to augment your data to increase the diversity of the tra
   - brightness, contrast, or hue
   - these changes simulate variations in lighting conditions
  
-We will not be looking at image augmentation in this lesson but it is important that you be aware of this type of data preparation because it can make a big difference in your model's ability to predict outside of your training data.
+We will discuss image augmentation in this lesson, but it is important that you be aware of this type of data preparation because it can make a big difference in your model's ability to predict outside of your training data.
 
-Have a look at [Image augmentation layers] for information about these operations. 
+Information about these operations are included in the Keras document for [Image augmentation layers]. 
 
 ### Data Splitting
 
@@ -344,14 +344,14 @@ The typical practice in machine learning is to split your data into two subsets:
 
 After this initial split, you can choose to further split the training set into a training set and a **validation set**. This is often done when you need to fine-tune hyperparameters, select the best model from a set of candidate models, or prevent overfitting.
 
-In the previous episode we saw that the keras installation includes the Cifar-10 dataset and that by using the 'cifar10.load_data()' method the returned data is split into two (train and test sets). Now we just need to split the training data into training and validation sets.
+In the previous episodes we used the 'cifar10.load_data()' that comes with the Keras installation to return data that is split into two (train and test sets). Now we want to split the training data into training and validation sets.
 
 To split a dataset into training and test sets there is a very convenient function from sklearn called [train_test_split]: 
 
 `sklearn.model_selection.train_test_split(*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)`
 
-- The first two parameters are the dataset (X) and the corresponding targets (y) (i.e. class labels)
-- Next is the named parameter `test_size` this is the fraction of the dataset that is used for testing, in this case `0.2` means 20% of the data will be used for testing.
+- The first two parameters are the dataset (X) and the corresponding targets (y) (i.e. class labels).
+- Next is the named parameter `test_size` this is the fraction of the dataset that is used for testing, in this case `0.2` means 20 per cent of the data will be used for testing.
 - `random_state` controls the shuffling of the dataset, setting this value will reproduce the same results (assuming you give the same integer) every time it is called.
 - `shuffle` which can be either `True` or `False`, it controls whether the order of the rows of the dataset is shuffled before splitting. It defaults to `True`.
 - `stratify` is a more advanced parameter that controls how the split is done. By setting it to `target` the train and test sets the function will return will have roughly the same proportions (with regards to the number of images of a certain class) as the dataset.
@@ -364,7 +364,7 @@ train_images, val_images, train_labels, val_labels = train_test_split(train_imag
 ::::::::::::::::::::::::::::::::::::: challenge
 Training and Validation
 
-Take a look at the training and validation sets we created. 
+Inspect the training and validation sets we created. 
 
 How many samples does each set have and are the classes well balanced?
 
@@ -463,34 +463,30 @@ It's important to note that the exact split ratios (e.g., 80-10-10 or 70-15-15)
 
 ## Finally! 
 
-Our dataset is preprocessed and split into three sets which means we are ready to look at how we built our CNN in the introduction.
+Our dataset is preprocessed and split into three sets which means we are ready to learn how we built the CNN in the introduction.
 
 
 ::::::::::::::::::::::::::::::::::::: keypoints 
 
-- Image datasets can be found online or created uniquely for your research question
-- Images consist of pixels arranged in a particular order
-- Image data is usually preprocessed before use in a CNN for efficiency, consistency, and robustness
+- Image datasets can be found online or created uniquely for your research question.
+- Images consist of pixels arranged in a particular order.
+- Image data is usually preprocessed before use in a CNN for efficiency, consistency, and robustness.
 
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 <!-- Collect your link references at the bottom of your document -->
+
 [MNIST database]: https://en.wikipedia.org/wiki/MNIST_database
 [ImageNet]: https://www.image-net.org/
 [annual software contest]: https://www.image-net.org/challenges/LSVRC/#:~:text=The%20ImageNet%20Large%20Scale%20Visual,image%20classification%20at%20large%20scale.
 [MS COCO]: https://cocodataset.org/#home
-
 [VGG Image Annotator]: https://www.robots.ox.ac.uk/~vgg/software/via/
 [ImageJ]: https://imagej.net/
 [COCO Annotator]: https://github.com/jsbroks/coco-annotator
-
 [PIL Image Module]: https://pillow.readthedocs.io/en/latest/reference/Image.html
 [image preprocessing]: https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image
-
-
 [tf.keras.utils.image_dataset_from_directory]:  https://keras.io/api/data_loading/image/
 [to_categorical]: https://keras.io/api/utils/python_utils/#to_categorical-function
-
 [Image augmentation layers]: https://keras.io/api/layers/preprocessing_layers/image_augmentation/
 [train_test_split]: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
diff --git a/episodes/03-build-cnn.md b/episodes/03-build-cnn.md
index ae2116c7..8e1de9a5 100644
--- a/episodes/03-build-cnn.md
+++ b/episodes/03-build-cnn.md
@@ -14,22 +14,22 @@ exercises: 2
 
 ::::::::::::::::::::::::::::::::::::: objectives
 
-- Understand how a convolutional neural network (CNN) differs from an artificial neural network (ANN)
-- Explain the terms: kernel, filter
-- Know the different layers: convolutional, pooling, flatten, dense
+- Understand how a convolutional neural network (CNN) differs from an artificial neural network (ANN).
+- Explain the terms: kernel, filter.
+- Know the different layers: convolutional, pooling, flatten, dense.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 ## Neural Networks
 
-A **neural network** is an artificial intelligence technique loosely based on the way neurons in the brain work. A neural network consists of connected computational units called neurons. Each neuron ...
+A **neural network** is an artificial intelligence technique loosely based on the way neurons in the brain work. A neural network consists of connected computational units called neurons. Each neuron:
 
-- has one or more inputs, e.g. input data expressed as floating point numbers
+- has one or more inputs, e.g., input data expressed as floating point numbers.
 - conducts three main operations most of the time:
     - take the weighted sum of the inputs
     - add an extra constant weight (i.e. a bias term) to this weighted sum
     - apply a non-linear function to the output so far (using a predefined activation function)
-- returns one output value, again a floating point number
+- returns one output value, again a floating point number.
 
 ![](fig/03_neuron.png){alt='diagram of a single neuron taking multiple inputs and their associated weights in and then applying an activation function to predict a single output'}
 
@@ -45,9 +45,9 @@ A convolutional neural network (CNN) is a type of artificial neural network (ANN
 
 ### Step 4. Build an architecture from scratch or choose a pretrained model
 
-Let us look at how to build a neural network from scratch. Although this sounds like a daunting task, with Keras it is surprisingly straightforward. With Keras you compose a neural network by creating layers and linking them together.
+Let us explore how to build a neural network from scratch. Although this sounds like a daunting task, with Keras it is surprisingly straightforward. With Keras you compose a neural network by creating layers and linking them together.
 
-Let's look at our network from the introduction:
+This is the same network from the introduction:
 
 ```
 # # CNN Part 1
@@ -75,7 +75,7 @@ Let's look at our network from the introduction:
 
 ### Parts of a neural network
 
-Here we can see there are three main components of a neural network:  
+There are three main components of a neural network:
 
 - CNN Part 1. Input Layer
 - CNN Part 2. Hidden Layers
@@ -129,9 +129,9 @@ A **convolution matrix**, or **kernel**, is a matrix transformation that we 'sli
  [0,   0,  0]
  [1,   1,  1]]
 ```
-This kernel will give a high value to a pixel if it is on a horizontal border between dark and light areas. Note that for RGB images, the kernel should also have a depth of 3, one for each colour channel.
+This kernel will give a high value to a pixel if it is on a horizontal border between dark and light areas. Note that for RGB images, the kernel should also have a depth of 3, one for each color channel.
 
-In the following image, we see the effect of such a kernel on the values of a single-channel image. The red cell in the output matrix is the result of multiplying and summing the values of the red square in the input, and the kernel. Applying this kernel to a real image shows that it indeed detects horizontal edges.
+In the following image, the effect of such a kernel on the values of a single-channel image stands out. The red cell in the output matrix is the result of multiplying and summing the values of the red square in the input, and the kernel. Applying this kernel to a real image shows that it indeed detects horizontal edges.
 
 ![](fig/03_conv_matrix.png){alt='6x5 input matrix representing a single color channel image being multipled by a 3x3 kernel to produce a 4x4 output matrix that detects horizonal edges in an image '}
 
@@ -148,19 +148,19 @@ We define arguments for the number of filters, the kernel size, and the activati
 # x_intro = keras.layers.Conv2D(32, (3, 3), activation='relu')(inputs_intro)
 ```
 
-The instantiation here has three parameters and a seemingly strange combination of parentheses, so let us take a closer look.
+The instantiation here has three parameters and a seemingly strange combination of parentheses, so let us break it down.
 
 - The first parameter is the number of filters we want in this layer and this is one of the hyperparameters of our system and needs to be chosen carefully. 
 
 The term **filter** in the context of CNN's is often used synonymously with kernel. However, a filter refers to the learned parameters (weights) that are applied during the convolution operation. For example, in a convolutional layer, you might have multiple filters (or kernels), each responsible for detecting different features in the input data. The parameter here specifies the number of output filters in the convolution.
 
-It's good practice to start with a relatively small number of filters in the first layer to prevent overfitting and choosing a number of filters as a power of 2 (e.g., 32, 64, 128) is common.
+It's good practice to start with a relatively small number of filters in the first layer to prevent overfitting and choosing a number of filters as a power of two (e.g., 32, 64, 128) is common.
 
 - The second parameter is the kernel size which we already discussed. Smaller kernels are often used to capture fine-grained features and odd-sized filters are preferred because they have a centre pixel which helps maintain spatial symmetry during covolutions.
 
 - The third parameter is the activation function to use; here we choose **relu** which is 0 for inputs that are 0 and below and the identity function (returning the same value) for inputs above 0. This is a commonly used activation function in deep neural networks that is proven to work well. We will discuss activation functions later in **Step 9. Tune hyperparameters** but to satisfy your curiosity, `ReLU` stands for Rectified Linear Unit (ReLU).
 
-- Next we see an extra set of parenthenses with inputs in them, this means that after creating an instance of the Conv2D layer we call it as if it was a function. This tells the Conv2D layer to connect the layer passed as a parameter, in this case the inputs.
+- Next is an extra set of parenthenses with inputs in them that means that after an instance of the Conv2D layer is created, it can be called as if it was a function. This tells the Conv2D layer to connect the layer passed as a parameter, in this case the inputs.
 
 - Finally, we store a reference so we can pass it to the next layer.
 
@@ -184,7 +184,7 @@ What do you think happens to the border pixels when applying a convolution?
 
 :::::::::::::::::::::::: solution
 
-There are different ways of dealing with border pixels. You can ignore them, which means that your output image is slightly smaller then your input. It is also possible to 'pad' the borders, e.g. with the same value or with zeros, so that the convolution can also be applied to the border pixels. In that case, the output image will have the same size as the input image.
+There are different ways of dealing with border pixels. You can ignore them, which means that your output image is slightly smaller then your input. It is also possible to 'pad' the borders, e.g., with the same value or with zeros, so that the convolution can also be applied to the border pixels. In that case, the output image will have the same size as the input image.
 :::::::::::::::::::::::::::::::::
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -193,7 +193,7 @@ There are different ways of dealing with border pixels. You can ignore them, whi
 
 The convolutional layers are often intertwined with **Pooling** layers. As opposed to the convolutional layer used in feature extraction, the pooling layer alters the dimensions of the image and reduces it by a scaling factor. It is basically decreasing the resolution of your picture. The rationale behind this is that higher layers of the network should focus on higher-level features of the image. By introducing a pooling layer, the subsequent convolutional layer has a broader 'view' on the original image.
 
-As we saw with convolutional layers, Keras offers several pooling layers and one used for images (2D spatial data) is the `tf.keras.layers.MaxPooling2D` class.
+Similar to convolutional layers, Keras offers several pooling layers and one used for images (2D spatial data) is the `tf.keras.layers.MaxPooling2D` class.
 
 ```
 # # Pooling layer with input window sized 2,2
@@ -228,7 +228,7 @@ In Keras, a densely-connected NN layer is defined by the `tf.keras.layers.Dense`
 # x_intro = keras.layers.Dense(64, activation='relu')(x_intro)
 ```
 
-This instantiation has two parameters: the number of neurons and the activation function as we saw in the convolutional layer.
+This instantiation has two parameters: the number of neurons and the activation function, similar to the argument for the convolutional layer.
 
 The choice of how many neurons to specify is often determined through experimentation and can impact the performance of our CNN. Too few neurons may not capture complex patterns in the data but too many neurons may lead to overfitting. 
 
@@ -292,9 +292,9 @@ The **Flatten** layer converts the output of the previous layer into a single on
 
 #### CNN Part 3. Output Layer
 
-Recall for the outputs we will need to look at what we want to identify from the data. If we are performing a classification problem then typically we will have one output for each potential class. We need to finish with a Dense layer to connect the output cells of the convolutional layer to the outputs for our 10 classes.
+Recall for the outputs we need to ask what we want to identify from the data. If we are performing a classification problem then typically we will have one output for each potential class. We finish with a Dense layer to connect the output cells of the convolutional layer to the outputs for our 10 classes.
 
-Note the use of `softmax` activation for this Dense layer as opposed to the `ReLU` activation used above. We use softmax for multiclass data because it helps the computer give each option (class) a likelihood score, and the scores add up to 100%. This way, it's easier to pick the one the computer thinks is most probable.
+Note the use of `softmax` activation for this Dense layer as opposed to the `ReLU` activation used above. We use softmax for multiclass data because it helps the computer give each option (class) a likelihood score, and the scores add up to 100 per cent. This way, it's easier to pick the one the computer thinks is most probable.
 
 ```
 # # Output layer with 10 units (one for each class) and softmax activation
@@ -335,6 +335,9 @@ model_intro = keras.Model(inputs=inputs_intro, outputs=outputs_intro, name="cifa
 # view the model summary
 model_intro.summary()
 
+# save dropout model
+model_intro.save('fit_outputs/model_intro.keras')
+
 ```
 ```output
 Model: "cifar_model_intro"
@@ -370,30 +373,33 @@ _________________________________________________________________
 
 ## How to choose an architecture?
 
-Even for this neural network, we had to make a choice on the number of hidden neurons. Other choices to be made are the number of layers and type of layers. You might wonder how you should make these architectural choices. Unfortunately, there are no clear rules to follow here, and it often boils down to a lot of trial and error. However, it is recommended to look what others have done with similar datasets and problems. Another best practice is to start with a relatively simple architecture. Once running start to add layers and tweak the network to see if performance increases. 
+Even for this neural network, we had to make a choice on the number of hidden neurons. Other choices to be made are the number of layers and type of layers. You might wonder how you should make these architectural choices. Unfortunately, there are no clear rules to follow here, and it often boils down to a lot of trial and error. However, it is recommended to explore what others have done with similar datasets and problems. Another best practice is to start with a relatively simple architecture. Once running start to add layers and tweak the network to test if performance increases. 
 
 ::::::::::::::::::::::::::::::::::::::::::::::
 
 ## We have a model now what?
 
-This CNN should be able to run with the CIFAR-10 dataset and provide reasonable results for basic classification tasks. However, do keep in mind that this model is relatively simple, and its performance may not be as high as more complex architectures. The reason it's called deep learning is because in most cases, the more layers we have, ie, the deeper and more sophisticated CNN architecture we use, the better the performance.
+This CNN should be able to run with the CIFAR-10 dataset and provide reasonable results for basic classification tasks. However, do keep in mind that this model is relatively simple, and its performance may not be as high as more complex architectures. The reason it's called deep learning is because in most cases, the more layers we have, i.e. the deeper and more sophisticated CNN architecture we use, the better the performance.
+
+How can we tell? We can inspect a couple metrics produced during the training process to detect whether our model is underfitting or overfitting. To do that, we first need to continue with the next steps in our Deep Learning workflow, **Step 5. Choose a loss function and optimizer** and **Step 6. Train model**. 
 
-How can we tell? We can look at a couple metrics during the training process to detect whether our model is underfitting or overfitting. To do that, we first need to continue with the next steps in our Deep Learning workflow, **Step 5. Choose a loss function and optimizer** and **Step 6. Train model**. 
+Make sure you saved your model before moving on.
 
 ::::::::::::::::::::::::::::::::::::: keypoints 
 
-- Artificial neural networks (ANN) are a machine learning technique based on a model inspired by groups of neurons in the brain
-- Convolution neural networks (CNN) are a type of ANN designed for image classification and object detection
-- The filter size determines the size of the receptive field where information is extracted and the kernel size changes the mathematical structure
+- Artificial neural networks (ANN) are a machine learning technique based on a model inspired by groups of neurons in the brain.
+- Convolution neural networks (CNN) are a type of ANN designed for image classification and object detection.
+- The filter size determines the size of the receptive field where information is extracted and the kernel size changes the mathematical structure.
 - A CNN can consist of many types of layers including convolutional, pooling, flatten, and dense (fully connected) layers
-- Convolutional layers are responsible for learning features from the input data
-- Pooling layers are often used to reduce the spatial dimensions of the data
-- The flatten layer is used to convert the multi-dimensional output of the convolutional and pooling layers into a flat vector
-- Dense layers are responsible for combining features learned by the previous layers to perform the final classification
+- Convolutional layers are responsible for learning features from the input data.
+- Pooling layers are often used to reduce the spatial dimensions of the data.
+- The flatten layer is used to convert the multi-dimensional output of the convolutional and pooling layers into a flat vector.
+- Dense layers are responsible for combining features learned by the previous layers to perform the final classification.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 <!-- Collect your link references at the bottom of your document -->
+
 [CC BY-SA 3.0]: https://creativecommons.org/licenses/by-sa/3.0
 [original source]: https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg
 [Layers API]: https://keras.io/api/layers/
diff --git a/episodes/04-fit-cnn.md b/episodes/04-fit-cnn.md
index 4445603f..d6ee01a0 100644
--- a/episodes/04-fit-cnn.md
+++ b/episodes/04-fit-cnn.md
@@ -17,12 +17,12 @@ exercises: 2
 
 ::::::::::::::::::::::::::::::::::::: objectives
 
-- Explain the difference between compiling and training (fitting) a CNN
-- Know how to select a loss function for your model
-- Understand what an optimizer is
-- Define the terms: learning rate, batch size, epoch
-- Understand what loss and accuracy are and how to monitor them during training
-- Explain what overfitting is and what to do about it
+- Explain the difference between compiling and training (fitting) a CNN.
+- Know how to select a loss function for your model.
+- Understand what an optimizer is.
+- Define the terms: learning rate, batch size, epoch.
+- Understand what loss and accuracy are and how to monitor them during training.
+- Explain what overfitting is and what to do about it.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -50,7 +50,7 @@ For classification purposes, there are a number of probabilistic losses to choos
 
 The loss function is defined by the `tf.keras.losses.CategoricalCrossentropy` class.
 
-For more information about loss functions in Keras look at the [loss documentation].
+More information about loss functions can be found in the Keras [loss documentation].
 
 
 #### Optimizer
@@ -63,7 +63,7 @@ We need to choose which optimizer to use and, if this optimizer has parameters,
 ## compile the model
 #model_intro.compile(optimizer = 'adam', 
 #                    loss = keras.losses.CategoricalCrossentropy(), 
-#                    metrics = ['accuracy'])  
+#                    metrics = ['accuracy'])
 ``` 
 
 **Adam** 
@@ -88,7 +88,7 @@ ChatGPT
 
 **Learning rate** is a hyperparameter that determines the step size at which the model's parameters are updated during training. A higher learning rate allows for more substantial parameter updates, which can lead to faster convergence, but it may risk overshooting the optimal solution. On the other hand, a lower learning rate leads to smaller updates, providing more cautious convergence, but it may take longer to reach the optimal solution. Finding an appropriate learning rate is crucial for effectively training machine learning models.
 
-In the figure below, we can see that a small learning rate will not traverse toward the minima of the gradient descent algorithm in a timely manner i.e. number of epochs.
+The figure below illustrates a small learning rate that will not traverse toward the minima of the gradient descent algorithm in a timely manner, i.e. number of epochs.
 
 ![Small learning rate leads to inefficient approach to loss minima](https://developers.google.com/static/machine-learning/crash-course/images/LearningRateTooSmall.svg "Small learning rate leads to inefficient approach to loss minima"){alt='plot of loss over value of weight shows how a small learning rate takes a long time to reach the optimal solution'}
 
@@ -120,9 +120,9 @@ Metric functions are similar to loss functions, except that the results from eva
 
 Typically you will use `accuracy` which calculates how often predictions matches labels.
 
-The accuracy function creates two local variables, total and count that are used to compute the frequency with which predictions matches labels. This frequency is ultimately returned as accuracy: an operation that simply divides total by count.
+The accuracy function creates two local variables, total and count that are used to compute the frequency with which predictions matches labels. This frequency is ultimately returned as accuracy: an operation that divides the  total by count.
 
-For a list of metrics in Keras see [metrics].
+A list of metrics can be found in the Keras [metrics] documentation.
 
 Now that we have decided on which loss function, optimizer, and metric to use we can compile the model using `model.compile`. Compiling the model prepares it for training.
 
@@ -148,7 +148,7 @@ The `batch_size` parameter defaults to 32. The **batch size** is an important hy
 
 Note we are also creating a new variable `history_intro` to capture the history of the training in order to extract metrics we will use for model evaluation.
 
-There are other arguments we could use to fit our model, see the documentation for [fit method].
+Other arguments used to fit our model can be found in the documentation for the [fit method].
  
 
 ::::::::::::::::::::::::::::::::::::::::: spoiler 
@@ -175,7 +175,7 @@ However, it's essential to consider the trade-offs of using different batch size
 
 ### Monitor Training Progress (aka Model Evaluation during Training)
 
-Now that we know more about the compilation and fitting of CNN's let us take a look at the training metrics for our model.
+Now that we know more about the compilation and fitting of CNN's let us take a inspect the training metrics for our model.
 
 Using seaborn we can plot the training process using the history:
 
@@ -201,14 +201,14 @@ This plot can be used to identify whether the training is well configured or whe
 
 ## Inspect the Training Curve
 
-Looking at the training curves we have just made and recall the difference between the training and the validation datasets.
+Inspect the training curves we have just made and recall the difference between the training and the validation datasets.
 
 1. How does the training progress?
 
 - Does the loss increase or decrease?
 - What about the accuracy?
 - Do either change fast or slowly?
-- Do the graphs look very jittery?
+- Do the graphs lines go up and down frequently (i.e. jitter)?
 
 2. Do you think the resulting trained network will work well on the test set?
 
@@ -220,7 +220,7 @@ Looking at the training curves we have just made and recall the difference betwe
 :::::::::::::::::::::::::::::::::
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
-If we look at these plots we can see signs of **overfitting**. If a model is overfitting, it means that the model performs exceptionally well on the training data but poorly on the validation or test data. Overfitting occurs when the model has learned to memorize the noise and specific patterns in the training data instead of generalizing the underlying relationships. As a result, the model fails to perform well on new, unseen data because it has become too specialized to the training set.
+These is evidence of **overfitting** in these plots. If a model is overfitting, it means that the model performs exceptionally well on the training data but poorly on the validation or test data. Overfitting occurs when the model has learned to memorize the noise and specific patterns in the training data instead of generalizing the underlying relationships. As a result, the model fails to perform well on new, unseen data because it has become too specialized to the training set.
 
 Key characteristics of an overfit model include:
 
@@ -244,9 +244,9 @@ Key characteristics of an underfit model include:
 
 - Low Validation Accuracy: This indicates that the model is not learning from the data effectively.
 - Large Training Loss: The training loss (error) is high, indicating that the model's predictions are far from the true labels in the training set.
-- Increasing validation loss
+- Increasing validation loss.
 
-How to Address underfitting:
+How to address underfitting:
 
 - Increase the model's complexity by adding more layers or units to the existing layers.
 - Train the model for more epochs to give it more time to learn from the data.
@@ -280,7 +280,7 @@ tf.keras.layers.Dropout(rate, noise_shape=None, seed=None, **kwargs)
 
 The `rate` parameter is a float between 0 and 1 and represents the fraction of the input units to drop.
 
-We want to add one Dropout Layer to our network that randomly drops 80% of the input units but where should we put it?
+We want to add one Dropout Layer to our network that randomly drops 80 per cent of the input units but where should we put it?
 
 The placement of the dropout layer matters. Adding dropout before or after certain layers can have different effects. For example, it's common to place dropout after convolutional and dense layers but not typically after pooling layers. Let us add a third convolutional layer to our model and then the dropout layer.
 
@@ -302,7 +302,7 @@ x_dropout = keras.layers.Conv2D(32, (3, 3), activation='relu')(x_dropout)
 x_dropout = keras.layers.MaxPooling2D((2, 2))(x_dropout)
 # Second Convolutional layer with 64 filters, 3x3 kernel size, and ReLU activation
 x_dropout = keras.layers.Conv2D(64, (3, 3), activation='relu')(x_dropout) # This is new!
-# Dropout layer andomly drops 60% of the input units
+# Dropout layer andomly drops 60 per cent of the input units
 x_dropout = keras.layers.Dropout(0.6)(x_dropout) # This is new!
 # Flatten layer to convert 2D feature maps into a 1D vector
 x_dropout = keras.layers.Flatten()(x_dropout)
@@ -353,7 +353,7 @@ Non-trainable params: 0 (0.00 Byte)
 _________________________________________________________________
 ```
 
-We can see that the dropout does not alter the dimensions of the image, and has zero parameters.
+Note the dropout does not alter the dimensions of the image and has zero parameters.
 
 ::::::::::::::::::::::::::::::::::::: challenge
 
@@ -389,12 +389,12 @@ fig.suptitle('cifar_model_dropout')
 sns.lineplot(ax=axes[0], data=history_dropout_df[['loss', 'val_loss']])
 sns.lineplot(ax=axes[1], data=history_dropout_df[['accuracy', 'val_accuracy']])
 
-val_loss_dropout, val_acc_dropout = model_dropout.evaluate(val_images,  val_labels, verbose=2)
+val_loss_dropout, val_acc_dropout = model_dropout.evaluate(val_images, val_labels, verbose=2)
 ```
 
 ![](fig/04_model_dropout_accuracy_loss.png){alt='two panel figure; the figure on the left shows the training loss starting at 1.7 and decreasing to 1.0 and the validation loss decreasing from 1.4 to 0.9 before leveling out; the figure on the right shows the training accuracy increasing from 0.40 to 0.65 and the validation accuracy increasing from 0.5 to 0.67'}
 
-Here we see the relatively uncommon situation where our training loss is higher than our validation loss while the validation accuracy is higher than the training accuracy. If you are using dropout or other regularization techniques during training, they might lead to a lower training accuracy. 
+In this relatively uncommon ,  the training loss is higher than our validation loss while the validation accuracy is higher than the training accuracy. Using dropout or other regularization techniques during training can lead to a lower training accuracy.
 
 Dropout randomly "drops out" units during training, which can prevent the model from fitting the training data too closely. This regularization effect may lead to a situation where the model generalizes better on the validation set.
 
@@ -441,15 +441,17 @@ Based on our evaluation of the loss and accuracy metrics, the `model_dropout` ap
 
 ::::::::::::::::::::::::::::::::::::: keypoints 
 
-- Use model.compile to compile a CNN
-- The choice of loss function will depend on your dataset and aim
-- The choice of optimizer often depends on experimentation and empirical evaluation
-- Use model.fit to make a train (fit) a CNN
-- Training/validation loss and accuracy can be used to evaluate a model during training - Dropout is one way to prevent overfitting
+- Use model.compile to compile a CNN.
+- The choice of loss function will depend on your data and aim.
+- The choice of optimizer often depends on experimentation and empirical evaluation.
+- Use model.fit to make a train (fit) a CNN.
+- Training/validation loss and accuracy can be used to evaluate a model during training.
+- Dropout is one way to prevent overfitting.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 <!-- Collect your link references at the bottom of your document -->
+
 [loss documentation]: https://keras.io/api/losses/
 [optimizer documentation]: https://keras.io/api/optimizers/
 [metrics]: https://keras.io/api/metrics/
diff --git a/episodes/05-evaluate-predict-cnn.md b/episodes/05-evaluate-predict-cnn.md
index ae48720f..6b8f84b4 100644
--- a/episodes/05-evaluate-predict-cnn.md
+++ b/episodes/05-evaluate-predict-cnn.md
@@ -15,11 +15,11 @@ exercises: 2
 
 ::::::::::::::::::::::::::::::::::::: objectives
 
-- Use a convolutional neural network (CNN) to make a prediction (ie classify an image)
-- Explain how to measure the performance of a CNN
-- Explain hyperparameter tuning
-- Be familiar with advantages and disadvantages of different optimizers
-- Understand what steps to take to improve model accuracy
+- Use a convolutional neural network (CNN) to make a prediction (i.e. classify an image).
+- Explain how to measure the performance of a CNN.
+- Explain hyperparameter tuning.
+- Be familiar with advantages and disadvantages of different optimizers.
+- Understand what steps to take to improve model accuracy.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -33,11 +33,9 @@ Recall in [Episode 02 Introduction to Image Data](episodes/02-image-data.md) we
 
 When creating and using a test set there are a few things to check:
 
-- it only contains images that the model has never seen before
-- it is sufficiently large to provide a meaningful evaluation of model performance
-  - images from every target label
-  - images of classes not in your target set
-- it is processed in the same way as your training set
+- It only contains images that the model has never seen before.
+- It is sufficiently large to provide a meaningful evaluation of model performance. It should include images from every target label and images of classes not in your target set.
+- It is processed in the same way as your training set.
 
 Check to make sure you have a model in memory and a test dataset:
 
@@ -46,7 +44,7 @@ Check to make sure you have a model in memory and a test dataset:
 model_best = keras.models.load_model('fit_outputs/model_dropout.h5') # pick your best model
 print('We are using', model_best.name)
 
-# load the cifar dataset included with the keras packages
+# load the CIFAR-10 dataset included with the keras packages
 (train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()
 
 # normalize the RGB values to be between 0 and 1
@@ -63,7 +61,7 @@ print('The number and shape of images in our test dataset is:', test_images.shap
 print('The number of labels in our test dataset is:', len(test_labels))
 ```
 ```output
-We are using  cifar_model_dropout
+We are using cifar_model_dropout
 The number and shape of images in our test dataset is:  (10000, 32, 32, 3)
 The number of labels in our test dataset is:  10000
 ```
@@ -185,8 +183,8 @@ We can then use the `heatmap` function from seaborn to create a nice visualizati
 sns.heatmap(confusion_df, annot=True)
 ```
 
-- the `annot=True` parameter here will put the numbers from the confusion matrix in the heatmap
-- the `fmt=3g` will display the values with 3 significant digits
+- The `annot=True` parameter here will put the numbers from the confusion matrix in the heatmap.
+- The `fmt=3g` will display the values with three significant digits.
 
 ![](fig/05_pred_v_true_confusion_matrix.png){alt='Confusion matrix of model predictions where the color scale goes from black to light to represent values from 0 to the total number of test observations in our test set of 1000. The diagonal has much lighter colors indicating our model is predicting well but a few non-diagonal cells also have a ligher color to show where the model is making prediction errors.'}
 
@@ -199,7 +197,7 @@ Measure the performance of the neural network you trained and visualized as a co
 
 Q1. Did the neural network perform well on the test set?
 
-Q2. Did you expect this from the training loss you saw?
+Q2. Did you expect this from the training loss plot?
 
 Q3. What could we do to improve the performance?
 
@@ -207,9 +205,9 @@ Q3. What could we do to improve the performance?
 
 Q1. The confusion matrix shows that the predictions are not bad but can improved.
 
-Q2. I expected the performance to be better than average because the accuracy of the model I chose was 67% on the validation set.
+Q2. I expected the performance to be better than average because the accuracy of the model I chose was 67 per cent on the validation set.
 
-Q3. We can try many things to improve the performance from here. One of the first things we can try is to change the network architecture. However, in the interest of time and given we already saw how to build a CNN we will try to change the training parameters.
+Q3. We can try many things to improve the performance from here. One of the first things we can try is to change the network architecture. However, in the interest of time, and given we already learned how to build a CNN, we will now change the training parameters.
 
 :::::::::::::::::::::::::::::::::
 ::::::::::::::::::::::::::::::::::::::::::::::::
@@ -245,7 +243,7 @@ One common method for hyperparameter tuning is by using a `for` loop to change a
 
 ## Tune Dropout Rate using a For Loop
 
-Q1. What do you think would happen if you lower the dropout rate? Write some code to vary the dropout rate and see how it affects the model training.
+Q1. What do you think would happen if you lower the dropout rate? Write some code to vary the dropout rate and investigate how it affects the model training.
 
 Q2. You are varying the dropout rate and checking its effect on the model performance, what is the term associated to this procedure?
 
@@ -253,9 +251,9 @@ Q2. You are varying the dropout rate and checking its effect on the model perfor
 
 Q1. Varying the dropout rate
 
-The code below instantiates and trains a model with varying dropout rates. You can see from the resulting plot that the ideal dropout rate in this case is around 0.45. This is where the validation loss is lowest.
+The code below instantiates and trains a model with varying dropout rates. The resulting plot indicates the ideal dropout rate in this case is around 0.45. This is where the validation loss is lowest.
 
-- NB1: It takes a while to train these 5 networks
+- NB1: It takes a while to train these five networks.
 - NB2: You should do this with a test set and not with the validation set!
 
 ```python
@@ -287,7 +285,7 @@ for dropout_rate in dropout_rates:
     x_vary = keras.layers.MaxPooling2D((2, 2))(x_vary)
     # Second Convolutional layer with 64 filters, 3x3 kernel size, and ReLU activation
     x_vary = keras.layers.Conv2D(64, (3, 3), activation='relu')(x_vary)
-    # Dropout layer randomly drops x% of the input units
+    # Dropout layer randomly drops x per cent of the input units
     x_vary = keras.layers.Dropout(dropout_rate)(x_vary) # This is new!
     # Flatten layer to convert 2D feature maps into a 1D vector
     x_vary = keras.layers.Flatten()(x_vary)
@@ -310,12 +308,12 @@ for dropout_rate in dropout_rates:
                    validation_data = (val_images, val_labels),
                    batch_size = 32)
 
-    val_loss_vary, val_acc_vary = model_vary.evaluate(val_images,  val_labels)
+    val_loss_vary, val_acc_vary = model_vary.evaluate(val_images, val_labels)
     val_losses_vary.append(val_loss_vary)
     
 loss_df = pd.DataFrame({'dropout_rate': dropout_rates, 'val_loss_vary': val_losses_vary})
 
-sns.lineplot(data=loss_df, x='dropout_rate', y='val_loss_vary')    
+sns.lineplot(data=loss_df, x='dropout_rate', y='val_loss_vary')
 ```
 ![](fig/05_vary_dropout_rate.png){alt='test loss plotted against five dropout rates ranging from 0.15 to 0.75 where the minimum test loss appears to occur between 0.4 and 0.5'}
 
@@ -338,7 +336,7 @@ For instance, suppose you're tuning two hyperparameters:
 
 - Batch size: with possible values [10, 50, 100]
 
-- GridSearch will evaluate the model for all 3x3 = 9 combinations (e.g., {0.01, 10}, {0.01, 50}, {0.1, 10}, and so on)
+- GridSearch will evaluate the model for all 3*3 = 9 combinations (e.g., {0.01, 10}, {0.01, 50}, {0.1, 10}, and so on).
 
 ::::::::::::::::::::::::::::::::::::: challenge 
 
@@ -398,15 +396,13 @@ grid_result = grid.fit(train_images, train_labels)
 # Summarize results
 print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
 ```
-Output from the GridSearch process should look similar to:
-
 ```output
 Best: 0.586660 using {'optimizer': 'RMSprop'}
 ```
 
 Thus, we can interpret from this output that our best tested optimiser is the **root mean square propagation** optimiser, or RMSprop.
 
-Curious about RMSprop? Read more here: [RMSprop in Keras] and [RMSProp, Cornell University].
+Curious about RMSprop? [RMSprop in Keras] and [RMSProp, Cornell University]
 
 :::::::::::::::::::::::::::::::::
 ::::::::::::::::::::::::::::::::::::::::::::::::
@@ -500,7 +496,7 @@ plt.show()
 
 ![](fig/05_tune_activation_results.png){alt='Validation accuracy plotted against ten epochs for five different activations functions. relu and Leaky relu have the highest accuracy atound 0.60; sigmoid and selu are next with accuracy around 0.45 and tanh has the lowest accuracy of 0.35'}
 
-You can see in this figure that after 10 epochs the `ReLU` and `Leaky ReLU` activation functions appear to converge around 0.60% validation accuracy. We recommend when tuning your model to ensure you use enough epochs to be confident in your results.
+In this figure, after 10 epochs, the `ReLU` and `Leaky ReLU` activation functions appear to converge around 0.60 per cent validation accuracy. We recommend when tuning your model to ensure you use enough epochs to be confident in your results.
 
 :::::::::::::::::::::::::::::::::
 ::::::::::::::::::::::::::::::::::::::::::::::::
@@ -509,14 +505,14 @@ You can see in this figure that after 10 epochs the `ReLU` and `Leaky ReLU` acti
 
 ## Open question: What could be next steps to further improve the model?
 
-With unlimited options to modify the model architecture or to play with the training parameters, deep learning can trigger very extensive hunting for better and better results. Usually models are "well behaving" in the sense that small chances to the architectures also only result in small changes of the performance (if any). It is often tempting to hunt for some magical settings that will lead to much better results. But do those settings exist? Applying common sense is often a good first step to make a guess of how much better could results be. In the present case we might certainly not expect to be able to reliably predict sunshine hours for the next day with 5-10 minute precision. But how much better our model could be exactly, often remains difficult to answer.
+With unlimited options to modify the model architecture or to play with the training parameters, deep learning can trigger very extensive hunting for better and better results. Usually models are "well behaving" in the sense that small chances to the architectures also only result in small changes of the performance (if any). It is often tempting to hunt for some magical settings that will lead to much better results. But do those settings exist? Applying common sense is often a good first step to make a guess of how much better could results be.
 
 - What changes to the model architecture might make sense to explore?
 - Ignoring changes to the model architecture, what might notably improve the prediction quality?
 
 :::::::::::::::::::::::: solution 
 
-This is an open question. And we don't actually know how far one could push this sunshine hour prediction (try it out yourself if you like! We're curious!). But there is a few things that might be worth exploring.
+This is an open question.
 
 Regarding the model architecture:
 
@@ -524,11 +520,9 @@ Regarding the model architecture:
 
 Other changes that might impact the quality notably:
 
-- The most obvious answer here would be: more data! Even this will not always work (e.g. if data is very noisy and uncorrelated, more data might not add much).
+- The most obvious answer here would be: more data! Even this will not always work (e.g., if data is very noisy and uncorrelated, more data might not add much).
 - Related to more data: use data augmentation. By creating realistic variations of the available data, the model might improve as well.
-- More data can mean more data points (you can test it yourself by taking more than the 3 years we used here!)
-- More data can also mean more features! What about adding the month?
-- The labels we used here (sunshine hours) are highly biased, many days with no or nearly no sunshine but few with >10 hours. Techniques such as oversampling or undersampling might handle such biased labels better. Another alternative would be to not only look at data from one day, but use the data of a longer period such as a full week. This will turn the data into time series data which in turn might also make it worth to apply different model architectures....
+- More data can mean more data points and also more features!
 
 :::::::::::::::::::::::::::::::::
 ::::::::::::::::::::::::::::::::::::::::::::::
@@ -537,14 +531,15 @@ By now you should have a well-trained, finely-tuned model that makes accurate pr
 
 ::::::::::::::::::::::::::::::::::::: keypoints 
 
-- Use model.predict to make a prediction with your model
-- Model accuracy must be measured on a test dataset with images your model has not seen before
-- There are many hyperparameters to choose from to improve model performance
-- Fitting separate models with different hyperparameters and comparing their performance is a common and good practice in deep learning
+- Use model.predict to make a prediction with your model.
+- Model accuracy must be measured on a test dataset with images your model has not seen before.
+- There are many hyperparameters to choose from to improve model performance.
+- Fitting separate models with different hyperparameters and comparing their performance is a common and good practice in deep learning.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 <!-- Collect your link references at the bottom of your document -->
+
 [RMSprop in Keras]: https://keras.io/api/optimizers/rmsprop/
 [RMSProp, Cornell University]: https://optimization.cbe.cornell.edu/index.php?title=RMSProp
 
diff --git a/episodes/06-conclusion.md b/episodes/06-conclusion.md
index 2b5d08d3..3db07669 100644
--- a/episodes/06-conclusion.md
+++ b/episodes/06-conclusion.md
@@ -14,10 +14,10 @@ exercises: 2
 
 ::::::::::::::::::::::::::::::::::::: objectives
 
-- Learn how to save and load models
-- Know where to look for pretrained models
-- Understand what a GPU is and what it can do for you
-- Explain when to use a CNN and when not to
+- Learn how to save and load models.
+- Know where to search for pretrained models.
+- Understand what a GPU is and what it can do for you.
+- Explain when to use a CNN and when not to.
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 
@@ -25,48 +25,44 @@ exercises: 2
 
 Now that we have a trained network that performs at a level we are happy with and can maintain high prediction accuracy on a test dataset we might want to consider publishing a file with both the architecture of our network and the weights which it has learned (assuming we did not use a pre-trained network). This will allow others to use it as as pre-trained network for their own purposes and for them to (mostly) reproduce our result.
 
-We have already seen how to save a model with `model.save`:
-```
-#model.save('model_final.h5')
+Use `model.save` to save a model:
+
+```python
+# save best model
+model.save('model_best.keras')
 ```
 
-The `save` method is actually an alias for `tf.keras.saving.save_model()` where the default `save_format=NONE`. By adding the extension **.h5** to our filename, keras will save the model in the legacy HDF5 format.
+The `save` method is actually an alias for `tf.keras.saving.save_model()` where the default `save_format=NONE`.
 
-This saved model can be loaded again by using the `load_model` method as follows:
+This saved model can be loaded again by using the `load_model` method:
 
 ```python
 # load a saved model
-pretrained_model = keras.models.load_model('model_final.h5')
+pretrained_model = keras.models.load_model('model_best.keras')
 ```
 
 This loaded model can be used as before to predict.
 
 ```python
-# use the pretrained model here
-from icwithcnn_functions import prepare_image_icwithcnn
+# use the pretrained model to predict the class name of the first test image
+result_pretrained = model_intro.predict(test_images[0].reshape(1,32,32,3))
 
-new_img_path = "../data/Jabiru_TGS.JPG" # path to image
-new_img_prepped = prepare_image_icwithcnn(new_img_path)
-
-# predict the class name
-y_pretrained_pred = pretrained_model.predict(new_img_prepped)
-pretrained_predicted_class = class_names[y_pretrained_pred.argmax()]
-print(pretrained_predicted_class)
+print('The predicted probability of each class is: ', result_pretrained.round(4))
+print('The class with the highest predicted probability is: ', class_names[result_pretrained.argmax()])
 ```
 ```output
-frog
+cat
 ```
 
-The HDF5 file format contains:
+The saved .keras file contains:
 
-- configuration (architecture)
-- weights
-- optimizer's state (if any)
-  - allows you to continue training; useful for checkpointing
+- The model's configuration (architecture).
+- The model's weights.
+- The model's optimizer's state (if any).
 
-Note that saving the model does not save the training history (ie training and validation loss and accuracy). For that you will need to save the model history dataframe we created for plotting.
+Note that saving the model does not save the training history (i.e. training and validation loss and accuracy). For that you will need to save the model history dataframe we created for plotting.
 
-To find out more about other file formats you can use to save your model see the Keras documentation for [Saving and Serialization].
+The Keras documentation for [Saving and Serialization] explains other ways to save your model.
 
 To share your model with a wider audience it is recommended you create git repository, such as [GitHub], and upload your code, images, and model outputs to the cloud. In some cases, you may be able to offer up your model to an online repository of pretrained models.
 
@@ -98,13 +94,13 @@ A couple of those libraries include:
 
 A **GPU**, or **Graphics Processing Unit**, is a specialized electronic circuit designed to accelerate graphics rendering and image processing in a computer. In the context of deep learning and machine learning, GPUs have become essential due to their ability to perform parallel computations at a much faster rate compared to traditional central processing units (CPUs). This makes them well-suited for the intensive matrix and vector operations that are common in deep learning algorithms.
 
-As you have seen in this lesson, training CNN models can take a long time. If you follow the steps presented here you will find you are training multiple models to find the one best suited to your needs, particularly when fine tuning hyperparameters. However you have also seen that running on CPU only machines can be done! So while a GPU is not an absolute requirement for deep learning, it can significantly accelerate your deep learning work and make it more efficient, especially for larger and more complex tasks. 
+As you have experienced in this lesson, training CNN models can take a long time. If you follow the steps presented here you will find you are training multiple models to find the one best suited to your needs, particularly when fine tuning hyperparameters. However you have also seen that running on CPU only machines can be done! So while a GPU is not an absolute requirement for deep learning, it can significantly accelerate your deep learning work and make it more efficient, especially for larger and more complex tasks. 
 
 If you don't have access to a powerful GPU locally, you can use cloud services that provide GPU instances for deep learning. This can be a cost-effective option for many users.
 
 #### It this the best/only way to code up CNN's for image classification?
 
-Absolutely not! The code we used in today's workshop might today be considered old fashioned. A lot of the data preprocessing we did by hand can now be done by simply adding different layer types to your model. See, for example, the [preprocessing layers] available with keras.
+Absolutely not! The code we used in today's workshop might today be considered old fashioned. A lot of the data preprocessing we did by hand can now be done by adding different layer types to your model. The [preprocessing layers] section fo the Keras documentation provides several examples.
 
 The point is that this technology, both hardware and software, is dynamic and changing at exponentially increasing rates. It is essential to stay curious and open to learning and follow up with continuous education and practice. Other strategies to stay informed include:
 
@@ -118,7 +114,7 @@ The point is that this technology, both hardware and software, is dynamic and ch
 
 #### What other uses are there for neural networks?
 
-In addition to image classification, we saw in the introduction other computer vision tasks including object detection and instance and semantic segmentation. These can all be done with CNN's and are readily transferable to videos. Also included in these tasks is medical imaging for diagnoses of disease and, of course, facial recognition. 
+In addition to image classification, [Episode 01 Introduction to Deep Learning](episodes/01-introduction.md) introduced other computer vision tasks, including object detection and instance and semantic segmentation. These can all be done with CNNs and are readily transferable to videos. Also included in these tasks is medical imaging for diagnoses of disease and, of course, facial recognition. 
 
 However, there are many other tasks which CNNs are well suited for:
 
@@ -136,11 +132,12 @@ However, there are many other tasks which CNNs are well suited for:
 - Deep Learning is well suited to classification and prediction problems such as image recognition.
 - To use Deep Learning effectively we need to go through a workflow of: defining the problem, identifying inputs and outputs, preparing data, choosing the type of network, choosing a loss function, training the model, tuning Hyperparameters, measuring performance before we can classify data.
 - Keras is a Deep Learning library that is easier to use than many of the alternatives such as TensorFlow and PyTorch.
-- Graphical Processing Units are useful, though not essential, for deep learning tasks
+- Graphical Processing Units are useful, though not essential, for deep learning tasks.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 <!-- Collect your link references at the bottom of your document -->
+
 [Saving and Serialization]: https://keras.io/api/saving/
 [GitHub]: https://github.com/
 [Model Zoo]: https://modelzoo.co/
diff --git a/episodes/setup-gpu.md b/episodes/setup-gpu.md
index 82119b5d..4d547b0d 100644
--- a/episodes/setup-gpu.md
+++ b/episodes/setup-gpu.md
@@ -2,7 +2,7 @@
 title: "Setup - GPU"
 ---
 
-This lesson is designed for Software Carpentry users who have completed [Plotting and Programming in Python] and are looking to jump straight into image classification. We recognize that this jump is quite large and have done our best to provide the content and code to perform these types of analyses.
+This lesson is designed for Software Carpentry users who have completed [Plotting and Programming in Python] and want to jump straight into image classification. We recognize this jump is quite large and have done our best to provide the content and code to perform these types of analyses.
 
 The default [Setup](../learners/setup.md) is for CPU only environments.
 
@@ -86,9 +86,9 @@ Conda should already be available in your system once you installed Anaconda suc
 
 The easiest way to create a conda environment for this lesson is to use the Anaconda Prompt. You can search for "anaconda prompt" using the Windows search function (Windows Logo Key) or Spotlight on macOS (Command + spacebar).
 
-![](fig/00_anaconda_prompt_search.png){alt='Screenshot of what the Anaconda Prompt application looks like'}
+![](fig/00_anaconda_prompt_search.png){alt='Screenshot of the Anaconda Prompt application'}
 
-A terminal window will open with the title 'Anaconda Prompt' that looks like this:
+A terminal window will open with the title 'Anaconda Prompt':
 
 ![](fig/00_anaconda_prompt_window.png){alt='Screenshot of the terminal window that opens when you launch the Anaconda Prompt application'}
 
diff --git a/learners/setup.md b/learners/setup.md
index a79b42da..41f3acfa 100644
--- a/learners/setup.md
+++ b/learners/setup.md
@@ -2,7 +2,7 @@
 title: "Setup - CPU"
 ---
 
-This lesson is designed for Software Carpentry users who have completed [Plotting and Programming in Python] and are looking to jump straight into image classification. We recognize that this jump is quite large and have done our best to provide the content and code to perform these types of analyses.
+This lesson is designed for Software Carpentry users who have completed [Plotting and Programming in Python] and want to jump straight into image classification. We recognize this jump is quite large and have done our best to provide the content and code to perform these types of analyses.
 
 It uses the Anaconda package manager to install the required python packages, including the Spyder IDE. 
 
@@ -89,9 +89,9 @@ Conda should already be available in your system once you installed Anaconda suc
 
 The easiest way to create a conda environment for this lesson is to use the Anaconda Prompt. You can search for "anaconda prompt" using the Windows search function (Windows Logo Key) or Spotlight on macOS (Command + spacebar).
 
-![](fig/00_anaconda_prompt_search.png){alt='Screenshot of what the Anaconda Prompt application looks like'}
+![](fig/00_anaconda_prompt_search.png){alt='Screenshot of the Anaconda Prompt application'}
 
-A terminal window will open with the title 'Anaconda Prompt' that looks like this:
+A terminal window will open with the title 'Anaconda Prompt':
 
 ![](fig/00_anaconda_prompt_window.png){alt='Screenshot of the terminal window that opens when you launch the Anaconda Prompt application'}