Merge pull request #251 from sergiopaniego/main

Reviewed general punctuation and fixed some broken links
johko · Jul 17, 2024 · f532359 · f532359
2 parents e65cc91 + 1c69233
commit f532359
Show file tree

Hide file tree

Showing 35 changed files with 139 additions and 132 deletions.
diff --git a/chapters/en/unit0/welcome/welcome.mdx b/chapters/en/unit0/welcome/welcome.mdx
@@ -12,7 +12,7 @@ On this page, you can find how to join the learners community, make a submission
 
 To obtain your certification for completing the course, complete the following assignments:
 
-1. Training/fine-tuning a Model
+1. Training/fine-tuning a model
 2. Building an application and hosting it on Hugging Face Spaces
 
 ### Training/fine-tuning a Model
@@ -21,7 +21,8 @@ There are notebooks under the Notebooks/Vision Transformers section. As of now,
 
 The model repository needs to have the following:
 
-1. A properly filled model card, you can check out [here for more information](https://huggingface.co/docs/hub/en/model-cards)
+
+1. A properly filled model card, you can check out [here for more information](https://huggingface.co/docs/hub/en/model-cards).
 2. If you trained a model with transformers and pushed it to Hub, the model card will be generated. In that case, edit the card and fill in more details.
 3. Add the dataset’s ID to the model card to link the model repository to the dataset repository.
 
@@ -34,7 +35,7 @@ In this assignment section, you'll be building a Gradio-based application for yo
 
 ## Certification 🥇
 
-Once you've finished the assignments — Training/fine-tuning a Model and Creating a Space — please complete the [form](https://forms.gle/isiVSw59oiiHP6pN9) with your name, email, and links to your model and Space repositories to receive your certificate
+Once you've finished the assignments — Training/fine-tuning a Model and Creating a Space — please complete the [form](https://forms.gle/isiVSw59oiiHP6pN9) with your name, email, and links to your model and Space repositories to receive your certificate.
 
 ## Join the community!
 
@@ -50,8 +51,8 @@ There are many channels focused on various topics on our Discord server. You wil
 
 As a computer vision course learner, you may find the following set of channels particularly relevant:
 
-- `#computer-vision`: a catch-all channel for everything related to computer vision.
-- `#cv-study-group`: a place to exchange ideas, ask questions about specific posts and start discussions.
+- `#computer-vision`: a catch-all channel for everything related to computer vision
+- `#cv-study-group`: a place to exchange ideas, ask questions about specific posts and start discussions
 - `#3d`: a channel to discuss aspects of computer vision specific to 3D computer vision
 
 If you are interested in generative AI, we also invite you to join all channels related to the Diffusion Models: #core-announcements, #discussions, #dev-discussions, and #diff-i-made-this.

diff --git a/chapters/en/unit1/feature-extraction/feature-matching.mdx b/chapters/en/unit1/feature-extraction/feature-matching.mdx
@@ -8,7 +8,7 @@ Imagine you have a giant box of puzzle pieces, and you're trying to find a speci
 
 Now that we have an intuitive idea of how brute-force matches are found, let's dive into the algorithms. We are going to use the descriptors that we learned about in the previous chapter to find the matching features in two images.
 
-First install and load libraries
+First install and load libraries.
 
 ```bash
 !pip install opencv-python
@@ -137,13 +137,13 @@ We also create a dictionary to specify the maximum leafs to visit as follows.
 search_params = dict(checks=50)
 ```
 
-Initiate SIFT detector
+Initiate SIFT detector.
 
 ```python
 sift = cv2.SIFT_create()
 ```
 
-Find the keypoints and descriptors with SIFT
+Find the keypoints and descriptors with SIFT.
 
 ```python
 kp1, des1 = sift.detectAndCompute(img1, None)
@@ -259,7 +259,7 @@ Fm, inliers = cv2.findFundamentalMat(mkpts0, mkpts1, cv2.USAC_MAGSAC, 0.5, 0.999
 inliers = inliers > 0
 ```
 
-Finally, we can visualize the matches
+Finally, we can visualize the matches.
 
 ```python
 draw_LAF_matches(

diff --git a/chapters/en/unit1/image_and_imaging/examples-preprocess.mdx b/chapters/en/unit1/image_and_imaging/examples-preprocess.mdx
@@ -9,7 +9,7 @@ In digital image processing, operations on images are diverse and can be categor
 - Statistical
 - Geometrical
 - Mathematical
-- Transform operations.
+- Transform operations
 
 Each category encompasses different techniques, such as morphological operations under logical operations or fourier transforms and principal component analysis (PCA) under transforms. In this context, we refer to morphology as the group of operations that use structuring elements to generate images of the same size by looking into the values of the pixel neighborhood. Understanding the distinction between element-wise and matrix operations is important in image manipulation. Element-wise operations, such as raising an image to a power or dividing it by another image, involve processing each pixel individually. This pixel-based approach contrasts with matrix operations, which utilize matrix theory for image manipulation. Having said that, you can do whatever you want with images, as they are matrices containing numbers!
 

diff --git a/chapters/en/unit1/image_and_imaging/imaging.mdx b/chapters/en/unit1/image_and_imaging/imaging.mdx
@@ -16,7 +16,7 @@ The core of digital image formation is the function \\(f(x,y)\\), which is deter
 </div>
 
 In transmission-based imaging, such as X-rays, transmissivity takes the place of reflectivity. The digital representation of an image is essentially a matrix or array of numerical values, each corresponding to a pixel. The process of transforming continuous image data into a digital format is twofold: 
-- Sampling, which digitizes the coordinate values
+- Sampling, which digitizes the coordinate values.
 - Quantization, which converts amplitude values into discrete quantities. 
 
 The resolution and quality of a digital image significantly depend on the following:

diff --git a/chapters/en/unit10/blenderProc.mdx b/chapters/en/unit10/blenderProc.mdx
@@ -98,16 +98,20 @@ It is specifically created to help in the generation of realistic looking images
 You can install BlenderProc via pip:
 
 ```bash
- pip install blenderProc
+pip install blenderProc
 ```
 
 Alternately, you can clone the official [BlenderProc repository](https://github.com/DLR-RM/BlenderProc) from GitHub using Git:
 
-`git clone https://github.com/DLR-RM/BlenderProc`
+```bash
+git clone https://github.com/DLR-RM/BlenderProc
+```
 
 BlenderProc must be run inside the blender python environment (bpy), as this is the only way to access the Blender API.
 
-`blenderproc run <your_python_script>`
+```bash
+blenderproc run <your_python_script>
+```
 
 You can check out this notebook to try BlenderProc in Google Colab, demos the basic examples provided [here](https://github.com/DLR-RM/BlenderProc/tree/main/examples/basics).
 Here are some images rendered with the basic example:

diff --git a/chapters/en/unit10/datagen-diffusion-models.mdx b/chapters/en/unit10/datagen-diffusion-models.mdx
@@ -59,7 +59,7 @@ This means we have many tools under our belt to generate synthetic data!
 
 ## Approaches to Synthetic Data Generation
 
-There are generally three cases for needing synthetic data,
+There are generally three cases for needing synthetic data:
 
 **Extending an existing dataset:**
 

diff --git a/chapters/en/unit10/point_clouds.mdx b/chapters/en/unit10/point_clouds.mdx
@@ -22,22 +22,22 @@ The 3D Point Data is mainly used in self-driving capabilities, but now other AI
 
 ## Generation and Data Representation
 
-We will be using the python library [point-cloud-utils](https://github.com/fwilliams/point-cloud-utils), and [open-3d](https://github.com/isl-org/Open3D), which can be installed by
+We will be using the python library [point-cloud-utils](https://github.com/fwilliams/point-cloud-utils), and [open-3d](https://github.com/isl-org/Open3D), which can be installed by:
 
 ```bash
- pip install point-cloud-utils
+pip install point-cloud-utils
 ```
 
-We will be also using the python library open-3d, which can be installed by
+We will be also using the python library open-3d, which can be installed by:
 
 ```bash
- pip install open3d
+pip install open3d
 ```
 
-OR a Smaller CPU only version
+OR a Smaller CPU only version:
 
 ```bash
- pip install open3d-cpu
+pip install open3d-cpu
 ```
 
 Now, first we need to understand the formats in which these point clouds are stored in, and for that, we need to look at mesh cloud.
@@ -53,13 +53,13 @@ The type of file is inferred from its file extension. Some of the extensions sup
 
 - A simple PLY object consists of a collection of elements for representation of the object. It consists of a list of (x,y,z) triplets of a vertex and a list of faces that are actually indices into the list of vertices.
 - Vertices and faces are two examples of elements and the majority of the PLY file consists of these two elements.
-- New properties can also be created and attached to the elements of an object, but these should be added in such a way that old programs do not break when these new properties are encountered
+- New properties can also be created and attached to the elements of an object, but these should be added in such a way that old programs do not break when these new properties are encountered.
 
 ** STL (Standard Tessellation Language) **
 
 - This format approximates the surfaces of a solid model with triangles.
 - These triangles are also known as facets, where each facet is described by a perpendicular direction and three points representing the vertices of the triangle.
-- However, these files have no description of Color and Texture
+- However, these files have no description of Color and Texture.
 
 ** OFF (Object File Format) **
 
@@ -77,11 +77,11 @@ The type of file is inferred from its file extension. Some of the extensions sup
 
 - X3D is an XML based 3D graphics file format for presentation of 3D information. It is a modular standard and is defined through several ISO specifications.
 - The format supports vector and raster graphics, transparency, lighting effects, and animation settings including rotations, fades, and swings.
-- X3D has the advantage of encoding color information (unlike STL) that is used during printing the model on a color 3D printer
+- X3D has the advantage of encoding color information (unlike STL) that is used during printing the model on a color 3D printer.
 
 ** DAE (Digital Asset Exchange) **
 
 - This is an XML schema which is an open standard XML schema, from which DAE files are built.
-- This file format is based on the COLLADA (COLLAborative Design Activity) XML schema which is an open standard XML schema for the exchange of digital assets among graphics software applications
+- This file format is based on the COLLADA (COLLAborative Design Activity) XML schema which is an open standard XML schema for the exchange of digital assets among graphics software applications.
 - The format's biggest selling point is its compatibility across multiple platforms.
 - COLLADA files aren't restricted to one program or manufacturer. Instead, they offer a standard way to store 3D assets.
diff --git a/chapters/en/unit10/synthetic-lung-images.mdx b/chapters/en/unit10/synthetic-lung-images.mdx
@@ -15,7 +15,7 @@ The generator has the following model architecture:
   - Conv2D layer
   - Batch Normalization layer
   - ReLU activation
-- Conv2D layer with Tanh activation
+- Conv2D layer with Tanh activation.
 
 The discriminator has the following model architecture:
 
@@ -27,7 +27,7 @@ The discriminator has the following model architecture:
   - Conv2D layer
   - Batch Normalization layer
   - Leaky ReLU activation
-- Conv2D layer with Sigmoid
+- Conv2D layer with Sigmoid.
 
 **Data Collection**
 

diff --git a/chapters/en/unit10/synthetic_datasets.mdx b/chapters/en/unit10/synthetic_datasets.mdx
@@ -40,7 +40,7 @@ Semantic segmentation is vital for autonomous vehicles to interpret and navigate
 | Name        | Year | Description | Paper |  | Additional Links |
 |---------------------|--------------|-------------|----------------|---------------------|---------------------|
 | Virtual KITTI 2      | 2020          | Virtual Worlds as Proxy for Multi-Object Tracking Analysis | [Virtual KITTI 2](https://arxiv.org/pdf/2001.10773.pdf) |  | [Website](https://europe.naverlabs.com/Research/Computer-Vision/Proxy-Virtual-Worlds/) |
-| ApolloScape | 2019 | Compared with existing public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labeling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labeling, lane-mark labeling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities, and daytimes | [The ApolloScape Open Dataset for Autonomous Driving and its Application](https://arxiv.org/abs/1803.06184) | | [Website](https://apolloscape.auto/) |
+| ApolloScape | 2019 | Compared with existing public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labeling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labeling, lane-mark labeling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities, and daytimes. | [The ApolloScape Open Dataset for Autonomous Driving and its Application](https://arxiv.org/abs/1803.06184) | | [Website](https://apolloscape.auto/) |
 | Driving in the Matrix | 2017       | The core idea behind "Driving in the Matrix" is to use photo-realistic computer-generated images from a simulation engine to produce annotated data quickly.  | [Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?](https://arxiv.org/pdf/1610.01983.pdf) |  | [GitHub](https://github.com/umautobots/driving-in-the-matrix)  ![GitHub stars](https://img.shields.io/github/stars/umautobots/driving-in-the-matrix.svg?style=social&label=Star) |
 | CARLA | 2017 | **CARLA** (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation). | [CARLA: An Open Urban Driving Simulator](https://arxiv.org/pdf/1711.03938v1.pdf) | | [Website](https://carla.org/) |
 | Synthia             | 2016         | A large collection of synthetic images for semantic segmentation of urban scenes. SYNTHIA consists of a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations for 13 classes: misc, sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist, lane-marking. | [The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes](https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Ros_The_SYNTHIA_Dataset_CVPR_2016_paper.html) |  | [Website](https://synthia-dataset.net/) |

diff --git a/chapters/en/unit12/conclusion.mdx b/chapters/en/unit12/conclusion.mdx
@@ -67,7 +67,7 @@ This is work that highlights and explores techniques for making machine learning
 ### 🧑‍🤝‍🧑 Inclusive
 
 These are projects which broaden the scope of who builds and benefits in the machine learning world. Some examples:
-- Curating diverse datasets that increase the representation of underserved groups
+- Curating diverse datasets that increase the representation of underserved groups.
 - Training language models on languages that aren't yet available on the Hugging Face Hub.
 - Creating no-code and low-code frameworks that allow non-technical folk to engage with AI.
 

diff --git a/chapters/en/unit13/hyena.mdx b/chapters/en/unit13/hyena.mdx
@@ -91,8 +91,8 @@ Some work has been conducted to speed up this computation like FastFFTConv based
 
 ![nd_hyena.png](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/outlook_hyena_images/nd_hyena.png)
 In essence, Hyena can be performed in two steps: 
-1. Compute a set of N+1 linear projections similarly of attention (it can be more than 3 projections)
-2. Mixing up the projections: The matrix \\(H(u)\\) is defined by a combination of matrix multiplications
+1. Compute a set of N+1 linear projections similarly of attention (it can be more than 3 projections).
+2. Mixing up the projections: The matrix \\(H(u)\\) is defined by a combination of matrix multiplications.
 
 ## Why Hyena Matters
 
@@ -113,7 +113,7 @@ Hyena has been applied to N-Dimensional data with the Hyena N-D layer and can be
 here is a noticeable enhancement in GPU memory efficiency with the increase in the number of image patches.
 
 Hyena Hierarchy facilitates the development of larger, more efficient convolution models for long sequences. 
-The potential for Hyena type models for computer vision would be a more efficient GPU memory consumption of patches, that would allow : 
+The potential for Hyena type models for computer vision would be a more efficient GPU memory consumption of patches, that would allow: 
 - The processing of larger, higher-resolution images
 - The use of smaller patches, allowing a fine-graine feature representation 
 

diff --git a/chapters/en/unit2/cnns/convnext.mdx b/chapters/en/unit2/cnns/convnext.mdx
@@ -9,12 +9,13 @@ ConvNext represents a significant improvement to pure convolution models by inco
 ## Key Improvements
 The author of the ConvNeXT paper starts building the model with a regular ResNet (ResNet-50), then modernizes and improves the architecture step-by-step to imitate the hierarchical structure of Vision Transformers.
 The key improvements are:
-- Training Techniques
-- Macro Design
+- Training techniques
+- Macro design
 - ResNeXt-ify
-- Inverted Bottleneck
-- Large Kernel Sizes
-- Micro Design
+- Inverted bottleneck
+- Large kernel sizes
+- Micro design
+
 We will go through each of the key improvements.
 These designs are not novel in itself. However, you can learn how researchers adapt and modify designs systematically to improve existing models.
 To show the effectiveness of each improvement, we will compare the model's accuracy before and after the modification on ImageNet-1K.

diff --git a/chapters/en/unit3/vision-transformers/cvt.mdx b/chapters/en/unit3/vision-transformers/cvt.mdx
@@ -61,7 +61,7 @@ from einops import rearrange
 from einops.layers.torch import Rearrange
 ```
 
-2. Implementation of **Convolutional Projection**.
+2. Implementation of **Convolutional Projection**
 
 ```python
 def _build_projection(self, dim_in, dim_out, kernel_size, padding, stride, method):

diff --git a/chapters/en/unit3/vision-transformers/detr.mdx b/chapters/en/unit3/vision-transformers/detr.mdx
@@ -138,12 +138,12 @@ class DETR(nn.Module):
 ```
 ### Going line by line in the `forward` function: 
 **Backbone**   
-The input image is first put through a ResNet backbone and then a convolution layer, which reduces the dimension to the `hidden_dim`
+The input image is first put through a ResNet backbone and then a convolution layer, which reduces the dimension to the `hidden_dim`.
 ```python
 x = self.backbone(inputs)
 h = self.conv(x)
 ```
-they are declared in the `__init__` function
+They are declared in the `__init__` function.
 ```python
 self.backbone = nn.Sequential(*list(resnet50(pretrained=True).children())[:-2])
 self.conv = nn.Conv2d(2048, hidden_dim, 1)
@@ -171,7 +171,7 @@ self.row_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))
 self.col_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))
 ```
 **Resize**   
-Before going into the transformer, the features with size `(batch size, hidden_dim, H, W)` are reshaped to `(hidden_dim, batch size, H*W)`. This makes them a sequential input for the transformer
+Before going into the transformer, the features with size `(batch size, hidden_dim, H, W)` are reshaped to `(hidden_dim, batch size, H*W)`. This makes them a sequential input for the transformer.
 ```python
 h.flatten(2).permute(2, 0, 1)
 ```
@@ -185,7 +185,7 @@ In the end, the outputs, which is a tensor of size `(query_pos_dim, batch size,
 ```python
 return self.linear_class(h), self.linear_bbox(h).sigmoid()
 ```
-The first of which predicts the class. An additional class is added for the `No Object` class
+The first of which predicts the class. An additional class is added for the `No Object` class.
 ```python
 self.linear_class = nn.Linear(hidden_dim, num_classes + 1)
 ```

diff --git a/chapters/en/unit3/vision-transformers/mobilevit.mdx b/chapters/en/unit3/vision-transformers/mobilevit.mdx
@@ -23,7 +23,7 @@ A diagram of the MobileViT Block is shown below:
 Okay, that's a lot to take in. Let's break that down.
 
 - The block takes in an image with multiple channels. Let's say for an RGB image 3 channels, so the block takes in a three channeled image. 
-- It then performs a N by N convolution on the channels appending them to the existing channels
+- It then performs a N by N convolution on the channels appending them to the existing channels.
 - The block then creates a linear combination of these channels and adds them to the existing stack of channels.
 - For each channel these images are unfolded into flattened patches.
 - Then these flattened patches are passed through a transformer to project them into new patches.