Skip to content

Commit

Permalink
Merge pull request #251 from sergiopaniego/main
Browse files Browse the repository at this point in the history
Reviewed general punctuation and fixed some broken links
  • Loading branch information
johko authored Jul 17, 2024
2 parents e65cc91 + 1c69233 commit f532359
Show file tree
Hide file tree
Showing 35 changed files with 139 additions and 132 deletions.
11 changes: 6 additions & 5 deletions chapters/en/unit0/welcome/welcome.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ On this page, you can find how to join the learners community, make a submission

To obtain your certification for completing the course, complete the following assignments:

1. Training/fine-tuning a Model
1. Training/fine-tuning a model
2. Building an application and hosting it on Hugging Face Spaces

### Training/fine-tuning a Model
Expand All @@ -21,7 +21,8 @@ There are notebooks under the Notebooks/Vision Transformers section. As of now,

The model repository needs to have the following:

1. A properly filled model card, you can check out [here for more information](https://huggingface.co/docs/hub/en/model-cards)

1. A properly filled model card, you can check out [here for more information](https://huggingface.co/docs/hub/en/model-cards).
2. If you trained a model with transformers and pushed it to Hub, the model card will be generated. In that case, edit the card and fill in more details.
3. Add the dataset’s ID to the model card to link the model repository to the dataset repository.

Expand All @@ -34,7 +35,7 @@ In this assignment section, you'll be building a Gradio-based application for yo

## Certification 🥇

Once you've finished the assignments — Training/fine-tuning a Model and Creating a Space — please complete the [form](https://forms.gle/isiVSw59oiiHP6pN9) with your name, email, and links to your model and Space repositories to receive your certificate
Once you've finished the assignments — Training/fine-tuning a Model and Creating a Space — please complete the [form](https://forms.gle/isiVSw59oiiHP6pN9) with your name, email, and links to your model and Space repositories to receive your certificate.

## Join the community!

Expand All @@ -50,8 +51,8 @@ There are many channels focused on various topics on our Discord server. You wil

As a computer vision course learner, you may find the following set of channels particularly relevant:

- `#computer-vision`: a catch-all channel for everything related to computer vision.
- `#cv-study-group`: a place to exchange ideas, ask questions about specific posts and start discussions.
- `#computer-vision`: a catch-all channel for everything related to computer vision
- `#cv-study-group`: a place to exchange ideas, ask questions about specific posts and start discussions
- `#3d`: a channel to discuss aspects of computer vision specific to 3D computer vision

If you are interested in generative AI, we also invite you to join all channels related to the Diffusion Models: #core-announcements, #discussions, #dev-discussions, and #diff-i-made-this.
Expand Down
8 changes: 4 additions & 4 deletions chapters/en/unit1/feature-extraction/feature-matching.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Imagine you have a giant box of puzzle pieces, and you're trying to find a speci

Now that we have an intuitive idea of how brute-force matches are found, let's dive into the algorithms. We are going to use the descriptors that we learned about in the previous chapter to find the matching features in two images.

First install and load libraries
First install and load libraries.

```bash
!pip install opencv-python
Expand Down Expand Up @@ -137,13 +137,13 @@ We also create a dictionary to specify the maximum leafs to visit as follows.
search_params = dict(checks=50)
```

Initiate SIFT detector
Initiate SIFT detector.

```python
sift = cv2.SIFT_create()
```

Find the keypoints and descriptors with SIFT
Find the keypoints and descriptors with SIFT.

```python
kp1, des1 = sift.detectAndCompute(img1, None)
Expand Down Expand Up @@ -259,7 +259,7 @@ Fm, inliers = cv2.findFundamentalMat(mkpts0, mkpts1, cv2.USAC_MAGSAC, 0.5, 0.999
inliers = inliers > 0
```

Finally, we can visualize the matches
Finally, we can visualize the matches.

```python
draw_LAF_matches(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ In digital image processing, operations on images are diverse and can be categor
- Statistical
- Geometrical
- Mathematical
- Transform operations.
- Transform operations

Each category encompasses different techniques, such as morphological operations under logical operations or fourier transforms and principal component analysis (PCA) under transforms. In this context, we refer to morphology as the group of operations that use structuring elements to generate images of the same size by looking into the values of the pixel neighborhood. Understanding the distinction between element-wise and matrix operations is important in image manipulation. Element-wise operations, such as raising an image to a power or dividing it by another image, involve processing each pixel individually. This pixel-based approach contrasts with matrix operations, which utilize matrix theory for image manipulation. Having said that, you can do whatever you want with images, as they are matrices containing numbers!

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/unit1/image_and_imaging/imaging.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The core of digital image formation is the function \\(f(x,y)\\), which is deter
</div>

In transmission-based imaging, such as X-rays, transmissivity takes the place of reflectivity. The digital representation of an image is essentially a matrix or array of numerical values, each corresponding to a pixel. The process of transforming continuous image data into a digital format is twofold:
- Sampling, which digitizes the coordinate values
- Sampling, which digitizes the coordinate values.
- Quantization, which converts amplitude values into discrete quantities.

The resolution and quality of a digital image significantly depend on the following:
Expand Down
10 changes: 7 additions & 3 deletions chapters/en/unit10/blenderProc.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -98,16 +98,20 @@ It is specifically created to help in the generation of realistic looking images
You can install BlenderProc via pip:

```bash
pip install blenderProc
pip install blenderProc
```

Alternately, you can clone the official [BlenderProc repository](https://github.com/DLR-RM/BlenderProc) from GitHub using Git:

`git clone https://github.com/DLR-RM/BlenderProc`
```bash
git clone https://github.com/DLR-RM/BlenderProc
```

BlenderProc must be run inside the blender python environment (bpy), as this is the only way to access the Blender API.

`blenderproc run <your_python_script>`
```bash
blenderproc run <your_python_script>
```

You can check out this notebook to try BlenderProc in Google Colab, demos the basic examples provided [here](https://github.com/DLR-RM/BlenderProc/tree/main/examples/basics).
Here are some images rendered with the basic example:
Expand Down
2 changes: 1 addition & 1 deletion chapters/en/unit10/datagen-diffusion-models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ This means we have many tools under our belt to generate synthetic data!

## Approaches to Synthetic Data Generation

There are generally three cases for needing synthetic data,
There are generally three cases for needing synthetic data:

**Extending an existing dataset:**

Expand Down
20 changes: 10 additions & 10 deletions chapters/en/unit10/point_clouds.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,22 +22,22 @@ The 3D Point Data is mainly used in self-driving capabilities, but now other AI

## Generation and Data Representation

We will be using the python library [point-cloud-utils](https://github.com/fwilliams/point-cloud-utils), and [open-3d](https://github.com/isl-org/Open3D), which can be installed by
We will be using the python library [point-cloud-utils](https://github.com/fwilliams/point-cloud-utils), and [open-3d](https://github.com/isl-org/Open3D), which can be installed by:

```bash
pip install point-cloud-utils
pip install point-cloud-utils
```

We will be also using the python library open-3d, which can be installed by
We will be also using the python library open-3d, which can be installed by:

```bash
pip install open3d
pip install open3d
```

OR a Smaller CPU only version
OR a Smaller CPU only version:

```bash
pip install open3d-cpu
pip install open3d-cpu
```

Now, first we need to understand the formats in which these point clouds are stored in, and for that, we need to look at mesh cloud.
Expand All @@ -53,13 +53,13 @@ The type of file is inferred from its file extension. Some of the extensions sup

- A simple PLY object consists of a collection of elements for representation of the object. It consists of a list of (x,y,z) triplets of a vertex and a list of faces that are actually indices into the list of vertices.
- Vertices and faces are two examples of elements and the majority of the PLY file consists of these two elements.
- New properties can also be created and attached to the elements of an object, but these should be added in such a way that old programs do not break when these new properties are encountered
- New properties can also be created and attached to the elements of an object, but these should be added in such a way that old programs do not break when these new properties are encountered.

** STL (Standard Tessellation Language) **

- This format approximates the surfaces of a solid model with triangles.
- These triangles are also known as facets, where each facet is described by a perpendicular direction and three points representing the vertices of the triangle.
- However, these files have no description of Color and Texture
- However, these files have no description of Color and Texture.

** OFF (Object File Format) **

Expand All @@ -77,11 +77,11 @@ The type of file is inferred from its file extension. Some of the extensions sup

- X3D is an XML based 3D graphics file format for presentation of 3D information. It is a modular standard and is defined through several ISO specifications.
- The format supports vector and raster graphics, transparency, lighting effects, and animation settings including rotations, fades, and swings.
- X3D has the advantage of encoding color information (unlike STL) that is used during printing the model on a color 3D printer
- X3D has the advantage of encoding color information (unlike STL) that is used during printing the model on a color 3D printer.

** DAE (Digital Asset Exchange) **

- This is an XML schema which is an open standard XML schema, from which DAE files are built.
- This file format is based on the COLLADA (COLLAborative Design Activity) XML schema which is an open standard XML schema for the exchange of digital assets among graphics software applications
- This file format is based on the COLLADA (COLLAborative Design Activity) XML schema which is an open standard XML schema for the exchange of digital assets among graphics software applications.
- The format's biggest selling point is its compatibility across multiple platforms.
- COLLADA files aren't restricted to one program or manufacturer. Instead, they offer a standard way to store 3D assets.
4 changes: 2 additions & 2 deletions chapters/en/unit10/synthetic-lung-images.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The generator has the following model architecture:
- Conv2D layer
- Batch Normalization layer
- ReLU activation
- Conv2D layer with Tanh activation
- Conv2D layer with Tanh activation.

The discriminator has the following model architecture:

Expand All @@ -27,7 +27,7 @@ The discriminator has the following model architecture:
- Conv2D layer
- Batch Normalization layer
- Leaky ReLU activation
- Conv2D layer with Sigmoid
- Conv2D layer with Sigmoid.

**Data Collection**

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/unit10/synthetic_datasets.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Semantic segmentation is vital for autonomous vehicles to interpret and navigate
| Name | Year | Description | Paper | | Additional Links |
|---------------------|--------------|-------------|----------------|---------------------|---------------------|
| Virtual KITTI 2 | 2020 | Virtual Worlds as Proxy for Multi-Object Tracking Analysis | [Virtual KITTI 2](https://arxiv.org/pdf/2001.10773.pdf) | | [Website](https://europe.naverlabs.com/Research/Computer-Vision/Proxy-Virtual-Worlds/) |
| ApolloScape | 2019 | Compared with existing public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labeling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labeling, lane-mark labeling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities, and daytimes | [The ApolloScape Open Dataset for Autonomous Driving and its Application](https://arxiv.org/abs/1803.06184) | | [Website](https://apolloscape.auto/) |
| ApolloScape | 2019 | Compared with existing public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labeling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labeling, lane-mark labeling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities, and daytimes. | [The ApolloScape Open Dataset for Autonomous Driving and its Application](https://arxiv.org/abs/1803.06184) | | [Website](https://apolloscape.auto/) |
| Driving in the Matrix | 2017 | The core idea behind "Driving in the Matrix" is to use photo-realistic computer-generated images from a simulation engine to produce annotated data quickly. | [Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?](https://arxiv.org/pdf/1610.01983.pdf) | | [GitHub](https://github.com/umautobots/driving-in-the-matrix) ![GitHub stars](https://img.shields.io/github/stars/umautobots/driving-in-the-matrix.svg?style=social&label=Star) |
| CARLA | 2017 | **CARLA** (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation). | [CARLA: An Open Urban Driving Simulator](https://arxiv.org/pdf/1711.03938v1.pdf) | | [Website](https://carla.org/) |
| Synthia | 2016 | A large collection of synthetic images for semantic segmentation of urban scenes. SYNTHIA consists of a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations for 13 classes: misc, sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist, lane-marking. | [The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes](https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Ros_The_SYNTHIA_Dataset_CVPR_2016_paper.html) | | [Website](https://synthia-dataset.net/) |
Expand Down
2 changes: 1 addition & 1 deletion chapters/en/unit12/conclusion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ This is work that highlights and explores techniques for making machine learning
### 🧑‍🤝‍🧑 Inclusive

These are projects which broaden the scope of who builds and benefits in the machine learning world. Some examples:
- Curating diverse datasets that increase the representation of underserved groups
- Curating diverse datasets that increase the representation of underserved groups.
- Training language models on languages that aren't yet available on the Hugging Face Hub.
- Creating no-code and low-code frameworks that allow non-technical folk to engage with AI.

Expand Down
6 changes: 3 additions & 3 deletions chapters/en/unit13/hyena.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,8 @@ Some work has been conducted to speed up this computation like FastFFTConv based

![nd_hyena.png](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/outlook_hyena_images/nd_hyena.png)
In essence, Hyena can be performed in two steps:
1. Compute a set of N+1 linear projections similarly of attention (it can be more than 3 projections)
2. Mixing up the projections: The matrix \\(H(u)\\) is defined by a combination of matrix multiplications
1. Compute a set of N+1 linear projections similarly of attention (it can be more than 3 projections).
2. Mixing up the projections: The matrix \\(H(u)\\) is defined by a combination of matrix multiplications.

## Why Hyena Matters

Expand All @@ -113,7 +113,7 @@ Hyena has been applied to N-Dimensional data with the Hyena N-D layer and can be
here is a noticeable enhancement in GPU memory efficiency with the increase in the number of image patches.

Hyena Hierarchy facilitates the development of larger, more efficient convolution models for long sequences.
The potential for Hyena type models for computer vision would be a more efficient GPU memory consumption of patches, that would allow :
The potential for Hyena type models for computer vision would be a more efficient GPU memory consumption of patches, that would allow:
- The processing of larger, higher-resolution images
- The use of smaller patches, allowing a fine-graine feature representation

Expand Down
11 changes: 6 additions & 5 deletions chapters/en/unit2/cnns/convnext.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,13 @@ ConvNext represents a significant improvement to pure convolution models by inco
## Key Improvements
The author of the ConvNeXT paper starts building the model with a regular ResNet (ResNet-50), then modernizes and improves the architecture step-by-step to imitate the hierarchical structure of Vision Transformers.
The key improvements are:
- Training Techniques
- Macro Design
- Training techniques
- Macro design
- ResNeXt-ify
- Inverted Bottleneck
- Large Kernel Sizes
- Micro Design
- Inverted bottleneck
- Large kernel sizes
- Micro design

We will go through each of the key improvements.
These designs are not novel in itself. However, you can learn how researchers adapt and modify designs systematically to improve existing models.
To show the effectiveness of each improvement, we will compare the model's accuracy before and after the modification on ImageNet-1K.
Expand Down
2 changes: 1 addition & 1 deletion chapters/en/unit3/vision-transformers/cvt.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ from einops import rearrange
from einops.layers.torch import Rearrange
```

2. Implementation of **Convolutional Projection**.
2. Implementation of **Convolutional Projection**

```python
def _build_projection(self, dim_in, dim_out, kernel_size, padding, stride, method):
Expand Down
8 changes: 4 additions & 4 deletions chapters/en/unit3/vision-transformers/detr.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -138,12 +138,12 @@ class DETR(nn.Module):
```
### Going line by line in the `forward` function:
**Backbone**
The input image is first put through a ResNet backbone and then a convolution layer, which reduces the dimension to the `hidden_dim`
The input image is first put through a ResNet backbone and then a convolution layer, which reduces the dimension to the `hidden_dim`.
```python
x = self.backbone(inputs)
h = self.conv(x)
```
they are declared in the `__init__` function
They are declared in the `__init__` function.
```python
self.backbone = nn.Sequential(*list(resnet50(pretrained=True).children())[:-2])
self.conv = nn.Conv2d(2048, hidden_dim, 1)
Expand Down Expand Up @@ -171,7 +171,7 @@ self.row_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))
self.col_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))
```
**Resize**
Before going into the transformer, the features with size `(batch size, hidden_dim, H, W)` are reshaped to `(hidden_dim, batch size, H*W)`. This makes them a sequential input for the transformer
Before going into the transformer, the features with size `(batch size, hidden_dim, H, W)` are reshaped to `(hidden_dim, batch size, H*W)`. This makes them a sequential input for the transformer.
```python
h.flatten(2).permute(2, 0, 1)
```
Expand All @@ -185,7 +185,7 @@ In the end, the outputs, which is a tensor of size `(query_pos_dim, batch size,
```python
return self.linear_class(h), self.linear_bbox(h).sigmoid()
```
The first of which predicts the class. An additional class is added for the `No Object` class
The first of which predicts the class. An additional class is added for the `No Object` class.
```python
self.linear_class = nn.Linear(hidden_dim, num_classes + 1)
```
Expand Down
2 changes: 1 addition & 1 deletion chapters/en/unit3/vision-transformers/mobilevit.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ A diagram of the MobileViT Block is shown below:
Okay, that's a lot to take in. Let's break that down.

- The block takes in an image with multiple channels. Let's say for an RGB image 3 channels, so the block takes in a three channeled image.
- It then performs a N by N convolution on the channels appending them to the existing channels
- It then performs a N by N convolution on the channels appending them to the existing channels.
- The block then creates a linear combination of these channels and adds them to the existing stack of channels.
- For each channel these images are unfolded into flattened patches.
- Then these flattened patches are passed through a transformer to project them into new patches.
Expand Down
Loading

0 comments on commit f532359

Please sign in to comment.