From 469cc2552da216bc178a791db8f31b8203c4a423 Mon Sep 17 00:00:00 2001 From: Vijay Janapa Reddi Date: Mon, 21 Oct 2024 09:02:41 -0400 Subject: [PATCH] replace "**: " with ": " or "** : " (space after last star) Feedback from @Bravo --- .../arduino/nicla_vision/nicla_vision.qmd | 6 +- contents/labs/labs.qmd | 10 +- .../image_classification.qmd | 36 ++--- contents/labs/raspi/llm/llm.qmd | 126 +++++++++--------- contents/labs/raspi/raspi.qmd | 8 +- contents/labs/raspi/setup/setup.qmd | 52 ++++---- contents/labs/seeed/xiao_esp32s3/kws/kws.qmd | 6 +- .../labs/seeed/xiao_esp32s3/setup/setup.qmd | 12 +- .../labs/seeed/xiao_esp32s3/xiao_esp32s3.qmd | 8 +- contents/optimizations/optimizations.qmd | 8 +- 10 files changed, 136 insertions(+), 136 deletions(-) diff --git a/contents/labs/arduino/nicla_vision/nicla_vision.qmd b/contents/labs/arduino/nicla_vision/nicla_vision.qmd index c19e7b11..50990d7c 100644 --- a/contents/labs/arduino/nicla_vision/nicla_vision.qmd +++ b/contents/labs/arduino/nicla_vision/nicla_vision.qmd @@ -6,9 +6,9 @@ These labs provide a unique opportunity to gain practical experience with machin ## Pre-requisites -- **Nicla Vision Board**: Ensure you have the Nicla Vision board. -- **USB Cable**: For connecting the board to your computer. -- **Network**: With internet access for downloading necessary software. +- **Nicla Vision Board** : Ensure you have the Nicla Vision board. +- **USB Cable** : For connecting the board to your computer. +- **Network** : With internet access for downloading necessary software. ## Setup diff --git a/contents/labs/labs.qmd b/contents/labs/labs.qmd index 10fc8474..a61d02d5 100644 --- a/contents/labs/labs.qmd +++ b/contents/labs/labs.qmd @@ -54,15 +54,15 @@ These labs are designed for: Each lab follows a structured approach: -1. **Introduction**: Explore the application and its significance in real-world scenarios. +1. **Introduction** : Explore the application and its significance in real-world scenarios. -2. **Setup**: Step-by-step instructions to configure the hardware and software environment. +2. **Setup** : Step-by-step instructions to configure the hardware and software environment. -3. **Deployment**: Guidance on training and deploying the pre-trained ML models on supported devices. +3. **Deployment** : Guidance on training and deploying the pre-trained ML models on supported devices. -4. **Exercises**: Hands-on tasks to modify and experiment with model parameters. +4. **Exercises** : Hands-on tasks to modify and experiment with model parameters. -5. **Discussion**: Analysis of results, potential improvements, and practical insights. +5. **Discussion** : Analysis of results, potential improvements, and practical insights. ## Troubleshooting and Support diff --git a/contents/labs/raspi/image_classification/image_classification.qmd b/contents/labs/raspi/image_classification/image_classification.qmd index 81fc56fa..ee70ca72 100644 --- a/contents/labs/raspi/image_classification/image_classification.qmd +++ b/contents/labs/raspi/image_classification/image_classification.qmd @@ -777,19 +777,19 @@ This Python script creates a web-based interface for capturing and organizing im #### Key Features: -1. **Web Interface**: Accessible from any device on the same network as the Raspberry Pi. -2. **Live Camera Preview**: This shows a real-time feed from the camera. -3. **Labeling System**: Allows users to input labels for different categories of images. -4. **Organized Storage**: Automatically saves images in label-specific subdirectories. -5. **Per-Label Counters**: Keeps track of how many images are captured for each label. -6. **Summary Statistics**: Provides a summary of captured images when stopping the capture process. +1. **Web Interface** : Accessible from any device on the same network as the Raspberry Pi. +2. **Live Camera Preview** : This shows a real-time feed from the camera. +3. **Labeling System** : Allows users to input labels for different categories of images. +4. **Organized Storage** : Automatically saves images in label-specific subdirectories. +5. **Per-Label Counters** : Keeps track of how many images are captured for each label. +6. **Summary Statistics** : Provides a summary of captured images when stopping the capture process. #### Main Components: -1. **Flask Web Application**: Handles routing and serves the web interface. -2. **Picamera2 Integration**: Controls the Raspberry Pi camera. -3. **Threaded Frame Capture**: Ensures smooth live preview. -4. **File Management**: Organizes captured images into labeled directories. +1. **Flask Web Application** : Handles routing and serves the web interface. +2. **Picamera2 Integration** : Controls the Raspberry Pi camera. +3. **Threaded Frame Capture** : Ensures smooth live preview. +4. **File Management** : Organizes captured images into labeled directories. #### Key Functions: @@ -1435,10 +1435,10 @@ The code creates a web application for real-time image classification using a Ra #### Key Components: -1. **Flask Web Application**: Serves the user interface and handles requests. -2. **PiCamera2**: Captures images from the Raspberry Pi camera module. -3. **TensorFlow Lite**: Runs the image classification model. -4. **Threading**: Manages concurrent operations for smooth performance. +1. **Flask Web Application** : Serves the user interface and handles requests. +2. **PiCamera2** : Captures images from the Raspberry Pi camera module. +3. **TensorFlow Lite** : Runs the image classification model. +4. **Threading** : Manages concurrent operations for smooth performance. #### Main Features: @@ -1491,10 +1491,10 @@ The code creates a web application for real-time image classification using a Ra #### Key Concepts: -1. **Concurrent Operations**: Using threads to handle camera capture and classification separately from the web server. -2. **Real-time Updates**: Frequent updates to the classification results without page reloads. -3. **Model Reuse**: Loading the TFLite model once and reusing it for efficiency. -4. **Flexible Configuration**: Allowing users to adjust the confidence threshold on the fly. +1. **Concurrent Operations** : Using threads to handle camera capture and classification separately from the web server. +2. **Real-time Updates** : Frequent updates to the classification results without page reloads. +3. **Model Reuse** : Loading the TFLite model once and reusing it for efficiency. +4. **Flexible Configuration** : Allowing users to adjust the confidence threshold on the fly. #### Usage: diff --git a/contents/labs/raspi/llm/llm.qmd b/contents/labs/raspi/llm/llm.qmd index 90098990..d4716090 100644 --- a/contents/labs/raspi/llm/llm.qmd +++ b/contents/labs/raspi/llm/llm.qmd @@ -46,13 +46,13 @@ GenAI provides the conceptual framework for AI-driven content creation, with LLM Large Language Models (LLMs) are advanced artificial intelligence systems that understand, process, and generate human-like text. These models are characterized by their massive scale in terms of the amount of data they are trained on and the number of parameters they contain. Critical aspects of LLMs include: -1. **Size**: LLMs typically contain billions of parameters. For example, GPT-3 has 175 billion parameters, while some newer models exceed a trillion parameters. +1. **Size** : LLMs typically contain billions of parameters. For example, GPT-3 has 175 billion parameters, while some newer models exceed a trillion parameters. -2. **Training Data**: They are trained on vast amounts of text data, often including books, websites, and other diverse sources, amounting to hundreds of gigabytes or even terabytes of text. +2. **Training Data** : They are trained on vast amounts of text data, often including books, websites, and other diverse sources, amounting to hundreds of gigabytes or even terabytes of text. -3. **Architecture**: Most LLMs use [transformer-based architectures](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)), which allow them to process and generate text by paying attention to different parts of the input simultaneously. +3. **Architecture** : Most LLMs use [transformer-based architectures](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)), which allow them to process and generate text by paying attention to different parts of the input simultaneously. -4. **Capabilities**: LLMs can perform a wide range of language tasks without specific fine-tuning, including: +4. **Capabilities** : LLMs can perform a wide range of language tasks without specific fine-tuning, including: - Text generation - Translation - Summarization @@ -60,17 +60,17 @@ Large Language Models (LLMs) are advanced artificial intelligence systems that u - Code generation - Logical reasoning -5. **Few-shot Learning**: They can often understand and perform new tasks with minimal examples or instructions. +5. **Few-shot Learning** : They can often understand and perform new tasks with minimal examples or instructions. -6. **Resource-Intensive**: Due to their size, LLMs typically require significant computational resources to run, often needing powerful GPUs or TPUs. +6. **Resource-Intensive** : Due to their size, LLMs typically require significant computational resources to run, often needing powerful GPUs or TPUs. -7. **Continual Development**: The field of LLMs is rapidly evolving, with new models and techniques constantly emerging. +7. **Continual Development** : The field of LLMs is rapidly evolving, with new models and techniques constantly emerging. -8. **Ethical Considerations**: The use of LLMs raises important questions about bias, misinformation, and the environmental impact of training such large models. +8. **Ethical Considerations** : The use of LLMs raises important questions about bias, misinformation, and the environmental impact of training such large models. -9. **Applications**: LLMs are used in various fields, including content creation, customer service, research assistance, and software development. +9. **Applications** : LLMs are used in various fields, including content creation, customer service, research assistance, and software development. -10. **Limitations**: Despite their power, LLMs can produce incorrect or biased information and lack true understanding or reasoning capabilities. +10. **Limitations** : Despite their power, LLMs can produce incorrect or biased information and lack true understanding or reasoning capabilities. We must note that we use large models beyond text, calling them *multi-modal models*. These models integrate and process information from multiple types of input simultaneously. They are designed to understand and generate content across various forms of data, such as text, images, audio, and video. @@ -94,17 +94,17 @@ SLMs are compact versions of LLMs designed to run efficiently on resource-constr Key characteristics of SLMs include: -1. **Reduced parameter count**: Typically ranging from a few hundred million to a few billion parameters, compared to two-digit billions in larger models. +1. **Reduced parameter count** : Typically ranging from a few hundred million to a few billion parameters, compared to two-digit billions in larger models. -2. **Lower memory footprint**: Requiring, at most, a few gigabytes of memory rather than tens or hundreds of gigabytes. +2. **Lower memory footprint** : Requiring, at most, a few gigabytes of memory rather than tens or hundreds of gigabytes. -3. **Faster inference time**: Can generate responses in milliseconds to seconds on edge devices. +3. **Faster inference time** : Can generate responses in milliseconds to seconds on edge devices. -4. **Energy efficiency**: Consuming less power, making them suitable for battery-powered devices. +4. **Energy efficiency** : Consuming less power, making them suitable for battery-powered devices. -5. **Privacy-preserving**: Enabling on-device processing without sending data to cloud servers. +5. **Privacy-preserving** : Enabling on-device processing without sending data to cloud servers. -6. **Offline functionality**: Operating without an internet connection. +6. **Offline functionality** : Operating without an internet connection. SLMs achieve their compact size through various techniques such as knowledge distillation, model pruning, and quantization. While they may not match the broad capabilities of larger models, SLMs excel in specific tasks and domains, making them ideal for targeted applications on edge devices. @@ -120,25 +120,25 @@ For more information on SLMs, the paper, [LLM Pruning and Distillation in Practi [Ollama](https://ollama.com/) is an open-source framework that allows us to run language models (LMs), large or small, locally on our machines. Here are some critical points about Ollama: -1. **Local Model Execution**: Ollama enables running LMs on personal computers or edge devices such as the Raspi-5, eliminating the need for cloud-based API calls. +1. **Local Model Execution** : Ollama enables running LMs on personal computers or edge devices such as the Raspi-5, eliminating the need for cloud-based API calls. -2. **Ease of Use**: It provides a simple command-line interface for downloading, running, and managing different language models. +2. **Ease of Use** : It provides a simple command-line interface for downloading, running, and managing different language models. -3. **Model Variety**: Ollama supports various LLMs, including Phi, Gemma, Llama, Mistral, and other open-source models. +3. **Model Variety** : Ollama supports various LLMs, including Phi, Gemma, Llama, Mistral, and other open-source models. -4. **Customization**: Users can create and share custom models tailored to specific needs or domains. +4. **Customization** : Users can create and share custom models tailored to specific needs or domains. -5. **Lightweight**: Designed to be efficient and run on consumer-grade hardware. +5. **Lightweight** : Designed to be efficient and run on consumer-grade hardware. -6. **API Integration**: Offers an API that allows integration with other applications and services. +6. **API Integration** : Offers an API that allows integration with other applications and services. -7. **Privacy-Focused**: By running models locally, it addresses privacy concerns associated with sending data to external servers. +7. **Privacy-Focused** : By running models locally, it addresses privacy concerns associated with sending data to external servers. -8. **Cross-Platform**: Available for macOS, Windows, and Linux systems (our case, here). +8. **Cross-Platform** : Available for macOS, Windows, and Linux systems (our case, here). -9. **Active Development**: Regularly updated with new features and model support. +9. **Active Development** : Regularly updated with new features and model support. -10. **Community-Driven**: Benefits from community contributions and model sharing. +10. **Community-Driven** : Benefits from community contributions and model sharing. To learn more about what Ollama is and how it works under the hood, you should see this short video from [Matt Williams](https://www.youtube.com/@technovangelist), one of the founders of Ollama: @@ -205,14 +205,14 @@ Using the option `--verbose` when calling the model will generate several statis Each metric gives insights into how the model processes inputs and generates outputs. Here’s a breakdown of what each metric means: -- **Total Duration (2.620170326s)**: This is the complete time taken from the start of the command to the completion of the response. It encompasses loading the model, processing the input prompt, and generating the response. -- **Load Duration (39.947908ms)**: This duration indicates the time to load the model or necessary components into memory. If this value is minimal, it can suggest that the model was preloaded or that only a minimal setup was required. -- **Prompt Eval Count (32 tokens)**: The number of tokens in the input prompt. In NLP, tokens are typically words or subwords, so this count includes all the tokens that the model evaluated to understand and respond to the query. -- **Prompt Eval Duration (1.644773s)**: This measures the model's time to evaluate or process the input prompt. It accounts for the bulk of the total duration, implying that understanding the query and preparing a response is the most time-consuming part of the process. -- **Prompt Eval Rate (19.46 tokens/s)**: This rate indicates how quickly the model processes tokens from the input prompt. It reflects the model’s speed in terms of natural language comprehension. -- **Eval Count (8 token(s))**: This is the number of tokens in the model’s response, which in this case was, “The capital of France is Paris.” -- **Eval Duration (889.941ms)**: This is the time taken to generate the output based on the evaluated input. It’s much shorter than the prompt evaluation, suggesting that generating the response is less complex or computationally intensive than understanding the prompt. -- **Eval Rate (8.99 tokens/s)**: Similar to the prompt eval rate, this indicates the speed at which the model generates output tokens. It's a crucial metric for understanding the model's efficiency in output generation. +- **Total Duration (2.620170326s)** : This is the complete time taken from the start of the command to the completion of the response. It encompasses loading the model, processing the input prompt, and generating the response. +- **Load Duration (39.947908ms)** : This duration indicates the time to load the model or necessary components into memory. If this value is minimal, it can suggest that the model was preloaded or that only a minimal setup was required. +- **Prompt Eval Count (32 tokens)** : The number of tokens in the input prompt. In NLP, tokens are typically words or subwords, so this count includes all the tokens that the model evaluated to understand and respond to the query. +- **Prompt Eval Duration (1.644773s)** : This measures the model's time to evaluate or process the input prompt. It accounts for the bulk of the total duration, implying that understanding the query and preparing a response is the most time-consuming part of the process. +- **Prompt Eval Rate (19.46 tokens/s)** : This rate indicates how quickly the model processes tokens from the input prompt. It reflects the model’s speed in terms of natural language comprehension. +- **Eval Count (8 token(s))** : This is the number of tokens in the model’s response, which in this case was, “The capital of France is Paris.” +- **Eval Duration (889.941ms)** : This is the time taken to generate the output based on the evaluated input. It’s much shorter than the prompt evaluation, suggesting that generating the response is less complex or computationally intensive than understanding the prompt. +- **Eval Rate (8.99 tokens/s)** : Similar to the prompt eval rate, this indicates the speed at which the model generates output tokens. It's a crucial metric for understanding the model's efficiency in output generation. This detailed breakdown can help understand the computational demands and performance characteristics of running SLMs like Llama on edge devices like the Raspberry Pi 5. It shows that while prompt evaluation is more time-consuming, the actual generation of responses is relatively quicker. This analysis is crucial for optimizing performance and diagnosing potential bottlenecks in real-time applications. @@ -526,10 +526,10 @@ As a result, we will have the model response in a JSON format: As we can see, several pieces of information are generated, such as: -- **response**: the main output text generated by the model in response to our prompt. +- **response** : the main output text generated by the model in response to our prompt. - `The capital of France is **Paris**. 🇫🇷` -- **context**: the token IDs representing the input and context used by the model. Tokens are numerical representations of text used for processing by the language model. +- **context** : the token IDs representing the input and context used by the model. Tokens are numerical representations of text used for processing by the language model. - `[106, 1645, 108, 1841, 603, 573, 6037, 576, 6081, 235336, 107, 108,` ` 106, 2516, 108, 651, 6037, 576, 6081, 603, 5231, 29437, 168428, ` ` 235248, 244304, 241035, 235248, 108]` @@ -537,11 +537,11 @@ As we can see, several pieces of information are generated, such as: The Performance Metrics: -- **total_duration**: The total time taken for the operation in nanoseconds. In this case, approximately 24.26 seconds. -- **load_duration**: The time taken to load the model or components in nanoseconds. About 19.83 seconds. -- **prompt_eval_duration**: The time taken to evaluate the prompt in nanoseconds. Around 16 nanoseconds. -- **eval_count**: The number of tokens evaluated during the generation. Here, 14 tokens. -- **eval_duration**: The time taken for the model to generate the response in nanoseconds. Approximately 2.5 seconds. +- **total_duration** : The total time taken for the operation in nanoseconds. In this case, approximately 24.26 seconds. +- **load_duration** : The time taken to load the model or components in nanoseconds. About 19.83 seconds. +- **prompt_eval_duration** : The time taken to evaluate the prompt in nanoseconds. Around 16 nanoseconds. +- **eval_count** : The number of tokens evaluated during the generation. Here, 14 tokens. +- **eval_duration** : The time taken for the model to generate the response in nanoseconds. Approximately 2.5 seconds. But, what we want is the plain 'response' and, perhaps for analysis, the total duration of the inference, so let's change the code to extract it from the dictionary: @@ -708,11 +708,11 @@ from pydantic import BaseModel, Field import instructor ``` -- **sys**: Provides access to system-specific parameters and functions. It's used to get command-line arguments. -- **haversine**: A function from the haversine library that calculates the distance between two geographic points using the Haversine formula. -- **openAI**: A module for interacting with the OpenAI API (although it's used in conjunction with a local setup, Ollama). Everything is off-line here. -- **pydantic**: Provides data validation and settings management using Python-type annotations. It's used to define the structure of expected response data. -- **instructor**: A module is used to patch the OpenAI client to work in a specific mode (likely related to structured data handling). +- **sys** : Provides access to system-specific parameters and functions. It's used to get command-line arguments. +- **haversine** : A function from the haversine library that calculates the distance between two geographic points using the Haversine formula. +- **openAI** : A module for interacting with the OpenAI API (although it's used in conjunction with a local setup, Ollama). Everything is off-line here. +- **pydantic** : Provides data validation and settings management using Python-type annotations. It's used to define the structure of expected response data. +- **instructor** : A module is used to patch the OpenAI client to work in a specific mode (likely related to structured data handling). ### 2. Defining Input and Model @@ -723,11 +723,11 @@ mylat = -33.33 # Latitude of Santiago de Chile mylon = -70.51 # Longitude of Santiago de Chile ``` -- **country**: On a Python script, getting the country name from command-line arguments is possible. On a Jupyter notebook, we can enter its name, for example, +- **country** : On a Python script, getting the country name from command-line arguments is possible. On a Jupyter notebook, we can enter its name, for example, - `country = "France"` -- **MODEL**: Specifies the model being used, which is, in this example, the phi3.5. -- **mylat** **and** **mylon**: Coordinates of Santiago de Chile, used as the starting point for the distance calculation. +- **MODEL** : Specifies the model being used, which is, in this example, the phi3.5. +- **mylat** **and** **mylon** : Coordinates of Santiago de Chile, used as the starting point for the distance calculation. ### 3. Defining the Response Data Structure @@ -738,7 +738,7 @@ class CityCoord(BaseModel): lon: float = Field(..., description="Decimal Longitude of the city") ``` -- **CityCoord**: A Pydantic model that defines the expected structure of the response from the LLM. It expects three fields: city (name of the city), lat (latitude), and lon (longitude). +- **CityCoord** : A Pydantic model that defines the expected structure of the response from the LLM. It expects three fields: city (name of the city), lat (latitude), and lon (longitude). ### 4. Setting Up the OpenAI Client @@ -752,8 +752,8 @@ client = instructor.patch( ) ``` -- **OpenAI**: This setup initializes an OpenAI client with a local base URL and an API key (ollama). It uses a local server. -- **instructor.patch**: Patches the OpenAI client to work in JSON mode, enabling structured output that matches the Pydantic model. +- **OpenAI** : This setup initializes an OpenAI client with a local base URL and an API key (ollama). It uses a local server. +- **instructor.patch** : Patches the OpenAI client to work in JSON mode, enabling structured output that matches the Pydantic model. ### 5. Generating the Response @@ -772,11 +772,11 @@ resp = client.chat.completions.create( ) ``` -- **client.chat.completions.create**: Calls the LLM to generate a response. -- **model**: Specifies the model to use (llava-phi3). -- **messages**: Contains the prompt for the LLM, asking for the latitude and longitude of the capital city of the specified country. -- **response_model**: Indicates that the response should conform to the CityCoord model. -- **max_retries**: The maximum number of retry attempts if the request fails. +- **client.chat.completions.create** : Calls the LLM to generate a response. +- **model** : Specifies the model to use (llava-phi3). +- **messages** : Contains the prompt for the LLM, asking for the latitude and longitude of the capital city of the specified country. +- **response_model** : Indicates that the response should conform to the CityCoord model. +- **max_retries** : The maximum number of retry attempts if the request fails. ### 6. Calculating the Distance @@ -786,12 +786,12 @@ print(f"Santiago de Chile is about {int(round(distance, -1)):,} \ kilometers away from {resp.city}.") ``` -- **haversine**: Calculates the distance between Santiago de Chile and the capital city returned by the LLM using their respective coordinates. -- **(mylat, mylon)**: Coordinates of Santiago de Chile. -- **resp.city**: Name of the country's capital -- **(resp.lat, resp.lon)**: Coordinates of the capital city are provided by the LLM response. -- **unit='km'**: Specifies that the distance should be calculated in kilometers. -- **print**: Outputs the distance, rounded to the nearest 10 kilometers, with thousands of separators for readability. +- **haversine** : Calculates the distance between Santiago de Chile and the capital city returned by the LLM using their respective coordinates. +- **(mylat, mylon)** : Coordinates of Santiago de Chile. +- **resp.city** : Name of the country's capital +- **(resp.lat, resp.lon)** : Coordinates of the capital city are provided by the LLM response. +- **unit='km'** : Specifies that the distance should be calculated in kilometers. +- **print** : Outputs the distance, rounded to the nearest 10 kilometers, with thousands of separators for readability. **Running the code** diff --git a/contents/labs/raspi/raspi.qmd b/contents/labs/raspi/raspi.qmd index 2e6dfe95..02241339 100644 --- a/contents/labs/raspi/raspi.qmd +++ b/contents/labs/raspi/raspi.qmd @@ -6,13 +6,13 @@ These labs offer invaluable hands-on experience with machine learning systems, l ## Pre-requisites -- **Raspberry Pi**: Ensure you have at least one of the boards: the Raspberry Pi Zero 2W, Raspberry Pi 4 or 5 for the Vision Labs, and the Raspberry 5 for the GenAi lab. -- **Power Adapter**: To Power on the boards. +- **Raspberry Pi** : Ensure you have at least one of the boards: the Raspberry Pi Zero 2W, Raspberry Pi 4 or 5 for the Vision Labs, and the Raspberry 5 for the GenAi lab. +- **Power Adapter** : To Power on the boards. - Raspberry Pi Zero 2-W: 2.5W with a Micro-USB adapter - Raspberry Pi 4 or 5: 3.5W with a USB-C adapter -- **Network**: With internet access for downloading the necessary software and controlling the boards remotely. -- **SD Card (32GB minimum) and an SD card Adapter**: For the Raspberry Pi OS. +- **Network** : With internet access for downloading the necessary software and controlling the boards remotely. +- **SD Card (32GB minimum) and an SD card Adapter** : For the Raspberry Pi OS. ## Setup diff --git a/contents/labs/raspi/setup/setup.qmd b/contents/labs/raspi/setup/setup.qmd index fcd3a891..56434433 100644 --- a/contents/labs/raspi/setup/setup.qmd +++ b/contents/labs/raspi/setup/setup.qmd @@ -12,17 +12,17 @@ The Raspberry Pi is a powerful and versatile single-board computer that has beco ### Key Features -1. **Computational Power**: Despite their small size, Raspberry Pis offers significant processing capabilities, with the latest models featuring multi-core ARM processors and up to 8GB of RAM. +1. **Computational Power** : Despite their small size, Raspberry Pis offers significant processing capabilities, with the latest models featuring multi-core ARM processors and up to 8GB of RAM. -2. **GPIO Interface**: The 40-pin GPIO header allows direct interaction with sensors, actuators, and other electronic components, facilitating hardware-software integration projects. +2. **GPIO Interface** : The 40-pin GPIO header allows direct interaction with sensors, actuators, and other electronic components, facilitating hardware-software integration projects. -3. **Extensive Connectivity**: Built-in Wi-Fi, Bluetooth, Ethernet, and multiple USB ports enable diverse communication and networking projects. +3. **Extensive Connectivity** : Built-in Wi-Fi, Bluetooth, Ethernet, and multiple USB ports enable diverse communication and networking projects. -4. **Low-Level Hardware Access**: Raspberry Pis provides access to interfaces like I2C, SPI, and UART, allowing for detailed control and communication with external devices. +4. **Low-Level Hardware Access** : Raspberry Pis provides access to interfaces like I2C, SPI, and UART, allowing for detailed control and communication with external devices. -5. **Real-Time Capabilities**: With proper configuration, Raspberry Pis can be used for soft real-time applications, making them suitable for control systems and signal processing tasks. +5. **Real-Time Capabilities** : With proper configuration, Raspberry Pis can be used for soft real-time applications, making them suitable for control systems and signal processing tasks. -6. **Power Efficiency**: Low power consumption enables battery-powered and energy-efficient designs, especially in models like the Pi Zero. +6. **Power Efficiency** : Low power consumption enables battery-powered and energy-efficient designs, especially in models like the Pi Zero. ### Raspberry Pi Models (covered in this book) @@ -36,21 +36,21 @@ The Raspberry Pi is a powerful and versatile single-board computer that has beco ### Engineering Applications -1. **Embedded Systems Design**: Develop and prototype embedded systems for real-world applications. +1. **Embedded Systems Design** : Develop and prototype embedded systems for real-world applications. -2. **IoT and Networked Devices**: Create interconnected devices and explore protocols like MQTT, CoAP, and HTTP/HTTPS. +2. **IoT and Networked Devices** : Create interconnected devices and explore protocols like MQTT, CoAP, and HTTP/HTTPS. -3. **Control Systems**: Implement feedback control loops, PID controllers, and interface with actuators. +3. **Control Systems** : Implement feedback control loops, PID controllers, and interface with actuators. -4. **Computer Vision and AI**: Utilize libraries like OpenCV and TensorFlow Lite for image processing and machine learning at the edge. +4. **Computer Vision and AI** : Utilize libraries like OpenCV and TensorFlow Lite for image processing and machine learning at the edge. -5. **Data Acquisition and Analysis**: Collect sensor data, perform real-time analysis, and create data logging systems. +5. **Data Acquisition and Analysis** : Collect sensor data, perform real-time analysis, and create data logging systems. -6. **Robotics**: Build robot controllers, implement motion planning algorithms, and interface with motor drivers. +6. **Robotics** : Build robot controllers, implement motion planning algorithms, and interface with motor drivers. -7. **Signal Processing**: Perform real-time signal analysis, filtering, and DSP applications. +7. **Signal Processing** : Perform real-time signal analysis, filtering, and DSP applications. -8. **Network Security**: Set up VPNs, firewalls, and explore network penetration testing. +8. **Network Security** : Set up VPNs, firewalls, and explore network penetration testing. This tutorial will guide you through setting up the most common Raspberry Pi models, enabling you to start on your machine learning project quickly. We'll cover hardware setup, operating system installation, and initial configuration, focusing on preparing your Pi for Machine Learning applications. @@ -60,23 +60,23 @@ This tutorial will guide you through setting up the most common Raspberry Pi mod ![](images/jpeg/zero-hardware.jpg) -- **Processor**: 1GHz quad-core 64-bit Arm Cortex-A53 CPU -- **RAM**: 512MB SDRAM -- **Wireless**: 2.4GHz 802.11 b/g/n wireless LAN, Bluetooth 4.2, BLE -- **Ports**: Mini HDMI, micro USB OTG, CSI-2 camera connector -- **Power**: 5V via micro USB port +- **Processor** : 1GHz quad-core 64-bit Arm Cortex-A53 CPU +- **RAM** : 512MB SDRAM +- **Wireless** : 2.4GHz 802.11 b/g/n wireless LAN, Bluetooth 4.2, BLE +- **Ports** : Mini HDMI, micro USB OTG, CSI-2 camera connector +- **Power** : 5V via micro USB port ### Raspberry Pi 5 ![](images/jpeg/r5-hardware.jpg) -- **Processor**: +- **Processor** : - Pi 5: Quad-core 64-bit Arm Cortex-A76 CPU @ 2.4GHz - Pi 4: Quad-core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz -- **RAM**: 2GB, 4GB, or 8GB options (8GB recommended for AI tasks) -- **Wireless**: Dual-band 802.11ac wireless, Bluetooth 5.0 -- **Ports**: 2 × micro HDMI ports, 2 × USB 3.0 ports, 2 × USB 2.0 ports, CSI camera port, DSI display port -- **Power**: 5V DC via USB-C connector (3A) +- **RAM** : 2GB, 4GB, or 8GB options (8GB recommended for AI tasks) +- **Wireless** : Dual-band 802.11ac wireless, Bluetooth 5.0 +- **Ports** : 2 × micro HDMI ports, 2 × USB 3.0 ports, 2 × USB 2.0 ports, CSI camera port, DSI display port +- **Power** : 5V DC via USB-C connector (3A) ## Installing the Operating System @@ -122,14 +122,14 @@ Follow the steps to install the OS in your Raspi. 2. Insert a microSD card into your computer (a 32GB SD card is recommended) . 3. Open Raspberry Pi Imager and select your Raspberry Pi model. 4. Choose the appropriate operating system: - - **For Raspi-Zero**: For example, you can select: + - **For Raspi-Zero** : For example, you can select: `Raspberry Pi OS Lite (64-bit)`. ![img](images/png/zero-burn.png) > Due to its reduced SDRAM (512MB), the recommended OS for the Rasp Zero is the 32-bit version. However, to run some machine learning models, such as the YOLOv8 from Ultralitics, we should use the 64-bit version. Although Raspi-Zero can run a *desktop*, we will choose the LITE version (no Desktop) to reduce the RAM needed for regular operation. - - For **Raspi-5**: We can select the full 64-bit version, which includes a desktop: + - For **Raspi-5** : We can select the full 64-bit version, which includes a desktop: `Raspberry Pi OS (64-bit)` ![](images/png/r5-burn.png) diff --git a/contents/labs/seeed/xiao_esp32s3/kws/kws.qmd b/contents/labs/seeed/xiao_esp32s3/kws/kws.qmd index e1e6f81f..2c9efecf 100644 --- a/contents/labs/seeed/xiao_esp32s3/kws/kws.qmd +++ b/contents/labs/seeed/xiao_esp32s3/kws/kws.qmd @@ -99,11 +99,11 @@ The I2S protocol consists of at least three lines: ![](https://hackster.imgix.net/uploads/attachments/1594628/image_8CRJmXD9Fr.png?auto=compress%2Cformat&w=740&h=555&fit=max) -**1. Bit (or Serial) clock line (BCLK or CLK)**: This line toggles to indicate the start of a new bit of data (pin IO42). +**1. Bit (or Serial) clock line (BCLK or CLK)** : This line toggles to indicate the start of a new bit of data (pin IO42). -**2. Word select line (WS)**: This line toggles to indicate the start of a new word (left channel or right channel). The Word select clock (WS) frequency defines the sample rate. In our case, L/R on the microphone is set to ground, meaning that we will use only the left channel (mono). +**2. Word select line (WS)** : This line toggles to indicate the start of a new word (left channel or right channel). The Word select clock (WS) frequency defines the sample rate. In our case, L/R on the microphone is set to ground, meaning that we will use only the left channel (mono). -**3. Data line (SD)**: This line carries the audio data (pin IO41) +**3. Data line (SD)** : This line carries the audio data (pin IO41) In an I2S data stream, the data is sent as a sequence of frames, each containing a left-channel word and a right-channel word. This makes I2S particularly suited for transmitting stereo audio data. However, it can also be used for mono or multichannel audio with additional data lines. diff --git a/contents/labs/seeed/xiao_esp32s3/setup/setup.qmd b/contents/labs/seeed/xiao_esp32s3/setup/setup.qmd index 8caa62e0..2b836e52 100644 --- a/contents/labs/seeed/xiao_esp32s3/setup/setup.qmd +++ b/contents/labs/seeed/xiao_esp32s3/setup/setup.qmd @@ -10,12 +10,12 @@ The [XIAO ESP32S3 Sense](https://www.seeedstudio.com/XIAO-ESP32S3-Sense-p-5639.h **XIAO ESP32S3 Sense Main Features** -- **Powerful MCU Board**: Incorporate the ESP32S3 32-bit, dual-core, Xtensa processor chip operating up to 240 MHz, mounted multiple development ports, Arduino / MicroPython supported -- **Advanced Functionality**: Detachable OV2640 camera sensor for 1600 * 1200 resolution, compatible with OV5640 camera sensor, integrating an additional digital microphone -- **Elaborate Power Design**: Lithium battery charge management capability offers four power consumption models, which allows for deep sleep mode with power consumption as low as 14μA -- **Great Memory for more Possibilities**: Offer 8MB PSRAM and 8MB FLASH, supporting SD card slot for external 32GB FAT memory -- **Outstanding RF performance**: Support 2.4GHz Wi-Fi and BLE dual wireless communication, support 100m+ remote communication when connected with U.FL antenna -- **Thumb-sized Compact Design**: 21 x 17.5mm, adopting the classic form factor of XIAO, suitable for space-limited projects like wearable devices +- **Powerful MCU Board** : Incorporate the ESP32S3 32-bit, dual-core, Xtensa processor chip operating up to 240 MHz, mounted multiple development ports, Arduino / MicroPython supported +- **Advanced Functionality** : Detachable OV2640 camera sensor for 1600 * 1200 resolution, compatible with OV5640 camera sensor, integrating an additional digital microphone +- **Elaborate Power Design** : Lithium battery charge management capability offers four power consumption models, which allows for deep sleep mode with power consumption as low as 14μA +- **Great Memory for more Possibilities** : Offer 8MB PSRAM and 8MB FLASH, supporting SD card slot for external 32GB FAT memory +- **Outstanding RF performance** : Support 2.4GHz Wi-Fi and BLE dual wireless communication, support 100m+ remote communication when connected with U.FL antenna +- **Thumb-sized Compact Design** : 21 x 17.5mm, adopting the classic form factor of XIAO, suitable for space-limited projects like wearable devices ![](./images/png/xiao_pins.png) diff --git a/contents/labs/seeed/xiao_esp32s3/xiao_esp32s3.qmd b/contents/labs/seeed/xiao_esp32s3/xiao_esp32s3.qmd index 90064ee8..0d77dd03 100644 --- a/contents/labs/seeed/xiao_esp32s3/xiao_esp32s3.qmd +++ b/contents/labs/seeed/xiao_esp32s3/xiao_esp32s3.qmd @@ -6,10 +6,10 @@ These labs provide a unique opportunity to gain practical experience with machin ## Pre-requisites -- **XIAO ESP32S3 Sense Board**: Ensure you have the XIAO ESP32S3 Sense Board. -- **USB-C Cable**: This is for connecting the board to your computer. -- **Network**: With internet access for downloading necessary software. -- **SD Card and an SD card Adapter**: This saves audio and images (optional). +- **XIAO ESP32S3 Sense Board** : Ensure you have the XIAO ESP32S3 Sense Board. +- **USB-C Cable** : This is for connecting the board to your computer. +- **Network** : With internet access for downloading necessary software. +- **SD Card and an SD card Adapter** : This saves audio and images (optional). ## Setup diff --git a/contents/optimizations/optimizations.qmd b/contents/optimizations/optimizations.qmd index 285ab5c5..826dd868 100644 --- a/contents/optimizations/optimizations.qmd +++ b/contents/optimizations/optimizations.qmd @@ -91,10 +91,10 @@ A widely adopted and effective strategy for systematically pruning structures re There are several techniques for assigning these importance scores: -* **Weight Magnitude-Based Pruning**: This approach assigns importance scores to a structure by evaluating the aggregate magnitude of their associated weights. Structures with smaller overall weight magnitudes are considered less critical to the network's performance. -* **Gradient-Based Pruning**: This technique utilizes the gradients of the loss function with respect to the weights associated with a structure. Structures with low cumulative gradient magnitudes, indicating minimal impact on the loss when altered, are prime candidates for pruning. -* **Activation-Based Pruning**: This method tracks how often a neuron or filter is activated by storing this information in a parameter called the activation counter. Each time the structure is activated, the counter is incremented. A low activation count suggests that the structure is less relevant. -* **Taylor Expansion-Based Pruning**: This approach approximates the change in the loss function from removing a given weight. By assessing the cumulative loss disturbance from removing all the weights associated with a structure, you can identify structures with negligible impact on the loss, making them suitable candidates for pruning. +* **Weight Magnitude-Based Pruning** : This approach assigns importance scores to a structure by evaluating the aggregate magnitude of their associated weights. Structures with smaller overall weight magnitudes are considered less critical to the network's performance. +* **Gradient-Based Pruning** : This technique utilizes the gradients of the loss function with respect to the weights associated with a structure. Structures with low cumulative gradient magnitudes, indicating minimal impact on the loss when altered, are prime candidates for pruning. +* **Activation-Based Pruning** : This method tracks how often a neuron or filter is activated by storing this information in a parameter called the activation counter. Each time the structure is activated, the counter is incremented. A low activation count suggests that the structure is less relevant. +* **Taylor Expansion-Based Pruning** : This approach approximates the change in the loss function from removing a given weight. By assessing the cumulative loss disturbance from removing all the weights associated with a structure, you can identify structures with negligible impact on the loss, making them suitable candidates for pruning. The idea is to measure, either directly or indirectly, the contribution of each component to the model's output. Structures with minimal influence according to the defined criteria are pruned first. This enables selective, optimized pruning that maximally compresses models while preserving predictive capacity. In general, it is important to evaluate the impact of removing particular structures on the model's output, with recent works such as [@rachwan2022winning] and [@lubana2020gradient] investigating combinations of techniques like magnitude-based pruning and gradient-based pruning.