Built site for gh-pages

harvard-edge · Sep 4, 2024 · 773b149 · 773b149
1 parent 52ca11c
commit 773b149
Show file tree

Hide file tree

Showing 39 changed files with 195 additions and 2,157 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-72c27ed9
+5c8e4c92
diff --git a/CNAME b/CNAME
diff --git a/Machine-Learning-Systems.pdf b/Machine-Learning-Systems.pdf
diff --git a/contents/ai_for_good/ai_for_good.html b/contents/ai_for_good/ai_for_good.html
@@ -799,7 +799,7 @@ <h2 data-number="19.2" class="anchored" data-anchor-id="agriculture"><span class
 <p>Other sensors, such as GPS units and accelerometers, can track microclimate conditions, soil humidity, and livestock wellbeing. Local real-time data helps farmers respond and adapt better to changes in the field. TinyML analytics at the edge avoids lag, network disruptions, and the high data costs of cloud-based systems. Localized systems allow customization of specific crops, diseases, and regional issues.</p>
 <p>Widespread TinyML applications can help digitize smallholder farms to increase productivity, incomes, and resilience. The low cost of hardware and minimal connectivity requirements make solutions accessible. Projects across the developing world have shown the benefits:</p>
 <ul>
-<li><p>Microsoft’s <a href="https://www.microsoft.com/en-us/research/project/farmbeats-iot-agriculture/">FarmBeats</a> project is an end-to-end approach to enable data-driven farming by using low-cost sensors, drones, and vision and machine learning algorithms. The project aims to solve the problem of limited adoption of technology in farming due to the need for more power and internet connectivity in farms and the farmers’ limited technology savviness. The project aims to increase farm productivity and reduce costs by coupling data with farmers’ knowledge and intuition about their farms. The project has successfully enabled actionable insights from data by building artificial intelligence (AI) or machine learning (ML) models based on fused data sets.</p></li>
+<li><p>Microsoft’s <a href="https://www.microsoft.com/en-us/research/project/farmbeats-iot-agriculture/">FarmBeats</a> project is an end-to-end approach to enable data-driven farming by using low-cost sensors, drones, and vision and machine learning algorithms. The project seeks to solve the problem of limited adoption of technology in farming due to the need for more power and internet connectivity in farms and the farmers’ limited technology savviness. The project strives to increase farm productivity and reduce costs by coupling data with farmers’ knowledge and intuition about their farms. The project has successfully enabled actionable insights from data by building artificial intelligence (AI) or machine learning (ML) models based on fused data sets.</p></li>
 <li><p>In Sub-Saharan Africa, off-the-shelf cameras and edge AI have cut cassava disease losses from 40% to 5%, protecting a staple crop <span class="citation" data-cites="ramcharan2017deep">(<a href="../../references.html#ref-ramcharan2017deep" role="doc-biblioref">Ramcharan et al. 2017</a>)</span>.</p></li>
 <li><p>In Indonesia, sensors monitor microclimates across rice paddies, optimizing water usage even with erratic rains <span class="citation" data-cites="tirtalistyani2022indonesia">(<a href="../../references.html#ref-tirtalistyani2022indonesia" role="doc-biblioref">Tirtalistyani, Murtiningrum, and Kanwar 2022</a>)</span>.</p></li>
 </ul>
@@ -846,7 +846,7 @@ <h3 data-number="19.3.3" class="anchored" data-anchor-id="infectious-disease-con
 <p>Mosquitoes remain the most deadly disease vector worldwide, transmitting illnesses that infect over one billion people annually <span class="citation" data-cites="vectorborne">(<a href="../../references.html#ref-vectorborne" role="doc-biblioref"><span>“Vector-Borne Diseases,”</span> n.d.</a>)</span>. Diseases like malaria, dengue, and Zika are especially prevalent in resource-limited regions lacking robust infrastructure for mosquito control. Monitoring local mosquito populations is essential to prevent outbreaks and properly target interventions.</p>
 <div class="no-row-height column-margin column-container"><div id="ref-vectorborne" class="csl-entry" role="listitem">
 <span>“Vector-Borne Diseases.”</span> n.d. https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases.
-</div></div><p>Traditional monitoring methods are expensive, labor-intensive, and difficult to deploy remotely. The proposed TinyML solution aims to overcome these barriers. Small microphones coupled with machine learning algorithms can classify mosquitoes by species based on minute differences in wing oscillations. The TinyML software runs efficiently on low-cost microcontrollers, eliminating the need for continuous connectivity.</p>
+</div></div><p>Traditional monitoring methods are expensive, labor-intensive, and difficult to deploy remotely. The proposed TinyML solution overcomes these barriers. Small microphones coupled with machine learning algorithms can classify mosquitoes by species based on minute differences in wing oscillations. The TinyML software runs efficiently on low-cost microcontrollers, eliminating the need for continuous connectivity.</p>
 <p>A collaborative research team from the University of Khartoum and the ICTP is exploring an innovative solution using TinyML. In a recent paper, they presented a low-cost device that can identify disease-spreading mosquito species through their wing beat sounds <span class="citation" data-cites="altayeb2022classifying">(<a href="../../references.html#ref-altayeb2022classifying" role="doc-biblioref">Altayeb, Zennaro, and Rovai 2022</a>)</span>.</p>
 <div class="no-row-height column-margin column-container"><div id="ref-altayeb2022classifying" class="csl-entry" role="listitem">
 Altayeb, Moez, Marco Zennaro, and Marcelo Rovai. 2022. <span>“Classifying Mosquito Wingbeat Sound Using <span>TinyML</span>.”</span> In <em>Proceedings of the 2022 ACM Conference on Information Technology for Social Good</em>, 132–37. ACM. <a href="https://doi.org/10.1145/3524458.3547258">https://doi.org/10.1145/3524458.3547258</a>.

diff --git a/contents/benchmarking/benchmarking.html b/contents/benchmarking/benchmarking.html
@@ -795,7 +795,7 @@ <h1 class="title"><span id="sec-benchmarking_ai" class="quarto-section-identifie
 </figure>
 </div>
 <p>Benchmarking is critical to developing and deploying machine learning systems, especially TinyML applications. Benchmarks allow developers to measure and compare the performance of different model architectures, training procedures, and deployment strategies. This provides key insights into which approaches work best for the problem at hand and the constraints of the deployment environment.</p>
-<p>This chapter will provide an overview of popular ML benchmarks, best practices for benchmarking, and how to use benchmarks to improve model development and system performance. It aims to provide developers with the right tools and knowledge to effectively benchmark and optimize their systems, especially for TinyML systems.</p>
+<p>This chapter will provide an overview of popular ML benchmarks, best practices for benchmarking, and how to use benchmarks to improve model development and system performance. It provides developers with the right tools and knowledge to effectively benchmark and optimize their systems, especially for TinyML systems.</p>
 <div class="callout callout-style-default callout-tip callout-titled">
 <div class="callout-header d-flex align-content-center">
 <div class="callout-icon-container">
@@ -1463,7 +1463,7 @@ <h2 data-number="11.6" class="anchored" data-anchor-id="data-benchmarking"><span
 </div></div></figure>
 </div>
 <p>An alternative paradigm is emerging called data-centric AI. Rather than treating data as static and focusing narrowly on model performance, this approach recognizes that models are only as good as their training data. So, the emphasis shifts to curating high-quality datasets that better reflect real-world complexity, developing more informative evaluation benchmarks, and carefully considering how data is sampled, preprocessed, and augmented. The goal is to optimize model behavior by improving the data rather than just optimizing metrics on flawed datasets. Data-centric AI critically examines and enhances the data itself to produce beneficial AI. This reflects an important evolution in mindset as the field addresses the shortcomings of narrow benchmarking.</p>
-<p>This section will explore the key differences between model-centric and data-centric approaches to AI. This distinction has important implications for how we benchmark AI systems. Specifically, we will see how focusing on data quality and Efficiency can directly improve machine learning performance as an alternative to optimizing model architectures solely. The data-centric approach recognizes that models are only as good as their training data. So, enhancing data curation, evaluation benchmarks, and data handling processes can produce AI systems that are safer, fairer, and more robust. Rethinking benchmarking to prioritize data alongside models represents an important evolution as the field aims to deliver trustworthy real-world impact.</p>
+<p>This section will explore the key differences between model-centric and data-centric approaches to AI. This distinction has important implications for how we benchmark AI systems. Specifically, we will see how focusing on data quality and Efficiency can directly improve machine learning performance as an alternative to optimizing model architectures solely. The data-centric approach recognizes that models are only as good as their training data. So, enhancing data curation, evaluation benchmarks, and data handling processes can produce AI systems that are safer, fairer, and more robust. Rethinking benchmarking to prioritize data alongside models represents an important evolution as the field strives to deliver trustworthy real-world impact.</p>
 <section id="limitations-of-model-centric-ai" class="level3" data-number="11.6.1">
 <h3 data-number="11.6.1" class="anchored" data-anchor-id="limitations-of-model-centric-ai"><span class="header-section-number">11.6.1</span> Limitations of Model-Centric AI</h3>
 <p>In the model-centric AI era, a prominent characteristic was the development of complex model architectures. Researchers and practitioners dedicated substantial effort to devising sophisticated and intricate models in the quest for superior performance. This frequently involved the incorporation of additional layers and the fine-tuning of a multitude of hyperparameters to achieve incremental improvements in accuracy. Concurrently, there was a significant emphasis on leveraging advanced algorithms. These algorithms, often at the forefront of the latest research, were employed to improve the performance of AI models. The primary aim of these algorithms was to optimize the learning process of models, thereby extracting maximal information from the training data.</p>
@@ -1479,11 +1479,11 @@ <h3 data-number="11.6.2" class="anchored" data-anchor-id="the-shift-toward-data-
 <p>Data-centric AI puts a strong emphasis on the cleaning and labeling of data. Cleaning involves the removal of outliers, handling missing values, and addressing other data inconsistencies. Labeling, on the other hand, involves assigning meaningful and accurate labels to the data. Both these processes are crucial in ensuring that the AI model is trained on accurate and relevant data. Another important aspect of the data-centric approach is data augmentation. This involves artificially increasing the size and diversity of the dataset by applying various transformations to the data, such as rotation, scaling, and flipping training images. Data augmentation helps in improving the model’s robustness and generalization capabilities.</p>
 <p>There are several benefits to adopting a data-centric approach to AI development. First and foremost, it leads to improved model performance and generalization capabilities. By ensuring that the model is trained on high-quality, diverse data, the model can better generalize to new, unseen data <span class="citation" data-cites="gaviria2022dollar">(<a href="../../references.html#ref-gaviria2022dollar" role="doc-biblioref">Mattson et al. 2020b</a>)</span>.</p>
 <div class="no-row-height column-margin column-container"></div><p>Additionally, a data-centric approach can often lead to simpler models that are easier to interpret and maintain. This is because the emphasis is on the data rather than the model architecture, meaning simpler models can achieve high performance when trained on high-quality data.</p>
-<p>The shift towards data-centric AI represents a significant paradigm shift. By prioritizing the quality of the input data, this approach aims to improve model performance and generalization capabilities, ultimately leading to more robust and reliable AI systems. As we continue to advance in our understanding and application of AI, the data-centric approach is likely to play an important role in shaping the future of this field.</p>
+<p>The shift towards data-centric AI represents a significant paradigm shift. By prioritizing the quality of the input data, this approach tries to model performance and generalization capabilities, ultimately leading to more robust and reliable AI systems. As we continue to advance in our understanding and application of AI, the data-centric approach is likely to play an important role in shaping the future of this field.</p>
 </section>
 <section id="benchmarking-data" class="level3 page-columns page-full" data-number="11.6.3">
 <h3 data-number="11.6.3" class="anchored" data-anchor-id="benchmarking-data"><span class="header-section-number">11.6.3</span> Benchmarking Data</h3>
-<p>Data benchmarking aims to evaluate common issues in datasets, such as identifying label errors, noisy features, representation imbalance (for example, out of the 1000 classes in Imagenet-1K, there are over 100 categories which are just types of dogs), class imbalance (where some classes have many more samples than others), whether models trained on a given dataset can generalize to out-of-distribution features, or what types of biases might exist in a given dataset <span class="citation" data-cites="gaviria2022dollar">(<a href="../../references.html#ref-gaviria2022dollar" role="doc-biblioref">Mattson et al. 2020b</a>)</span>. In its simplest form, data benchmarking aims to improve accuracy on a test set by removing noisy or mislabeled training samples while keeping the model architecture fixed. Recent competitions in data benchmarking have invited participants to submit novel augmentation strategies and active learning techniques.</p>
+<p>Data benchmarking focuses on evaluating common issues in datasets, such as identifying label errors, noisy features, representation imbalance (for example, out of the 1000 classes in Imagenet-1K, there are over 100 categories which are just types of dogs), class imbalance (where some classes have many more samples than others), whether models trained on a given dataset can generalize to out-of-distribution features, or what types of biases might exist in a given dataset <span class="citation" data-cites="gaviria2022dollar">(<a href="../../references.html#ref-gaviria2022dollar" role="doc-biblioref">Mattson et al. 2020b</a>)</span>. In its simplest form, data benchmarking seeks to improve accuracy on a test set by removing noisy or mislabeled training samples while keeping the model architecture fixed. Recent competitions in data benchmarking have invited participants to submit novel augmentation strategies and active learning techniques.</p>
 <div class="no-row-height column-margin column-container"><div id="ref-gaviria2022dollar" class="csl-entry" role="listitem">
 Mattson, Peter, Vijay Janapa Reddi, Christine Cheng, Cody Coleman, Greg Diamos, David Kanter, Paulius Micikevicius, et al. 2020b. <span>“<span>MLPerf:</span> <span>An</span> Industry Standard Benchmark Suite for Machine Learning Performance.”</span> <em>IEEE Micro</em> 40 (2): 8–16. <a href="https://doi.org/10.1109/mm.2020.2974843">https://doi.org/10.1109/mm.2020.2974843</a>.
 </div></div><p>Data-centric techniques continue to gain attention in benchmarking, especially as foundation models are increasingly trained on self-supervised objectives. Compared to smaller datasets like Imagenet-1K, massive datasets commonly used in self-supervised learning, such as Common Crawl, OpenImages, and LAION-5B, contain higher amounts of noise, duplicates, bias, and potentially offensive data.</p>

diff --git a/contents/conclusion/conclusion.html b/contents/conclusion/conclusion.html
@@ -694,7 +694,7 @@ <h2 data-number="20.1" class="anchored" data-anchor-id="introduction"><span clas
 <p>Our journey started by tracing ML’s historical trajectory, from its theoretical foundations to its current state as a transformative force across industries (<a href="../dl_primer/dl_primer.html" class="quarto-xref"><span>Chapter 3</span></a>). This journey has highlighted the remarkable progress in the field, challenges, and opportunities.</p>
 <p>Throughout this book, we have looked into the intricacies of ML systems, examining the critical components and best practices necessary to create a seamless and efficient pipeline. From data preprocessing and model training to deployment and monitoring, we have provided insights and guidance to help readers navigate the complex landscape of ML system development.</p>
 <p>ML systems involve complex workflows, spanning various topics from data engineering to model deployment on diverse systems (<a href="../workflow/workflow.html" class="quarto-xref"><span>Chapter 4</span></a>). By providing an overview of these ML system components, we have aimed to showcase the tremendous depth and breadth of the field and expertise that is needed. Understanding the intricacies of ML workflows is crucial for practitioners and researchers alike, as it enables them to navigate the landscape effectively and develop robust, efficient, and impactful ML solutions.</p>
-<p>By focusing on the systems aspect of ML, we aim to bridge the gap between theoretical knowledge and practical implementation. Just as a healthy human body system allows the organs to function optimally, a well-designed ML system enables the models to consistently deliver accurate and reliable results. This book aims to empower readers with the knowledge and tools necessary to build ML systems that showcase the underlying models’ power and ensure smooth integration and operation, much like a well-functioning human body.</p>
+<p>By focusing on the systems aspect of ML, we aim to bridge the gap between theoretical knowledge and practical implementation. Just as a healthy human body system allows the organs to function optimally, a well-designed ML system enables the models to consistently deliver accurate and reliable results. This book’s goal is to empower readers with the knowledge and tools necessary to build ML systems that showcase the underlying models’ power and ensure smooth integration and operation, much like a well-functioning human body.</p>
 </section>
 <section id="knowing-the-importance-of-ml-datasets" class="level2" data-number="20.2">
 <h2 data-number="20.2" class="anchored" data-anchor-id="knowing-the-importance-of-ml-datasets"><span class="header-section-number">20.2</span> Knowing the Importance of ML Datasets</h2>