diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/01.png b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/01.png
new file mode 100644
index 000000000..98a272f84
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/01.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/02.png b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/02.png
new file mode 100644
index 000000000..d0b8df7cb
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/02.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/03.png b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/03.png
new file mode 100644
index 000000000..80e41973f
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/03.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/04.png b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/04.png
new file mode 100644
index 000000000..d098da4e1
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/04.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/05.png b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/05.png
new file mode 100644
index 000000000..8fa7609f6
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/05.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/06.png b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/06.png
new file mode 100644
index 000000000..a78e5ee6f
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/06.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/07.png b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/07.png
new file mode 100644
index 000000000..5993f29b2
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/07.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/08.png b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/08.png
new file mode 100644
index 000000000..a01e883ef
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/08.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/09.png b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/09.png
new file mode 100644
index 000000000..64d714c26
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/09.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/10.png b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/10.png
new file mode 100644
index 000000000..571783c51
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/android_halide/Figures/10.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/_index.md b/content/learning-paths/mobile-graphics-and-gaming/android_halide/_index.md
new file mode 100644
index 000000000..56bbf291a
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/android_halide/_index.md
@@ -0,0 +1,46 @@
+---
+title: Halide Essentials. From Basics to Android Integration
+minutes_to_complete: 180
+
+who_is_this_for: This is an introductory topic for software developers interested in learning how to use Halide for image processing.
+
+learning_objectives:
+ - Introduction, Background, and Installation.
+ - Building a Simple Camera/Image Processing Workflow.
+ - Demonstrating Operation Fusion.
+ - Integrating Halide into an Android (Kotlin) Project
+
+prerequisites:
+
+
+author: Dawid Borycki
+
+### Tags
+skilllevels: Introductory
+subjects: Performance and Architecture
+armips:
+ - Cortex-A
+ - Cortex-X
+operatingsystems:
+ - Android
+tools_software_languages:
+ - Android Studio
+ - Coding
+
+further_reading:
+ - resource:
+ title: Halide 19.0.0
+ link: https://halide-lang.org/docs/index.html
+ type: website
+ - resource:
+ title: Halide GitHub
+ link: https://github.com/halide/Halide
+ type: repository
+
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1 # _index.md always has weight of 1 to order correctly
+layout: "learningpathall" # All files under learning paths have this same wrapper
+learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/_next-steps.md b/content/learning-paths/mobile-graphics-and-gaming/android_halide/_next-steps.md
new file mode 100644
index 000000000..c3db0de5a
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/android_halide/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+# FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps" # Always the same, html page title.
+layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/android.md b/content/learning-paths/mobile-graphics-and-gaming/android_halide/android.md
new file mode 100644
index 000000000..e1809816b
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/android_halide/android.md
@@ -0,0 +1,410 @@
+---
+# User change
+title: "Integrating Halide into an Android (Kotlin) Project"
+
+weight: 6
+
+layout: "learningpathall"
+---
+
+## Objective
+In this lesson, we’ll learn how to integrate a high-performance Halide image-processing pipeline into an Android application using Kotlin.
+
+## Overview of Mobile Integration with Halide
+Android is the world’s most widely-used mobile operating system, powering billions of devices across diverse markets. This vast user base makes Android an ideal target platform for developers aiming to reach a broad audience, particularly in applications requiring sophisticated image and signal processing, such as augmented reality, photography, video editing, and real-time analytics.
+
+Kotlin, now the preferred programming language for Android development, combines concise syntax with robust language features, enabling developers to write maintainable, expressive, and safe code. It offers seamless interoperability with existing Java codebases and straightforward integration with native code via JNI, simplifying the development of performant mobile applications.
+
+## Benefits of Using Halide on Mobile
+Integrating Halide into Android applications brings several key advantages:
+1. Performance. Halide enables significant acceleration of complex image processing algorithms, often surpassing the speed of traditional Java or Kotlin implementations by leveraging optimized code generation. By generating highly optimized native code tailored for ARM CPUs or GPUs, Halide can dramatically increase frame rates and responsiveness, essential for real-time or interactive applications.
+2. Efficiency. On mobile devices, resource efficiency translates directly to improved battery life and reduced thermal output. Halide’s scheduling strategies (such as operation fusion, tiling, parallelization, and vectorization) minimize unnecessary memory transfers, CPU usage, and GPU overhead. This optimization substantially reduces overall power consumption, extending battery life and enhancing the user experience by preventing overheating.
+3. Portability. Halide abstracts hardware-specific details, allowing developers to write a single high-level pipeline that easily targets different processor architectures and hardware configurations. Pipelines can seamlessly run on various ARM-based CPUs and GPUs commonly found in Android smartphones and tablets, enabling developers to support a wide range of devices with minimal platform-specific modifications.
+
+In short, Halide delivers high-performance image processing without sacrificing portability or efficiency, a balance particularly valuable on resource-constrained mobile devices.
+
+### Android Development Ecosystem and Challenges
+While Android presents abundant opportunities for developers, the mobile development ecosystem brings its own set of challenges, especially for performance-intensive applications:
+1. Limited Hardware Resources. Unlike desktop or server environments, mobile devices have significant constraints on processing power, memory capacity, and battery life. Developers must optimize software meticulously to deliver smooth performance while carefully managing hardware resource consumption. Leveraging tools like Halide allows developers to overcome these constraints by optimizing computational workloads, making resource-intensive tasks feasible on constrained hardware.
+2. Cross-Compilation Complexities. Developing native code for Android requires handling multiple hardware architectures (such as ARMv7, ARM64, and sometimes x86/x86_64). Cross-compilation introduces complexities due to different instruction sets, CPU features, and performance characteristics. Managing this complexity involves careful use of the Android NDK, understanding toolchains, and correctly configuring build systems (e.g., Gradle, CMake). Halide helps mitigate these issues by abstracting away many platform-specific optimizations, automatically generating code optimized for target architectures.
+3. Image-Format Conversions (Bitmap ↔ Halide Buffer). Android typically handles images through the Bitmap class or similar platform-specific constructs, whereas Halide expects image data to be in raw, contiguous buffer formats. Developers must bridge the gap between Android-specific image representations (Bitmaps, YUV images from camera APIs, etc.) and Halide’s native buffer format. Proper management of these conversions—including considerations for pixel formats, stride alignment, and memory copying overhead—can significantly impact performance and correctness, necessitating careful design and efficient implementation of buffer-handling routines.
+
+## Project Requirements
+Before integrating Halide into your Android application, ensure you have the necessary tools and libraries.
+
+### Tools and Prerequisites
+1. Android Studio. [Download link](https://developer.android.com/studio).
+2. Android NDK (Native Development Kit). Can be easily installed from Android Studio (Tools → SDK Manager → SDK Tools → Android NDK).
+
+## Setting Up the Android Project
+### Creating the Project:
+1. Open Android Studio.
+2. Select New Project > Native C++.
+
+
+### Configure the Project:
+1. Set the project Name to Arm.Halide.AndroidDemo.
+2. Choose Kotlin as the language.
+3. Set Minimum SDK to API 24.
+4. Click Next.
+
+5. Select C++17 from the C++ Standard dropdown list.
+
+6. Click Finish.
+
+## Configuring the Android Project
+Next, configure your Android project to use the files generated in the previous step. First, copy blur_threshold_android.a and blur_threshold_android.h into ArmHalideAndroidDemo/app/src/main/cpp. Ensure your cpp directory contains the following files:
+* native-lib.cpp
+* blur_threshold_android.a
+* blur_threshold_android.h
+* CMakeLists.txt
+
+Open CMakeLists.txt and modify it as follows (replace /path/to/halide with your Halide installation directory)::
+```cpp
+cmake_minimum_required(VERSION 3.22.1)
+
+project("armhalideandroiddemo")
+include_directories(
+ /path/to/halide/include
+)
+
+add_library(blur_threshold_android STATIC IMPORTED)
+set_target_properties(blur_threshold_android PROPERTIES IMPORTED_LOCATION
+ ${CMAKE_CURRENT_SOURCE_DIR}/blur_threshold_android.a
+)
+
+add_library(${CMAKE_PROJECT_NAME} SHARED native-lib.cpp)
+
+target_link_libraries(${CMAKE_PROJECT_NAME}
+ blur_threshold_android
+ android
+ log)
+```
+
+Open build.gradle.kts and modify it as follows:
+
+```console
+plugins {
+ alias(libs.plugins.android.application)
+ alias(libs.plugins.kotlin.android)
+}
+
+android {
+ namespace = "com.arm.armhalideandroiddemo"
+ compileSdk = 35
+
+ defaultConfig {
+ applicationId = "com.arm.armhalideandroiddemo"
+ minSdk = 24
+ targetSdk = 34
+ versionCode = 1
+ versionName = "1.0"
+ ndk {
+ abiFilters += "arm64-v8a"
+ }
+ testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner"
+ externalNativeBuild {
+ cmake {
+ cppFlags += "-std=c++17"
+ }
+ }
+ }
+
+ buildTypes {
+ release {
+ isMinifyEnabled = false
+ proguardFiles(
+ getDefaultProguardFile("proguard-android-optimize.txt"),
+ "proguard-rules.pro"
+ )
+ }
+ }
+ compileOptions {
+ sourceCompatibility = JavaVersion.VERSION_11
+ targetCompatibility = JavaVersion.VERSION_11
+ }
+ kotlinOptions {
+ jvmTarget = "11"
+ }
+ externalNativeBuild {
+ cmake {
+ path = file("src/main/cpp/CMakeLists.txt")
+ version = "3.22.1"
+ }
+ }
+ buildFeatures {
+ viewBinding = true
+ }
+}
+
+dependencies {
+
+ implementation(libs.androidx.core.ktx)
+ implementation(libs.androidx.appcompat)
+ implementation(libs.material)
+ implementation(libs.androidx.constraintlayout)
+ testImplementation(libs.junit)
+ androidTestImplementation(libs.androidx.junit)
+ androidTestImplementation(libs.androidx.espresso.core)
+}
+```
+
+Click the Sync Now button at the top. To verify that everything is configured correctly, click Build > Make Project in Android Studio.
+
+## UI
+Now, you'll define the application's User Interface, consisting of two buttons and an ImageView. One button loads the image, the other processes it, and the ImageView displays both the original and processed images.
+1. Open the res/layout/activity_main.xml file, and modify it as follows:
+```XML
+
+
+
+
+
+
+
+
+
+
+
+
+
+```
+
+2. In MainActivity.kt, comment out the following line:
+
+```java
+//binding.sampleText.text = stringFromJNI()
+```
+
+Now you can run the app to view the UI:
+
+
+
+## Processing
+You will now implement the image processing code. First, pick up an image you want to process. Here we use the camera man. Then, under the Arm.Halide.AndroidDemo/src/main create assets folder, and save the image under that folder as img.png.
+
+Now, open MainActivity.kt and modify it as follows:
+```java
+package com.arm.armhalideandroiddemo
+
+import android.graphics.Bitmap
+import android.graphics.BitmapFactory
+import androidx.appcompat.app.AppCompatActivity
+import android.os.Bundle
+import android.widget.Button
+import android.widget.ImageView
+import com.arm.armhalideandroiddemo.databinding.ActivityMainBinding
+import kotlinx.coroutines.CoroutineScope
+import kotlinx.coroutines.Dispatchers
+import kotlinx.coroutines.launch
+import kotlinx.coroutines.withContext
+import java.io.InputStream
+
+class MainActivity : AppCompatActivity() {
+
+ private lateinit var binding: ActivityMainBinding
+
+ private var originalBitmap: Bitmap? = null
+ private lateinit var btnLoadImage: Button
+ private lateinit var btnProcessImage: Button
+ private lateinit var imageView: ImageView
+
+ override fun onCreate(savedInstanceState: Bundle?) {
+ super.onCreate(savedInstanceState)
+
+ binding = ActivityMainBinding.inflate(layoutInflater)
+ setContentView(binding.root)
+
+ btnLoadImage = findViewById(R.id.btnLoadImage)
+ btnProcessImage = findViewById(R.id.btnProcessImage)
+ imageView = findViewById(R.id.imageView)
+
+ // Load the image from assets when the user clicks "Load Image"
+ btnLoadImage.setOnClickListener {
+ originalBitmap = loadImageFromAssets("img.png")
+ originalBitmap?.let {
+ imageView.setImageBitmap(it)
+ // Enable the process button only if the image is loaded.
+ btnProcessImage.isEnabled = true
+ }
+ }
+
+ // Process the image using Halide when the user clicks "Process Image"
+ btnProcessImage.setOnClickListener {
+ originalBitmap?.let { bmp ->
+ // Run the processing on a background thread using coroutines.
+ CoroutineScope(Dispatchers.IO).launch {
+ // Convert Bitmap to grayscale byte array.
+ val grayBytes = extractGrayScaleBytes(bmp)
+
+ // Call your native function via JNI.
+ val processedBytes = blurThresholdImage(grayBytes, bmp.width, bmp.height)
+
+ // Convert processed bytes back to a Bitmap.
+ val processedBitmap = createBitmapFromGrayBytes(processedBytes, bmp.width, bmp.height)
+
+ // Update UI on the main thread.
+ withContext(Dispatchers.Main) {
+ imageView.setImageBitmap(processedBitmap)
+ }
+ }
+ }
+ }
+ }
+
+ // Utility to load an image from the assets folder.
+ private fun loadImageFromAssets(fileName: String): Bitmap? {
+ return try {
+ val assetManager = assets
+ val istr: InputStream = assetManager.open(fileName)
+ BitmapFactory.decodeStream(istr)
+ } catch (e: Exception) {
+ e.printStackTrace()
+ null
+ }
+ }
+
+ // Convert Bitmap to a grayscale ByteArray.
+ private fun extractGrayScaleBytes(bitmap: Bitmap): ByteArray {
+ val width = bitmap.width
+ val height = bitmap.height
+ val pixels = IntArray(width * height)
+ bitmap.getPixels(pixels, 0, width, 0, 0, width, height)
+ val grayBytes = ByteArray(width * height)
+ var index = 0
+ for (pixel in pixels) {
+ val r = (pixel shr 16 and 0xFF)
+ val g = (pixel shr 8 and 0xFF)
+ val b = (pixel and 0xFF)
+ val gray = ((r + g + b) / 3).toByte()
+ grayBytes[index++] = gray
+ }
+ return grayBytes
+ }
+
+ // Convert a grayscale byte array back to a Bitmap.
+ private fun createBitmapFromGrayBytes(grayBytes: ByteArray, width: Int, height: Int): Bitmap {
+ val bitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888)
+ val pixels = IntArray(width * height)
+ var idx = 0
+ for (i in 0 until width * height) {
+ val gray = grayBytes[idx++].toInt() and 0xFF
+ pixels[i] = (0xFF shl 24) or (gray shl 16) or (gray shl 8) or gray
+ }
+ bitmap.setPixels(pixels, 0, width, 0, 0, width, height)
+ return bitmap
+ }
+
+ external fun blurThresholdImage(inputBytes: ByteArray, width: Int, height: Int): ByteArray
+
+ companion object {
+ // Used to load the 'armhalideandroiddemo' library on application startup.
+ init {
+ System.loadLibrary("armhalideandroiddemo")
+ }
+ }
+}
+```
+
+This Kotlin Android application demonstrates integrating a Halide-generated image-processing pipeline within an Android app. The main activity (MainActivity) manages loading and processing an image stored in the application’s asset folder.
+
+When the app launches, the Process Image button is disabled. When a user taps Load Image, the app retrieves img.png from its assets directory and displays it within the ImageView, simultaneously enabling the Process Image button for further interaction.
+
+Upon pressing the Process Image button, the following sequence occurs:
+1. Background Processing. A Kotlin coroutine initiates processing on a background thread, ensuring the application’s UI remains responsive.
+2. Conversion to Grayscale. The loaded bitmap image is converted into a grayscale byte array using a simple RGB-average method, preparing it for processing by the native (JNI) layer.
+3. Native Function Invocation. This grayscale byte array, along with image dimensions, is passed to a native function (blurThresholdImage) defined via JNI. This native function is implemented using the Halide pipeline, performing operations such as blurring and thresholding directly on the image data.
+4. Post-processing. After the native function completes, the resulting processed grayscale byte array is converted back into a Bitmap image.
+5. UI Update. The coroutine then updates the displayed image (on the main UI thread) with this newly processed bitmap, providing the user immediate visual feedback.
+
+The code defines three utility methods:
+1. loadImageFromAssets, which retrieves an image from the assets folder and decodes it into a Bitmap.
+2. extractGrayScaleBytes - converts a Bitmap into a grayscale byte array suitable for native processing.
+3. createBitmapFromGrayBytes - converts a grayscale byte array back into a Bitmap for display purposes.
+
+The JNI integration occurs through an external method declaration, blurThresholdImage, loaded via the companion object at app startup. The native library (armhalideandroiddemo) containing this function is compiled separately and integrated into the application (native-lib.cpp).
+
+You will now need to create blurThresholdImage function. To do so, in Android Studio put the cursor above blurThresholdImage function, and then click Create JNI function for blurThresholdImage:
+
+
+This will generate a new function in the native-lib.cpp:
+```cpp
+extern "C"
+JNIEXPORT jbyteArray JNICALL
+Java_com_arm_armhalideandroiddemo_MainActivity_blurThresholdImage(JNIEnv *env, jobject thiz,
+ jbyteArray input_bytes,
+ jint width, jint height) {
+ // TODO: implement blurThresholdImage()
+}
+```
+
+Implement this function as follows:
+```cpp
+extern "C"
+JNIEXPORT jbyteArray JNICALL
+Java_com_arm_armhalideandroiddemo_MainActivity_blurThresholdImage(JNIEnv *env, jobject thiz,
+ jbyteArray input_bytes,
+ jint width, jint height) {
+ // Get the input byte array
+ jbyte* inBytes = env->GetByteArrayElements(input_bytes, nullptr);
+ if (inBytes == nullptr) return nullptr;
+
+ // Wrap the grayscale image in a Halide::Runtime::Buffer.
+ Halide::Runtime::Buffer inputBuffer(reinterpret_cast(inBytes), width, height);
+
+ // Prepare an output buffer of the same size.
+ Halide::Runtime::Buffer outputBuffer(width, height);
+
+ // Call your Halide AOT function. Its signature is typically:
+ blur_threshold(inputBuffer, outputBuffer);
+
+ // Allocate a jbyteArray for the output.
+ jbyteArray outputArray = env->NewByteArray(width * height);
+ // Copy the data from Halide's output buffer to the jbyteArray.
+ env->SetByteArrayRegion(outputArray, 0, width * height, reinterpret_cast(outputBuffer.data()));
+
+ env->ReleaseByteArrayElements(input_bytes, inBytes, JNI_ABORT);
+ return outputArray;
+}
+```
+Then supplement the native-lib.cpp file by the following includes:
+```cpp
+#include "HalideBuffer.h"
+#include "Halide.h"
+#include "blur_threshold_android.h"
+```
+
+This C++ function acts as a bridge between Java (Kotlin) and native code. Specifically, the function blurThresholdImage is implemented using JNI, allowing it to be directly called from Kotlin. When invoked from Kotlin (through the external fun blurThresholdImage declaration), the function receives a grayscale image represented as a Java byte array (jbyteArray) along with its width and height.
+
+The input Java byte array (input_bytes) is accessed and pinned into native memory via GetByteArrayElements. This provides a direct pointer (inBytes) to the grayscale data sent from Kotlin. The raw grayscale byte data is wrapped into a Halide::Runtime::Buffer object (inputBuffer). This buffer structure is required by the Halide pipeline. An output buffer (outputBuffer) is created with the same dimensions as the input image. This buffer will store the result produced by the Halide pipeline. The native function invokes the Halide-generated AOT function blur_threshold, passing in both the input and output buffers. After processing, a new Java byte array (outputArray) is allocated to hold the processed grayscale data. The processed data from the Halide output buffer is copied into this Java array using SetByteArrayRegion. The native input buffer (inBytes) is explicitly released using ReleaseByteArrayElements, specifying JNI_ABORT as no changes were made to the input array. Finally, the processed byte array (outputArray) is returned to Kotlin.
+
+Through this JNI bridge, Kotlin can invoke high-performance native code. You can now re-run the application. Click the Load Image button, and then Process Image. You will see the following results:
+
+
+
+
+## Summary
+In this lesson, we’ve successfully integrated a Halide image-processing pipeline into an Android application using Kotlin. We started by setting up an Android project configured for native development with the Android NDK, employing Kotlin as the primary language. We then integrated Halide-generated static libraries and demonstrated their usage through Java Native Interface (JNI), bridging Kotlin and native code. This equips developers with the skills needed to harness Halide’s capabilities for building sophisticated, performant mobile applications on Android.
\ No newline at end of file
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md b/content/learning-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md
new file mode 100644
index 000000000..52a4a68e7
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/android_halide/aot-and-cross-compilation.md
@@ -0,0 +1,142 @@
+---
+# User change
+title: "Ahead-of-time and cross-compilation"
+
+weight: 5
+
+layout: "learningpathall"
+---
+
+## Ahead-of-time and cross-compilation
+One of Halide’s standout features is the ability to compile image processing pipelines ahead-of-time (AOT), enabling developers to generate optimized binary code on their host machines rather than compiling directly on target devices. This AOT compilation process allows developers to create highly efficient libraries that run effectively across diverse hardware without incurring the runtime overhead associated with just-in-time (JIT) compilation.
+
+Halide also supports robust cross-compilation capabilities. Cross-compilation means using the host version of Halide, typically running on a desktop Linux or macOS system—to target different architectures, such as ARM for Android devices. Developers can thus optimize Halide pipelines on their host machine, produce libraries specifically optimized for Android, and integrate them seamlessly into Android applications. The generated pipeline code includes essential optimizations and can embed minimal runtime support, further reducing workload on the target device and ensuring responsiveness and efficiency.
+
+## Objective
+In this section, we leverage the host version of Halide to perform AOT compilation of an image processing pipeline via cross-compilation. The resulting pipeline library is specifically tailored to Android devices (targeting, for instance, arm64-v8a ABI), while the compilation itself occurs entirely on the host system. This approach significantly accelerates development by eliminating the need to build Halide or perform JIT compilation on Android devices. It also guarantees that the resulting binaries are optimized for the intended hardware, streamlining the deployment of high-performance image processing applications on mobile platforms.
+
+## Prepare Pipeline for Android
+The procedure implemented in the following code demonstrates how Halide’s AOT compilation and cross-compilation features can be utilized to create an optimized image processing pipeline for Android. We will run Halide on our host machine (in this example, macOS) to generate a static library containing the pipeline function, which will later be invoked from an Android device. Below is a step-by-step explanation of this process.
+
+Create a new file named blur-android.cpp with the following contents:
+
+```cpp
+#include "Halide.h"
+#include
+using namespace Halide;
+
+int main(int argc, char** argv) {
+ if (argc < 2) {
+ std::cerr << "Usage: " << argv[0] << " \n";
+ return 1;
+ }
+
+ std::string output_basename = argv[1];
+
+ // Configure Halide Target for Android
+ Halide::Target target;
+ target.os = Halide::Target::OS::Android;
+ target.arch = Halide::Target::Arch::ARM;
+ target.bits = 64;
+ target.set_feature(Target::NoRuntime, false);
+
+ // --- Define the pipeline ---
+ // Define variables
+ Var x("x"), y("y");
+
+ // Define input parameter
+ ImageParam input(UInt(8), 2, "input");
+
+ // Create a clamped function that limits the access to within the image bounds
+ Func clamped("clamped");
+ clamped(x, y) = input(clamp(x, 0, input.width()-1),
+ clamp(y, 0, input.height()-1));
+
+ // Now use the clamped function in processing
+ RDom r(0, 3, 0, 3);
+ Func blur("blur");
+
+ // Initialize blur accumulation
+ blur(x, y) = cast(0);
+ blur(x, y) += cast(clamped(x + r.x - 1, y + r.y - 1));
+
+ // Then continue with pipeline
+ Func blur_div("blur_div");
+ blur_div(x, y) = cast(blur(x, y) / 9);
+
+ // Thresholding
+ Func thresholded("thresholded");
+ Expr t = cast(128);
+ thresholded(x, y) = select(blur_div(x, y) > t, cast(255), cast(0));
+
+ // Simple scheduling
+ blur_div.compute_root();
+ thresholded.compute_root();
+
+ // --- AOT compile to a file ---
+ thresholded.compile_to_static_library(
+ output_basename, // base filename
+ { input }, // list of inputs
+ "blur_threshold", // name of the generated function
+ target
+ );
+
+ return 0;
+}
+```
+
+The program takes at least one command-line argument, the output base name used to generate the files (e.g., “blur_threshold_android”). Here, the target architecture is explicitly set within the code to Android ARM64:
+
+```cpp
+// Configure Halide Target for Android
+Halide::Target target;
+target.os = Halide::Target::OS::Android;
+target.arch = Halide::Target::Arch::ARM;
+target.bits = 64;
+target.set_feature(Target::NoRuntime, false);
+```
+
+We declare spatial variables (x, y) and an ImageParam named “input” representing the input image data. We use boundary clamping (clamp) to safely handle edge pixels. Then, we apply a 3x3 blur with a reduction domain (RDom). The accumulated sum is divided by 9 (the number of pixels in the neighborhood), producing an average blurred image. Lastly, thresholding is applied, producing a binary output: pixels above a certain brightness threshold (128) become white (255), while others become black (0).
+
+Simple scheduling directives (compute_root) instruct Halide to compute intermediate functions at the pipeline’s root, simplifying debugging and potentially enhancing runtime efficiency.
+
+We invoke Halide’s AOT compilation function compile_to_static_library, which generates a static library (.a) containing the optimized pipeline and a corresponding header file (.h).
+
+```cpp
+thresholded.compile_to_static_library(
+ output_basename, // base filename for output files (e.g., "blur_threshold_android")
+ { input }, // list of input parameters to the pipeline
+ "blur_threshold", // the generated function name
+ target // our target configuration for Android
+);
+```
+
+This will produce:
+* A static library (blur_threshold_android.a) containing the compiled pipeline.
+* A header file (blur_threshold_android.h) declaring the pipeline function for use in other C++/JNI code.
+
+These generated files are then ready to integrate directly into an Android project via JNI, allowing efficient execution of the optimized pipeline on Android devices. The integration process is covered in the next section.
+
+## Compilation instructions
+To compile the pipeline-generation program on your host system, use the following commands (replace /path/to/halide with your Halide installation directory):
+```console
+export DYLD_LIBRARY_PATH=/path/to/halide/lib/libHalide.19.dylib
+g++ -std=c++17 camera-capture.cpp -o camera-capture \
+ -I/path/to/halide/include -L/path/to/halide/lib -lHalide \
+ $(pkg-config --cflags --libs opencv4) -lpthread -ldl \
+ -Wl,-rpath,/path/to/halide/lib
+```
+
+Then execute the binary:
+```console
+./blur_android blur_threshold_android
+```
+
+This will produce two files:
+* blur_threshold_android.a: The static library containing your Halide pipeline.
+* blur_threshold_android.h: The header file needed to invoke the generated pipeline.
+
+We will integrate these files into our Android project in the following section.
+
+## Summary
+In this section, we’ve explored Halide’s powerful ahead-of-time (AOT) and cross-compilation capabilities, preparing an optimized image processing pipeline tailored specifically for Android devices. By using the host-based Halide compiler, we’ve generated a static library optimized for ARM64 Android architecture, incorporating safe boundary conditions, neighborhood-based blurring, and thresholding operations. This streamlined process allows seamless integration of highly optimized native code into Android applications, ensuring both development efficiency and runtime performance on mobile platforms.
\ No newline at end of file
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md b/content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
new file mode 100644
index 000000000..cb75b70da
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md
@@ -0,0 +1,199 @@
+---
+# User change
+title: "Demonstrating Operation Fusion"
+
+weight: 4
+
+layout: "learningpathall"
+---
+
+## Objective
+In this section, you’ll learn about operation fusion, a powerful performance optimization technique offered by Halide. We’ll explore how combining multiple processing stages into a single fused operation reduces memory usage, decreases scheduling overhead, and significantly enhances performance. You’ll see when and why to apply operation fusion, analyze the performance of a baseline pipeline, identify bottlenecks, and then leverage Halide’s scheduling constructs like compute_at, store_at, and fuse to optimize and accelerate your image-processing applications.
+
+## What is Operation Fusion?
+Operation fusion (also known as operator fusion or kernel fusion) is a technique used in high-performance computing, especially in image and signal processing pipelines, where multiple computational steps (operations) are combined into a single processing stage. Instead of computing and storing intermediate results separately, fused operations perform calculations in one continuous pass, reducing redundant memory operations and improving efficiency.
+
+## How Fusion Reduces Memory Bandwidth and Scheduling Overhead
+Every individual stage in a processing pipeline typically reads input data, computes intermediate results, writes these results back to memory, and then the next stage again reads this intermediate data. This repeated read-write cycle introduces significant overhead, particularly in memory-intensive applications like image processing. Operation fusion dramatically reduces this overhead by:
+1. Reducing memory accesses. Intermediate results stay in CPU registers or caches rather than being repeatedly written to and read from main memory.
+2. Improving cache utilization. Data is accessed in a contiguous manner, improving CPU cache efficiency.
+3. Reducing scheduling overhead. By executing multiple operations in a single pass, scheduling complexity and overhead are minimized.
+
+## When to Use Operation Fusion
+Operation fusion is most beneficial in scenarios that involve multiple sequential operations or transformations performed on large datasets, particularly when these intermediate results are large or costly to recompute. Typical situations include:
+* Image filtering pipelines (blur, sharpen, threshold sequences)
+* Color transformations followed by thresholding or other pixel-wise operations
+* Complex signal processing operations involving repeated data transformations
+
+Operation fusion is less beneficial (or even detrimental) if intermediate results are frequently reused across multiple subsequent stages or if fusing operations complicates parallelism or vectorization opportunities.
+
+## Typical Scenarios and Performance-Critical Pipelines
+Some performance-critical pipelines where fusion is especially beneficial include:
+* Real-time video processing (e.g., streaming transformations)
+* Computational photography applications (HDR blending, tone mapping)
+* Computer vision tasks (feature extraction followed by thresholding)
+
+## Baseline Pipeline Analysis
+To demonstrate the benefit of operation fusion, we’ll revisit the Gaussian blur and threshold pipeline from the previous lesson and then apply fusion.
+
+### Review of the Non-Fused Pipeline
+Recall the pipeline we created previously:
+1. Gaussian Blur. Smoothes the image using a convolution kernel.
+2. Thresholding. Converts the blurred image to a binary image based on pixel intensities.
+
+In the non-fused version, these stages are separately realized, meaning intermediate blurred results are computed and stored in memory before thresholding.
+
+```cpp
+Halide::Func blur("blur");
+// blur definition here
+Halide::Buffer blurBuffer = blur.realize({ width, height });
+
+Halide::Func thresholded("thresholded");
+thresholded(x, y) = Halide::cast(Halide::select(blurBuffer(x, y) > 128, 255, 0));
+
+Halide::Buffer outputBuffer = thresholded.realize({ width, height });
+```
+
+### Performance Profiling to Identify Bottlenecks
+The primary bottleneck in the non-fused pipeline is memory access:
+1. Intermediate results (blurBuffer) are written to and read from memory.
+2. This results in additional latency, reduced cache efficiency, and unnecessary memory bandwidth usage.
+
+Profiling this pipeline with tools like Halide’s built-in profiler typically shows memory bandwidth as a major limiting factor.
+
+## Applying Operation Fusion in Halide
+To apply operation fusion, Halide provides powerful scheduling constructs that allow you to precisely control when and where operations are computed and stored.
+
+### Scheduling Techniques
+The three primary Halide scheduling methods to enable fusion are:
+1. compute_at - compute the values of one Func at the iteration point of another.
+2. store_at - store intermediate results at a particular loop iteration or stage to minimize memory footprint.
+3. fuse - merge loop variables of two dimensions into one, improving loop efficiency.
+
+### Using constructs like compute_at, store_at, and fuse
+Let’s fuse the blur and thresholding stages using compute_at to ensure both operations execute together, eliminating intermediate storage. To do so, create a new file camera-capture-fusion.cpp, and modify it as follows:
+
+```cpp
+#include "Halide.h"
+#include
+#include
+#include
+
+using namespace cv;
+using namespace std;
+
+// This function clamps the coordinate (coord) within [0, maxCoord - 1].
+static inline Halide::Expr clampCoord(Halide::Expr coord, int maxCoord) {
+ return Halide::clamp(coord, 0, maxCoord - 1);
+}
+
+int main() {
+ // Open the default camera with OpenCV.
+ VideoCapture cap(0);
+ if (!cap.isOpened()) {
+ cerr << "Error: Unable to open camera." << endl;
+ return -1;
+ }
+
+ while (true) {
+ // Capture a frame from the camera.
+ Mat frame;
+ cap >> frame;
+ if (frame.empty()) {
+ cerr << "Error: Received empty frame." << endl;
+ break;
+ }
+
+ // Convert the frame to grayscale.
+ Mat gray;
+ cvtColor(frame, gray, COLOR_BGR2GRAY);
+
+ // Ensure the grayscale image is continuous in memory.
+ if (!gray.isContinuous()) {
+ gray = gray.clone();
+ }
+
+ int width = gray.cols;
+ int height = gray.rows;
+
+ // Create a simple 2D Halide buffer from the grayscale Mat.
+ Halide::Buffer inputBuffer(gray.data, width, height);
+
+ // Create a Halide ImageParam for a 2D UInt(8) image.
+ Halide::ImageParam input(Halide::UInt(8), 2, "input");
+ input.set(inputBuffer);
+
+ // Define variables for x (width) and y (height).
+ Halide::Var x("x"), y("y");
+
+ // Define a function that applies a 3x3 Gaussian blur.
+ Halide::Func blur("blur");
+ Halide::RDom r(0, 3, 0, 3);
+
+ // Kernel layout: [1 2 1; 2 4 2; 1 2 1], sum = 16.
+ Halide::Expr weight = Halide::select(
+ (r.x == 1 && r.y == 1), 4,
+ (r.x == 1 || r.y == 1), 2,
+ 1
+ );
+
+ Halide::Expr offsetX = x + (r.x - 1);
+ Halide::Expr offsetY = y + (r.y - 1);
+
+ // Manually clamp offsets to avoid out-of-bounds.
+ Halide::Expr clampedX = clampCoord(offsetX, width);
+ Halide::Expr clampedY = clampCoord(offsetY, height);
+
+ // Accumulate weighted sum in 32-bit int before normalization.
+ Halide::Expr val = Halide::cast(input(clampedX, clampedY)) * weight;
+
+ blur(x, y) = Halide::cast(Halide::sum(val) / 16);
+
+ // Add a thresholding stage on top of the blurred result.
+ // If blur(x,y) > 128 => 255, else 0
+ Halide::Func thresholded("thresholded");
+ thresholded(x, y) = Halide::cast(
+ Halide::select(blur(x, y) > 128, 255, 0)
+ );
+
+ // Apply fusion scheduling
+ blur.compute_at(thresholded, x);
+ thresholded.parallel(y);
+
+ // Realize the thresholded function. Wrap in try-catch for error reporting.
+ Halide::Buffer outputBuffer;
+ try {
+ outputBuffer = thresholded.realize({ width, height });
+ } catch (const std::exception &e) {
+ cerr << "Halide pipeline error: " << e.what() << endl;
+ break;
+ }
+
+ // Wrap the Halide output in an OpenCV Mat and display.
+ Mat blurredThresholded(height, width, CV_8UC1, outputBuffer.data());
+
+ imshow("Processed image", blurredThresholded);
+
+ // Exit the loop if a key is pressed.
+ if (waitKey(30) >= 0) {
+ break;
+ }
+ }
+
+ cap.release();
+ destroyAllWindows();
+ return 0;
+}
+```
+
+We modified the code such that we used blur.compute_at(thresholded, x). This instructs Halide to compute the blur function at each iteration of the thresholding function’s inner loop (x). This means the blurred value is computed just before thresholding, avoiding any intermediate buffer storage.
+
+Then, the thresholded.parallel(y) parallelizes the outer loop across multiple CPU cores, further accelerating execution.
+
+By using this fused schedule, we achieve:
+* Reduced memory usage (no intermediate storage).
+* Improved cache efficiency.
+* Reduced overall execution time.
+
+## Summary
+In this lesson, we learned about operation fusion in Halide, a powerful technique to reduce memory bandwidth and improve computational efficiency. We explored why fusion matters, identified scenarios where fusion is most effective, and demonstrated how Halide’s scheduling constructs (compute_at, store_at, fuse) enable you to apply fusion easily and effectively. By fusing the Gaussian blur and thresholding stages, we improved the performance of our real-time image processing pipeline.
\ No newline at end of file
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md b/content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
new file mode 100644
index 000000000..b37677dc1
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/android_halide/intro.md
@@ -0,0 +1,165 @@
+---
+# User change
+title: "Introduction, Background, and Installation"
+
+weight: 2
+
+layout: "learningpathall"
+---
+
+## Introduction
+Halide is a powerful, open-source programming language specifically designed to simplify and optimize high-performance image and signal processing pipelines. Initially developed by researchers at MIT and Adobe in 2012, Halide addresses a critical challenge in computational imaging: efficiently mapping image-processing algorithms onto diverse hardware architectures without extensive manual tuning. It accomplishes this by clearly separating the description of an algorithm (defining what computations are performed) from its schedule (detailing how and where those computations execute). This design enables rapid experimentation and effective optimization for various processing platforms, including CPUs, GPUs, and mobile hardware.
+
+A key advantage of Halide lies in its innovative programming model. By clearly distinguishing between algorithmic logic and scheduling decisions—such as parallelism, vectorization, memory management, and hardware-specific optimizations—developers can first focus on ensuring the correctness of their algorithms. Performance tuning can then be handled independently, significantly accelerating development cycles. This approach often yields performance that matches or even surpasses manually optimized code. As a result, Halide has seen widespread adoption across industry and academia, powering image processing systems at technology giants such as Google, Adobe, and Facebook, and enabling advanced computational photography features used by millions daily.
+
+In this learning path, you will explore Halide’s foundational concepts, set up your development environment, and create your first functional Halide application. By the conclusion, you will understand what makes Halide uniquely suited to efficient image processing and be ready to build your own optimized pipelines.
+
+The companion code for this Learning Path is available [here](https://github.com/dawidborycki/Arm.Halide.Hello-World.git) and [here](https://github.com/dawidborycki/Arm.Halide.AndroidDemo.git)
+
+## Key Concepts in Halide
+### Separation of Algorithm and Schedule
+At the core of Halide’s design philosophy is the principle of clearly separating algorithms from schedules. Traditional image-processing programming tightly couples algorithmic logic with execution strategy, complicating optimization and portability. In contrast, Halide explicitly distinguishes these two components:
+* Algorithm. Defines what computations are performed—for example, image filters, pixel transformations, or other mathematical operations on image data.
+* Schedule. Specifies how and where these computations are executed, addressing critical details such as parallel execution, memory usage, caching strategies, and hardware-specific optimizations.
+
+This separation allows developers to rapidly experiment and optimize their code for different hardware architectures or performance requirements without altering the core algorithmic logic.
+
+### Functions, Vars, and Pipelines
+Halide introduces several essential concepts to simplify image-processing pipelines:
+* Functions (Func). Represent discrete computational steps or operations applied across pixels in an image. By defining computations declaratively, functions simplify the description of complex image-processing algorithms.
+* Variables (Var). Symbolize spatial coordinates or dimensions of the data (e.g., horizontal position x, vertical position y, and channel c). Vars serve as symbolic indices to define how computations apply across image data.
+* Pipelines. Comprise interconnected Halide functions that collectively form a complete image-processing workflow. Pipelines clearly define data dependencies, facilitating structured and modular image-processing tasks.
+
+### Scheduling Strategies (Parallelism, Vectorization, Tiling)
+Halide offers several powerful scheduling strategies designed for maximum performance:
+* Parallelism. Executes computations concurrently across multiple CPU cores, significantly reducing execution time for large datasets.
+* Vectorization. Enables simultaneous processing of multiple data elements using SIMD (Single Instruction, Multiple Data) instructions available on CPUs and GPUs, greatly enhancing performance.
+* Tiling. Divides computations into smaller blocks (tiles) optimized for cache efficiency, thus improving memory locality and reducing overhead due to memory transfers.
+
+By combining these scheduling techniques, developers can achieve optimal performance tailored specifically to their target hardware architecture.
+
+## System Requirements and Environment Setup
+To start developing with Halide, your system must meet several requirements and dependencies.
+
+### Installation Options
+Halide can be set up using one of two main approaches:
+* Installing pre-built binaries - pre-built binaries are convenient, quick to install, and suitable for beginners or standard platforms (Windows, Linux, macOS).
+* Building from source - building Halide from source offers greater flexibility, allowing optimization for your specific hardware or operating system configuration.
+
+Here, we’ll use pre-built binaries:
+1. Visit the official Halide releases [page](https://github.com/halide/Halide/releases). As of this writing, the latest Halide version is v19.0.0.
+2. Download and unzip the binaries to a convenient location (e.g., /usr/local/halide on Linux/macOS or C:\halide on Windows).
+3. Optionally set environment variables to simplify further usage:
+```console
+export HALIDE_DIR=/path/to/halide
+export PATH=$HALIDE_DIR/bin:$PATH
+```
+
+Halide depends on the following key software packages:
+1. LLVM (required for efficient compilation and optimization):
+* Linux (Ubuntu):
+```console
+sudo apt-get install llvm-15-dev libclang-15-dev clang-15
+```
+* macOS (Homebrew):
+```console
+brew install llvm
+```
+2. OpenCV (for image handling in later lessons):
+* Linux (Ubuntu):
+```console
+sudo apt-get install libopencv-dev pkg-config
+```
+* macOS (Homebrew)::
+```console
+brew install opencv pkg-config
+```
+
+## Your First Halide Program
+Now you’re ready to build your first Halide-based application. Save the following as hello-world.cpp:
+```cpp
+#include "Halide.h"
+#include
+#include
+
+using namespace Halide;
+using namespace cv;
+
+int main() {
+ // Static path for the input image.
+ std::string imagePath = "img.png";
+
+ // Load the input image using OpenCV (BGR by default).
+ Mat input = imread(imagePath, IMREAD_COLOR);
+ if (input.empty()) {
+ std::cerr << "Error: Unable to load image from " << imagePath << std::endl;
+ return -1;
+ }
+
+ // Convert from BGR to RGB (Halide expects RGB order).
+ cvtColor(input, input, COLOR_BGR2RGB);
+
+ // Wrap the OpenCV Mat data in a Halide::Buffer.
+ // Dimensions: (width, height, channels)
+ Buffer inputBuffer(input.data, input.cols, input.rows, input.channels());
+
+ // Create an ImageParam for symbolic indexing.
+ ImageParam inputImage(UInt(8), 3);
+ inputImage.set(inputBuffer);
+
+ // Define a Halide pipeline that inverts the image.
+ Var x("x"), y("y"), c("c");
+ Func invert("invert");
+ invert(x, y, c) = 255 - inputImage(x, y, c);
+
+ // Schedule the pipeline so that the channel dimension is the innermost loop,
+ // ensuring that the output is interleaved.
+ invert.reorder(c, x, y);
+
+ // Realize the output buffer with the same dimensions as the input.
+ Buffer outputBuffer = invert.realize({input.cols, input.rows, input.channels()});
+
+ // Wrap the Halide output buffer directly into an OpenCV Mat header.
+ // This does not copy data; it creates a header that refers to the same memory.
+ Mat output(input.rows, input.cols, CV_8UC3, outputBuffer.data());
+
+ // Convert RGB back to BGR for proper display in OpenCV.
+ cvtColor(output, output, COLOR_RGB2BGR);
+
+ // Display the input and processed image.
+ imshow("Original Image", input);
+ imshow("Inverted Image", output);
+ waitKey(0); // Wait for a key press before closing the window.
+
+ return 0;
+}
+```
+
+This program demonstrates how to combine Halide’s image processing capabilities with OpenCV’s image I/O and display functionality. It begins by loading an image from disk using OpenCV, specifically reading from a static file named img.png (here we use a Cameraman image). Since OpenCV loads images in BGR format by default, the code immediately converts the image to RGB format so that it is compatible with Halide’s expectations.
+
+Once the image is loaded and converted, the program wraps the raw image data into a Halide buffer, which captures the image’s dimensions (width, height, and the number of color channels). An ImageParam is then created, allowing the Halide pipeline to use symbolic indexing with variables. The core of the pipeline is defined by a function called invert, which computes a new pixel value by subtracting the original value from 255. This operation effectively inverts the image’s colors. The scheduling directive invert.reorder(c, x, y) ensures that the output buffer is stored in an interleaved manner—making it compatible with the memory layout expected by OpenCV.
+
+After the pipeline processes the image, the output is realized into another Halide buffer. Instead of copying pixel data back and forth, the program directly wraps this buffer into an OpenCV Mat header. This efficient step avoids unnecessary data duplication. Finally, because OpenCV displays images in BGR format, the code converts the processed image back from RGB to BGR. The program then displays both the original and the inverted images in separate windows, waiting for a key press before exiting. This workflow showcases a streamlined integration between Halide for high-performance image processing and OpenCV for handling input and output operations.
+
+## Compilation Instructions
+Compile the program as follows (replace /path/to/halide accordingly):
+```console
+export DYLD_LIBRARY_PATH=/path/to/halide/lib/libHalide.19.dylib
+g++ -std=c++17 hello-world.cpp -o hello-world \
+ -I/path/to/halide/include -L/path/to/halide/lib -lHalide \
+ $(pkg-config --cflags --libs opencv4) -lpthread -ldl \
+ -Wl,-rpath,/path/to/halide/lib
+```
+
+Run the executable:
+```console
+./hello-world
+```
+
+You will see two windows displaying the original and inverted images:
+
+
+
+## Summary
+In this lesson, you’ve learned Halide’s foundational concepts, explored the benefits of separating algorithms and schedules, set up your development environment, and created your first functional Halide application integrated with OpenCV.
+
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_halide/processing-workflow.md b/content/learning-paths/mobile-graphics-and-gaming/android_halide/processing-workflow.md
new file mode 100644
index 000000000..8a25cb62b
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/android_halide/processing-workflow.md
@@ -0,0 +1,170 @@
+---
+# User change
+title: "Building a Simple Camera Image Processing Workflow"
+
+weight: 3
+
+layout: "learningpathall"
+---
+
+## Objective
+In this section, we’ll build a real-time camera processing pipeline using Halide. We’ll start by capturing video frames from a webcam using OpenCV, then implement a Gaussian blur to smooth the captured images, followed by thresholding to create a clear binary output highlighting the most prominent image features. After establishing this pipeline, we’ll explore how to optimize performance further by applying Halide’s tiling strategy, a technique for enhancing cache efficiency and execution speed, particularly beneficial for high-resolution or real-time applications
+
+## Gaussian Blur And Thresholding
+Create a new camera-capture.cpp file and modify it as follows:
+```cpp
+#include "Halide.h"
+#include
+#include
+#include
+
+using namespace cv;
+using namespace std;
+
+// This function clamps the coordinate (coord) within [0, maxCoord - 1].
+static inline Halide::Expr clampCoord(Halide::Expr coord, int maxCoord) {
+ return Halide::clamp(coord, 0, maxCoord - 1);
+}
+
+int main() {
+ // Open the default camera with OpenCV.
+ VideoCapture cap(0);
+ if (!cap.isOpened()) {
+ cerr << "Error: Unable to open camera." << endl;
+ return -1;
+ }
+
+ while (true) {
+ // Capture a frame from the camera.
+ Mat frame;
+ cap >> frame;
+ if (frame.empty()) {
+ cerr << "Error: Received empty frame." << endl;
+ break;
+ }
+
+ // Convert the frame to grayscale.
+ Mat gray;
+ cvtColor(frame, gray, COLOR_BGR2GRAY);
+
+ // Ensure the grayscale image is continuous in memory.
+ if (!gray.isContinuous()) {
+ gray = gray.clone();
+ }
+
+ int width = gray.cols;
+ int height = gray.rows;
+
+ // Create a simple 2D Halide buffer from the grayscale Mat.
+ Halide::Buffer inputBuffer(gray.data, width, height);
+
+ // Create a Halide ImageParam for a 2D UInt(8) image.
+ Halide::ImageParam input(Halide::UInt(8), 2, "input");
+ input.set(inputBuffer);
+
+ // Define variables for x (width) and y (height).
+ Halide::Var x("x"), y("y");
+
+ // Define a function that applies a 3x3 Gaussian blur.
+ Halide::Func blur("blur");
+ Halide::RDom r(0, 3, 0, 3);
+
+ // Kernel layout: [1 2 1; 2 4 2; 1 2 1], sum = 16.
+ Halide::Expr weight = Halide::select(
+ (r.x == 1 && r.y == 1), 4,
+ (r.x == 1 || r.y == 1), 2,
+ 1
+ );
+
+ Halide::Expr offsetX = x + (r.x - 1);
+ Halide::Expr offsetY = y + (r.y - 1);
+
+ // Manually clamp offsets to avoid out-of-bounds.
+ Halide::Expr clampedX = clampCoord(offsetX, width);
+ Halide::Expr clampedY = clampCoord(offsetY, height);
+
+ // Accumulate weighted sum in 32-bit int before normalization.
+ Halide::Expr val = Halide::cast(input(clampedX, clampedY)) * weight;
+
+ blur(x, y) = Halide::cast(Halide::sum(val) / 16);
+
+ // Add a thresholding stage on top of the blurred result.
+ // If blur(x,y) > 128 => 255, else 0
+ Halide::Func thresholded("thresholded");
+ thresholded(x, y) = Halide::cast(
+ Halide::select(blur(x, y) > 128, 255, 0)
+ );
+
+ // Realize the thresholded function. Wrap in try-catch for error reporting.
+ Halide::Buffer outputBuffer;
+ try {
+ outputBuffer = thresholded.realize({ width, height });
+ } catch (const std::exception &e) {
+ cerr << "Halide pipeline error: " << e.what() << endl;
+ break;
+ }
+
+ // Wrap the Halide output in an OpenCV Mat and display.
+ Mat blurredThresholded(height, width, CV_8UC1, outputBuffer.data());
+
+ imshow("Processed image", blurredThresholded);
+
+ // Exit the loop if a key is pressed.
+ if (waitKey(30) >= 0) {
+ break;
+ }
+ }
+
+ cap.release();
+ destroyAllWindows();
+ return 0;
+}
+```
+
+This code demonstrates a simple real-time image processing pipeline using Halide and OpenCV. Initially, it opens the computer’s default camera to continuously capture video frames. Each captured frame, originally in color, is converted into a single-channel grayscale image using OpenCV.
+
+The grayscale image data is then passed to Halide via a buffer to perform computations. Within Halide, the program implements a Gaussian blur using a 3x3 convolution kernel with weights specifically chosen to smooth the image (weights: [1 2 1; 2 4 2; 1 2 1]). To safely handle pixels near image borders, the coordinates are manually clamped, ensuring all pixel accesses remain valid within the image dimensions.
+
+After the Gaussian blur stage, a thresholding operation is applied. This step converts the blurred grayscale image into a binary image, assigning a value of 255 to pixels with intensity greater than 128 and 0 otherwise, thus highlighting prominent features against the background.
+
+Finally, the processed image is returned from Halide to an OpenCV matrix and displayed in real-time. The loop continues until a key is pressed, providing a smooth, interactive demonstration of Halide’s ability to accelerate and streamline real-time image processing tasks.
+
+## Compilation instructions
+Compile the program as follows (replace /path/to/halide accordingly):
+```console
+g++ -std=c++17 camera-capture.cpp -o camera-capture \
+ -I/path/to/halide/include -L/path/to/halide/lib -lHalide \
+ $(pkg-config --cflags --libs opencv4) -lpthread -ldl \
+ -Wl,-rpath,/path/to/halide/lib
+```
+
+Run the executable:
+```console
+./camera-capture
+```
+
+The output should look as in the figure below:
+
+
+## Tiling
+Tiling is a powerful scheduling optimization provided by Halide, allowing image computations to be executed efficiently by dividing the workload into smaller, cache-friendly blocks called tiles. By processing these smaller regions individually, we significantly improve data locality, reduce memory bandwidth usage, and better leverage CPU caches, ultimately boosting performance for real-time applications.
+
+We can easily extend our Gaussian blur and thresholding pipeline to leverage Halide’s built-in tiling capabilities. Let’s apply a simple tiling schedule to our existing pipeline. Replace the code segment immediately after defining the thresholded function with the following:
+
+```cpp
+// Apply a simple tiling schedule
+Halide::Var x_outer, y_outer, x_inner, y_inner;
+thresholded.tile(x, y, x_outer, y_outer, x_inner, y_inner, 64, 64)
+ .parallel(y_outer);
+```
+
+The .tile(x, y, x_outer, y_outer, x_inner, y_inner, 64, 64) statement divides the image into 64×64 pixel tiles, significantly improving cache locality. While the parallel(y_outer) executes each horizontal row of tiles in parallel across available CPU cores, boosting execution speed.
+
+Recompile your application as before, then run:
+```console
+./camera-capture
+```
+
+## Summary
+In this section, we built a complete real-time image processing pipeline using Halide and OpenCV. Initially, we captured live video frames and applied Gaussian blur and thresholding to highlight image features clearly. By incorporating Halide’s tiling optimization, we also improved performance by enhancing cache efficiency and parallelizing computation. Through these steps, we demonstrated Halide’s capability to provide both concise, clear code and high performance, making it an ideal framework for demanding real-time image processing tasks.
+