Skip to content

LPs on Halide #1814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dawidborycki
Copy link
Contributor

Before submitting a pull request for a new Learning Path, please review Create a Learning Path

  • I have reviewed Create a Learning Path

Please do not include any confidential information in your contribution. This includes confidential microarchitecture details and unannounced product information.

  • I have checked my contribution for confidential information

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the Creative Commons Attribution 4.0 International License.


who_is_this_for: This is an introductory topic for software developers interested in learning how to use Halide for image processing.

learning_objectives:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the language to answer the rendered question (Upon completion of this learning path, you will be able to...)

- Demonstrating Operation Fusion.
- Integrating Halide into an Android (Kotlin) Project

prerequisites:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding:

  • basic C++ knowledge
  • Android Studio with emulator
    Or remove the prerequisites part altogether.


### Tags
skilllevels: Introductory
subjects: Performance and Architecture

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add image processing, or this is chosen from a predefined list?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree and "computer vision" as well.

---

## Introduction
Halide is a powerful, open-source programming language specifically designed to simplify and optimize high-performance image and signal processing pipelines. Initially developed by researchers at MIT and Adobe in 2012, Halide addresses a critical challenge in computational imaging: efficiently mapping image-processing algorithms onto diverse hardware architectures without extensive manual tuning. It accomplishes this by clearly separating the description of an algorithm (defining what computations are performed) from its schedule (detailing how and where those computations execute). This design enables rapid experimentation and effective optimization for various processing platforms, including CPUs, GPUs, and mobile hardware.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It accomplishes this by clearly separating the description of an algorithm (defining what computations are performed) from its schedule (detailing how and where those computations execute).
Consider changing "defining what computations are performed" to something that doesn't imply scheduling (e.g. applied filters between the input and output images).


A key advantage of Halide lies in its innovative programming model. By clearly distinguishing between algorithmic logic and scheduling decisions—such as parallelism, vectorization, memory management, and hardware-specific optimizations—developers can first focus on ensuring the correctness of their algorithms. Performance tuning can then be handled independently, significantly accelerating development cycles. This approach often yields performance that matches or even surpasses manually optimized code. As a result, Halide has seen widespread adoption across industry and academia, powering image processing systems at technology giants such as Google, Adobe, and Facebook, and enabling advanced computational photography features used by millions daily.

In this learning path, you will explore Halide’s foundational concepts, set up your development environment, and create your first functional Halide application. By the conclusion, you will understand what makes Halide uniquely suited to efficient image processing and be ready to build your own optimized pipelines.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider changing the term "By the conclusion" to "when finished" or "by the end".

* CMakeLists.txt

Open CMakeLists.txt and modify it as follows (replace /path/to/halide with your Halide installation directory)::
Copy link

@balintelias balintelias Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the double colon.

@@ -14,6 +14,8 @@ A key advantage of Halide lies in its innovative programming model. By clearly d

In this learning path, you will explore Halide’s foundational concepts, set up your development environment, and create your first functional Halide application. By the conclusion, you will understand what makes Halide uniquely suited to efficient image processing and be ready to build your own optimized pipelines.

The companion code for this Learning Path is available [here](https://github.com/dawidborycki/Arm.Halide.Hello-World.git) and [here](https://github.com/dawidborycki/Arm.Halide.AndroidDemo.git)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the term "companion code" generally used in this meaning?

@@ -142,6 +144,7 @@ After the pipeline processes the image, the output is realized into another Hali
## Compilation Instructions
Compile the program as follows (replace /path/to/halide accordingly):
```console
export DYLD_LIBRARY_PATH=/path/to/halide/lib/libHalide.19.dylib

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this MacOS specific? Please show the Linux alternative as well.

@@ -0,0 +1,165 @@
---
# User change
title: "Introduction, Background, and Installation"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous "chapter" was also called Introduction.

@@ -29,23 +29,13 @@ tools_software_languages:

further_reading:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider directly linking the Halide tutorials for further reading.

Copy link

@stevesuzuki-arm stevesuzuki-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put my 1st round of feedback, only for intro, atm.


### Tags
skilllevels: Introductory
subjects: Performance and Architecture

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree and "computer vision" as well.


A key advantage of Halide lies in its innovative programming model. By clearly distinguishing between algorithmic logic and scheduling decisions—such as parallelism, vectorization, memory management, and hardware-specific optimizations—developers can first focus on ensuring the correctness of their algorithms. Performance tuning can then be handled independently, significantly accelerating development cycles. This approach often yields performance that matches or even surpasses manually optimized code. As a result, Halide has seen widespread adoption across industry and academia, powering image processing systems at technology giants such as Google, Adobe, and Facebook, and enabling advanced computational photography features used by millions daily.

In this learning path, you will explore Halide’s foundational concepts, set up your development environment, and create your first functional Halide application. By the conclusion, you will understand what makes Halide uniquely suited to efficient image processing and be ready to build your own optimized pipelines.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider mentioning that this LP contents covers Halide application targeting Android on Arm CPU, to differentiate from official tutorial


This separation allows developers to rapidly experiment and optimize their code for different hardware architectures or performance requirements without altering the core algorithmic logic.

### Functions, Vars, and Pipelines

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without any code snippet, it might be difficult for the reader to understand what it means?
Please consider either a, b, or c below:
a) Add some code
b) Remove this entire paragraph
c) Instead, mention that Halide is domain specific language which provides predefined operators as building block specialized for image processing pipeline

### Installation Options
Halide can be set up using one of two main approaches:
* Installing pre-built binaries - pre-built binaries are convenient, quick to install, and suitable for beginners or standard platforms (Windows, Linux, macOS).
* Building from source - building Halide from source offers greater flexibility, allowing optimization for your specific hardware or operating system configuration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I think customizing Halide compiler itself is for very advanced experts, not for our target readers.
Building from source is required if prebuilt binaries are unavailable for your environment or you want to use the latest Halide or LLVM under development

1. LLVM (required for efficient compilation and optimization):
* Linux (Ubuntu):
```console
sudo apt-get install llvm-15-dev libclang-15-dev clang-15

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, Halide defines specific supported version of LLVM and it supports 3 versions (e.g. 19 is primary target, then 18 and 20 are also supported). Please double check the latest status.

```

Halide depends on the following key software packages:
1. LLVM (required for efficient compilation and optimization):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"required for efficient compilation and optimization" might be redundant as LLVM is just essential (mandatory) dependency.

```console
brew install llvm
```
2. OpenCV (for image handling in later lessons):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mentions from "Halide depends on the following key software packages:", but I don't think Halide depends on OpenCV. Instead, OpenCV is required for this LP application.


// Wrap the OpenCV Mat data in a Halide::Buffer.
// Dimensions: (width, height, channels)
Buffer<uint8_t> inputBuffer(input.data, input.cols, input.rows, input.channels());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just before this line, is the memory layout of input CHW, or HWC order? Depending on that, Buffer::make_interleaved() might be appropriate. Please check the difference.

![img2](Figures/02.png)

## Summary
In this lesson, you’ve learned Halide’s foundational concepts, explored the benefits of separating algorithms and schedules, set up your development environment, and created your first functional Halide application integrated with OpenCV.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this moment, "benefit of separating algorithms and schedules" is not shown very much with the example above. I suppose it would be shown in the later example.


// Allocate a jbyteArray for the output.
jbyteArray outputArray = env->NewByteArray(width * height);
// Copy the data from Halide's output buffer to the jbyteArray.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider zero-copy way to reduce memory access.

// Run the processing on a background thread using coroutines.
CoroutineScope(Dispatchers.IO).launch {
// Convert Bitmap to grayscale byte array.
val grayBytes = extractGrayScaleBytes(bmp)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider integrating gray<->bmp conversion into Halide as we have more chance of benefitting from operator fusion

val processedBytes = blurThresholdImage(grayBytes, bmp.width, bmp.height)

// Convert processed bytes back to a Bitmap.
val processedBitmap = createBitmapFromGrayBytes(processedBytes, bmp.width, bmp.height)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider integrating gray<->bmp conversion into Halide as we have more chance of benefitting from operator fusion

## Objective
In this lesson, we’ll learn how to integrate a high-performance Halide image-processing pipeline into an Android application using Kotlin.

## Overview of Mobile Integration with Halide

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge two section "Overview of Mobile Integration with Halide" and "Benefits of Using Halide on Mobile" into one, focusing on how Halide helps some of challenges in image processing on mobile device?

```

This will produce:
* A static library (blur_threshold_android.a) containing the compiled pipeline.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this contain runtime functions of Halide for the specific target (i.e. arm-64-android), right? If true, I think it is worth noting.

);

// Apply fusion scheduling
blur.compute_at(thresholded, x);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As x is the variable of inner-most loop, I think this is almost equivalent to default scheduling (inlined) but with more redundant memory access (i.e. writing and reading to/from single size array).

```cpp
Halide::Func blur("blur");
// blur definition here
Halide::Buffer<uint8_t> blurBuffer = blur.realize({ width, height });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. When comparing with the scheduling where entire intermediate frame is written, we usually use compute_root.

### Scheduling Techniques
The three primary Halide scheduling methods to enable fusion are:
1. compute_at - compute the values of one Func at the iteration point of another.
2. store_at - store intermediate results at a particular loop iteration or stage to minimize memory footprint.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store_at is a bit advanced scheduling for more customization of how intermediates are stored. As this LP is introduction targeting beginner, we could remove the mention here.

Operation fusion is less beneficial (or even detrimental) if intermediate results are frequently reused across multiple subsequent stages or if fusing operations complicates parallelism or vectorization opportunities.

## Typical Scenarios and Performance-Critical Pipelines
Some performance-critical pipelines where fusion is especially beneficial include:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these listed items are too broad. Most of the keys are explained in the previous block, I think.


## When to Use Operation Fusion
Operation fusion is most beneficial in scenarios that involve multiple sequential operations or transformations performed on large datasets, particularly when these intermediate results are large or costly to recompute. Typical situations include:
* Image filtering pipelines (blur, sharpen, threshold sequences)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, blur and spatial filters are not immediately good for fusion. Depending on the filter and pipeline structure, it would be better with tiling as the fusion requires more compute amount to get the intermediate values on the fly where we compute the same result multiple times. So, there is a trade-offs of memory locality and compute redundancy. For example, in case the kernel size of the filter is large, or multiple sequence of small convolution (5 layers of 3x3).

What fusion improves the most is a element-wise operation where we do not refer to the value of neighbour pixels unlike spatial filters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants