Skip to content

Latest commit

 

History

History

PWR040

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

PWR040: Consider loop tiling to improve the locality of reference

Issue

Inefficient matrix access pattern detected that can be fixed through loop tiling.

Actions

Apply loop tiling to the loop nest.

Relevance

Inefficient memory access patterns and low locality of reference are the main reasons for low performance on modern computer systems. Matrices are stored in a row-major order in C and column-major order in Fortran. Iterating over them column-wise (in C) and row-wise (in Fortran) is inefficient, because it uses the memory subsystem suboptimally.

Nested loops that iterate over matrices in an inefficient manner can be optimized by applying loop tiling. In contrast to loop interchange, loop tiling doesn't remove the inefficient memory access, but instead breaks the problem into smaller subproblems. Smaller subproblems have a much better locality of reference and are faster to solve. Using loop tiling, the pressure on the memory subsystem due to inefficient matrix access is decreased which leads to improvement in program speed.

Note

The benefit of loop tiling directly depends on the size of the dataset. Large datasets profit from loop tiling a lot, in contrast to small datasets that don't profit that much.

Code example

C

The following code shows two nested loops. The matrix B is accessed column-wise, which is inefficient. Loop interchange doesn't help either, because fixing the inefficient memory access pattern for B would introduce an inefficient memory access pattern for A:

void example(double **A, double **B, int n) {
  for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
      A[i][j] = B[j][i];
    }
  }
}

After applying loop tiling, the locality of reference is improved and the performance is better. The tiled version of this loop nest is as follows:

for (int ii = 0; ii < n; ii += TILE_SIZE) {
  for (int jj = 0; jj < n; jj += TILE_SIZE) {
    for (int i = ii; i < MIN(ii + TILE_SIZE, n); i++) {
      for (int j = jj; j < MIN(jj + TILE_SIZE, n); j++) {
        A[i][j] = B[j][i];
      }
    }
  }
}

Fortran

The following code shows two nested loops. The matrix B is accessed row-wise, which is inefficient. Loop interchange doesn't help either, because fixing the inefficient memory access pattern for B would introduce an inefficient memory access pattern for A:

pure subroutine example(a, b)
  use iso_fortran_env, only: real32
  implicit none
  real(kind=real32), dimension(:, :), intent(out) :: a
  real(kind=real32), dimension(:, :), intent(in) :: b
  integer :: i, j

  do j = 1, size(a, 2)
    do i = 1, size(a, 1)
      a(i, j) = b(j, i)
    end do
  end do
end subroutine example

After applying loop tiling, the locality of reference is improved and the performance is better. The tiled version of this loop nest is as follows:

do jj = 1, size(a, 2), TILE_SIZE
  do ii = 1, size(a, 1), TILE_SIZE
    do j = jj, MIN(jj + TILE_SIZE, size(a, 2))
      do i = ii, MIN(ii + TILE_SIZE, size(a, 1))
        a(i, j) = b(j, i)
      end do
    end do
  end do
end do

Related resources

References