Skip to content

Commit

Permalink
Merge pull request #4655 from yamazakimitsufumi/update_2d_thread_dist…
Browse files Browse the repository at this point in the history
…ribution

Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance
  • Loading branch information
martin-frbg committed May 14, 2024
2 parents b45a78c + 51ab190 commit 6ca9ffa
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions driver/level3/level3_thread.c
Original file line number Diff line number Diff line change
Expand Up @@ -826,6 +826,16 @@ int CNAME(blas_arg_t *args, BLASLONG *range_m, BLASLONG *range_n, IFLOAT *sa, IF
if (nthreads_m * nthreads_n > args -> nthreads) {
nthreads_n = blas_quickdivide(args -> nthreads, nthreads_m);
}
/* The nthreads_m and nthreads_n are adjusted so that the submatrix */
/* to be handled by each thread preferably becomes a square matrix */
/* by minimizing an objective function 'n * nthreads_m + m * nthreads_n'. */
/* Objective function come from sum of partitions in m and n. */
/* (n / nthreads_n) + (m / nthreads_m) */
/* = (n * nthreads_m + m * nthreads_n) / (nthreads_n * nthreads_m) */
while (nthreads_m % 2 == 0 && n * nthreads_m + m * nthreads_n > n * (nthreads_m / 2) + m * (nthreads_n * 2)) {
nthreads_m /= 2;
nthreads_n *= 2;
}
}

/* Execute serial or parallel computation */
Expand Down

0 comments on commit 6ca9ffa

Please sign in to comment.