Skip to content

Commit

Permalink
Added 'sifive_x280' subconfig, kernel set. (#737)
Browse files Browse the repository at this point in the history
Details:
- Added a new 'sifive_x280' subconfiguration for SiFive's x280 RISC-V
  instruction set architecture. The subconfig registers kernels from a
  correspondingly new kernel set, also named 'sifive_x280'.
- Added the aforementioned kernel set, which includes intrinsics- and
  assembly-based implementations of most level-1v kernels along with
  level-1f kernels axpy2v dotaxpyv, packm kernels, and level-3 gemm,
  gemmtrsm_l, and gemmtrsm_u microkernels (plus supporting files).
- Registered the 'sifive_x280' subconfig as belonging to a singleton
  family by the same name.
- Added an entry to '.travis.yml' to test the new subconfig via qemu.
- Updates to 'travis/do_riscv.sh' script to support the 'sifive_x280'
  subconfig and to reflect updated tarball names.
- Special thanks to Lee Killough, Devin Matthews, and Angelika Schwarz
  for their engagement on this commit.
- (cherry picked from commit 05388dd)

Fixed HPX barrier synchronization (#783)

Details:
- Fixed hpx barrier synchronization. HPX was hanging on larger cores
  because blis was using non-hpx synchronization primitives. But when
  using hpx-runtime only hpx-synchronization primitives should be used.
  Hence, a C style wrapper hpx_barrier_t is introduced to perform hpx
  barrier operations.
- Replaced hpx::for_loop with hpx::futures. Using hpx::for_loop with
  hpx::barrier on n_threads greater than actual hardware thread count
  causes synchronization issues making hpx hanging. This can be avoided
  by using hpx::futures, which are relatively very lightweight, robust
  and scalable.
- (cherry picked from 7a87e57)

Fixed bug in sup threshold registration. (#782)

Details:
- Fixed a bug that resulted in BLIS non-deterministically calling the
  gemmsup handler, irrespective of the thresholds that are registered
  via bli_cntx_set_blkszs().
- Deep dive: In bli_cntx_init_ref.c, the default values for the gemmsup
  thresholds (BLIS_[MNK]T blocksizes) wre being set to zero so that no
  operation ever matched the criteria for gemmsup (unless specific sup
  thresholds are registered). HOWEVER, these thresholds are set via
  bli_cntx_set_blkszs() which calls bli_blksz_copy_if_pos(), which was
  only coping the thresholds into the gks' cntx_t if the values were
  strictly positive. Thus, the zero values passed into
  bli_cntx_set_blkszs() were being ignored and those threshold slots
  within the gks were left uninitialized. The upshot of this is that the
  reference gemmsup handler was being called for gemm problems
  essentially at random (and as it turns out, very rarely the reference
  gemmsup implementation would encounter a divide-by-zero error).
- The problem was fixed by changing bli_blksz_copy_if_pos() so that it
  copies values that are non-negative (values >= 0 instead of > 0). The
  function was also renamed to bli_blksz_copy_if_nonneg()
- Also needed to standardize use of -1 as the sole value to embed into
  blksz_t structs as a signal to bli_cntx_set_blkszs() to *not* register
  a value for that slot (and instead let whatever existing values
  remain). This required updates to the bli_cntx_init_*() functions for
  bgq, cortexa9, knc, penryn, power7, and template subconfigs, as some
  of these codes were using 0 instead of -1.
- Fixes #781. Thanks to Devin Matthews for identifying, diagnosing, and
  proposing a fix for this issue.
- (cherry picked from 8fff1e3)

Update zen3 subconfig to support NVHPC compilers. (#779)

Details:
- Parse $(CC_VENDOR) values of "nvc" in 'zen3' make_defs.mk file.
- Minor refactor to accommodate above edit.
- CREDITS file update.
- (cherry picked from 1e264a4)

Fixed brokenness when sba is disabled. (#777)

Details:
- Previously, disabling the sba via --disable-sba-pools resulted in a
  segfault due to a sanity-check-triggering abort(). The problem was
  that the sba, as currently used in the l3 thread decorators, did not
  yet (fully) support pools being disabled. The solution entailed
  creating wrapper function, bli_sba_array_elem(), which either calls
  bli_apool_array_elem() (when sba pools are enabled at configure time)
  or returns a NULL sba_pool pointer (when sba pools are disabled), and
  calling bli_sba_array_elem() in place of bli_apool_array_elem(). Note
  that the NULL pointer returned by bli_sba_array_elem() when the sba
  pools are disabled does no harm since in that situation the pointer
  goes unreferenced when acquiring and releasing small blocks. Thanks to
  John Mather for reporting this bug.
- Guarded the bodies of bli_sba_init() and bli_sba_finalize() with
  #ifdef BLIS_ENABLE_SBA_POOLS. I don't think this was actually necessary
  to fix the aforementioned bug, but it seems like good practice.
- Moved the code in bli_l3_thrinfo_create() that checked that the array*
  pointer is non-NULL before calling bli_sba_array_elem() (previously
  bli_apool_array_elem()) into the definition of bli_sba_array_elem().
- Renamed various instances of 'pool' variables and function parameters
  to 'sba_pool' to emphasize what kind of pool it represents.
- Whitespace changes.
- (cherry picked from c2099ed)

Implemented [cz]symv_(), [cz]syr_(), [cz]rot_(). (#778)

Details:
- Expanded existing BLAS compatibility APIs to provide interfaces to
  [cz]symv_(), [cz]syr_(). This was easy since those operations were
  already implemented natively in BLIS; the APIs were previously
  omitted only because they were not formally part of the BLAS.
- Implemented [cz]rot_() by feeding code from LAPACK 3.11 through
  f2c.
- Thanks to James Foster for pointing out that LAPACK contains these
  additional symbols, which prompted these additions, as well as for
  testing the [cz]rot_() functions from Julia's test infrastructure.
- CREDITS file update.
- (cherry picked from 37ca4fd)

Fixes to HPC runtime code path. (#773)

Details:
- Fixed hpx::for_each invocation and replace with hpx::for_loop. The HPX
  runtime was initialized using hpx::start, but the hpx::for_each
  function was being called on a non-hpx runtime (i.e standard BLIS
  runtime - single main thread). To run hpx::for_each on HPX runtime
  correctly, the code now uses hpx::run_as_hpx_thread(func, args...).
- Replaced hpx::for_each with hpx::for_loop, which eliminates use of
  hpx::util::counting_iterator.
- Employ hpx::execution::chunk_size(1) to make sure that a thread
  resides on a particular core.
- Replaced hpx::apply() with updated version hpx::post().
- Initialize tdata->id = 0 in libblis.c to 0, as it is the main thread
  and is needed for writing results to output file.
- By default, if not specified, the HPX runtime uses all N threads/cores
  available in the system. But, if we want to only specify n_threads out
  N threads, we use hpx::execution::experimental::num_cores(n_threads).
- (cherry picked from a4a6329)

Fixed broken link in Multithreading.md. (#774)

Details:
- Replaced 404'd link in docs/Multithreading.md with an archive from
   The Wayback Machine.
- CREDITS file update.
- (cherry picked from c6546c1)
  • Loading branch information
fgvanzee committed May 23, 2024
1 parent 961e998 commit 097ca4e
Show file tree
Hide file tree
Showing 98 changed files with 18,794 additions and 160 deletions.
11 changes: 11 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,11 @@ matrix:
env: OOT=0 TEST=FAST SDE=0 THR="none" BLD="--disable-shared" CONF="rv32iv" \
CC=riscv32-unknown-linux-gnu-gcc \
LDFLAGS=-static
- os: linux
compiler: clang
env: OOT=0 TEST=FAST SDE=0 THR="none" BLD="--disable-shared" CONF="sifive_x280" \
CC=clang \
LDFLAGS=-static
install:
- if [ "$CC" = "gcc" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then export CC="gcc-9"; fi
- if [ -n "$PACKAGES" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo apt-get install -y $PACKAGES; fi
Expand All @@ -106,6 +111,12 @@ script:
export CXX=$DIST_PATH/../toolchain/riscv/bin/riscv32-unknown-linux-gnu-g++;
export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv32 -cpu rv32,vext_spec=v1.0,v=true,vlen=128 -B 0x100000";
fi
- if [ "$CONF" = "sifive_x280" ]; then
$DIST_PATH/travis/do_riscv.sh "$CONF";
export CC=$DIST_PATH/../toolchain/riscv/bin/clang;
export CXX=$DIST_PATH/../toolchain/riscv/bin/clang++;
export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv64 -cpu rv64,vext_spec=v1.0,v=true,vlen=512 -B 0x100000";
fi
- $DIST_PATH/configure -p `pwd`/../install -t $THR $BLD CC=$CC $CONF
- pwd
- ls -l
Expand Down
8 changes: 7 additions & 1 deletion CREDITS
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@ but many others have contributed code, ideas, and feedback, including
Alex Arslan @ararslan
Vernon Austel (IBM, T.J. Watson Research Center)
Mohsen Aznaveh @Aznaveh (Texas A&M University)
Abhishek Bagusetty @abagusetty (Argonne National Laboratory)
Satish Balay @balay (Argonne National Laboratory)
Kihiro Bando @bandokihiro
Matthew Brett @matthew-brett (University of Birmingham)
Jérémie du Boisberranger @jeremiedbb
Jed Brown @jedbrown (Argonne National Laboratory)
Alex Chiang @alexsifivetw (SiFive)
Robin Christ @robinchrist
Dilyn Corner @dilyn-corner
Mat Cross @matcross (NAG)
Expand All @@ -37,12 +39,14 @@ but many others have contributed code, ideas, and feedback, including
Victor Eijkhout @VictorEijkhout (Texas Advanced Computing Center)
Evgeny Epifanovsky @epifanovsky (Q-Chem)
Isuru Fernando @isuruf
James Foster @jd-foster (CSIRO)
Roman Gareev @gareevroman
Richard Goldschmidt @SuperFluffy
Chris Goodyer
Alexander Grund @Flamefire
John Gunnels @jagunnels (IBM, T.J. Watson Research Center)
Ali Emre Gülcü @Lephar
@h-vetinari
Jeff Hammond @jeffhammond (Intel)
Jacob Gorm Hansen @jacobgorm
Shivaprashanth H (Global Edge)
Expand All @@ -52,7 +56,9 @@ but many others have contributed code, ideas, and feedback, including
Minh Quan Ho @hominhquan
Matthew Honnibal @honnibal
Stefan Husmann @stefanhusmann
Aaron Hutchinson @Aaron-Hutchinson (SiFive)
Francisco Igual @figual (Universidad Complutense de Madrid)
John Mather @jmather-sesi (SideFX Software)
Madeesh Kannan @shadeMe
Tony Kelman @tkelman
Lee Killough @leekillough (Tactical Computing Labs)
Expand Down Expand Up @@ -125,12 +131,12 @@ but many others have contributed code, ideas, and feedback, including
Meghana Vankadari @Meghana-vankadari (AMD)
Kiran Varaganti @kvaragan (AMD)
Natalia Vassilieva (Hewlett Packard Enterprise)
@h-vetinari
Andrew Wildman @awild82 (University of Washington)
Zhang Xianyi @xianyi (Chinese Academy of Sciences)
Benda Xu @heroxbd
Guodong Xu @docularxu (Linaro.org)
RuQing Xu @xrq-phys (The University of Tokyo)
Srinivas Yadav @srinivasyadav18
Costas Yamin @cosstas
Chenhan Yu @ChenhanYu (The University of Texas at Austin)
Roman Yurchak @rth (Symerio)
Expand Down
10 changes: 5 additions & 5 deletions config/bgq/bli_cntx_init_bgq.c
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,11 @@ void bli_cntx_init_bgq( cntx_t* cntx )

// Initialize level-3 blocksize objects with architecture-specific values.
// s d c z
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 0, 8, 0, 4 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], 0, 8, 0, 4 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 0, 1024, 0, 768 );
bli_blksz_init_easy( &blkszs[ BLIS_KC ], 0, 2048, 0, 1536 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 0, 10240, 0, 10240 );
bli_blksz_init_easy( &blkszs[ BLIS_MR ], -1, 8, -1, 4 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], -1, 8, -1, 4 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], -1, 1024, -1, 768 );
bli_blksz_init_easy( &blkszs[ BLIS_KC ], -1, 2048, -1, 1536 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], -1, 10240, -1, 10240 );

// Update the context with the current architecture's register and cache
// blocksizes (and multiples) for native execution.
Expand Down
10 changes: 5 additions & 5 deletions config/cortexa9/bli_cntx_init_cortexa9.c
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,11 @@ void bli_cntx_init_cortexa9( cntx_t* cntx )

// Initialize level-3 blocksize objects with architecture-specific values.
// s d c z
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 4, 4, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], 4, 4, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 432, 176, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_KC ], 352, 368, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 4096, 4096, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 4, 4, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], 4, 4, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 432, 176, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_KC ], 352, 368, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 4096, 4096, -1, -1 );

// Update the context with the current architecture's register and cache
// blocksizes (and multiples) for native execution.
Expand Down
14 changes: 7 additions & 7 deletions config/knc/bli_cntx_init_knc.c
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,13 @@ void bli_cntx_init_knc( cntx_t* cntx )

// Initialize level-3 blocksize objects with architecture-specific values.
// s d c z
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 0, 30, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], 0, 8, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 0, 120, 0, 0,
0, 160, 0, 0 );
bli_blksz_init ( &blkszs[ BLIS_KC ], 0, 240, 0, 0,
0, 300, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 0, 14400, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_MR ], -1, 30, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], -1, 8, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], -1, 120, -1, -1,
-1, 160, -1, -1 );
bli_blksz_init ( &blkszs[ BLIS_KC ], -1, 240, -1, -1,
-1, 300, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], -1, 14400, -1, -1 );

// Update the context with the current architecture's register and cache
// blocksizes (and multiples) for native execution.
Expand Down
10 changes: 5 additions & 5 deletions config/penryn/bli_cntx_init_penryn.c
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,11 @@ void bli_cntx_init_penryn( cntx_t* cntx )

// Initialize level-3 blocksize objects with architecture-specific values.
// s d c z
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 8, 4, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], 4, 4, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 768, 384, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_KC ], 384, 384, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 4096, 4096, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 8, 4, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], 4, 4, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 768, 384, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_KC ], 384, 384, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 4096, 4096, -1, -1 );

// Update the context with the current architecture's register and cache
// blocksizes (and multiples) for native execution.
Expand Down
10 changes: 5 additions & 5 deletions config/power7/bli_cntx_init_power7.c
Original file line number Diff line number Diff line change
Expand Up @@ -67,11 +67,11 @@ void bli_cntx_init_power7( cntx_t* cntx )

// Initialize level-3 blocksize objects with architecture-specific values.
// s d c z
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 0, 8, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], 0, 4, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 0, 64, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_KC ], 0, 256, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 0, 4096, 0, 0 );
bli_blksz_init_easy( &blkszs[ BLIS_MR ], -1, 8, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], -1, 4, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_MC ], -1, 64, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_KC ], -1, 256, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NC ], -1, 4096, -1, -1 );

// Update the context with the current architecture's register and cache
// blocksizes (and multiples) for native execution.
Expand Down
Loading

0 comments on commit 097ca4e

Please sign in to comment.