Merge sifive_x280 with RISC-V base architecture support changes #3

leekillough · 2023-04-12T02:06:00Z

This merges flame#693, flame#735, flame#736, flame#738 and flame#740 into flame#737, and resolves merge conflicts, which are mostly caused by #if/#endif scopes for sifive_x280 and rv32/rv32i/rv64/rv64i overlapping.

* Move -fPIC insertion to subconfigs' make_defs.mk. Details: - Previously, common.mk was appending -fPIC to the CPICFLAGS variables set within the various subconfigurations' make_defs.mk files. This seemed somewhat unintuitive, and so now the -fPIC flag is assigned to the various subconfigs' CPICFLAGS variables in the respective make_defs.mk files. - This also commit changes the logic in common.mk so that instead of appending, the variable is overwritten, but now *only* in the case of Windows (since apparently -fPIC needs to be omitted there). Thanks to Nick Knight for catching and reporting this weirdness.

Details: - Added two PowerPoint files that contain slides depicting the classic Goto algorithm for matrix multiplication as well as its sister "panel-block" algorithm. These files reside in docs/diagrams.

Details: - Added PDF versions of the PowerPoint files added in 17cd260.

Details: - Added `output.testsuite` to .gitignore since it was previously not being matched by `output.testsuite.*`.

Details: - Implemented a new configure option, --disable-tls, which allows the user to optionally disable the use of thread-local storage qualifiers on static variables in BLIS. This option will rarely be needed, but in some situations may allow BLIS to compile when TLS is unavailable. Thanks to Nick Knight for suggesting this option. - Unlike the --disable-system option, --disable-tls does not forcibly disable threading. Instead, warnings of the possible consequences of using threading with TLS disabled are added to: - the output of './configure --help'; - the output of 'configure' the --disable-tls option is parsed; - the informational header output by the testsuite. Thanks to Minh Quan Ho for suggesting these warnings. - Modified frame/include/bli_lang_defs.h so that BLIS_THREAD_LOCAL is defined to nothing when BLIS_ENABLE_TLS is not defined. - Defined bli_info_get_enable_tls(), which returns whether the cpp macro BLIS_ENABLE_TLS was defined. - Edited --disable-system configure status output for clarity. - Whitespace updates.

Added whitespace and other formatting fixes

Restore changes from sifive-blis-private#28

@bartoldeman

Details: - Added attributions associated with commits: - 98d4678 9b1beec: @bartoldeman - 2b05948 059f151: @ct-clmsn - Reordered attirubtion for @decandia50.

Details: - There are four RISC-V base configurations: 'rv32i', 'rv32iv', 'rv64i', and 'rv64iv', namely the 32-bit and 64-bit implementations with and without the 'V' vector extension. Additional extensions such as 'M' (multiplication), 'A' (atomics), 'F' ('float' hardware support), 'D' ('double' hardware support), and 'C' (compressed-length instructions), are automatically used when available. If they are not available, then software equivalents (e.g., softfloat and -latomic) are used. - './configure auto' can be invoked on a RISC-V build platform, and will automatically detect RISC-V CPU extensions through the RISC-V C API: https://github.com/riscv-non-isa/riscv-c-api-doc/blob/master/riscv-c-api.md - The assembly kernels assume the presence of the vector extension RVV 1.0. - It is possible to build 'rv[32,64]iv' for any value of VLEN. However, if VLEN < 128, the targets will fall back to the generic kernels and blocksizes. - The vector microkernels are vector-length agnostic and work with every VLEN >=128, but are expected to work best with smaller vector lengths, i.e., VLEN <= 512. - The assembly kernels cover column major storage (rs_c == 1). - The blocksizes aim at being a good generic choice for out-of-order cores. They are not tuned to a specific RISC-V HPC core. - The vector kernels have been tested using vlen={128,256,512}. - The single- and double-precision assembly code routines for 'sgemm' and 'dgemm', or for 'cgemm' and 'zgemm', are combined in their RISC-V vector assembly source code, and are differentiated only with macros. - The XLEN=32 and XLEN=64 versions of the RISC-V assembly code are identical, except that callee-saved registers are saved and restored differently. There are RISC-V assembly code #include files for handling the saving and restoring of callee-saved registers, and they are future-proof if ever XLEN=128. - Multiplications, such as computing array strides and offsets, are performed in C, and later passed to the RISC-V assembly kernels. This is so that the compiler can determine whether the 'M' (multiply) extension is available and use multiplication instructions, or call library helper functions instead. - A new macro called bli_static_assert() has been added to perform static assertions at compile-time, regardless of the C/C++ dialect of the compiler. The original motivation of this was to ensure that calling RISC-V assembly kernels would not silently truncate arguments of type 'dim_t' or 'inc_t' (so-called "narrowing conversions"). - RISC-V CI tests have been added to Travis CI, using the riscv-gnu-toolchain cross-compiler, and qemu simulator. - Thanks to Lee Killough for collaborating on this commit.

Details: - PR flame#738 -- which moved -fPIC flag insertion responsibilities from common.mk to the subconfigs' individual make_defs.mk files -- was merged shortly before the introduction of new RISC-V subconfigs in flame#693. This commit brings those RISC-V subconfigs up to date with the new -fPIC conventions.

`FC` was used instead of `found_fc`.

Details: - This commit fixes issue flame#746, in which the _access() function (called from within blastest/f2c/open.c) is undeclared when compiling on Windows with clang 16.

Details: - Consolidated INSERT_GENTFUNC_* (and corresponding GENTPROT) macro sets using variadic macros (__VA_ARGS__), which means we no longer need a different INSERT_ macro for each possible number of arguments the macro might take. This change seems reasonable given that variadic macros are a standard C99 feature and widely supported. I took care not to use variadic macros where 0 variadic arguments are expected since that is a non-standard extension. - Added pre-typecast parentheses to arithmetic expressions in printf() statements in bli_thread_range_tlb.c.

Add detection of the NVIDIA nvhpc compiler (`nvc`) in `configure`, and adjust some warning options in `config.mk`. Currently, no specific options for `nvc` have been added in the relevant configurations so it may not be usable without further tweaks.

Details: - Ever since 28b0982, herk, her2k, syrk, and syr2k have been implemented in terms of the gemmt expert API. And since the decision of which induced method to use (1m or native) is made *below* the level of the expert API, executing any of {herk,her2k,syrk,syr2k} results in BLIS checking the enablement status for gemmt. - This commit applies a band-aid of sorts to this issue by modifying bli_l3_ind_oper_get_enable() and bli_l3_ind_oper_set_enable() so that any attempts to query or modify the internal enablement status for herk, her2k, syrk, or syr2k instead does so for gemmt. - This solution isn't perfect since, in theory, the user could enable 1m for, say, herk but then disable it for syrk, and then be confused when herk runs via native execution. But we don't anticipate that users modify 1m enablement at the operation level, and so in practice this solution is likely fine for now.

Details: - Inserted a cache line of padding between various fields of the thrcomm_t and, in the case of the (presently defunct) tree barrier, fields of the barrier_t. This additional padding ensures that these fields, which both serve different purposes when performing a thread barrier, are only accessed when needed (and not just due to their spatial locality with their cache line neighbors). - Added a new cpp macro constant, BLIS_CACHE_LINE_SIZE, to bli_config_macro_defs. This new constant defines the size of a cache line (in bytes) and defaults to 64. - Special thanks to Leick Robinson for discovering this false sharing issue and developing/submitting the patch.

Remove C++ options; fix whitespace; fix spurious merge conflicts

leekillough · 2023-06-08T23:14:42Z

This PR, which merges sifive_x280 with the latest master (which includes my and @angsch's generic RISC-V code), passes all fast tests after using the attached script to build the toolchain. I'm running the full testsuite now.

build-riscv.sh.txt

cd
./build-riscv.sh
source  ~/riscv/rv64imafdv_lp64d_vlen512/riscv.sh

cd blis
./configure sifive_x280
make -j
make checkblis-fast
...
check-blistest.sh: All BLIS tests passed!

The CI is the main thing holding things up. @angsch and @fgvanzee have some experience there.

leekillough · 2023-06-09T22:47:07Z

$ make checkblis
Running test_libblis.x with output redirected to 'output.testsuite'
check-blistest.sh: All BLIS tests passed!

Details: - Rewrote the defintion of bli_thrcomm_tree_barrier() so that it (a) actually worked again, and (b) used atomics instead of a basic C99 spin loop. (Note that the conventional barrier implementation is still enabled by default; the tree barrier must be toggled on manually within the configuration.) - Added an early return to the definition of bli_thrcomm_barrier() in the cases where comm == NULL or comm->n_threads == 1. - Reordered thread-related and thread-dependent header #include directives in blis.h so that the BLIS_TREE_BARRIER and BLIS_TREE_BARRIER_ARITY macros, which would be defined in the target configuration's in the bli_family_*.h file, would be #included prior to the inclusion of the thrcomm_t header that uses them. - Changed the type of barrier_t.count from 'int' to 'dim_t'. - Changed the type of barrier_t.signal from 'volatile int' to 'gint_t'. - Special thanks to Leick Robinson for contributing these changes. - Whitespace changes.

leekillough · 2023-07-09T08:35:21Z

Superceded by #4

Aaron-Hutchinson and others added 14 commits March 29, 2023 11:20

Initial commit for sifive_x280 configuration.

d367fcc

Added mm_algorithm pptx files (bp and pb).

17cd260

Details: - Added two PowerPoint files that contain slides depicting the classic Goto algorithm for matrix multiplication as well as its sister "panel-block" algorithm. These files reside in docs/diagrams.

Added mm_algorithm pdf files (bp and pb).

38fc523

Details: - Added PDF versions of the PowerPoint files added in 17cd260.

Add output.testsuite to .gitignore (flame#736)

3f1432a

Details: - Added `output.testsuite` to .gitignore since it was previously not being matched by `output.testsuite.*`.

Added whitespace and other formatting fixes

cfe0b35

Restore changes from sifive-blis-private#28

c56fddc

Merge pull request flame#1 from sifive/pr/ahutchinson/whitespace_fixes

5b51f1f

Added whitespace and other formatting fixes

Merge pull request flame#2 from sifive/myeh/invscalv_scrub2

acd68f9

Restore changes from sifive-blis-private#28

CREDITS file update.

259f684

Details: - Added attributions associated with commits: - 98d4678 9b1beec: @bartoldeman - 2b05948 059f151: @ct-clmsn - Reordered attirubtion for @decandia50.

CREDITS file update.

593d017

leekillough mentioned this pull request Apr 19, 2023

Add sifive_x280 configuration flame/blis#737

Merged

devinamatthews and others added 5 commits May 5, 2023 14:22

Fix bug in detecting Fortran compiler vendor (flame#745)

6fd9aab

`FC` was used instead of `found_fc`.

Added missing #include <io.h> for Windows. (flame#747)

ef9d3e6

Details: - This commit fixes issue flame#746, in which the _access() function (called from within blastest/f2c/open.c) is undeclared when compiling on Windows with clang 16.

leekillough force-pushed the merge_sifive_x280 branch from 654336e to cebd8ec Compare June 7, 2023 04:58

fgvanzee and others added 4 commits June 7, 2023 16:11

merge

98af78a

Merge remote-tracking branch 'flame/master' into merge_sifive_x280_flame

e607676

Merge branch 'merge_sifive_x280' into merge_sifive_x280_flame

e8fe295

Remove C++ options; fix whitespace; fix spurious merge conflicts

leekillough force-pushed the merge_sifive_x280 branch from cebd8ec to e8fe295 Compare June 8, 2023 23:07

fgvanzee and others added 2 commits June 12, 2023 17:22

Merge remote-tracking branch 'flame/master' into merge_sifive_x280

7ae57cf

leekillough closed this Jul 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge sifive_x280 with RISC-V base architecture support changes #3

Merge sifive_x280 with RISC-V base architecture support changes #3

leekillough commented Apr 12, 2023

leekillough commented Jun 8, 2023

leekillough commented Jun 9, 2023

leekillough commented Jul 9, 2023

Merge sifive_x280 with RISC-V base architecture support changes #3

Merge sifive_x280 with RISC-V base architecture support changes #3

Conversation

leekillough commented Apr 12, 2023

leekillough commented Jun 8, 2023

leekillough commented Jun 9, 2023

leekillough commented Jul 9, 2023