forked from flame/blis
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge sifive_x280 with RISC-V base architecture support changes #3
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Move -fPIC insertion to subconfigs' make_defs.mk. Details: - Previously, common.mk was appending -fPIC to the CPICFLAGS variables set within the various subconfigurations' make_defs.mk files. This seemed somewhat unintuitive, and so now the -fPIC flag is assigned to the various subconfigs' CPICFLAGS variables in the respective make_defs.mk files. - This also commit changes the logic in common.mk so that instead of appending, the variable is overwritten, but now *only* in the case of Windows (since apparently -fPIC needs to be omitted there). Thanks to Nick Knight for catching and reporting this weirdness.
Details: - Added two PowerPoint files that contain slides depicting the classic Goto algorithm for matrix multiplication as well as its sister "panel-block" algorithm. These files reside in docs/diagrams.
Details: - Added PDF versions of the PowerPoint files added in 17cd260.
Details: - Added `output.testsuite` to .gitignore since it was previously not being matched by `output.testsuite.*`.
Details: - Implemented a new configure option, --disable-tls, which allows the user to optionally disable the use of thread-local storage qualifiers on static variables in BLIS. This option will rarely be needed, but in some situations may allow BLIS to compile when TLS is unavailable. Thanks to Nick Knight for suggesting this option. - Unlike the --disable-system option, --disable-tls does not forcibly disable threading. Instead, warnings of the possible consequences of using threading with TLS disabled are added to: - the output of './configure --help'; - the output of 'configure' the --disable-tls option is parsed; - the informational header output by the testsuite. Thanks to Minh Quan Ho for suggesting these warnings. - Modified frame/include/bli_lang_defs.h so that BLIS_THREAD_LOCAL is defined to nothing when BLIS_ENABLE_TLS is not defined. - Defined bli_info_get_enable_tls(), which returns whether the cpp macro BLIS_ENABLE_TLS was defined. - Edited --disable-system configure status output for clarity. - Whitespace updates.
Added whitespace and other formatting fixes
Restore changes from sifive-blis-private#28
Details: - Added attributions associated with commits: - 98d4678 9b1beec: @bartoldeman - 2b05948 059f151: @ct-clmsn - Reordered attirubtion for @decandia50.
Details: - There are four RISC-V base configurations: 'rv32i', 'rv32iv', 'rv64i', and 'rv64iv', namely the 32-bit and 64-bit implementations with and without the 'V' vector extension. Additional extensions such as 'M' (multiplication), 'A' (atomics), 'F' ('float' hardware support), 'D' ('double' hardware support), and 'C' (compressed-length instructions), are automatically used when available. If they are not available, then software equivalents (e.g., softfloat and -latomic) are used. - './configure auto' can be invoked on a RISC-V build platform, and will automatically detect RISC-V CPU extensions through the RISC-V C API: https://github.com/riscv-non-isa/riscv-c-api-doc/blob/master/riscv-c-api.md - The assembly kernels assume the presence of the vector extension RVV 1.0. - It is possible to build 'rv[32,64]iv' for any value of VLEN. However, if VLEN < 128, the targets will fall back to the generic kernels and blocksizes. - The vector microkernels are vector-length agnostic and work with every VLEN >=128, but are expected to work best with smaller vector lengths, i.e., VLEN <= 512. - The assembly kernels cover column major storage (rs_c == 1). - The blocksizes aim at being a good generic choice for out-of-order cores. They are not tuned to a specific RISC-V HPC core. - The vector kernels have been tested using vlen={128,256,512}. - The single- and double-precision assembly code routines for 'sgemm' and 'dgemm', or for 'cgemm' and 'zgemm', are combined in their RISC-V vector assembly source code, and are differentiated only with macros. - The XLEN=32 and XLEN=64 versions of the RISC-V assembly code are identical, except that callee-saved registers are saved and restored differently. There are RISC-V assembly code #include files for handling the saving and restoring of callee-saved registers, and they are future-proof if ever XLEN=128. - Multiplications, such as computing array strides and offsets, are performed in C, and later passed to the RISC-V assembly kernels. This is so that the compiler can determine whether the 'M' (multiply) extension is available and use multiplication instructions, or call library helper functions instead. - A new macro called bli_static_assert() has been added to perform static assertions at compile-time, regardless of the C/C++ dialect of the compiler. The original motivation of this was to ensure that calling RISC-V assembly kernels would not silently truncate arguments of type 'dim_t' or 'inc_t' (so-called "narrowing conversions"). - RISC-V CI tests have been added to Travis CI, using the riscv-gnu-toolchain cross-compiler, and qemu simulator. - Thanks to Lee Killough for collaborating on this commit.
Details: - PR flame#738 -- which moved -fPIC flag insertion responsibilities from common.mk to the subconfigs' individual make_defs.mk files -- was merged shortly before the introduction of new RISC-V subconfigs in flame#693. This commit brings those RISC-V subconfigs up to date with the new -fPIC conventions.
`FC` was used instead of `found_fc`.
Details: - This commit fixes issue flame#746, in which the _access() function (called from within blastest/f2c/open.c) is undeclared when compiling on Windows with clang 16.
Details: - Consolidated INSERT_GENTFUNC_* (and corresponding GENTPROT) macro sets using variadic macros (__VA_ARGS__), which means we no longer need a different INSERT_ macro for each possible number of arguments the macro might take. This change seems reasonable given that variadic macros are a standard C99 feature and widely supported. I took care not to use variadic macros where 0 variadic arguments are expected since that is a non-standard extension. - Added pre-typecast parentheses to arithmetic expressions in printf() statements in bli_thread_range_tlb.c.
Add detection of the NVIDIA nvhpc compiler (`nvc`) in `configure`, and adjust some warning options in `config.mk`. Currently, no specific options for `nvc` have been added in the relevant configurations so it may not be usable without further tweaks.
Details: - Ever since 28b0982, herk, her2k, syrk, and syr2k have been implemented in terms of the gemmt expert API. And since the decision of which induced method to use (1m or native) is made *below* the level of the expert API, executing any of {herk,her2k,syrk,syr2k} results in BLIS checking the enablement status for gemmt. - This commit applies a band-aid of sorts to this issue by modifying bli_l3_ind_oper_get_enable() and bli_l3_ind_oper_set_enable() so that any attempts to query or modify the internal enablement status for herk, her2k, syrk, or syr2k instead does so for gemmt. - This solution isn't perfect since, in theory, the user could enable 1m for, say, herk but then disable it for syrk, and then be confused when herk runs via native execution. But we don't anticipate that users modify 1m enablement at the operation level, and so in practice this solution is likely fine for now.
leekillough
force-pushed
the
merge_sifive_x280
branch
from
June 7, 2023 04:58
654336e
to
cebd8ec
Compare
Details: - Inserted a cache line of padding between various fields of the thrcomm_t and, in the case of the (presently defunct) tree barrier, fields of the barrier_t. This additional padding ensures that these fields, which both serve different purposes when performing a thread barrier, are only accessed when needed (and not just due to their spatial locality with their cache line neighbors). - Added a new cpp macro constant, BLIS_CACHE_LINE_SIZE, to bli_config_macro_defs. This new constant defines the size of a cache line (in bytes) and defaults to 64. - Special thanks to Leick Robinson for discovering this false sharing issue and developing/submitting the patch.
Remove C++ options; fix whitespace; fix spurious merge conflicts
leekillough
force-pushed
the
merge_sifive_x280
branch
from
June 8, 2023 23:07
cebd8ec
to
e8fe295
Compare
This PR, which merges
The CI is the main thing holding things up. @angsch and @fgvanzee have some experience there. |
|
Details: - Rewrote the defintion of bli_thrcomm_tree_barrier() so that it (a) actually worked again, and (b) used atomics instead of a basic C99 spin loop. (Note that the conventional barrier implementation is still enabled by default; the tree barrier must be toggled on manually within the configuration.) - Added an early return to the definition of bli_thrcomm_barrier() in the cases where comm == NULL or comm->n_threads == 1. - Reordered thread-related and thread-dependent header #include directives in blis.h so that the BLIS_TREE_BARRIER and BLIS_TREE_BARRIER_ARITY macros, which would be defined in the target configuration's in the bli_family_*.h file, would be #included prior to the inclusion of the thrcomm_t header that uses them. - Changed the type of barrier_t.count from 'int' to 'dim_t'. - Changed the type of barrier_t.signal from 'volatile int' to 'gint_t'. - Special thanks to Leick Robinson for contributing these changes. - Whitespace changes.
Superceded by #4 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This merges flame#693, flame#735, flame#736, flame#738 and flame#740 into flame#737, and resolves merge conflicts, which are mostly caused by
#if/#endif
scopes forsifive_x280
andrv32/rv32i/rv64/rv64i
overlapping.