-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sifive_x280 configuration #737
Conversation
@Aaron-Hutchinson @nick-knight @myeh01 awesome work, much appreciated! Regarding steps 1-3 of the testing process, can these product be pre-built? This would really help CI build times... @angsch and @leekillough have been putting similar things here. |
@devinamatthews Yes, absolutely. The GNU toolchain build, in particular, is substantial. But your comment touches on a larger shortcoming of our PR: we have not addressed CI. (We meant to add a comment about this when we submitted the PR.) We are hoping for some guidance from the community on the best way to go about this, since we have little experience with setting up CI, and none with BLIS CI in particular. |
The PR can be merged without it. Once we get at least one RISC-V configuration running reliably in Travis then adding more shouldn't be too difficult. |
I think that we can extend the CI infrastructure that @leekillough and I set up. I am happy to help here. Further, before merging the PR, it would be good to check how the x280 target interacts with the auto configure and ISA detection work that we added. |
Is there a C macro which is always defined when an X280 compiler is being used? There is an auto-detect mechanism which auto-detects RISC-V architecture based on I want to improve it so that it can also detect X280, because with our PR, it will detect X280 as If X280 is detected when |
I'd be happy to upload a tarball of the prebuilt toolchain and QEMU for CI purposes. It looks like there's already a QEMU tarball in the link in your post, so I can try replacing the QEMU portion of our automation script with just downloading and unpacking that tarball. I can also do something similar with the prebuilt toolchain once it's uploaded. I think then translating the script over to CI would be much smoother.
Our automation script uses the upstream toolchain, so I'm not sure if there would be a way to differentiate it from |
Is there anything like |
@devinamatthews: There is no need to use a runtime @devinamatthews, @Aaron-Hutchinson @nick-knight : There are two RISC-V autodetection header files in PR693: bli_riscv_cpuid.h, which returns one of bli_riscv_detect_arch.h, which returns the full detected RISC-V architecture string, such as |
But if two companies make rv64iv chips how do you tell them apart? |
Hence my question in #737 (comment). @angsch and I have created a foundational RISC-V BLIS port which should be adaptable to all RISC-V variants. But we understand that there may be specific BLIS implementations for specific RISC-V implementations. The BLIS RISC-V autodetection mechanism is able to identify base features of the RISC-V implementation, such as whether |
Regarding prebuilding the toolchain for CI, I'm not sure how portable the toolchain that our script creates is. It appears it hardcodes some of the filepaths, and I fear this may cause some issues if I were to create a tarball of my local build and upload it (I have limited knowledge in this area, so correct me if I'm wrong). Would it be possible to have one of the CI machines build the toolchain itself and save the result for future runs? |
That concern is justified. I encountered incompatibilities when I first packaged qemu. To package qemu, I had to replicate the build environment of the CI machine. Further, the build of the toolchain was susceptible to the execution environment. I think that the incompatibilities are solely due to dismatching version of linked libraries such as glibc. I suggest that you use the tarball of qemu and the toolchain that Lee and I use in our PR. That runs successfully on the CI machine. |
I tried this and it is not possible. The Travis runs will hit a timeout. |
Can the timeout be increased for the steps that build the toolchain/QEMU? |
We were recommended to aim at a runtime of below 10 minutes for our rv[32,64]iv target. Note that |
Again please forgive my limited experience in this area. I would think there would be a way to save the toolchain and QEMU builds for use over multiple CI invocations and only build them when they either don't already exist on the machine or the builds become out of date. This way, they're only built once on the CI machine and nearly all CI runs will skip over the build steps for the toolchain and QEMU. |
I think Travis also has Docker images of the CI environment which you can run locally. |
GitHub has a 100 MB limit on tracked files before it requires paid service. Instead of files stored in the distribution, we would need to use released binaries, which have a 2 GB limit. That is the same 2 GB limit for Git Large File Storage in GitHub. Travis has quotas on how much CPU, memory and disk space can be used. Once the credits run out for a billing period, they must be bought with paid-for credits, or wait until the next billing period. See this also. According to @angsch, the dependency on linked libraries makes it a necessity to build the toolchain in an environment that is compatible with the CI machines. So you need to build on a fresh Ubuntu Focal machine / Docker container. |
Chunked tar.gz?
From: "Field G. Van Zee" ***@***.***>
Reply-To: flame/blis ***@***.***>
Date: Tuesday, April 4, 2023 at 2:19 PM
To: flame/blis ***@***.***>
Cc: "Matthews, Devin" ***@***.***>, Mention ***@***.***>
Subject: Re: [flame/blis] Add sifive_x280 configuration (PR #737)
[EXTERNAL SENDER]
@Aaron-Hutchinson<https://github.com/Aaron-Hutchinson>:
GitHub has a 100 MB limit<https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github> on tracked files before it requires paid service.
Instead of files stored in the distribution, we would need to use released binaries, which have a 2 GB limit<https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#distributing-large-binaries>. That is the same 2 GB limit<https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage> for Git Large File Storage in GitHub.
Travis has quotas<https://docs.travis-ci.com/user/billing-faq/> on how much CPU, memory and disk space can be used. Once the credits run out for a billing period, they must be bought with paid-for credits, or wait until the next billing period. See this also<https://www.jeffgeerling.com/blog/2020/travis-cis-new-pricing-plan-threw-wrench-my-open-source-works>.
According to @angsch<https://github.com/angsch>, the dependency on linked libraries makes it a necessity to build the toolchain in an environment that is compatible with the CI machines. So you need to build on a fresh Ubuntu Focal machine / Docker container.
—
Reply to this email directly, view it on GitHub<#737 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIAZIM57GKN75CBKFPUUALW7RX3HANCNFSM6AAAAAAWMJJWJA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I'm proposing that we do not track any toolchain/QEMU related files on GitHub, and just use build caching for them. It looks like Travis has built-in functionality for exactly this kind of purpose. See here and here. This line from the first link is particularly relevant:
|
@Aaron-Hutchinson Caching sounds fine to me. I read the links you provided, but I'm still not 100% certain how we would employ caching in this context. (Travis could use a few more examples in their documentation!) |
I agree that Travis' documentation is not very thorough. I've read a little bit about this feature and it's something I'd like to try pursuing. Does anyone know if there is a local version of Travis CI I can use on my own machine to test the results of changes to the |
I believe there is a local version using Docker. At least there was a few years ago. |
I haven't been able to find any official documentation on a local version, and unofficial discussions I've come across are a few years old and don't appear to work any more. It looks like they may have made this an Enterprise feature. |
Caching is not recommended for built toolchains (unless that document is outdated), and used to not be performed for Docker images, but seems to be now. See this and this too. |
config/sifive_x280/make_defs.mk
Outdated
CPPROCFLAGS := | ||
CMISCFLAGS := $(CMISCFLAGS_SIFIVE) -fdata-sections -ffunction-sections \ | ||
-fdiagnostics-color=always -fno-rtti -fno-exceptions \ | ||
-std=gnu++17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this read -std=gnu17
? I think that gnu++17
is a C++-only option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-std=gnu++17
should be removed completely since BLIS already adds std=c99
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We just copied this from the generic make_defs.mk
without really understanding what was required by the project. IIRC, a bunch of the warning flags are also redundant (generated somewhere else in the build system).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aaron-Hutchinson I think you forgot to update CMISCFLAGS
when you rebased
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder! I did indeed forget. This will be fixed in the upcoming commit.
RISC-V General Toolchain BuilderThe following script is used in-house @tactcomplabs: build-riscv.txt (rename to
To use it, edit the variables at the top of the file, e.g.,
and then run To Build BLISAfter the toolchain is built,
Build issues encountered with this PR(The C++ options have been removed, and merge conflicts eliminated, in sifive#3 .) Your script sets:
while also using:
... which seems to exclude shared libraries, while also specifying options to use them. When using QEMU, our script sets:
... which allows QEMU to work with BLIS shared libraries. When I build my toolchain with tag
Is there a @angsch @nick-knight @Aaron-Hutchinson @devinamatthews @fgvanzee @ct-clmsn |
@Aaron-Hutchinson In order to avoid duplication, I tested the QEMU tarball that Lee and I use. I face the compilation problem with the vector intrinsics, too, so my test experimentally enabled all extension that x280 has. Based on these tests, I am confident that you can use the same QEMU tarball for your CI. The tarball lives in a sibling repo: https://github.com/flame/ci-utils/blob/master/riscv/qemu-riscv-2023.02.25-ubuntu-20.04.tar.gz. |
Thanks for all the feedback, sorry we're slow to respond. Regarding the RISC-V vector intrinsics issue, this name-mangling was introduced recently at the behest of the RISC-V Toolchains SIG, in riscv-non-isa/riscv-c-api-doc#31. It made its way into the vector intrinsics API, version 0.11 (multiple PRs, I won't try to list them all). That API change, in turn, appeared in LLVM 16.0.0. Unfortunately, I don't know the status with GCC. Historically, GCC has lagged LLVM w.r.t. chasing unratified/churning RISC-V specs, so I'm not surprised that LLVM works but GCC does not. On that last point, in case it isn't clear, the RISC-V vector intrinsics API is a community project, sponsored by RISC-V International: We are working towards v1.0 of the API but have not frozen yet. And it looks like we'll miss the GCC 13 window. The task group meets monthly; we'd love your company. If you have questions on GCC support for the latest intrinsics API changes, this is the right community to bring it up with. |
I am willing to do it for this PR, since I have been locally keeping it up to date . |
…code. However, it does not get correct results for complex BLIS routines which use segment loads (or call those that do). The intrinsic types check out and make sense, but it returns wrong answers. It's probably something really simple. For historical reference, see: riscv-non-isa/riscv-c-api-doc#43 flame#737 (comment) https://reviews.llvm.org/D152134 riscv-non-isa/rvv-intrinsic-doc#139 riscv-non-isa/rvv-intrinsic-doc#198 https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/auto-generated/intrinsic_funcs/03_vector_stride_segment_load_store_instructions_zvlsseg.md https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/auto-generated/intrinsic_funcs/04_vector_indexed_segment_load_store_instructions_zvlsseg.md
Added whitespace and other formatting fixes
Restore changes from sifive-blis-private#28
Our team would like to get this PR merged soon. We have some updates coming in shortly with minor changes, such as resolving the merge conflicts and updating the RISC-V intrinsics. What is the best way forward regarding the CI issue? From what I can tell from the comments above this is still unresolved. |
When you have updated the PR, I am happy to test locally if you can reuse the binaries that are used in the current CI pipeline. I am optimistic that the CI suggestions from above still work. |
acd68f9
to
8663e95
Compare
* Updated RISC-V intrinsics to match LLVM 17.0.2
All of the developmental changes we planned to make are now merged into @angsch If you're able and willing to run the CI tests locally, I think the branch should be in a stable place to do so now. Thank you! |
The following should work. I think it makes sense to use the same compiler version for all RISC-V targets, so the compiler version is bumped below for the already existing targets. diff --git a/.travis.yml b/.travis.yml
index 848cb184..bdfafb6b 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -86,6 +86,11 @@ matrix:
env: OOT=0 TEST=FAST SDE=0 THR="none" BLD="--disable-shared" CONF="rv32iv" \
CC=riscv32-unknown-linux-gnu-gcc \
LDFLAGS=-static
+ - os: linux
+ compiler: clang
+ env: OOT=0 TEST=FAST SDE=0 THR="none" BLD="--disable-shared" CONF="sifive_x280" \
+ CC=clang \
+ LDFLAGS=-static
install:
- if [ "$CC" = "gcc" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then export CC="gcc-9"; fi
- if [ -n "$PACKAGES" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo apt-get install -y $PACKAGES; fi
@@ -106,6 +111,12 @@ script:
export CXX=$DIST_PATH/../toolchain/riscv/bin/riscv32-unknown-linux-gnu-g++;
export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv32 -cpu rv32,vext_spec=v1.0,v=true,vlen=128 -B 0x100000";
fi
+- if [ "$CONF" = "sifive_x280" ]; then
+ $DIST_PATH/travis/do_riscv.sh "$CONF";
+ export CC=$DIST_PATH/../toolchain/riscv/bin/clang;
+ export CXX=$DIST_PATH/../toolchain/riscv/bin/clang++;
+ export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv64 -cpu rv64,vext_spec=v1.0,v=true,vlen=512 -B 0x100000";
+ fi
- $DIST_PATH/configure -p `pwd`/../install -t $THR $BLD CC=$CC $CONF
- pwd
- ls -l
diff --git a/travis/do_riscv.sh b/travis/do_riscv.sh
index a51d3306..9a114b0e 100755
--- a/travis/do_riscv.sh
+++ b/travis/do_riscv.sh
@@ -3,18 +3,21 @@
set -e
set -x
-TAG=2023.02.25
+TAG=2023.10.18
# The prebuilt toolchains only support hardfloat, so we only
# test these for now.
case $1 in
"rv32iv")
- TARBALL=riscv32-glibc-ubuntu-20.04-nightly-${TAG}-nightly.tar.gz
+ TARBALL=riscv32-glibc-ubuntu-20.04-gcc-nightly-${TAG}-nightly.tar.gz
;;
"rv64iv")
- TARBALL=riscv64-glibc-ubuntu-20.04-nightly-${TAG}-nightly.tar.gz
+ TARBALL=riscv64-glibc-ubuntu-20.04-gcc-nightly-${TAG}-nightly.tar.gz
;;
+ "sifive_x280")
+ TARBALL=riscv64-glibc-ubuntu-20.04-llvm-nightly-${TAG}-nightly.tar.gz
*)
+ ;;
exit 1
;;
esac I zipped the patch due to Github's constraints of what can be attached. |
@angsch Looks like CI has failed after applying the patch due to not being able to find the compiler:
Any idea what went wrong? |
;; | ||
"sifive_x280") | ||
TARBALL=riscv64-glibc-ubuntu-20.04-llvm-nightly-${TAG}-nightly.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a QEMU in this tarball file. Is it necessary to get another one using the following commands?
# Once CI upgrades to jammy, the next three lines can be removed.
# The qemu version installed via packages (qemu-user qemu-user-binfmt)
# is sufficient.
TARBALL_QEMU=qemu-riscv-2023.02.25-ubuntu-20.04.tar.gz
wget https://github.com/flame/ci-utils/raw/master/riscv/${TARBALL_QEMU}
tar -xf $TARBALL_QEMU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just need to update TARBALL
to riscv64-glibc-ubuntu-{JAMMY_VER}-gcc-nightly-${TAG}-nightly.tar.gz
if the CI is upgraded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I didn't notice that now both the LLVM and the GNU toolchain include qemu.
Does soft link work?
Could you try using |
I think that a syntax error before that introduces the problem. My mistake, sorry. + "sifive_x280")
+ TARBALL=riscv64-glibc-ubuntu-20.04-llvm-nightly-${TAG}-nightly.tar.gz
+ ;;
*)
exit 1
;;
esac (The In the meanwhile, I will try the qemu builds shipped with the toolchain. |
Thanks for the correction @angsch. Looks like with that fix the PR has passed the CI. |
Thank you everyone for your contributions and engagement on this PR! Does anyone else have any comments before I merge? 🚀 |
Details: - Added a new 'sifive_x280' subconfiguration for SiFive's x280 RISC-V instruction set architecture. The subconfig registers kernels from a correspondingly new kernel set, also named 'sifive_x280'. - Added the aforementioned kernel set, which includes intrinsics- and assembly-based implementations of most level-1v kernels along with level-1f kernels axpy2v dotaxpyv, packm kernels, and level-3 gemm, gemmtrsm_l, and gemmtrsm_u microkernels (plus supporting files). - Registered the 'sifive_x280' subconfig as belonging to a singleton family by the same name. - Added an entry to '.travis.yml' to test the new subconfig via qemu. - Updates to 'travis/do_riscv.sh' script to support the 'sifive_x280' subconfig and to reflect updated tarball names. - Special thanks to Lee Killough, Devin Matthews, and Angelika Schwarz for their engagement on this commit. - (cherry picked from commit 05388dd) Fixed HPX barrier synchronization (#783) Details: - Fixed hpx barrier synchronization. HPX was hanging on larger cores because blis was using non-hpx synchronization primitives. But when using hpx-runtime only hpx-synchronization primitives should be used. Hence, a C style wrapper hpx_barrier_t is introduced to perform hpx barrier operations. - Replaced hpx::for_loop with hpx::futures. Using hpx::for_loop with hpx::barrier on n_threads greater than actual hardware thread count causes synchronization issues making hpx hanging. This can be avoided by using hpx::futures, which are relatively very lightweight, robust and scalable. - (cherry picked from 7a87e57) Fixed bug in sup threshold registration. (#782) Details: - Fixed a bug that resulted in BLIS non-deterministically calling the gemmsup handler, irrespective of the thresholds that are registered via bli_cntx_set_blkszs(). - Deep dive: In bli_cntx_init_ref.c, the default values for the gemmsup thresholds (BLIS_[MNK]T blocksizes) wre being set to zero so that no operation ever matched the criteria for gemmsup (unless specific sup thresholds are registered). HOWEVER, these thresholds are set via bli_cntx_set_blkszs() which calls bli_blksz_copy_if_pos(), which was only coping the thresholds into the gks' cntx_t if the values were strictly positive. Thus, the zero values passed into bli_cntx_set_blkszs() were being ignored and those threshold slots within the gks were left uninitialized. The upshot of this is that the reference gemmsup handler was being called for gemm problems essentially at random (and as it turns out, very rarely the reference gemmsup implementation would encounter a divide-by-zero error). - The problem was fixed by changing bli_blksz_copy_if_pos() so that it copies values that are non-negative (values >= 0 instead of > 0). The function was also renamed to bli_blksz_copy_if_nonneg() - Also needed to standardize use of -1 as the sole value to embed into blksz_t structs as a signal to bli_cntx_set_blkszs() to *not* register a value for that slot (and instead let whatever existing values remain). This required updates to the bli_cntx_init_*() functions for bgq, cortexa9, knc, penryn, power7, and template subconfigs, as some of these codes were using 0 instead of -1. - Fixes #781. Thanks to Devin Matthews for identifying, diagnosing, and proposing a fix for this issue. - (cherry picked from 8fff1e3) Update zen3 subconfig to support NVHPC compilers. (#779) Details: - Parse $(CC_VENDOR) values of "nvc" in 'zen3' make_defs.mk file. - Minor refactor to accommodate above edit. - CREDITS file update. - (cherry picked from 1e264a4) Fixed brokenness when sba is disabled. (#777) Details: - Previously, disabling the sba via --disable-sba-pools resulted in a segfault due to a sanity-check-triggering abort(). The problem was that the sba, as currently used in the l3 thread decorators, did not yet (fully) support pools being disabled. The solution entailed creating wrapper function, bli_sba_array_elem(), which either calls bli_apool_array_elem() (when sba pools are enabled at configure time) or returns a NULL sba_pool pointer (when sba pools are disabled), and calling bli_sba_array_elem() in place of bli_apool_array_elem(). Note that the NULL pointer returned by bli_sba_array_elem() when the sba pools are disabled does no harm since in that situation the pointer goes unreferenced when acquiring and releasing small blocks. Thanks to John Mather for reporting this bug. - Guarded the bodies of bli_sba_init() and bli_sba_finalize() with #ifdef BLIS_ENABLE_SBA_POOLS. I don't think this was actually necessary to fix the aforementioned bug, but it seems like good practice. - Moved the code in bli_l3_thrinfo_create() that checked that the array* pointer is non-NULL before calling bli_sba_array_elem() (previously bli_apool_array_elem()) into the definition of bli_sba_array_elem(). - Renamed various instances of 'pool' variables and function parameters to 'sba_pool' to emphasize what kind of pool it represents. - Whitespace changes. - (cherry picked from c2099ed) Implemented [cz]symv_(), [cz]syr_(), [cz]rot_(). (#778) Details: - Expanded existing BLAS compatibility APIs to provide interfaces to [cz]symv_(), [cz]syr_(). This was easy since those operations were already implemented natively in BLIS; the APIs were previously omitted only because they were not formally part of the BLAS. - Implemented [cz]rot_() by feeding code from LAPACK 3.11 through f2c. - Thanks to James Foster for pointing out that LAPACK contains these additional symbols, which prompted these additions, as well as for testing the [cz]rot_() functions from Julia's test infrastructure. - CREDITS file update. - (cherry picked from 37ca4fd) Fixes to HPC runtime code path. (#773) Details: - Fixed hpx::for_each invocation and replace with hpx::for_loop. The HPX runtime was initialized using hpx::start, but the hpx::for_each function was being called on a non-hpx runtime (i.e standard BLIS runtime - single main thread). To run hpx::for_each on HPX runtime correctly, the code now uses hpx::run_as_hpx_thread(func, args...). - Replaced hpx::for_each with hpx::for_loop, which eliminates use of hpx::util::counting_iterator. - Employ hpx::execution::chunk_size(1) to make sure that a thread resides on a particular core. - Replaced hpx::apply() with updated version hpx::post(). - Initialize tdata->id = 0 in libblis.c to 0, as it is the main thread and is needed for writing results to output file. - By default, if not specified, the HPX runtime uses all N threads/cores available in the system. But, if we want to only specify n_threads out N threads, we use hpx::execution::experimental::num_cores(n_threads). - (cherry picked from a4a6329) Fixed broken link in Multithreading.md. (#774) Details: - Replaced 404'd link in docs/Multithreading.md with an archive from The Wayback Machine. - CREDITS file update. - (cherry picked from c6546c1)
This PR adds a new configuration to BLIS, called
sifive_x280
. This configuration is built for the RISC-V instruction set architecture and is optimized for SiFive's X280 processor. Included are implementations for most level 1, 1f, and 3 kernels, with the level 3gemm
andgemmtrsm
kernels receiving the most attention.Since this configuration targets RISC-V, compiling it and running tests on typical machines is challenging. For convenience, we've written a simple script that aims to make testing this configuration easier. The script can be found here, which has the following flow:
sifive_x280
, builds it, and runsmake check
.Developers for the
sifive_x280
implementation (in alphabetical order):Special thanks to @fgvanzee for their assistance in debugging various issues and helping our team understand the BLIS framework.
We look forward to your feedback and are very excited to join the BLIS community.