Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sifive_x280 configuration #737

Merged
merged 11 commits into from
Nov 3, 2023
Merged

Conversation

Aaron-Hutchinson
Copy link
Contributor

This PR adds a new configuration to BLIS, called sifive_x280. This configuration is built for the RISC-V instruction set architecture and is optimized for SiFive's X280 processor. Included are implementations for most level 1, 1f, and 3 kernels, with the level 3 gemm and gemmtrsm kernels receiving the most attention.

Since this configuration targets RISC-V, compiling it and running tests on typical machines is challenging. For convenience, we've written a simple script that aims to make testing this configuration easier. The script can be found here, which has the following flow:

  • downloads and builds the RISC-V GNU Linux toolchain (just for the C runtime)
  • downloads and builds the RISC-V LLVM Linux toolchain, integrating the C runtime from GNU
  • downloads QEMU and builds the riscv64 Linux usermode emulator
  • downloads BLIS, configures it for sifive_x280, builds it, and runs make check.

Developers for the sifive_x280 implementation (in alphabetical order):

Special thanks to @fgvanzee for their assistance in debugging various issues and helping our team understand the BLIS framework.

We look forward to your feedback and are very excited to join the BLIS community.

@devinamatthews
Copy link
Member

@Aaron-Hutchinson @nick-knight @myeh01 awesome work, much appreciated! Regarding steps 1-3 of the testing process, can these product be pre-built? This would really help CI build times... @angsch and @leekillough have been putting similar things here.

@nick-knight
Copy link

@devinamatthews Yes, absolutely. The GNU toolchain build, in particular, is substantial. But your comment touches on a larger shortcoming of our PR: we have not addressed CI. (We meant to add a comment about this when we submitted the PR.) We are hoping for some guidance from the community on the best way to go about this, since we have little experience with setting up CI, and none with BLIS CI in particular.

@devinamatthews
Copy link
Member

The PR can be merged without it. Once we get at least one RISC-V configuration running reliably in Travis then adding more shouldn't be too difficult.

@angsch
Copy link
Collaborator

angsch commented Mar 30, 2023

The PR can be merged without it. Once we get at least one RISC-V configuration running reliably in Travis then adding more shouldn't be too difficult.

I think that we can extend the CI infrastructure that @leekillough and I set up. I am happy to help here. Further, before merging the PR, it would be good to check how the x280 target interacts with the auto configure and ISA detection work that we added.

@leekillough
Copy link
Collaborator

Is there a C macro which is always defined when an X280 compiler is being used?

There is an auto-detect mechanism which auto-detects RISC-V architecture based on __riscv* macros. It is used when ./configure auto is invoked.

I want to improve it so that it can also detect X280, because with our PR, it will detect X280 as rv64iv.

If X280 is detected when configure auto is used, do you want it choose the sifive_x280 configuration?

@Aaron-Hutchinson
Copy link
Contributor Author

Regarding steps 1-3 of the testing process, can these product be pre-built? This would really help CI build times... @angsch and @leekillough have been putting similar things here.

I'd be happy to upload a tarball of the prebuilt toolchain and QEMU for CI purposes. It looks like there's already a QEMU tarball in the link in your post, so I can try replacing the QEMU portion of our automation script with just downloading and unpacking that tarball. I can also do something similar with the prebuilt toolchain once it's uploaded.

I think then translating the script over to CI would be much smoother.

Is there a C macro which is always defined when an X280 compiler is being used?

Our automation script uses the upstream toolchain, so I'm not sure if there would be a way to differentiate it from rv64iv through C preprocessor macros. @nick-knight would be able to say more, but is out-of-office through next week.

@devinamatthews
Copy link
Member

Is there anything like cpuid on RISC-V?

@leekillough
Copy link
Collaborator

@devinamatthews: There is no need to use a runtime cpuid on RISC-V, because there are predefined macros in the RISC-V C API. Using a cross-compiler and executing RISC-V on a different host architecture, requiring the use of a simulator in configure, would be awkward. Fortunately, the RISC-V C API provides preprocessor macros for architecture detection, so $(CC) -E can be used to autodetect the RISC-V architecture.

@devinamatthews, @Aaron-Hutchinson @nick-knight :

There are two RISC-V autodetection header files in PR693:

bli_riscv_cpuid.h, which returns one of rv32i, rv32iv, rv64i, rv64iv or generic, depending on whether one of 4 major RISC-V architectures are detected (XLEN=32 and XLEN=64, with and without V vector extension). In the configure script, if this autodetection header returns generic, then the existing BLIS autodetection mechanism is fallen back on.

bli_riscv_detect_arch.h, which returns the full detected RISC-V architecture string, such as rv64imafdcv. The result of this header is used to form the -march= option. On some versions of Clang and GCC, -march=...v needs to be specified to enable the V vector extension to be enabled, which is forced in the BLIS rv32iv and rv64iv configurations by using -DFORCE_RISCV_VECTOR when preprocessing bli_riscv_detect_arch.h, because preprocessing the header with default compiler options would not enable V.

@devinamatthews
Copy link
Member

But if two companies make rv64iv chips how do you tell them apart?

@leekillough
Copy link
Collaborator

But if two companies make rv64iv chips how do you tell them apart?

Hence my question in #737 (comment).

@angsch and I have created a foundational RISC-V BLIS port which should be adaptable to all RISC-V variants. But we understand that there may be specific BLIS implementations for specific RISC-V implementations.

The BLIS RISC-V autodetection mechanism is able to identify base features of the RISC-V implementation, such as whether A, M, V extensions are available, but unless there is a C macro to autodetect x280 or other implementations, the BLIS user will need to specify ./configure sifive_x280 instead of ./configure auto in order to get the most features out of a particular RISC-V implementation.

@Aaron-Hutchinson
Copy link
Contributor Author

Regarding prebuilding the toolchain for CI, I'm not sure how portable the toolchain that our script creates is. It appears it hardcodes some of the filepaths, and I fear this may cause some issues if I were to create a tarball of my local build and upload it (I have limited knowledge in this area, so correct me if I'm wrong).

Would it be possible to have one of the CI machines build the toolchain itself and save the result for future runs?

@angsch
Copy link
Collaborator

angsch commented Apr 4, 2023

Regarding prebuilding the toolchain for CI, I'm not sure how portable the toolchain that our script creates is. It appears it hardcodes some of the filepaths, and I fear this may cause some issues if I were to create a tarball of my local build and upload it (I have limited knowledge in this area, so correct me if I'm wrong).

That concern is justified. I encountered incompatibilities when I first packaged qemu. To package qemu, I had to replicate the build environment of the CI machine. Further, the build of the toolchain was susceptible to the execution environment. I think that the incompatibilities are solely due to dismatching version of linked libraries such as glibc.

I suggest that you use the tarball of qemu and the toolchain that Lee and I use in our PR. That runs successfully on the CI machine.

@angsch
Copy link
Collaborator

angsch commented Apr 4, 2023

Would it be possible to have one of the CI machines build the toolchain itself and save the result for future runs?

I tried this and it is not possible. The Travis runs will hit a timeout.

@Aaron-Hutchinson
Copy link
Contributor Author

I tried this and it is not possible. The Travis runs will hit a timeout.

Can the timeout be increased for the steps that build the toolchain/QEMU?

@angsch
Copy link
Collaborator

angsch commented Apr 4, 2023

Can the timeout be increased for the steps that build the toolchain/QEMU?

We were recommended to aim at a runtime of below 10 minutes for our rv[32,64]iv target. Note that make -j does not do the trick. Further, since your CI target will be triggered also when something unrelated is pushed (e.g. a non-RISC-V target), building the toolchain will burn CPU hours.

@Aaron-Hutchinson
Copy link
Contributor Author

We were recommended to aim at a runtime of below 10 minutes for our rv[32,64]iv target. Note that make -j does not do the trick. Further, since your CI target will be triggered also when something unrelated is pushed (e.g. a non-RISC-V target), building the toolchain will burn CPU hours.

Again please forgive my limited experience in this area. I would think there would be a way to save the toolchain and QEMU builds for use over multiple CI invocations and only build them when they either don't already exist on the machine or the builds become out of date. This way, they're only built once on the CI machine and nearly all CI runs will skip over the build steps for the toolchain and QEMU.

@devinamatthews
Copy link
Member

I think Travis also has Docker images of the CI environment which you can run locally.

@leekillough
Copy link
Collaborator

@Aaron-Hutchinson:

GitHub has a 100 MB limit on tracked files before it requires paid service.

Instead of files stored in the distribution, we would need to use released binaries, which have a 2 GB limit. That is the same 2 GB limit for Git Large File Storage in GitHub.

Travis has quotas on how much CPU, memory and disk space can be used. Once the credits run out for a billing period, they must be bought with paid-for credits, or wait until the next billing period. See this also.

According to @angsch, the dependency on linked libraries makes it a necessity to build the toolchain in an environment that is compatible with the CI machines. So you need to build on a fresh Ubuntu Focal machine / Docker container.

@devinamatthews
Copy link
Member

devinamatthews commented Apr 4, 2023 via email

@Aaron-Hutchinson
Copy link
Contributor Author

@Aaron-Hutchinson:

GitHub has a 100 MB limit on tracked files before it requires paid service.

Instead of files stored in the distribution, we would need to use released binaries, which have a 2 GB limit. That is the same 2 GB limit for Git Large File Storage in GitHub.

Travis has quotas on how much CPU, memory and disk space can be used. Once the credits run out for a billing period, they must be bought with paid-for credits, or wait until the next billing period. See this also.

According to @angsch, the dependency on linked libraries makes it a necessity to build the toolchain in an environment that is compatible with the CI machines. So you need to build on a fresh Ubuntu Focal machine / Docker container.

I'm proposing that we do not track any toolchain/QEMU related files on GitHub, and just use build caching for them. It looks like Travis has built-in functionality for exactly this kind of purpose. See here and here. This line from the first link is particularly relevant:

Caches lets Travis CI store directories between builds, which is useful for storing dependencies that take longer to compile or download.

@fgvanzee
Copy link
Member

@Aaron-Hutchinson Caching sounds fine to me. I read the links you provided, but I'm still not 100% certain how we would employ caching in this context. (Travis could use a few more examples in their documentation!)

@Aaron-Hutchinson
Copy link
Contributor Author

@Aaron-Hutchinson Caching sounds fine to me. I read the links you provided, but I'm still not 100% certain how we would employ caching in this context. (Travis could use a few more examples in their documentation!)

I agree that Travis' documentation is not very thorough. I've read a little bit about this feature and it's something I'd like to try pursuing.

Does anyone know if there is a local version of Travis CI I can use on my own machine to test the results of changes to the .travis.yaml file? The answers I've found from searching around are greatly out of date.

@devinamatthews
Copy link
Member

I believe there is a local version using Docker. At least there was a few years ago.

@Aaron-Hutchinson
Copy link
Contributor Author

I haven't been able to find any official documentation on a local version, and unofficial discussions I've come across are a few years old and don't appear to work any more. It looks like they may have made this an Enterprise feature.

@leekillough
Copy link
Collaborator

Caching is not recommended for built toolchains (unless that document is outdated), and used to not be performed for Docker images, but seems to be now. See this and this too.

CPPROCFLAGS :=
CMISCFLAGS := $(CMISCFLAGS_SIFIVE) -fdata-sections -ffunction-sections \
-fdiagnostics-color=always -fno-rtti -fno-exceptions \
-std=gnu++17
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this read -std=gnu17? I think that gnu++17 is a C++-only option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-std=gnu++17 should be removed completely since BLIS already adds std=c99.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. We just copied this from the generic make_defs.mk without really understanding what was required by the project. IIRC, a bunch of the warning flags are also redundant (generated somewhere else in the build system).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aaron-Hutchinson I think you forgot to update CMISCFLAGS when you rebased

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder! I did indeed forget. This will be fixed in the upcoming commit.

@leekillough
Copy link
Collaborator

Since this configuration targets RISC-V, compiling it and running tests on typical machines is challenging. For convenience, we've written a simple script that aims to make testing this configuration easier. The script can be found here, which has the following flow:

* downloads and builds the RISC-V GNU Linux toolchain (just for the C runtime)

* downloads and builds the RISC-V LLVM Linux toolchain, integrating the C runtime from GNU

* downloads QEMU and builds the riscv64 Linux usermode emulator

* downloads BLIS, configures it for `sifive_x280`, builds it, and runs `make check`.

RISC-V General Toolchain Builder

The following script is used in-house @tactcomplabs:

build-riscv.txt (rename to build-riscv.sh).

  • It supports any valid RISC-V ARCH / ABI / VLEN combination (e.g., rv64imafdcv/lp64d, rv32imaf/ilp32f).
  • It supports GCC and Clang/LLVM (thanks to @cmuellner).
  • It supports QEMU and Spike/PK.
  • It supports specifying the branch/tag/commit of the riscv-gnu-toolchain to use (e.g., master, rvv-next, latest).
  • It clones from the repositories as needed, starting with a clean build each time.
  • It autodetects missing package dependencies on Debian-based platforms.
  • It creates a riscv.sh script to source to set all of the environment variables to cross-compile and run a software package with a simulator.
  • It unsets any environment variables set beforehand which could affect builds.
  • It uses color highlighting and interactive prompts if it's run on a terminal.
  • It profiles the time spent in each Bash function.

To use it, edit the variables at the top of the file, e.g.,

# Variables defining the RISC-V toolchain

# Build parameters
RISCV_ARCH=rv64imafdv
RISCV_ABI=lp64d
RISCV_VLEN=128

# gnu or llvm
COMPILER=gnu

# latest: The most recent tagged released in RISC-V toolchain
# rvv-next: An experimental RISC-V toolchain branch (stale?)
# master: The latest development branch
# <commitID>
RISCV_GNU_TAG=rvv-next

# qemu or spike
RISCV_SIM=qemu

and then run ./build-riscv.sh or bash ./build-riscv.txt.

To Build BLIS

After the toolchain is built, cd blis and type, e.g.,

source ~/riscv/rv64imafdv_lp64d_vlen128/riscv.sh
./configure rv64iv
make -j
make -j checkblis-fast

Build issues encountered with this PR

(The C++ options have been removed, and merge conflicts eliminated, in sifive#3 .)

Your script sets:

TESTSUITE_WRAPPER="$QEMU_PATH -cpu $QEMU_CPU -L $CLANG_CROSS_INSTALL_DIR/sysroot"

while also using:

BLIS_OPTIONS="--prefix=sifive_x280 --disable-shared"

... which seems to exclude shared libraries, while also specifying options to use them.

When using QEMU, our script sets:

export QEMU_LD_PREFIX=$RISCV/sysroot

... which allows QEMU to work with BLIS shared libraries.

When I build my toolchain with tag rvv-next and then attempt to build BLIS with sifive_x280, I get the following error:

Compiling obj/sifive_x280/kernels/sifive_x280/1/bli_addv_sifive_x280_intr/bli_addv_sifive_x280_intr.o ('sifive_x280' CFLAGS for kernels)
In file included from kernels/sifive_x280/1/bli_addv_sifive_x280_intr/bli_addv_sifive_x280_intr.c:40:
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/./bli_addv_sifive_x280_intr_real.c: In function 'bli_saddv_sifive_x280_intr':
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:38:34: warning: implicit declaration of function '__riscv_vsetvl_e32m8' [-Wimplicit-function-declaration]
   38 | #define VSETVL_(PRECISION, LMUL) __riscv_vsetvl_e##PRECISION##LMUL
      |                                  ^~~~~~~~~~~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:39:33: note: in expansion of macro 'VSETVL_'
   39 | #define VSETVL(PRECISION, LMUL) VSETVL_(PRECISION, LMUL)
      |                                 ^~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/./bli_addv_sifive_x280_intr_real.c:52:21: note: in expansion of macro 'VSETVL'
   52 |         size_t vl = VSETVL(PREC, LMUL)(avl);
      |                     ^~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:43:37: warning: implicit declaration of function '__riscv_vle32_v_f32m8' [-Wimplicit-function-declaration]
   43 | #define VLE_V_F_(PRECISION, LMUL)   __riscv_vle##PRECISION##_v_f##PRECISION##LMUL
      |                                     ^~~~~~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:44:36: note: in expansion of macro 'VLE_V_F_'
   44 | #define VLE_V_F(PRECISION, LMUL)   VLE_V_F_(PRECISION, LMUL)
      |                                    ^~~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/./bli_addv_sifive_x280_intr_real.c:56:20: note: in expansion of macro 'VLE_V_F'
   56 |             xvec = VLE_V_F(PREC, LMUL) (x, vl);
      |                    ^~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:43:37: error: incompatible types when assigning to type 'vfloat32m8_t' from type 'int'
   43 | #define VLE_V_F_(PRECISION, LMUL)   __riscv_vle##PRECISION##_v_f##PRECISION##LMUL
      |                                     ^~~~~~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:44:36: note: in expansion of macro 'VLE_V_F_'
   44 | #define VLE_V_F(PRECISION, LMUL)   VLE_V_F_(PRECISION, LMUL)
      |                                    ^~~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/./bli_addv_sifive_x280_intr_real.c:56:20: note: in expansion of macro 'VLE_V_F'
   56 |             xvec = VLE_V_F(PREC, LMUL) (x, vl);
      |                    ^~~~~~~
compilation terminated due to -Wfatal-errors.
make: *** [Makefile:696: obj/sifive_x280/kernels/sifive_x280/1/bli_addv_sifive_x280_intr/bli_addv_sifive_x280_intr.o] Error 1

Is there a rvv-next riscv-gnu-toolchain configure option which needs to be specified, in order to enable the vector intrinsics?

@angsch @nick-knight @Aaron-Hutchinson @devinamatthews @fgvanzee @ct-clmsn

@angsch
Copy link
Collaborator

angsch commented Apr 19, 2023

@Aaron-Hutchinson In order to avoid duplication, I tested the QEMU tarball that Lee and I use. I face the compilation problem with the vector intrinsics, too, so my test experimentally enabled all extension that x280 has. Based on these tests, I am confident that you can use the same QEMU tarball for your CI. The tarball lives in a sibling repo: https://github.com/flame/ci-utils/blob/master/riscv/qemu-riscv-2023.02.25-ubuntu-20.04.tar.gz.

@nick-knight
Copy link

nick-knight commented Apr 19, 2023

Thanks for all the feedback, sorry we're slow to respond.

Regarding the RISC-V vector intrinsics issue, this name-mangling was introduced recently at the behest of the RISC-V Toolchains SIG, in riscv-non-isa/riscv-c-api-doc#31. It made its way into the vector intrinsics API, version 0.11 (multiple PRs, I won't try to list them all). That API change, in turn, appeared in LLVM 16.0.0. Unfortunately, I don't know the status with GCC. Historically, GCC has lagged LLVM w.r.t. chasing unratified/churning RISC-V specs, so I'm not surprised that LLVM works but GCC does not.

On that last point, in case it isn't clear, the RISC-V vector intrinsics API is a community project, sponsored by RISC-V International:

We are working towards v1.0 of the API but have not frozen yet. And it looks like we'll miss the GCC 13 window. The task group meets monthly; we'd love your company. If you have questions on GCC support for the latest intrinsics API changes, this is the right community to bring it up with.

@leekillough
Copy link
Collaborator

Ah yes, the RVV intrinsics API is still not frozen, we should be prepared for churn.

I am willing to do it for this PR, since I have been locally keeping it up to date .

@Aaron-Hutchinson
Copy link
Contributor Author

Our team would like to get this PR merged soon. We have some updates coming in shortly with minor changes, such as resolving the merge conflicts and updating the RISC-V intrinsics.

What is the best way forward regarding the CI issue? From what I can tell from the comments above this is still unresolved.

@angsch
Copy link
Collaborator

angsch commented Oct 12, 2023

What is the best way forward regarding the CI issue? From what I can tell from the comments above this is still unresolved.

When you have updated the PR, I am happy to test locally if you can reuse the binaries that are used in the current CI pipeline. I am optimistic that the CI suggestions from above still work.

@Aaron-Hutchinson
Copy link
Contributor Author

All of the developmental changes we planned to make are now merged into add_sifive_x280, and the RISC-V intrinsic updates and merge conflicts have been addressed. I believe our team is happy with the state of the branch.

@angsch If you're able and willing to run the CI tests locally, I think the branch should be in a stable place to do so now. Thank you!

@angsch
Copy link
Collaborator

angsch commented Oct 19, 2023

The following should work. I think it makes sense to use the same compiler version for all RISC-V targets, so the compiler version is bumped below for the already existing targets.

diff --git a/.travis.yml b/.travis.yml
index 848cb184..bdfafb6b 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -86,6 +86,11 @@ matrix:
     env: OOT=0 TEST=FAST SDE=0 THR="none" BLD="--disable-shared" CONF="rv32iv" \
       CC=riscv32-unknown-linux-gnu-gcc \
       LDFLAGS=-static
+  - os: linux
+    compiler: clang
+    env: OOT=0 TEST=FAST SDE=0 THR="none" BLD="--disable-shared" CONF="sifive_x280" \
+      CC=clang \
+      LDFLAGS=-static
 install:
 - if [ "$CC" = "gcc"  ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then export CC="gcc-9"; fi
 - if [ -n "$PACKAGES" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo apt-get install -y $PACKAGES; fi
@@ -106,6 +111,12 @@ script:
     export CXX=$DIST_PATH/../toolchain/riscv/bin/riscv32-unknown-linux-gnu-g++;
     export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv32 -cpu rv32,vext_spec=v1.0,v=true,vlen=128 -B 0x100000";
   fi
+- if [ "$CONF" = "sifive_x280" ]; then
+    $DIST_PATH/travis/do_riscv.sh "$CONF";
+    export CC=$DIST_PATH/../toolchain/riscv/bin/clang;
+    export CXX=$DIST_PATH/../toolchain/riscv/bin/clang++;
+    export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv64 -cpu rv64,vext_spec=v1.0,v=true,vlen=512 -B 0x100000";
+  fi
 - $DIST_PATH/configure -p `pwd`/../install -t $THR $BLD CC=$CC $CONF
 - pwd
 - ls -l
diff --git a/travis/do_riscv.sh b/travis/do_riscv.sh
index a51d3306..9a114b0e 100755
--- a/travis/do_riscv.sh
+++ b/travis/do_riscv.sh
@@ -3,18 +3,21 @@
 set -e
 set -x
 
-TAG=2023.02.25
+TAG=2023.10.18
 
 # The prebuilt toolchains only support hardfloat, so we only
 # test these for now.
 case $1 in
 	"rv32iv")
-	TARBALL=riscv32-glibc-ubuntu-20.04-nightly-${TAG}-nightly.tar.gz
+	TARBALL=riscv32-glibc-ubuntu-20.04-gcc-nightly-${TAG}-nightly.tar.gz
 	;;
 	"rv64iv")
-	TARBALL=riscv64-glibc-ubuntu-20.04-nightly-${TAG}-nightly.tar.gz
+	TARBALL=riscv64-glibc-ubuntu-20.04-gcc-nightly-${TAG}-nightly.tar.gz
 	;;
+	"sifive_x280")
+	TARBALL=riscv64-glibc-ubuntu-20.04-llvm-nightly-${TAG}-nightly.tar.gz
 	*)
+	;;
 	exit 1
 	;;
 esac

I zipped the patch due to Github's constraints of what can be attached.
0001-Add-sifive_x280-to-CI.zip

@Aaron-Hutchinson
Copy link
Contributor Author

Thanks @angsch. I've opened a PR here to apply the CI patch and update the make_defs.mk.

@Aaron-Hutchinson
Copy link
Contributor Author

@angsch Looks like CI has failed after applying the patch due to not being able to find the compiler:

configure: user specified a C compiler via CC (./../toolchain/riscv/bin/riscv64-unknown-linux-gnu-gcc).
configure: *** Could not find the C compiler specified via CC ('./../toolchain/riscv/bin/riscv64-unknown-linux-gnu-gcc').
configure: *** A working C compiler is required. Please set CC
configure: *** to a C compiler that exists (or unset CC).
The command "$DIST_PATH/configure -p `pwd`/../install -t $THR $BLD CC=$CC $CONF" exited with 1.

Any idea what went wrong?

;;
"sifive_x280")
TARBALL=riscv64-glibc-ubuntu-20.04-llvm-nightly-${TAG}-nightly.tar.gz

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a QEMU in this tarball file. Is it necessary to get another one using the following commands?

# Once CI upgrades to jammy, the next three lines can be removed.
# The qemu version installed via packages (qemu-user qemu-user-binfmt)
# is sufficient.
TARBALL_QEMU=qemu-riscv-2023.02.25-ubuntu-20.04.tar.gz
wget https://github.com/flame/ci-utils/raw/master/riscv/${TARBALL_QEMU}
tar -xf $TARBALL_QEMU

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to update TARBALL to riscv64-glibc-ubuntu-{JAMMY_VER}-gcc-nightly-${TAG}-nightly.tar.gz if the CI is upgraded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I didn't notice that now both the LLVM and the GNU toolchain include qemu.

@alexsifivetw
Copy link

@angsch Looks like CI has failed after applying the patch due to not being able to find the compiler:

configure: user specified a C compiler via CC (./../toolchain/riscv/bin/riscv64-unknown-linux-gnu-gcc).
configure: *** Could not find the C compiler specified via CC ('./../toolchain/riscv/bin/riscv64-unknown-linux-gnu-gcc').
configure: *** A working C compiler is required. Please set CC
configure: *** to a C compiler that exists (or unset CC).
The command "$DIST_PATH/configure -p `pwd`/../install -t $THR $BLD CC=$CC $CONF" exited with 1.

Any idea what went wrong?

Does soft link work?

ln -s -f /your/path/to/clang /usr/bin/clang
CC=clang

Could you try using CC environment variable with absolutely path?

@angsch
Copy link
Collaborator

angsch commented Nov 1, 2023

I think that a syntax error before that introduces the problem. My mistake, sorry.
Can we try do_riscv.shwith

+	"sifive_x280")
+	TARBALL=riscv64-glibc-ubuntu-20.04-llvm-nightly-${TAG}-nightly.tar.gz
+	;;
 	 *)
 	exit 1
 	;;
 esac

(The ;; and *) before exit are flipped)

In the meanwhile, I will try the qemu builds shipped with the toolchain.

@Aaron-Hutchinson
Copy link
Contributor Author

Thanks for the correction @angsch. Looks like with that fix the PR has passed the CI.

@fgvanzee
Copy link
Member

fgvanzee commented Nov 2, 2023

Thank you everyone for your contributions and engagement on this PR!

Does anyone else have any comments before I merge? 🚀

@fgvanzee fgvanzee merged commit 05388dd into flame:master Nov 3, 2023
1 check passed
fgvanzee added a commit that referenced this pull request May 29, 2024
Details:
- Added a new 'sifive_x280' subconfiguration for SiFive's x280 RISC-V
  instruction set architecture. The subconfig registers kernels from a
  correspondingly new kernel set, also named 'sifive_x280'.
- Added the aforementioned kernel set, which includes intrinsics- and
  assembly-based implementations of most level-1v kernels along with
  level-1f kernels axpy2v dotaxpyv, packm kernels, and level-3 gemm,
  gemmtrsm_l, and gemmtrsm_u microkernels (plus supporting files).
- Registered the 'sifive_x280' subconfig as belonging to a singleton
  family by the same name.
- Added an entry to '.travis.yml' to test the new subconfig via qemu.
- Updates to 'travis/do_riscv.sh' script to support the 'sifive_x280'
  subconfig and to reflect updated tarball names.
- Special thanks to Lee Killough, Devin Matthews, and Angelika Schwarz
  for their engagement on this commit.
- (cherry picked from commit 05388dd)

Fixed HPX barrier synchronization (#783)

Details:
- Fixed hpx barrier synchronization. HPX was hanging on larger cores
  because blis was using non-hpx synchronization primitives. But when
  using hpx-runtime only hpx-synchronization primitives should be used.
  Hence, a C style wrapper hpx_barrier_t is introduced to perform hpx
  barrier operations.
- Replaced hpx::for_loop with hpx::futures. Using hpx::for_loop with
  hpx::barrier on n_threads greater than actual hardware thread count
  causes synchronization issues making hpx hanging. This can be avoided
  by using hpx::futures, which are relatively very lightweight, robust
  and scalable.
- (cherry picked from 7a87e57)

Fixed bug in sup threshold registration. (#782)

Details:
- Fixed a bug that resulted in BLIS non-deterministically calling the
  gemmsup handler, irrespective of the thresholds that are registered
  via bli_cntx_set_blkszs().
- Deep dive: In bli_cntx_init_ref.c, the default values for the gemmsup
  thresholds (BLIS_[MNK]T blocksizes) wre being set to zero so that no
  operation ever matched the criteria for gemmsup (unless specific sup
  thresholds are registered). HOWEVER, these thresholds are set via
  bli_cntx_set_blkszs() which calls bli_blksz_copy_if_pos(), which was
  only coping the thresholds into the gks' cntx_t if the values were
  strictly positive. Thus, the zero values passed into
  bli_cntx_set_blkszs() were being ignored and those threshold slots
  within the gks were left uninitialized. The upshot of this is that the
  reference gemmsup handler was being called for gemm problems
  essentially at random (and as it turns out, very rarely the reference
  gemmsup implementation would encounter a divide-by-zero error).
- The problem was fixed by changing bli_blksz_copy_if_pos() so that it
  copies values that are non-negative (values >= 0 instead of > 0). The
  function was also renamed to bli_blksz_copy_if_nonneg()
- Also needed to standardize use of -1 as the sole value to embed into
  blksz_t structs as a signal to bli_cntx_set_blkszs() to *not* register
  a value for that slot (and instead let whatever existing values
  remain). This required updates to the bli_cntx_init_*() functions for
  bgq, cortexa9, knc, penryn, power7, and template subconfigs, as some
  of these codes were using 0 instead of -1.
- Fixes #781. Thanks to Devin Matthews for identifying, diagnosing, and
  proposing a fix for this issue.
- (cherry picked from 8fff1e3)

Update zen3 subconfig to support NVHPC compilers. (#779)

Details:
- Parse $(CC_VENDOR) values of "nvc" in 'zen3' make_defs.mk file.
- Minor refactor to accommodate above edit.
- CREDITS file update.
- (cherry picked from 1e264a4)

Fixed brokenness when sba is disabled. (#777)

Details:
- Previously, disabling the sba via --disable-sba-pools resulted in a
  segfault due to a sanity-check-triggering abort(). The problem was
  that the sba, as currently used in the l3 thread decorators, did not
  yet (fully) support pools being disabled. The solution entailed
  creating wrapper function, bli_sba_array_elem(), which either calls
  bli_apool_array_elem() (when sba pools are enabled at configure time)
  or returns a NULL sba_pool pointer (when sba pools are disabled), and
  calling bli_sba_array_elem() in place of bli_apool_array_elem(). Note
  that the NULL pointer returned by bli_sba_array_elem() when the sba
  pools are disabled does no harm since in that situation the pointer
  goes unreferenced when acquiring and releasing small blocks. Thanks to
  John Mather for reporting this bug.
- Guarded the bodies of bli_sba_init() and bli_sba_finalize() with
  #ifdef BLIS_ENABLE_SBA_POOLS. I don't think this was actually necessary
  to fix the aforementioned bug, but it seems like good practice.
- Moved the code in bli_l3_thrinfo_create() that checked that the array*
  pointer is non-NULL before calling bli_sba_array_elem() (previously
  bli_apool_array_elem()) into the definition of bli_sba_array_elem().
- Renamed various instances of 'pool' variables and function parameters
  to 'sba_pool' to emphasize what kind of pool it represents.
- Whitespace changes.
- (cherry picked from c2099ed)

Implemented [cz]symv_(), [cz]syr_(), [cz]rot_(). (#778)

Details:
- Expanded existing BLAS compatibility APIs to provide interfaces to
  [cz]symv_(), [cz]syr_(). This was easy since those operations were
  already implemented natively in BLIS; the APIs were previously
  omitted only because they were not formally part of the BLAS.
- Implemented [cz]rot_() by feeding code from LAPACK 3.11 through
  f2c.
- Thanks to James Foster for pointing out that LAPACK contains these
  additional symbols, which prompted these additions, as well as for
  testing the [cz]rot_() functions from Julia's test infrastructure.
- CREDITS file update.
- (cherry picked from 37ca4fd)

Fixes to HPC runtime code path. (#773)

Details:
- Fixed hpx::for_each invocation and replace with hpx::for_loop. The HPX
  runtime was initialized using hpx::start, but the hpx::for_each
  function was being called on a non-hpx runtime (i.e standard BLIS
  runtime - single main thread). To run hpx::for_each on HPX runtime
  correctly, the code now uses hpx::run_as_hpx_thread(func, args...).
- Replaced hpx::for_each with hpx::for_loop, which eliminates use of
  hpx::util::counting_iterator.
- Employ hpx::execution::chunk_size(1) to make sure that a thread
  resides on a particular core.
- Replaced hpx::apply() with updated version hpx::post().
- Initialize tdata->id = 0 in libblis.c to 0, as it is the main thread
  and is needed for writing results to output file.
- By default, if not specified, the HPX runtime uses all N threads/cores
  available in the system. But, if we want to only specify n_threads out
  N threads, we use hpx::execution::experimental::num_cores(n_threads).
- (cherry picked from a4a6329)

Fixed broken link in Multithreading.md. (#774)

Details:
- Replaced 404'd link in docs/Multithreading.md with an archive from
   The Wayback Machine.
- CREDITS file update.
- (cherry picked from c6546c1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants