PULP DSP: Digital Signal Processing on Parallel Ultra Low Power Platform

Introduction

This repository contains DSP functions for PULP platform.

It contains the kernel functions for different ISA extensions of RISC-V being developed for PULP platforms (RV32IM, XPULPV2, etc.).

Currently it's being developed and tested on Mr.Wolf (fabric controller (FC) with IBEX (previous zero-riscy), i.e. RV32IM, and cluster with 8 RISCY cores, i.e. RV32IM with XPULPV2 extentions.).

Additionally, it supports the pulp-open and can be used with GWT GAP8.

Some of the functions are very optimized, while others are WIP. Contributions are welcome!

Structure of the repository

This repository contains:

src folder contains the source codes. In the folder you can find subfolders with different kinds of functions (e.g. BasicMathFunctions, StatisticsFunctions, etc.). In each subfolder you find the glue codes and a folder called kernel which contains the kernels for different ISA extensions. The glue codes are used for namely "gluing" the kernels, for example for checking the ISA extension of the calling platform (in the case of Mr. Wolf, it checks if it's FC or CLUSTER calling the function, in case of FC the kernel function for RV32IM is called, otherwise the kernel function for XPULPV2 extension is called. The glue code can be changed and adapted to the specific chip.
include folder with necessary header files. Especially the main header file plp_math.h has to be included in the codes which want to use this library. Moreover, the rtos_hal.h provides an hardware abstraction layer (HAL) for interfacing both the 'old' pulp-sdk (v1 branch, which supports Mr. Wolf and older chips) and the 'new' pulp-sdk (main branch, which supports newer chips and GAP8). Note: in the same header file it's possible to define macros (e.g. LOOPUNROLL if you want to take into consideration the option of unrolling or not unrolling the loops).
Makefile for compiling the library. Add your glue codes and kernel functions to be compiled.
test folder contains the testing setup used during the development of the library. For more details please read the README file in the folder.
docs folder contains the configurations for building documentation for the library.

Installation and usage

First of all, in order to install and use the library, you have to install the pulp-sdk. You can follow the instructions here.

Note that for Mr. Wolf and older chips you need to use the 'old' sdk on the v1 branch. While for newer chips and GAP8 use the 'new' sdk on the main branch.

Configure the sdk according to the chip and the platform you want to use (for example with the 'old' sdk source configs/wolfe.sh and source configs/platform-board.sh). Do not forget to source the sourceme.sh everytime you open a new terminal to set up the environmental variables (e.g. PULP_SDK_HOME) needed to run pulp projects. For more updated instructions, please refer to the pulp-sdk page.

Once you are done with the pulp-sdk setup, you can clone this repository, enter the pulp-dsp folder.

With the pulp-sdk on the v1 branch

To compile and install the library, do

make clean header all install

To use the library add PULP_LDFLAGS += -lplpdsp in the Makefile of your project and don't forget to include the necessary header files, e.g., plp_math.h, in your codes. Link also the math library using -lm.

If you add or modify the source codes and want to rebuild the library without recompiling unmodified files, do

make header build install

With the pulp-sdk on the main branch

You need to enable the PMSIS mode:

export PULP_RTOS=pmsis

To compile and install the library, do

make build-lib
make install-lib

Documentation

The documentation is built from the latest master and hosted at github pages: https://pulp-platform.github.io/pulp-dsp, using MkDocs-Material and Doxybook2.

You can also generate the reference manual by yourself by going to the docs folder and doing

doxygen Doxyfile

It creates the reference manual and you can browse it by opening html/index.html using a browser.

To add documentations use @defgroup, @ingroup, @addtogroup, etc. Please refer to plp_math.h and the source codes src/BasicMathFunctions/plp_dot_prod_i32.c and src/BasicMathFunctions/kernels/plp_dot_prod_i32s_rv32im.c as examples.

Test framework and benchmarks

Under the test folder you can test the functions and benchmark their performance by collecting number of cycles, instructions, instructions per cycle (i/c), instruction cache misses (imiss), load stalls (ld_stall), TCDM contentions (tcdm_cont), number of operations (ops, mostly counted as multiply-and-accumulate operations), and operations per cycle (ops/c).

An example on the 1D convolution function is shown below, run on gvsoc of Mr. Wolf. The device ibex means that the basic RV32IM ISA is used, while riscy means that the XPULPv2 ISA extensions are used. We can reach up to 11.739 MACs/cycle using the 8 cores of Mr. Wolf!

function	device	dimension	cycles	insn	i/c	imiss	ld_stall	tcdm_cont	ops	ops/c
plp_conv_i32	ibex	len_a=512; len_b=512	1649017	868026	0.526	0	447594	0	523776	0.318
plp_conv_i32	ibex	len_a=512; len_b=1024	3176006	1654308	0.521	0	880571	0	785920	0.247
plp_conv_i16	ibex	len_a=512; len_b=512	1659535	870761	0.525	0	454502	0	523776	0.316
plp_conv_i16	ibex	len_a=512; len_b=1024	3213505	1669232	0.519	0	896778	0	785920	0.245
plp_conv_i8	ibex	len_a=512; len_b=512	1639099	850447	0.519	0	454457	0	523776	0.320
plp_conv_i8	ibex	len_a=512; len_b=1024	3174745	1630530	0.514	0	896757	0	785920	0.248
plp_conv_i32	riscy	len_a=512; len_b=512	567109	541968	0.956	1111	52	0	523776	0.924
plp_conv_i32	riscy	len_a=512; len_b=1024	1028088	998842	0.972	1430	27	0	785920	0.764
plp_conv_i32_parallel	riscy	len_a=512; len_b=512	72510	68677	0.947	880	25	907	523776	7.224
plp_conv_i32_parallel	riscy	len_a=512; len_b=1024	131819	126403	0.959	891	25	1603	785920	5.962
plp_conv_i16	riscy	len_a=512; len_b=512	489358	457920	0.936	1254	49	0	523776	1.070
plp_conv_i16	riscy	len_a=512; len_b=1024	892258	846771	0.949	1331	25	0	785920	0.881
plp_conv_i16_parallel	riscy	len_a=512; len_b=512	63512	58444	0.920	814	25	1052	523776	8.247
plp_conv_i16_parallel	riscy	len_a=512; len_b=1024	115545	109386	0.947	869	25	1595	785920	6.802
plp_conv_i8	riscy	len_a=512; len_b=512	336855	291150	0.864	1551	19	0	523776	1.555
plp_conv_i8	riscy	len_a=512; len_b=1024	575868	503993	0.875	1210	11	0	785920	1.365
plp_conv_i8_parallel	riscy	len_a=512; len_b=512	44618	37599	0.843	880	23	1211	523776	11.739
plp_conv_i8_parallel	riscy	len_a=512; len_b=1024	80015	68701	0.859	891	23	2304	785920	9.822

To contribute

The library contains many optimized functions, but there are still many of them to be optimized. Contributions are very welcome and are accepted under Apache v2.0.

If you want to contribute, fork the repository and issue pull requests.

For each function you develop, note the following:

Include its corresponding test framework under the test folder or adapt it if it already exists;
Update the documentation;
Use clang-format to format the code;
Make sure that all tests pass;
Maintain the compatibility for both 'old' and 'new' sdk (see rtos_hal.h).

More details can be found in HACKATHON.md.

License and Attribution

All source code is released under Apache v2.0 license unless noted otherwise, please refer to the LICENSE file for details.

We are inspired by CMSIS-DSP (CMSIS_5 licensed under Apache v2.0) and partially adapted its structure and codes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PULP DSP: Digital Signal Processing on Parallel Ultra Low Power Platform

Introduction

Structure of the repository

Installation and usage

With the pulp-sdk on the v1 branch

With the pulp-sdk on the main branch

Documentation

Test framework and benchmarks

To contribute

License and Attribution

Files

README.md

Latest commit

History

README.md

File metadata and controls

PULP DSP: Digital Signal Processing on Parallel Ultra Low Power Platform

Introduction

Structure of the repository

Installation and usage

With the pulp-sdk on the v1 branch

With the pulp-sdk on the main branch

Documentation

Test framework and benchmarks

To contribute

License and Attribution