This directory provides the source codes of the hardware accelerator for functional encryption for quadratic functions (FE-QF) scheme. The source codes are related to the following research article that also provides a detailed description of the accelerator:
- Milad Bahadori, Kimmo Järvinen, Tilen Marc, and Miha Stopar: Speed Reading in the Dark: Accelerating Functional Encryption for Quadratic Functions with Reprogrammable Hardware, IACR Transactions on Cryptographic Hardware and Embedded Systems, vol. 2021, no. 3, to appear.
The accelerator is a hardware/software codesign implemented in a Xilinx reprogrammable SoC to maximally utilize the advantages of both software and hardware. The accelerator implements the decryption operation of the FE-QF scheme from:
- Edouard Dufour Sans, Romain Gay, and David Pointcheval: Reading in the dark: Classifying encrypted digits with functional encryption, Cryptology ePrint Archive, Report 2018/206, 2018.
FE-QF decryption consists of several cryptographic pairings followed by a discrete logarithm. The hardware side is implemented in programmable logic as a multi-core architecture where the cores are designed for efficient computation of cryptographic pairings and the discrete logarithm is implemented via an efficient interplay of software and hardware. The accelerator also utilizes a new parallel version of Shanks’ baby-step giant-step discrete logarithm algorithm which is optimized for reasonably small positive and negative output values. The algorithm splits into precomputation and on-the-fly phases that compute and use a large pre-computed table, respectively.
We have implemented the HW/SW codesign accelerator on real hardware. We targeted Xilinx programmable SoCs and specifically we used the Zynq UltraScale+ MPSoC ZCU102 evaluation kit including Xilinx Zynq UltraScale+ MPSoC XCZU9EG-2FFVB1156 device, which features a quad-core ARM Cortex-A53 processor running up to 1.5GHz in the SW domain and a 16nm FinFET+ based FPGA in the HW domain. For the SW domain, we used C programming and Xilinx Software Development Kit (SDK) as the development environment. For the HW domain, we used Verilog (HDL) and Xilinx Vivado v2019.1 tool for compiling and implementing the design to the FPGA.
The fe-qf-hardware-accelerated
directory contains two sub-directories:
hw_side_rtl
, which includes HDL source codes of different HW-side (i.e., FPGA side) IP-cores.sw_side_c
, which includes source codes of the SW side (i.e., ARM core in this case).
The following describes the overall steps to build both HW and SW sides of the system step-by-step.
- Creating and packaging new IP-cores for HW-side modules/blocks (create a new AXI4 peripheral IP) with the Vivado New IP Generation option in your directory. These IP-cores are CP-core, Mux-CP, DeMux-CP, GPM/GDM memory units, and interface/control units. All IP-cores can be created and packaged by RTL codes in the
hw_side_rtl
directory. - Create a Vivado raw project in your directory, and select your device/board as Zynq UltraScale+ MPSoC ZCU102 evaluation kit/XCZU9EG-2FFVB1156.
- Add the generated IP-cores to your current IP repository.
- Create a new block design in your Vivado project directory. Add Zynq UltraScale+ MPSoC to your block design in the Vivado environment.
- (Re-)customize your Zynq UltraScale+ MPSoC IP-core. In this step for this work, you need to enable the S_AXI_HP0_FPD (i.e., a slave HP interface) and M_AXI_HPM0_FPD (i.e., a master GP interface) ports/interfaces with 128-bit width as well as PL_CLK, PL_RST, and PL_PS_IRQ (i.e., interrupt) signals. For the PL_CLK, you may enter 210 MHz as an initial HW-side clock frequency value.
- Add Xilinx dedicated IP-core (i.e., Xilinx IP sources in Vivado) to your block design including an AXI SmartConnect interconnect peripheral module for the HP interface of the Zynq processing system, an AXI interconnect peripheral module for the GP interface of the Zynq processing system, 17 AXI GPIO block with 32-bit data width, a processor system reset, two concatenation module (i.e., Concat IP) for interrupt and status signals, and an AXI-DMA IP-core. Both read and write channels of the AXI-DMA IP-core should be enabled with 128-bit data width.
- Add the generated and packed IP-cores to your block design in Vivado including instantiating 16 CP-cores, Mux-CP, DeMux-CP, GPM, GDM, Interface/Control unit.
- Run the automatic wiring and interconnecting option of Vivado and resolve the possible errors. In this step, you should use AXI-GPIO-0 to AXI-GPIO-15 for sending the command signals to the CP-Core-0 to CP-Core-15 units, respectively. The AXI-GPIO-16 block should be used for receiving the concatenated status signals of the CP-cores. AXI-DMA module is connected to CP-cores and HP-interface through the Mux-CP, DeMux-CP, and AXI SmartConnect modules.
- After confirming all connections and IP-cores, validate your design in Vivado and solve the possible errors and warnings.
- Set Vivado's synthesis and implementation options for exploring high-performance explore options (high-speed implementation).
- Run synthesis and implementation steps and then, generate a bitstream for your HW (FPGA) side. In these steps, you may need to resolve the possible errors and/or warnings. After ending all steps, you can check the timing and utilization reports of the design in the Vivado tool.
- in the
sw_side_c/zcu102_platform
directory you can find the generated bitstream file asZCU102_HW_Side.bit
for downloading to the HW (FPGA) device. - Export a hardware platform (i.e., a HDF format file) for the Xilinx software development tools including the generated bitstream. In this step, you can export to your current project directory.
- Finally, lunch Xilinx Software Development Kit (SDK) from Vivado. Your project files (including the generated HDF files and bitstream) will move forward to the SDK environment. The next steps regarding configuring and developing the SW-side are described briefly in the next part (i.e., part B).
- In the
sw_side_c
directory, there are three sub-directories. Thezcu102_platform
directory contains HW-side initialization files, system HDF file, and bitstream file. You have also (automatically) this directory by exporting and launching your HW-side design from Vivado (as explained in parts A.13 and A.14). The Xilinx SDK tool needs this directory and initial files for generating the first stage boot loader (FSBL) directory and more importantly the board support package (BSP) directories. - In the Xilinx SDK environment, create a managed make application project by generating a new application project as
zcu102_fsbl
. In this step, select thezcu102_platform
directory as your target hardware platform and select the C language option. Then, select theZynq MP FSBL
available template in the Xilinx SDK tool. This process will execute to create your FSBL directory and generates theZCU102_FSBL.elf
file in your sub-directory. You can find the source codes and also the finalZCU102_FSBL.elf
file in thesw_side_c/zcu102_fsbl
directory. - In this step, you need to create a Board Support Package (BSP) for your project through the
New BSP Project
option in the Xilinx SDK tool. For this purpose, you should select thezcu102_platform
directory as your target hardware platform and select your preferred CPU core (e.g., in this work, psu_cortexa53_0) with the 64-bit compiler. Also, you need to select the BSP operating system (OS) as a standalone platform for simplicity. Continue the process and configure the BSP based on your desired preferences (e.g., set psu_uart_0 for the stdin/stdout and etc.). After finishing this step, you will have a BSP sub-directory in your SDK project. - Create a new application project as
sw_side_system
name. Also, in this step, select thezcu102_platform
directory as your target hardware platform and select your preferred CPU core (i.e., in this work, psu_cortexa53_0). Then, you should set your target software such as C language, the 64-bit compiler, and using the existing and generated BSP package. Now, you can select a start-up project (e.g.,Hello World
template project for the available list in the SDK tool). You have now your new application project for developing your SW-side program and algorithm and interfacing with the HW-side through the HP (via AXI-DMA) and GP (via AXI-GPIOs) dedicated interfaces. - Before developing your SW-side program, you may test your SoC platform by running the current
Hello World
program and checking the prints through the UART_0 port. For this purpose, after powering the ZCU102 Development Kit up, you should first run your FSBL application project (i.e.,ZCU102_FSBL.elf
), and then, you can run thesw_side_system.elf
project to observe print output in the opened terminal. - Now is the time to develop your SW-side program. In the current directory, there is a sub-directory
sw_side_c/source
which contains two-different sub-directoriessrc1
andsrc2
. Also, you can findZCU102_SW_Side.elf
which the executable SW-side file for programming/running in the SW-side CPU core and check the process and outputs. Thesw_side_c/source/src1
directory consists of SW-side typical/minimal source codes for implementing the computation of the decryption algorithm of the FE-QF scheme. All modules related to pairings and discrete-logarithm computations are prepared in this package. You can import all the C source and header files in your application project on the SW-side. Furthermore, for more clarity, thesw_side_c/source/src2
directory contains the required C source and header files of the SW-side for computing the basic primitive computations. You can import these files in your another application project (NOT the currentsw_side_system
application project which is for the decryption algorithm of the FE-QF scheme) for testing, debugging, and measuring purposes. - It should be noted that before running your application on the CPU core, you must check and be sure about the running configuration of the SDK tool (i.e., initializing, programming bitstream to FPGA, ..., running the application on CPU core).
- Finally, you can build, compile, and run your application in the SDK tool and observe the process and results. You can also modify, customize (the input/intermediate parameters, input vectors, etc.), extend, and debug your project and run it to make your own application project.