This repository contains a summary of serverless benchmarks and pipelines designed to measure the performance of serverless architectures like Lithops.
Benchmark | Description | Language | Stages | Data set | Data format | LOC |
---|---|---|---|---|---|---|
General | ||||||
FLOPS Computation Test | Analyze Lithops computation in FLOPS. | Python3.10 | 1 | Autogenerated | NumPy array | 63 |
Object Storage Test | Measure the bandwidth from FaaS Service. | Python3.10 | 2 | Autogenerated | Bytes | 286 |
Montecarlo | Monte Carlo Methods to make computations with a big amount of random data. | Python3.10 | 1 | Autogenerated | NumPy array | 46 |
Mandelbrot classic | Mandelbrot set calculated on a limited space several times using the Cloudbutton toolkit. | Python3.10 | 7 | Autogenerated | Mandelbrot set | 109 |
Machine Learning | ||||||
Hyperparameter tunning | Hyperparameter tuning using grid search algorithm. | Python3.10 | 1 | Amazon customer reviews | ft.txt | 111 |
Geospatial | ||||||
NDVI | Calculate NDVI from Object Storage images. | Python3.10 | 2 | Sentinel2 satellite image from the AWS Sentinel2 open data repository | Cloud-Optimized GeoTIFF (COG) | 1552 |
Model creation from LiDAR pre-processing | Create terrain models using LiDAR partitioner. | Python3.10 | 1 | laz files | laz | 212 |
Water Consumption | Calculate water consumption from crops using the Penman-Monteith formula and interpolation raster. | Python3.10 | 9 | Instituto Nacional de Informacion Geográfica | Tif files | 686 |
Metabolomics | ||||||
METASPACE metabolite annotation | Run the METASPACE metabolite annotation pipeline on cloud resources. | Python3.8 | 16 | Examples of datasets and databases | imzML | 2642 |
Genomics | ||||||
Variant Calling | Alignment of sequencing reads, stored as FASTQ files, to a reference genome, stored as a FASTA file. | Python3.10 | 9 | Trypanosome [Genome, SRR6052133], Human [Genome, SRR15068323 , ERR9856489], Bos Taurus [Genome, SRR934415] | fast, fastq | 4174 |
Astronomics | ||||||
Astronomica-interferometry | Radio interferometric data processing. | Python3.8 | 2 | SB205.MS SB206.MS SB207.MS SB208.MS SB209.MS SB210.MS | MS | 907 |
Elastic Exploration | ||||||
UTS | The Unbalanced Tree Search (UTS) benchmark. | Java 11 | 1 | Autogenerated | Dynamic tree | 2841 |
Mandelbrot with Mariani Silver | Render the Mandelbrot set using Marian-Silver algorithm. | Java 11 | 1 | Autogenerated | Mandelbrot set | 2735 |
Betweenness Centrality | Compute the Between Centrality (BC) algorithm. | Java 11 | 1 | Autogenerated | Graph | 3119 |
Extreme Sorting | ||||||
TeraSort | Implementation of the TeraSort benchmark built on Lithops. | Python3.10 | 2 | TeraGen | ascii | 827 |
TOTAL: | 20310 |
In most cases there's a link to an external repository containing the code while others can be found here.
All workflows except the ones in Elastic Exploration utilize Lithops to easily deploy and run code on any major Cloud serverless platform.
For the geospatial benchmarks you first need to follow this instructions to set up the environment. Find more information about Geospatial, Genomics and Metabolomics benchmarks here.
This is a benchmark to estimate the floating-point performance of the system for matrix multiplication operations using NumPy. It measures how many floating-point operations per second the system can perform for this specific operation.
This benchmark measures the bandwidth using data transfer in Object Storage with a FaaS Service.
This contains two applications in which Monte Carlo Methods is used to make computations with big amount of random data using Cloud Functions with Lithops. In order to perform these simulations with AWS you need to configure your Lithops config file to use AWS and put it in the directory of execution. Then you just edit the code to not use the configuration of IBM implemented there.
Mandelbrot set calculated on a limited space several times using the Cloudbutton toolkit. A certain region of the linear space is treated as a matrix and divided into chunks in order to be able to be distributed among many functions.
Perform hyperparameter tuning using grid search algorithm. We have a dataset consisting of amazon product reviews and a sklearn classifier to classiy these reviews. We take advantage of cloud functions to tune this classifier's hyperparameters and show how Lithops can be used for this kind of computations.
Use case of serverless image processing consuming data from Object Storage, NDVI(Normalized Difference Vegetation Index) is calculated over many images to demonstrate high throughput and performance.
7. Model creation from LiDAR pre-processing
LIDAR is a novel tool to partition LiDAR files based on the denisty of points. With this partitioned data we create several terrain models used in many geospatial workflows. We study the impact of load balancing by partitioning LiDAR data using the aforementioned density-based partitioner.
Pipeline that calculates water consumption from crops using the Penman-Monteith formula and interpolation rasters of temperature, wind, solar irradiance and humidity for a given day.
Run the METASPACE (Spatial metabolomics cloud platform that conducts molecular annotation of imaging mass spectrometry data) metabolite annotation pipeline on cloud resources using Lithops. The original implementation of this pipeline can be found on Metaspace repository. We have addapted this implementation to work with Lithops 3.0.1 and with more recent package versions.
More information about this pipeline can be found on this IBM Blog post.
10. Variant Calling
In genomics, variant calling entails the alignment process, which is essentially a search for string similarities. This process aligns sequencing reads, typically stored as FASTQ files, with a reference genome, which is stored as a FASTA file. The reference genome and reads are split into smaller chunks for alignment.
Processing radio interferometric data performing all the phases: rebinning, calibration and imaging using Lithops.
12. UTS
The implementation of UTS presented here is the first that tackles an elastic resource provisioning.
Render the Mandelbrot set using the Marian-Silver algorithm as optimization technique. This algorithm relies on the fact that the Mandelbrot250 set is connected: there is a path between any two points belonging to the set
Computing the Between Centrality (BC) algorithm. The implementation follows the Brandes’ algorithm described in the benchmark, augmenting Dijkstra’s single-source shortest paths (SSSP) algorithm for unweighted graphs.
15. TeraSort
Implementation of the TeraSort benchmark (a distributed sort), built on Lithops. Tasks are executed on cloud functions, object storage is used for reading & writing data (including the exchange of itnermediate files). More information about this pipeline can be found on the following links: