Skip to content

eu-digital-green-certificates/dgc-bloomfilter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Standardized Bloom Filter Implementations

AboutDevelopmentDocumentationSupportContributeLicensing

About

This repository contains the source code of standardized bloom filter implementationn in different languages for the usage in covid associated applications. The code base itself is not bound to Covid Certificates itself and can be used in other applications as well.

Main goal of the standardization is, to have an 1:1 crossplatform behavior in swift, java, python, javascript, kotlin, nodejs, C#, Go and Other languages which are used in covid applications. (Please contribute language implementations if possible). Currently available:

  • Java
  • Swift
  • Kotlin
  • C#
  • Javascript
  • Nodejs
  • Go
  • Python

To standardize the implementation a set of abstract testdata and a standardized data format is created to align all implementations in the behavior.

Important Links:

Important Notes

To align the implementations, some points must be considered during the implementation:

  • The logarithmus function is logarithmus naturalis (base e)
  • all bytes are ordered in Big Endian
  • the bits will be count from left to right (10000000 --> 010000000 --> 001000000 etc.)
  • Use Bit Masks for comparing e.g. currentByte & 0x000010000 and combine the results byte for byte
  • All bits are grouped in blocks of 4 Byte (Array of 4 Byte Objects, not an Array Single Bits --> more space efficent)
  • It doesnt matter which blocksize is used in a language (unsigned or signed is also not important), the only important thing is the bitshifting capability for a single bit (no filling during shift).
  • Maximum of the Filter must be an array containing 4 Byte values with an amount of maximum 4 Bytes (e.g. int [] with i=Integer.MAX_VALUE)
  • Hash Function is currently SHA256 (Default: 0 in Dataformat), because other Functions like MUMUR_3 are not in every language available. If more Performance is necessary, please provide Enhancement Issues with your Change Proposals (or better a Pull Request)
  • It's only one hash function used with a seed within multiple rounds. Different hash functions are not necessary in terms of uniformity

Data Format

The data format is a serialized byte stream with the following structure in Big Endian Format:

Pos Byte Field Description
0-1 2 Byte signed Number (-32,768 to 32,767) Version Number which describes the current version of the used Data Format.
2 1 Byte signed Number (-128 to 127) Used Hashing Number of the used Hashalgorithm (TOB) or Hash Strategy. Default is 0 for Uniform SHA256 Hashing
3 1 Byte signed Number (-128 to 127) k Amount of used Hash function calculations
4-7 4 Byte signed Decimal (7 Digits) p Probility Rate
8-11 4 Byte signed Number (-2,147,483,648 to 2,147,483,647) n Amount of Elements for which the filter was constructed
12-15 4 Byte signed Number (-2,147,483,648 to 2,147,483,647) Current Filter Size Amount of Elements which the filter currently carries
16-21 4 Byte signed Number (-2,147,483,648 to 2,147,483,647) Data Length of Filter Amount of Bytes of the Filter
21-* 4 Byte signed Number (-2,147,483,648 to 2,147,483,647) Filter Bytes of the Bloom Filter

Support and feedback

The following channels are available for discussions, feedback, and support requests:

Type Channel
Issues
Other requests

How to contribute

Contribution and feedback is encouraged and always welcome. For more information about how to contribute, the project structure, as well as additional contribution information, see our Contribution Guidelines. By participating in this project, you agree to abide by its Code of Conduct at all times.

Licensing

Copyright (C) 2021 T-Systems International GmbH and all other contributors

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.

You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the LICENSE for the specific language governing permissions and limitations under the License.

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages