Skip to content

one of the IITH course assignment for implementing bloom filters with using bit slicing for compact storage

Notifications You must be signed in to change notification settings

mSaify/bloom_filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Implementing bloom filter with bit slicing array to store the index. The number of hashfunctions (k) for a given term is decided based on the frequency and Signal-to-noise ratio

bloom filter index

term_id [1 0 1 0 0] where activated bits combined at different locations indicates whether given term is present in the document.

Since bloom filter uses compact space , the activated bit can represent the value for multiple documents.

But instead of using bit to present the document location; multiple bits are activated by different hash function. If all of them are 1, indicates the terms is present in the document.

The algorithm definitely finds out if something is not present, but may lead to false positive in case if the bit is activated by some other term.

for example 0 0 0 are three position representing a term present in 8 documents

q1 in d1 turned bits 1 0 1 q1 in d2 might not be there but may turn same bits q1 in d2: hash functions generates same bit 1 0 1 q1 in d2 is false positive

the bit density and signal to noise ratio is used to control false positive rate.

To calculate number of hash functions need to be used to find bits for given term

ki = min(3;logd(ai/((1-ai)*snr)))

ai - is term frequency snr - signal to noise ratio logd - d is the bit density

Result Files generated While Indexing: bloomFilter.txt – the actual bit sliced array stored in a file bloomFilter_shape.txt – dimensions of bitSliced array. term_hash_lookup – term to number of hash functions generated. document_lookup.txt – lookup file for document names. expectedResult.txt – the sparse matrix for terms.

About

one of the IITH course assignment for implementing bloom filters with using bit slicing for compact storage

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages