Skip to content

Python package to accelerate the sparse matrix multiplication and top-n similarity selection

License

Notifications You must be signed in to change notification settings

Vegoo89/sparse_dot_topn

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sparse_dot_topn:

sparse_dot_topn provides a fast way to performing a sparse matrix multiplication followed by top-n multiplication result selection.

Comparing very large feature vectors and picking the best matches, in practice often results in performing a sparse matrix multiplication followed by selecting the top-n multiplication results. In this package, we implement a customized Cython function for this purpose. When comparing our Cythonic approach to doing the same use with SciPy and NumPy functions, our approach improves the speed by about 40% and reduces memory consumption.

This package is made by ING Wholesale Banking Advanced Analytics team. This blog explains how we implement it.

Example

    import numpy as np
    from scipy.sparse import csr_matrix
    from scipy.sparse import rand
    from sparse_dot_topn import awesome_cossim_topn

    N = 10
    a = rand(100, 1000000, density=0.005, format='csr')
    b = rand(1000000, 200, density=0.005, format='csr')

    c = awesome_cossim_topn(a, b, 5, 0.01)

You can also find code which compares our boosting method with calling scipy+numpy function directly in example/comparison.py

Dependency and Install

Install numpy and cython first before installing this package. Then,

pip install sparse_dot_topn

Uninstall

pip uninstall sparse_dot_topn

About

Python package to accelerate the sparse matrix multiplication and top-n similarity selection

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 58.4%
  • C++ 31.2%
  • C 10.4%