Implemented AnnoyIndex serialization to bytes objects in-memory #661

FuexFollets · 2024-01-25T22:56:29Z

Despite the fact that AnnoyIndex had methods to save and load the state of an index to a file, there was not any method to serialize the state of the index without saving to a file. This is important because you may want to use an AnnoyIndex object as a field in another class, and also be able to efficiently serialize the class that contains the index. The main benefit of this is that some serialization libraries may be able to serialize the aggregate class more efficiently in terms of size. This also avoids other complications such as file write permissions and searching for an adequate temp directiry and makes it easier to serialize the aggregate overall.

This pull request includes the following:

AnnoyIndex serialize and deserialize functions. The serialize method returns bytes which can be passed into the deserialize method to then 'load' the bytes to create the original serialized index.
Bindings for serialize and deserialize functions implemented for python, golang, and lua
Tests for the functionality and bindings in python, golang, and lua

So far, I have gotten all tests to pass in addition to the tests which I have implemented in each language. The tests that I have implemented simply test the result of get_nns_by_item before and after serializing and deserializing an index. Unless there is anything to change that I have overseen, it should be ready.

…rialization

erikbern · 2024-01-27T22:04:57Z

Thanks for the contribution! Since Annoy stores all data in contiguous memory, wouldn't it be enough to just copy that memory buffer and then load from that? I'm not sure if we need to reimplement the logic to find the roots.

Also what happens if you serialize an index that has mmapped a file? We should probably disallow that right?

deserialization

FuexFollets · 2024-01-28T22:35:52Z

@erikbern I have changed some of the code based on your suggestion to instead store the roots and other information in the serialized bytes so it does not need to be recalculated. I tested if serialization will still work properly even if the index has been loaded from a file by mmapping, and it seems to still work just fine based on the fact that the serialization result is the same. The main logic that is being implemented here is putting together the buffers that are pointed to within the AnnoyIndex class all into a single contiguous byte sequence which is later turned back into an index.

FuexFollets added 13 commits January 23, 2024 19:03

Implemented AnnoyIndex serialization

e02e019

Implemented AnnoyIndex deserialization

813144d

Added CPython function headers for serialize and deserialize

83641f2

Implemented AnnoyIndex python c extensions for serialization and dese…

c9d2d18

…rialization

Fixed vector construction compile error

f0a2dc2

Added '.eggs/' directory to gitignore

7121f5c

Fix deserialization

f22e3f7

Added serialization test

4661f31

Added go module code

915d90b

Added test for go bindings serialization

f8c6c05

Implemented lua bindings for serialize and deserialize

b8a939f

Fixed deserialize functionality and added functions to table

d81b569

Added lua test 'serialize_deserialize'

f8e99d7

FuexFollets added 3 commits January 28, 2024 11:03

Added test for serialization on index mmaped from file

f7df982

Implemented root caching to replace computation of roots during

9ead484

deserialization

Added extra checks to serialize test

ab49363

FuexFollets added 2 commits January 29, 2024 10:32

Made serialize method 'const'

7d6118c

Fixed serialization and deserialization

2d92ba9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented AnnoyIndex serialization to bytes objects in-memory #661

Implemented AnnoyIndex serialization to bytes objects in-memory #661

FuexFollets commented Jan 25, 2024 •

edited

Loading

erikbern commented Jan 27, 2024

FuexFollets commented Jan 28, 2024

Implemented AnnoyIndex serialization to bytes objects in-memory #661

Are you sure you want to change the base?

Implemented AnnoyIndex serialization to bytes objects in-memory #661

Conversation

FuexFollets commented Jan 25, 2024 • edited Loading

erikbern commented Jan 27, 2024

FuexFollets commented Jan 28, 2024

FuexFollets commented Jan 25, 2024 •

edited

Loading