Skip to content

Commit

Permalink
Jegao/label hot fix with main2 (#430)
Browse files Browse the repository at this point in the history
* add codebook passing and pq/opq dim overwrite.

* Support per query filter (#279)

* Transferring Varun's chagges from external fork with squash merge

* generating multiple gt's for each filter label + search with multiple filter labels (code cleanup)

* supporting no-filter + one filter label + filter label file (multiple filters) while computing GT

* generating multiple gt's + refactoring code for readability & cleanliness

* adding more tests for filtered search

* updating pr-test to test filtered cases

* lowering recall requirement for disk index

* transferred functions to filter_utils 

* adding more test for build and search without universal label

* adding one_per_point distribution to generate_synthetic_labels + cleaning up artifacts after compute gt+ removing minor errors

* refactoring search_disk_index to use a query filter vector
---------

Co-authored-by: patelyash <[email protected]>
Co-authored-by: Varun Sivashankar <[email protected]>

* Rebasing main's latest commits onto ravi/filter_support_rebased (#225)

- add code for two variants of filtered index, readme and CI tests

- add utils for synthetic label generation and CI tests.

* Add co-authors

Co-authored-by: ravishankar <[email protected]>
Co-authored-by: Varun Sivashankar <[email protected]>

---------

Co-authored-by: ravishankar <[email protected]>
Co-authored-by: David Kaczynski <[email protected]>
Co-authored-by: Siddharth Gollapudi <[email protected]>
Co-authored-by: Neelam Mahapatro <[email protected]>
Co-authored-by: Harsha Vardhan Simhadri <[email protected]>
Co-authored-by: Harsha Vardhan Simhadri <[email protected]>
Co-authored-by: REDMOND\patelyash <[email protected]>
Co-authored-by: Varun Sivashankar <[email protected]>

* Clang-format now errors on push and PR if formatting is incorrect (#236)

* Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded

* Removing the --Werror flag only until we actually format all of the code in a future commit

* We're choosing to base our style on the Microsoft style guide and not make any changes

* Running format action on source code.  Settling on Google styling.  Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')

* Enabling error on malformatted file

* Revert "Enabling error on malformatted file"

This reverts commit fa33e82.

* Revert "Running format action on source code.  Settling on Google styling.  Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')"

This reverts commit e0281be.

* Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build.

* Somehow this was missed in the mass format.  Formatting include/distance.h.

* Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid

* Update SSD_index.md (#258)

Fix typo in SSD index readme

* Add filter-diskann paper link to readme (#275)

* Update README.md (#277)

* update citation (#281)

* Some fixes to pass internal building pipeline (#282)

Remove warnings affecting internal build pipelines

---------

Co-authored-by: Yiyong Lin <[email protected]>

* Add support for multiple frozen points (#283)

* Add support for multiple frozen points

* Add the missing parameters to the constructor.

* Added filtered disk index readme (#276)

* Added filtered disk index readme

* Support per query filter (#279)

* Transferring Varun's chagges from external fork with squash merge

* generating multiple gt's for each filter label + search with multiple filter labels (code cleanup)

* supporting no-filter + one filter label + filter label file (multiple filters) while computing GT

* generating multiple gt's + refactoring code for readability & cleanliness

* adding more tests for filtered search

* updating pr-test to test filtered cases

* lowering recall requirement for disk index

* transferred functions to filter_utils 

* adding more test for build and search without universal label

* adding one_per_point distribution to generate_synthetic_labels + cleaning up artifacts after compute gt+ removing minor errors

* refactoring search_disk_index to use a query filter vector
---------

Co-authored-by: patelyash <[email protected]>
Co-authored-by: Varun Sivashankar <[email protected]>

* udpate merging code

* Using boost program options under Visual Studio MSVC 14.0 Assertion failed

* some commts and rewriting

* add back LF which might be confict with  MSVC 14.0

* clang formating change

* clang formating

* revert back to Lf

* unexpected failure on UT re-try

* adding default string to the path

* fix reference issue

* Fixing Build errors in remove_extra_typedef (#290)

remove _u, _s typedefs

* converting uint64's to size_t where they represent array offsets

---------

Co-authored-by: harsha vardhan simhadri <[email protected]>

* clang format

* bump it up to 512 for MAX_PQ_CHUNKS

* default codebook prefix value pass in for generate_quantized_data

* add check for disabling both -B and -QD pass in

* remove rules for force only one of -B and -QD

* clange change

* change clang format

* bring back -B params

* generate_quantized_data pass in referemce instead of const string

* update clang and param reference

* updated dockerfile (#299)

* updated dockerfile

* add parallel build flag to dockerfile

* Adds CI jobs to build our docker container  (#302)

* Adding a step that at least builds the docker container.  I'm not yet sure how I want to actually integrate tests within the container, but at the least we should verify it builds

* docker build needs a path. i honestly thought it defaulted to the CWD

---------

Co-authored-by: Dax Pryce <[email protected]>

* Python API and Test Suite (#300)

* The first step in the python-api-enhancements branch.  We need to fix a problem with the Parameters class with a double free or segfault on deletion.

* Removing the parameters class in favor of the IndexRead and IndexWrite parameters classes.

* API changes and python packaging changes for linux.  It's almost ready for PR, but definitely ready for push.

* Suppressing the CIBuildWheel step on windows

* added in-mem static and dynamic index class to python bindings (#301)

* Advancing our version number to 0.5.0

* Some more updates as per harsha's comments on PR #300.  The diskann_bindings.cpp still need some more tlc and the wrapper needs to make use of it, and we also want to include some examples, but this is a good place to bring into main and then do further enhancements
---------

Co-authored-by: Harsha Vardhan Simhadri <[email protected]>

* reducing number of L values for stitched search (#307)

* reducing number of L values for stitched search in CI

* add a warning in prune_neighbor if zero distance neighbor is detected (#320)

* Fix condition on ubuntu version in README (#246)

* Fix building SSD index performance issue (#321)

Fix performance gap between in-mem and SSD based graph built by passing an appropriate number of threads.
---------

Co-authored-by: Yiyong Lin <[email protected]>
Co-authored-by: Harsha Vardhan Simhadri <[email protected]>

* remove the distance 0 warning in prune candidate the list, since diskann::cerr does not seem thread safe (#330)

* Set compile warning as error for core projects (#331)

* set(CMAKE_COMPILE_WARNING_AS_ERROR ON)


---------

Co-authored-by: Yiyong Lin <[email protected]>

* Create a data store abstraction (#305)

Create a virtual data store base class and a derived in-mem store class. In-mem index now uses the data store class.

---------

Co-authored-by: Gopal Srinivasa <[email protected]>
Co-authored-by: ravishankar <[email protected]>
Co-authored-by: yashpatel007 <[email protected]>

* Disabling Python builds (#338)

* Disabling Python builds

debian stretch no longer seems to have valid apt repos - or at least not ones that we can access - which means our cibuildwheel is failing.

* New python interface, build setup, apps and unit tests (#308)


---------

Co-authored-by: Dax Pryce <[email protected]>

* Adding some diagnostics to a pr build in an attempt to see what is going on with our systems prior to running our streaming/incremental tests

* fix cast error and add some status prints to in-mem-dynamic app

* Adding unit tests for both memory and disk index builder methods

* After the refactor and polish of the API was left half done, I also left half a jillion bugs in the library. At least I'm confident that build_memory_index and StaticMemoryIndex work in some cases, whereas before they barely were getting off the ground

* Sanity checks of static index (not comprehensive coverage), and tombstone file for test_dynamic_memory_index

* Argument range checks of some of the static memory index values.

* fixes for dynamic index in python interface (#334)

* create separate default number of frozen points for dynamic indices

* consolidate works

* remove superfluous param from dynamic index

* remove superfluous param from dynamic index

* batch insert and args modification to apps

* batch insert and args modification to apps

* typo

* Committing the updated unit tests. At least the initial sanity checks of StaticMemory are done

* Fixing an error in the static memory index ctor

* Formatting python with black

* Have to disable initial load with DynamicMemoryIndex, as there is no way to build a memory index with an associated tags file yet, making it impossible to load an index without tags

* Working on unit tests and need to pull harsha's changes

* I think I aligned this such that we can execute it via command line with the right behaviors

* Providing rest of parameters build_memory_index requires

* For some reason argparse is allowing a bunch of blank space to come in on arguments and they need stripped. It also needs to be using the right types.

* Recall test now works

* More unit tests for dynamic memory index

* Adding different range check for alpha, as the values are only really that realistic between 1 and 2. Below 1 is an error, and above 2 we'll probably make a warning going forward

* Storing this while I cut a new branch and walk back some work for a future branch

* Undoing the auto load of the dynamic index until I can debug why my tag vector files cause an error in diskann

* Updating the documentation for the python bindings. It's a lot closer than it was.

* Fixing a unit test

* add timers to dyanmic apps (#337)

* add timers to dyanmic apps

* clang format

* np.uintc vs. int for dtype of tags

* fixes to types in dynamic app

* cast tags to np.uintc array

* more timers

* added example code in comments in app file

* round elapsed

* fix typo

* fix typo

---------

Co-authored-by: Harsha Vardhan Simhadri <[email protected]>
Co-authored-by: harsha vardhan simhadri <[email protected]>

* Harshasi/timer python app (#341)

* added timer and QPS to static search app

* search only option to static index

* search only option to static index

* exposing metric in static function

* Force error on warnings and add casts to test directory (#342)

* Force error on warnings and add casts to test directory

* Use size_t for index of point IDs

* Refactor iterator and conditions for printing labels

---------

Co-authored-by: David Kaczynski <[email protected]>

* Enable Windows python bindings (#343)

* Use int64 for counter to fix windows compilation error

* Fix windows python bindings by adding install_lib command to move windows build output into python package

* Update to use Path instead of os

* Change batch_insert num_inserts signature to signed type for OpenMP compatibility

* Update num_inserts to int32_t per PR request

---------

Co-authored-by: Nick Caurvina <[email protected]>

* Use new macro(ENABLE_CUSTOM_LOGGER) to turn on Custom logger (#345)

* custom logger


---------

Co-authored-by: Yiyong Lin <[email protected]>

* updting from std cpp 14 to cpp 17 (#352)

* updting from std cpp 14 to cpp 17

* adding cmake_cxx_standard flag

* CICD Refactor (#354)

* Refactored the build processes. Broke things into components as much as
possible. We have standalone actions for the build processes to make
sure they are consistent across push or PR builds, a format-check that
doesn't rely on cmake to be there to work, and centralized our
randomized data generation into a single action that can be called in
each section.

We now are reusing as many of the steps as we can without copy/pasting,
which should ensure we're not making mistakes.

* Fixing the dynamic tests, the paths to the data were wrong

---------

Co-authored-by: yashpatel007 <[email protected]>

* Fix the disparity between disk and memory search for Universal label (#347)

* UNV Search Fix for Memory

* two places to update

* clang format

* unify find_common_filters function

* fix comments

- only return size of common filters from the find_common_filters function

* dummy comments

* clang format

* Reduce repetitive calls

* changing name and return type of function

* Remove compute_groundtruth from labels.yml (#363)

Co-authored-by: Yiyong Lin <[email protected]>

* Handle some corner cases in generate_cache_list_from_sample_queries (#361)

Co-authored-by: Yiyong Lin <[email protected]>

* Reduce the size of coord_scratch in SSDQueryScratch to reduce memory usage (#362)

* Remove useless coord_scratch in SSDQueryScratch to reduce memory usage


---------

Co-authored-by: Yiyong Lin <[email protected]>

* Upload data and binary files to artifact in CI workflows (#366)

* Upload data and binary files to artifact so that we could debug issue locally when the workflows fails

* use different artifact name for different scenarios

---------

Co-authored-by: Yiyong Lin <[email protected]>

* Python Type Enhancements (#364)

* Adding cosine distance - I didn't know we had that as a first level distance metric

* Making our mkl and iomp linking game more rigorously defined for the ubuntus

* Included latest as a path fragment twice on accident

* libmkl_def.so is named something different when installed via the intel oneapi installer

* Making a number of changes to homogenize our api (same parameters, minimize parameters as much as possible, etc)

* Stashing this and going to work on the CICD stuff, it's driving me nuts

* Fairly happy with the Python API now. Documentation needs another pass, the @Overloads in the .pyi files need to be addressed, and documentation checked again.  The apps folder also needs updating to use fire instead of argparse

* Updated build to not use tcmalloc for pybind, as well as fixed the pyproject.toml so that cibuildwheel can actually successfully build our project.

* Making a change to in-mem-static for the new api and also adjusting the comment in in-mem-dynamic a bit, though... I probably shouldn't have

* Add unit test project based on boost_unit_test_framework (#365)

* Add unit test project based on boost_unit_test_framework

* Add another dockerfile for developers

* update path

---------

Co-authored-by: Yiyong Lin <[email protected]>

* Fix inefficiency in constructing reverse label map (#373)

* single loop for reverse label map

* clang formatting

* unnecessary comments removed

* minor

---------

Co-authored-by: Varun Sivashankar <[email protected]>

* fixed a bug with loading medoids for sharded filtered index, and adde… (#368)

* fixed a bug with loading medoids for sharded filtered index, and added better caching for filtered index

clang-format

fixed minor cout error

addressed Yiyong's comments, and fixed a bug for finding medoid in sharded+filtered index

Fixed windows compile error (warnings)

Fix inefficiency in constructing reverse label map (#373)

* single loop for reverse label map

* clang formatting

* unnecessary comments removed

* minor

---------

Co-authored-by: Varun Sivashankar <[email protected]>

clang-formatted

* minor cleanup

* clang-format

---------

Co-authored-by: ravishankar <[email protected]>

* patelyash/index factory (#340)

* gi# This is a combination of 2 commits.

remove _u, _s typedefs

* added some seed files

* add seed files

* New distance metric hierarchy

* Refactoring changes

* Fixing compile errors in refactored code

* Fixing compile errors

* DiskANN Builds with initial refactoring changes

* Saving changes for Ravi

* More refactoring

* Refactor

* Fixed most of the bugs related to _data

* add seed files

* gi# This is a combination of 2 commits.

remove _u, _s typedefs

* added some seed files

* New distance metric hierarchy

* Refactoring changes

* Fixing compile errors in refactored code

* Fixing compile errors

* DiskANN Builds with initial refactoring changes

* Saving changes for Ravi

* More refactoring

* Refactor

* Fixed most of the bugs related to _data

* Post merge with main

* Refactored version which compiles on Windows

* now compiles on linux

* minor clean-up

* minor bug fix

* minor bug

* clang format fix + build error fix

* clang format fix

* minor changes

* added back the fast_l2 feature

* added back set_start_points in index.cpp

* Version for review

* Incorporating Harsha's comments - 2

* move implementation of abstract data store methods to a cpp file

* clang format

* clang format

* Added slot manager file (empty) and fixed compile errors

* fixed a linux compile error

* clang

* debugging workflow failure

* clang

* more debug

* more debug

* debug for workflow

* remove slot manager

* Removed the #ifdef WINDOWS directive from class definitions

* Refactoring alignment factor into distance hierarchy

* Fixing cosine distance

* Ensuring we call preprocess_query always

* Fixed distance invocations

* fixed cosine bug, clang-formatted

* cleaned up and added comments

* clang-formatted

* more clang-format

* clang-format 3

* remove deleted code in scratch.cpp

* reverted clang to Microsoft

* small change

* Removed slot_manager from this PR

* newline at EOF in_mem_Graph_store.cpp

* rename distance_metric to distance_fn

* resolving PR comments

* minor bug fix for initialization

* creating index_factory

* using index factory to build inmem index

* clang format fix

* minor bug fix

* fixing build error

* replacing mem_store with abstract_mem_store + injecting data_store to Index

* minor fix

* clang format fix

* commenting data_store injection to prevent double invocation and mem leak (for now)

* fixing the build for fiters

* moving abstract index to abstract_index.h

* IndexBuildParamsbuilder to build IndexBuildParams properly with error checking

* fixing build errors

* fixing minor error

* refactoring index search to be simple

* clang format fix

* refactoring search_mem_index to use index factory

* clang fix

* minor fix

* minor fix for build

* optimize for fast l2 restore

* removing comments

* removing comments

* adding templating to IndexFactory (can't avoide it anymore)

* fixing build error

* fixing ubuntu build error

* ubuntu build exception fix

* passing num_pq_bytes

* giving one more shot to config dricen arch with boost::any (type erasure)

* clang fix

* modifying search to use boost::any

* fixing ubuntu build errors/warning

* created indexconfigbuilder and fixed a typo

* fixing error in pq build

* some comments + lazy_delete impl

* bumping to std c++17 & replacing boost::any with std::any

* clang fix

* c++ std 17 for ubuntu

* minor fix

* converting search to batch_search + A vector wrapper using std::any to store vector as a shared ptr

* adding AnyVector to encapsulate vector in std::any + adding basic yaml parser(WIP)

* adding wrapper code for vector and set, checked with Andrija

* fixinh ubuntu build error

* trying to resolve ubuntu build error

* testing test streaming index with IndexFactory

* fixing ubuntu build error

* fixing search for test insert delete consolidate

* refactored test_streaming_scenario

* refactored test_insert_delete_consolidate to use AbstractIndex and Indexfactory

* fixing ubuntu build error

* making build method in abstract index consistent

* some code cleanup + abstract_cpp to add implementation

* remoing coments and code cleanup

* build error fix

* fixing -Wreorder warning

* separating build structs to their header + refactor search and remove batch search

* fixing ubuntu build errors

* resolving segfault error from search_mem_index

* fixing query_result_tag allocation

* minor update

* search fix

* trying to fix windows latest build for dynamic index

* ading temp loggin to debug windows latest build issue

* removing logging for debug

* fixning windows latest build error for dynamix index search

* moving any wrappers to separate file + organizing code

* fixing check error

* updating private vsr naming convention

* minor update

* unravelig search methods in abstract index. Iteraton 1

* minor fix

* unused vars remove

* returning a unique_ptr to Abstract Index from index factory

* adding implementation from abstract_index.h to abstract_index.cpp

* making abstract index api to be more explicit (expriment)

* some code cleanup

* removing detected memory leaks (free up index)

* separtaing enums for data and graph stratagy

* Index ctor(config) now uses injected datastore from IndexFactory

* distance in index population in new config ctor

* resolving some comments from Andrija

* Resolving some restructuring comments by Andrija

* minor fix

* fixing ubuntu build error

* warning fix

* simplified get() in anywrappers

* making index config a unique ptr and owned by IndexFactory

* removing complex if/else calling recursively + added unimplemented TagT to AbsIdx

* renaming get_instance to create_instance

* clang format fix

* removing const_cast from any_wrapper

* fixing andrija's comments

* removing warnings

---------

Co-authored-by: harsha vardhan simhadri <[email protected]>
Co-authored-by: Gopal Srinivasa <[email protected]>
Co-authored-by: ravishankar <[email protected]>
Co-authored-by: Harsha Vardhan Simhadri <[email protected]>

* patelyash/index factory (#340) (#380)



---------

Co-authored-by: Yash Patel <[email protected]>
Co-authored-by: harsha vardhan simhadri <[email protected]>
Co-authored-by: Gopal Srinivasa <[email protected]>
Co-authored-by: ravishankar <[email protected]>
Co-authored-by: Harsha Vardhan Simhadri <[email protected]>

* hot fix for python build (#383)

* some bug fix when enable the EXEC_EnV_OLS (#377)

* some bug fix when enable the EXEC_EnV_OLS

* avoid unit test failure

* unit test testing

* changed based on gopal's suggestion

* update load_impl(AlignedFileReader &reader)

* change the load_impl to be identical to objectstore

* remvoe blank

* Output distance file in memory index search (#382)

* Output distance file

* fix

---------

Co-authored-by: Shengjie Qian <[email protected]>

* Add WIN macro for non-win function (#360)

* Add WIN macro for non-win funtion

* fix vc16 compile issue

* fix compile issue

* fix compile issue

* fix compile issue

* clean up code

* small EXEC_ENV_OLS bug fix (#387)

* small bug fix

* test ubuntu fail

* formatting

* re-triggering unitest

* Python Refactor (#385)

* Refactor of diskannpy module code.

* 0.5.0.rc1 for python and enabling the build-python portion of the pr-test process.

* clang-format changes

* In theory this should speed up the python build drastically by only building the wheel for the python version and OS we're attempting to fan out to in our CICD job tree

* Missed a dollar sign

* Copy/pasting left a CICD step name that implied we were running a code formatting check when instead we were building a wheel.  This is now fixed.

* In theory, readying the release action too.  We won't know if it works until it merges and we cut a release, but at least the paths have been fixed

* Designated initializers just happened to work on linux but shouldn't have as they weren't added until cpp20

* Formatting

* Jinweizhang/filter paramsfix (#388)

* small bug fix

* test ubuntu fail

* formatting

* re-triggering unitest

* cause error, remove two character params

* cause error, remove two character params

* unit test fix

* clean up code

* add more accurate error handelling

* fix filter build

* re-trigger test

* try lower recall number

* test witl more value

* revert back to test unit test

* Update python-release.yml

Github actions fix: composite action `python-wheel` publishes wheels to the `wheels` artifact.  `python-release` workflow then looks for it in the `dist` artifact, which does not exist.

This is a CICD change only.

* Fixed inputs type-o (#391)

* Fixed inputs type-o

* Action 'checkout@v2' is deprecated

* Update pyproject.toml

Trying a new release of the python lib to see if there was a packaging error in the publication of rc1.

* Fixed param documentation (#393)

* Fixed param name in comments

* Hide rust/target

* Bypass errors in logging for non-msft-prod environments (#392)

* Removed the logger and verified that the logging capability is the root cause of our consistent segfault errors in python.  Perhaps it also will fix any issues in our label test too?  I'd like to push it to GH and see.

* Formatting fixes

* Revert "Formatting fixes"

This reverts commit 9042595.

* Revert "Removed the logger and verified that the logging capability is the root cause of our consistent segfault errors in python.  Perhaps it also will fix any issues in our label test too?  I'd like to push it to GH and see."

This reverts commit 7561009.

* The custom logging implementation is causing segfaults in python. We're not sure exactly where, but this is the easiest and quickest way to getting a working python release.

* All the integration tests are failing, and there's a chance the virtual dtor on AbstractDataStore might be the culprit, though I am not sure why.  I'm hoping it is so it won't fall on the logging changes.

* Formatting. Again.

* Improve help formatting in CLI tools (#390)

* Added utilities to standardize help across cli tools.  #370

* Made three option groupings (required/optional/print)

* Moved common parameter descriptions to a common file.  #370

* Updated usage statement for search_disk_app #370

* Updated range_search_disk_index to use the new required/optional format.  #370

* Updated test apps to use the new help format.  #370

* Fixed format issue.  #370

* Updated help format for the 'build' apps. #370

* Fixed code formatting.  #370

* Added src/*.hpp to the clang format.  #370

* Moved header into the headers directory.  #370

* Added missing configs.  #370

* Removed superflous paths from include.  #370

* Added #pragma once.  #370

* Type-o fixes.  #370

* Fixed capitolization of constant.  #370

* Make fail_if_recall description more accurate.  #370

* Changed to using set notation.  #370

* Better explanations for some options.  #370

* Added short explanation of file format.  #370

---------

Co-authored-by: Jon McLean <[email protected]>
Co-authored-by: Jonathan McLean <[email protected]>

* Python build with a far more portable wheel (#396)

* Identified the appropriate build flags to get a working python build that doesn't rely on -march=native or -mtune=native.  We've run benchmarks on multiple computers that indicate the only important flag other than -mavx2 -msse2 -mfma is -funroll-loops.  Optimization levels such as -O1, -O2, or -O3 actually makes for less performant code. -Ofast is unavailble for use in Python, as it causes problems with floating point math in Python

* 1.22 was left in a comment despite 1.25 being the value specified

* Python 3.8 is not supported by numpy 1.25, so we're removing it.

* Jomclean/write timings (#397)

* Work-in-progress commit adding JSON output for timings.  in-mem-static is complete

* Added timings to dynamic and total-time to static

* Update pyproject.toml (#398)

Using the correct README for our publication to pypi.

* Added filename to log (#399)

* Jinwei/fix in memory compile error (#401)

* small bug fix

* test ubuntu fail

* formatting

* re-triggering unitest

* add small fix for in_mem_data_store when EXEC_ENV_OLS is enabed

* fix: use the passed in io_limit (#403)

* fix: use the passed in io_limit

* fix to be clang-formatted

* DynamicMemoryIndex bug fixes (#404)

* While simply creating a unit test to repro Issue #400, I found a number of bugs that I needed to address just to get it to work the way I had intended. This does not yet have what I would consider a comprehensive suite of test coverage for the DynamicMemoryIndex, but we at least do save it with the metadata file, we can load it correctly, and saving *always* consolidate_deletes() prior to save if any item has been marked for deletion prior to save.

* We actually cannot save without compacting before save anyway. Removing the parameter from save() and hardcoding it to True until we can actually support it.

* Addressing some PR comments and readying a 0.5.0.rc5 release

* Pass nullptr as nullT when creating thread_data that's of ConcurrentQueue<SSDThreadData*> type, otherwise the default null_T is uninitialized, could point to arbitraty memory (#408)

* Preparing for 0.6.0 diskannpy release (#407)

* Some early staging for README updates and pyproject updates for a 0.6.0 release for diskannpy.

* Trying to fix the CI badge to point toward main's latest build

* Updating documentation for pdoc generation

* Documentation updates. Tightened up the API to drop list support (there were entirely too many cases where it wouldn't work, and it's easier to just tell people to convert it themselves)

* Some module reorganization to make pdoc actually display the docstrings for variables re-exported at the top level

* A copy paste happened that shouldn't have.

* Updating the apps to use the new 0.6.0 api

* Addressing PR feedback

* Some of the documentation changes didn't get made in both from_file or the constructor

* Fix compile issue

* fix some issue

* fix universal label

* fix label file path

* add unv label filepath in separate path function

* fix empty path issue

---------

Co-authored-by: jinwei14 <[email protected]>
Co-authored-by: Yash Patel <[email protected]>
Co-authored-by: patelyash <[email protected]>
Co-authored-by: Varun Sivashankar <[email protected]>
Co-authored-by: David Kaczynski <[email protected]>
Co-authored-by: ravishankar <[email protected]>
Co-authored-by: David Kaczynski <[email protected]>
Co-authored-by: Siddharth Gollapudi <[email protected]>
Co-authored-by: Neelam Mahapatro <[email protected]>
Co-authored-by: Harsha Vardhan Simhadri <[email protected]>
Co-authored-by: Harsha Vardhan Simhadri <[email protected]>
Co-authored-by: Dax Pryce <[email protected]>
Co-authored-by: Jakub Tarnawski <[email protected]>
Co-authored-by: Yiyong Lin <[email protected]>
Co-authored-by: Yiyong Lin <[email protected]>
Co-authored-by: Andrija Antonijevic <[email protected]>
Co-authored-by: Neelam Mahapatro <[email protected]>
Co-authored-by: harsha vardhan simhadri <[email protected]>
Co-authored-by: gopalrs <[email protected]>
Co-authored-by: Gopal Srinivasa <[email protected]>
Co-authored-by: yashpatel007 <[email protected]>
Co-authored-by: nicaurvi <[email protected]>
Co-authored-by: Nick Caurvina <[email protected]>
Co-authored-by: Varun Sivashankar <[email protected]>
Co-authored-by: rakri <[email protected]>
Co-authored-by: varat73 <[email protected]>
Co-authored-by: JieCin <[email protected]>
Co-authored-by: Shengjie Qian <[email protected]>
Co-authored-by: Jon McLean <[email protected]>
Co-authored-by: Jon McLean <[email protected]>
Co-authored-by: Jonathan McLean <[email protected]>
Co-authored-by: litan1 <[email protected]>
  • Loading branch information
1 parent 3681b56 commit 07938b9
Show file tree
Hide file tree
Showing 319 changed files with 30,059 additions and 4,937 deletions.
28 changes: 28 additions & 0 deletions .github/actions/build/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: 'DiskANN Build Bootstrap'
description: 'Prepares DiskANN build environment and executes build'
runs:
using: "composite"
steps:
# ------------ Linux Build ---------------
- name: Prepare and Execute Build
if: ${{ runner.os == 'Linux' }}
run: |
sudo scripts/dev/install-dev-deps-ubuntu.bash
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DUNIT_TEST=True
cmake --build build -- -j
cmake --install build --prefix="dist"
shell: bash
# ------------ End Linux Build ---------------
# ------------ Windows Build ---------------
- name: Add VisualStudio command line tools into path
if: runner.os == 'Windows'
uses: ilammy/msvc-dev-cmd@v1
- name: Run configure and build for Windows
if: runner.os == 'Windows'
run: |
mkdir build && cd build && cmake .. -DUNIT_TEST=True && msbuild diskann.sln /m /nologo /t:Build /p:Configuration="Release" /property:Platform="x64" -consoleloggerparameters:"ErrorsOnly;Summary"
cd ..
mkdir dist
mklink /j .\dist\bin .\x64\Release\
shell: cmd
# ------------ End Windows Build ---------------
13 changes: 13 additions & 0 deletions .github/actions/format-check/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: 'Checking code formatting...'
description: 'Ensures code complies with code formatting rules'
runs:
using: "composite"
steps:
- name: Checking code formatting...
run: |
sudo apt install clang-format
find include -name '*.h' -type f -print0 | xargs -0 -P 16 /usr/bin/clang-format --Werror --dry-run
find src -name '*.cpp' -type f -print0 | xargs -0 -P 16 /usr/bin/clang-format --Werror --dry-run
find apps -name '*.cpp' -type f -print0 | xargs -0 -P 16 /usr/bin/clang-format --Werror --dry-run
find python -name '*.cpp' -type f -print0 | xargs -0 -P 16 /usr/bin/clang-format --Werror --dry-run
shell: bash
35 changes: 35 additions & 0 deletions .github/actions/generate-random/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: 'Generating Random Data (Basic)'
description: 'Generates the random data files used in acceptance tests'
runs:
using: "composite"
steps:
- name: Generate Random Data (Basic)
run: |
mkdir data
echo "Generating random vectors for index"
dist/bin/rand_data_gen --data_type float --output_file data/rand_float_10D_10K_norm1.0.bin -D 10 -N 10000 --norm 1.0
dist/bin/rand_data_gen --data_type int8 --output_file data/rand_int8_10D_10K_norm50.0.bin -D 10 -N 10000 --norm 50.0
dist/bin/rand_data_gen --data_type uint8 --output_file data/rand_uint8_10D_10K_norm50.0.bin -D 10 -N 10000 --norm 50.0
echo "Generating random vectors for query"
dist/bin/rand_data_gen --data_type float --output_file data/rand_float_10D_1K_norm1.0.bin -D 10 -N 1000 --norm 1.0
dist/bin/rand_data_gen --data_type int8 --output_file data/rand_int8_10D_1K_norm50.0.bin -D 10 -N 1000 --norm 50.0
dist/bin/rand_data_gen --data_type uint8 --output_file data/rand_uint8_10D_1K_norm50.0.bin -D 10 -N 1000 --norm 50.0
echo "Computing ground truth for floats across l2, mips, and cosine distance functions"
dist/bin/compute_groundtruth --data_type float --dist_fn l2 --base_file data/rand_float_10D_10K_norm1.0.bin --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --K 100
dist/bin/compute_groundtruth --data_type float --dist_fn mips --base_file data/rand_float_10D_10K_norm1.0.bin --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/mips_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --K 100
dist/bin/compute_groundtruth --data_type float --dist_fn cosine --base_file data/rand_float_10D_10K_norm1.0.bin --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/cosine_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --K 100
echo "Computing ground truth for int8s across l2, mips, and cosine distance functions"
dist/bin/compute_groundtruth --data_type int8 --dist_fn l2 --base_file data/rand_int8_10D_10K_norm50.0.bin --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/l2_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
dist/bin/compute_groundtruth --data_type int8 --dist_fn mips --base_file data/rand_int8_10D_10K_norm50.0.bin --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/mips_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
dist/bin/compute_groundtruth --data_type int8 --dist_fn cosine --base_file data/rand_int8_10D_10K_norm50.0.bin --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/cosine_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
echo "Computing ground truth for uint8s across l2, mips, and cosine distance functions"
dist/bin/compute_groundtruth --data_type uint8 --dist_fn l2 --base_file data/rand_uint8_10D_10K_norm50.0.bin --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
dist/bin/compute_groundtruth --data_type uint8 --dist_fn mips --base_file data/rand_uint8_10D_10K_norm50.0.bin --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/mips_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
dist/bin/compute_groundtruth --data_type uint8 --dist_fn cosine --base_file data/rand_uint8_10D_10K_norm50.0.bin --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/cosine_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
shell: bash
22 changes: 22 additions & 0 deletions .github/actions/python-wheel/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: Build Python Wheel
description: Builds a python wheel with cibuildwheel
inputs:
cibw-identifier:
description: "CI build wheel identifier to build"
required: true
runs:
using: "composite"
steps:
- uses: actions/setup-python@v3
- name: Install cibuildwheel
run: python -m pip install cibuildwheel==2.11.3
shell: bash
- name: Building Python ${{inputs.cibw-identifier}} Wheel
run: python -m cibuildwheel --output-dir dist
env:
CIBW_BUILD: ${{inputs.cibw-identifier}}
shell: bash
- uses: actions/upload-artifact@v3
with:
name: wheels
path: ./dist/*.whl
42 changes: 42 additions & 0 deletions .github/workflows/build-python.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: DiskANN Build Python Wheel
on: [workflow_call]
jobs:
linux-build:
name: Python - Ubuntu - ${{matrix.cibw-identifier}}
strategy:
fail-fast: false
matrix:
cibw-identifier: ["cp39-manylinux_x86_64", "cp310-manylinux_x86_64", "cp311-manylinux_x86_64"]
runs-on: ubuntu-latest
defaults:
run:
shell: bash
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 1
- name: Building python wheel ${{matrix.cibw-identifier}}
uses: ./.github/actions/python-wheel
with:
cibw-identifier: ${{matrix.cibw-identifier}}
windows-build:
name: Python - Windows - ${{matrix.cibw-identifier}}
strategy:
fail-fast: false
matrix:
cibw-identifier: ["cp39-win_amd64", "cp310-win_amd64", "cp311-win_amd64"]
runs-on: windows-latest
defaults:
run:
shell: bash
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
submodules: true
fetch-depth: 1
- name: Building python wheel ${{matrix.cibw-identifier}}
uses: ./.github/actions/python-wheel
with:
cibw-identifier: ${{matrix.cibw-identifier}}
28 changes: 28 additions & 0 deletions .github/workflows/common.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: DiskANN Common Checks
# common means common to both pr-test and push-test
on: [workflow_call]
jobs:
formatting-check:
strategy:
fail-fast: true
name: Code Formatting Test
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 1
- name: Checking code formatting...
uses: ./.github/actions/format-check
docker-container-build:
name: Docker Container Build
needs: [formatting-check]
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 1
- name: Docker build
run: |
docker build .
107 changes: 107 additions & 0 deletions .github/workflows/disk-pq.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
name: Disk With PQ
on: [workflow_call]
jobs:
acceptance-tests-disk-pq:
name: Disk, PQ
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-2019, windows-latest]
runs-on: ${{matrix.os}}
defaults:
run:
shell: bash
steps:
- name: Checkout repository
if: ${{ runner.os == 'Linux' }}
uses: actions/checkout@v3
with:
fetch-depth: 1
- name: Checkout repository
if: ${{ runner.os == 'Windows' }}
uses: actions/checkout@v3
with:
fetch-depth: 1
submodules: true
- name: DiskANN Build CLI Applications
uses: ./.github/actions/build

- name: Generate Data
uses: ./.github/actions/generate-random

- name: build and search disk index (one shot graph build, L2, no diskPQ) (float)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_oneshot -R 16 -L 32 -B 0.00003 -M 1
dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_oneshot --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: build and search disk index (one shot graph build, L2, no diskPQ) (int8)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type int8 --dist_fn l2 --data_path data/rand_int8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_oneshot -R 16 -L 32 -B 0.00003 -M 1
dist/bin/search_disk_index --data_type int8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_oneshot --result_path /tmp/res --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/l2_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: build and search disk index (one shot graph build, L2, no diskPQ) (uint8)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --data_path data/rand_uint8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_oneshot -R 16 -L 32 -B 0.00003 -M 1
dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_oneshot --result_path /tmp/res --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: build and search disk index (one shot graph build, L2, no diskPQ, build with PQ distance comparisons) (float)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_oneshot_buildpq5 -R 16 -L 32 -B 0.00003 -M 1 --build_PQ_bytes 5
dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_oneshot_buildpq5 --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: build and search disk index (one shot graph build, L2, no diskPQ, build with PQ distance comparisons) (int8)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type int8 --dist_fn l2 --data_path data/rand_int8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_oneshot_buildpq5 -R 16 -L 32 -B 0.00003 -M 1 --build_PQ_bytes 5
dist/bin/search_disk_index --data_type int8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_oneshot_buildpq5 --result_path /tmp/res --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/l2_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16\
- name: build and search disk index (one shot graph build, L2, no diskPQ, build with PQ distance comparisons) (uint8)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --data_path data/rand_uint8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_oneshot_buildpq5 -R 16 -L 32 -B 0.00003 -M 1 --build_PQ_bytes 5
dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_oneshot_buildpq5 --result_path /tmp/res --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: build and search disk index (sharded graph build, L2, no diskPQ) (float)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_sharded -R 16 -L 32 -B 0.00003 -M 0.00006
dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_sharded --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: build and search disk index (sharded graph build, L2, no diskPQ) (int8)
run: |
dist/bin/build_disk_index --data_type int8 --dist_fn l2 --data_path data/rand_int8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_sharded -R 16 -L 32 -B 0.00003 -M 0.00006
dist/bin/search_disk_index --data_type int8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_sharded --result_path /tmp/res --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/l2_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: build and search disk index (sharded graph build, L2, no diskPQ) (uint8)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --data_path data/rand_uint8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_sharded -R 16 -L 32 -B 0.00003 -M 0.00006
dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_sharded --result_path /tmp/res --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: build and search disk index (one shot graph build, L2, diskPQ) (float)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskpq_oneshot -R 16 -L 32 -B 0.00003 -M 1 --PQ_disk_bytes 5
dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskpq_oneshot --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: build and search disk index (one shot graph build, L2, diskPQ) (int8)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type int8 --dist_fn l2 --data_path data/rand_int8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskpq_oneshot -R 16 -L 32 -B 0.00003 -M 1 --PQ_disk_bytes 5
dist/bin/search_disk_index --data_type int8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskpq_oneshot --result_path /tmp/res --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/l2_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: build and search disk index (one shot graph build, L2, diskPQ) (uint8)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --data_path data/rand_uint8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskpq_oneshot -R 16 -L 32 -B 0.00003 -M 1 --PQ_disk_bytes 5
dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskpq_oneshot --result_path /tmp/res --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: build and search disk index (sharded graph build, MIPS, diskPQ) (float)
if: success() || failure()
run: |
dist/bin/build_disk_index --data_type float --dist_fn mips --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_mips_rand_float_10D_10K_norm1.0_diskpq_sharded -R 16 -L 32 -B 0.00003 -M 0.00006 --PQ_disk_bytes 5
dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_mips_rand_float_10D_10K_norm1.0_diskpq_sharded --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/mips_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
- name: upload data and bin
uses: actions/upload-artifact@v3
with:
name: disk-pq
path: |
./dist/**
./data/**
Loading

0 comments on commit 07938b9

Please sign in to comment.