-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add API and CLI command for updating multiple indexes at one (#4)
* Add API and CLI command for updating multiple indexes at one * Add documentation and additional tests
- Loading branch information
1 parent
3191f24
commit a2e37b5
Showing
19 changed files
with
646 additions
and
36 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# How-To: Update Similarity Indexes | ||
|
||
The Similarity Index is a feature provided by db-ally that takes user input and maps it to the closest matching value in the data source using a chosen similarity metric. This feature is handy when the user input does not exactly match the data source, such as when the user asks to "list all employees in the IT department," while the database categorizes this group as the "computer department." To learn more about Similarity Indexes, refer to the [Concept: Similarity Indexes](../concepts/similarity_indexes.md) page. | ||
|
||
While Similarity Indexes can be used directly, they are usually used with [Views](../concepts/views.md), annotating arguments to filter methods. This technique lets db-ally automatically match user-provided arguments to the most similar value in the data source. You can see an example of using similarity indexes with views on the [Quickstart Part 2: Semantic Similarity](../quickstart/quickstart2.md) page. | ||
|
||
Similarity Indexes are designed to index all possible values (e.g., on disk or in a different data store). Consequently, when the data source undergoes changes, the Similarity Index must update to reflect these alterations. This guide will explain different ways to update Similarity Indexes. | ||
|
||
You can update the Similarity Index through Python code or via the db-ally CLI. The following sections explain how to update these indexes using both methods: | ||
|
||
* [Update Similarity Indexes via the CLI](#updating-similarity-indexes-via-the-cli) | ||
* [Update Similarity Indexes via Python Code](#updating-similarity-indexes-via-python-code) | ||
|
||
## Update Similarity Indexes via the CLI | ||
|
||
To update Similarity Indexes via the CLI, you can use the `dbally update-index` command. This command requires a path to what you wish to update. The path should follow this format: "path.to.module:ViewName.method_name.argument_name" where each part after the colon is optional. The more specific your target is, the fewer Similarity Indexes will be updated. | ||
|
||
For example, to update all Similarity Indexes in a module `my_module.views`, use this command: | ||
|
||
```bash | ||
dbally update-index my_module.views | ||
``` | ||
|
||
To update all Similarity Indexes in a specific View, add the name of the View following the module path: | ||
|
||
```bash | ||
dbally update-index my_module.views:MyView | ||
``` | ||
|
||
To update all Similarity Indexes within a specific method of a View, add the method's name after the View name: | ||
|
||
```bash | ||
dbally update-index my_module.views:MyView.method_name | ||
``` | ||
|
||
Lastly, to update all Similarity Indexes in a particular argument of a method, add the argument name after the method name: | ||
|
||
```bash | ||
dbally update-index my_module.views:MyView.method_name.argument_name | ||
``` | ||
|
||
For example, given the following view: | ||
|
||
## Update Similarity Indexes via Python Code | ||
### Update on a Single Similarity Index | ||
To manually update a Similarity Index, call the `update` method on the Similarity Index object. The `update` method will re-fetch all possible values from the data source and re-index them. Below is an example of how to manually update a Similarity Index: | ||
|
||
```python | ||
from db_ally import SimilarityIndex | ||
|
||
# Create a similarity index | ||
similarity_index = SimilarityIndex(fetcher=fetcher, store=store) | ||
|
||
# Update the similarity index | ||
await similarity_index.update() | ||
``` | ||
|
||
### Detect Similarity Indexes in Views | ||
If you are using Similarity Indexes to annotate arguments in views, you can use the [`SimilarityIndexDetector`][dbally.similarity.detector.SimilarityIndexDetector] to locate all Similarity Indexes in a view and update them. | ||
|
||
For example, to update all Similarity Indexes in a view named `MyView` in a module labeled `my_module.views`, use the following code: | ||
|
||
```python | ||
from db_ally import SimilarityIndexDetector | ||
|
||
detector = SimilarityIndexDetector.from_path("my_module.views:MyView") | ||
[await index.update() for index in detector.list_indexes()] | ||
``` | ||
|
||
The `from_path` method constructs a `SimilarityIndexDetector` object from a view path string in the same format as the CLI command. The `list_indexes` method returns a list of Similarity Indexes detected in the view. | ||
|
||
For instance, to detect all Similarity Indexes in a module, provide only the path: | ||
|
||
```python | ||
detector = SimilarityIndexDetector.from_path("my_module.views") | ||
``` | ||
|
||
Conversely, to detect all Similarity Indexes in a specific method of a view, provide the method name: | ||
|
||
```python | ||
detector = SimilarityIndexDetector.from_path("my_module.views:MyView.method_name") | ||
``` | ||
|
||
Lastly, to detect all Similarity Indexes in a particular argument of a method, provide the argument name: | ||
|
||
```python | ||
detector = SimilarityIndexDetector.from_path("my_module.views:MyView.method_name.argument_name") | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# SimilarityIndexDetector | ||
|
||
SimilarityIndexDetector is a class that can be used to detect similarity indexes in views and update them. To see how to use it, see the [How-To: Update Similarity Indexes](../../how-to/update_similarity_indexes.md) guide. | ||
|
||
::: dbally.similarity.detector.SimilarityIndexDetector | ||
|
||
::: dbally.similarity.detector.SimilarityIndexDetectorException |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,204 @@ | ||
import importlib | ||
from types import ModuleType | ||
from typing import Any, Dict, List, Optional, Type | ||
|
||
from dbally.similarity import AbstractSimilarityIndex | ||
from dbally.views import decorators | ||
from dbally.views.base import ExposedFunction, MethodParamWithTyping | ||
from dbally.views.methods_base import MethodsBaseView | ||
|
||
|
||
class SimilarityIndexDetectorException(Exception): | ||
""" | ||
Exception that occured during similarity index discovery | ||
""" | ||
|
||
def __init__(self, message: str): | ||
self.message = message | ||
super().__init__(message) | ||
|
||
def __str__(self) -> str: | ||
return self.message | ||
|
||
|
||
class SimilarityIndexDetector: | ||
""" | ||
Class used to detect similarity indexes. Works with method-based views that inherit | ||
from MethodsBaseView (including all built-in dbally views). Automatically detects similarity | ||
indexes on arguments of view's filter methods. | ||
Args: | ||
module: The module to search for similarity indexes | ||
chosen_view_name: The name of the view to search in (optional, all views if None) | ||
chosen_method_name: The name of the method to search in (optional, all methods if None) | ||
chosen_argument_name: The name of the argument to search in (optional, all arguments if None) | ||
""" | ||
|
||
def __init__( | ||
self, | ||
module: ModuleType, | ||
chosen_view_name: Optional[str] = None, | ||
chosen_method_name: Optional[str] = None, | ||
chosen_argument_name: Optional[str] = None, | ||
): | ||
self.module = module | ||
self.chosen_view_name = chosen_view_name | ||
self.chosen_method_name = chosen_method_name | ||
self.chosen_argument_name = chosen_argument_name | ||
|
||
@classmethod | ||
def from_path(cls, path: str) -> "SimilarityIndexDetector": | ||
""" | ||
Create a SimilarityIndexDetector object from a path string in the format | ||
"path.to.module:ViewName.method_name.argument_name" where each part after the | ||
colon is optional. | ||
Args: | ||
path: The path to the object | ||
Returns: | ||
The SimilarityIndexDetector object | ||
Raises: | ||
SimilarityIndexDetectorException: If the module is not found | ||
""" | ||
module_path, *object_path = path.split(":") | ||
object_parts = object_path[0].split(".") if object_path else [] | ||
chosen_view_name = object_parts[0] if object_parts else None | ||
chosen_method_name = object_parts[1] if len(object_parts) > 1 else None | ||
chosen_argument_name = object_parts[2] if len(object_parts) > 2 else None | ||
|
||
module = cls.get_module_from_path(module_path) | ||
return cls(module, chosen_view_name, chosen_method_name, chosen_argument_name) | ||
|
||
@staticmethod | ||
def get_module_from_path(module_path: str) -> ModuleType: | ||
""" | ||
Get the module from the given path | ||
Args: | ||
module_path: The path to the module | ||
Returns: | ||
The module | ||
Raises: | ||
SimilarityIndexDetectorException: If the module is not found | ||
""" | ||
try: | ||
module = importlib.import_module(module_path) | ||
except ModuleNotFoundError as exc: | ||
raise SimilarityIndexDetectorException(f"Module {module_path} not found.") from exc | ||
return module | ||
|
||
def _is_methods_base_view(self, obj: Any) -> bool: | ||
""" | ||
Check if the given object is a subclass of MethodsBaseView | ||
""" | ||
return isinstance(obj, type) and issubclass(obj, MethodsBaseView) and obj is not MethodsBaseView | ||
|
||
def list_views(self) -> List[Type[MethodsBaseView]]: | ||
""" | ||
List method-based views in the module, filtering by the chosen view name if given during initialization. | ||
Returns: | ||
List of views | ||
Raises: | ||
SimilarityIndexDetectorException: If the chosen view is not found | ||
""" | ||
views = [ | ||
getattr(self.module, name) | ||
for name in dir(self.module) | ||
if self._is_methods_base_view(getattr(self.module, name)) | ||
] | ||
if self.chosen_view_name: | ||
views = [view for view in views if view.__name__ == self.chosen_view_name] | ||
if not views: | ||
raise SimilarityIndexDetectorException( | ||
f"View {self.chosen_view_name} not found in module {self.module.__name__}." | ||
) | ||
return views | ||
|
||
def list_filters(self, view: Type[MethodsBaseView]) -> List[ExposedFunction]: | ||
""" | ||
List filters in the given view, filtering by the chosen method name if given during initialization. | ||
Args: | ||
view: The view | ||
Returns: | ||
List of filter names | ||
Raises: | ||
SimilarityIndexDetectorException: If the chosen method is not found | ||
""" | ||
methods = view.list_methods_by_decorator(decorators.view_filter) | ||
if self.chosen_method_name: | ||
methods = [method for method in methods if method.name == self.chosen_method_name] | ||
if not methods: | ||
raise SimilarityIndexDetectorException( | ||
f"Filter method {self.chosen_method_name} not found in view {view.__name__}." | ||
) | ||
return methods | ||
|
||
def list_arguments(self, method: ExposedFunction) -> List[MethodParamWithTyping]: | ||
""" | ||
List arguments in the given method, filtering by the chosen argument name if given during initialization. | ||
Args: | ||
method: The method | ||
Returns: | ||
List of argument names | ||
Raises: | ||
SimilarityIndexDetectorException: If the chosen argument is not found | ||
""" | ||
parameters = method.parameters | ||
if self.chosen_argument_name: | ||
parameters = [parameter for parameter in parameters if parameter.name == self.chosen_argument_name] | ||
if not parameters: | ||
raise SimilarityIndexDetectorException( | ||
f"Argument {self.chosen_argument_name} not found in method {method.name}." | ||
) | ||
return parameters | ||
|
||
def list_indexes(self, view: Optional[Type[MethodsBaseView]] = None) -> Dict[AbstractSimilarityIndex, List[str]]: | ||
""" | ||
List similarity indexes in the module, filtering by the chosen view, method and argument names if given | ||
during initialization. | ||
Args: | ||
view: The view to search in (optional, all views if None) | ||
Returns: | ||
Dictionary mapping indexes to method arguments that use them | ||
Raises: | ||
SimilarityIndexDetectorException: If any of the chosen path parts is not found | ||
""" | ||
indexes: Dict[AbstractSimilarityIndex, List[str]] = {} | ||
views = self.list_views() if view is None else [view] | ||
for view_class in views: | ||
for method in self.list_filters(view_class): | ||
for parameter in self.list_arguments(method): | ||
if parameter.similarity_index: | ||
indexes.setdefault(parameter.similarity_index, []).append( | ||
f"{view_class.__name__}.{method.name}.{parameter.name}" | ||
) | ||
return indexes | ||
|
||
async def update_indexes(self) -> None: | ||
""" | ||
Update similarity indexes in the module, filtering by the chosen view, method and argument names if given | ||
during initialization. | ||
Raises: | ||
SimilarityIndexDetectorException: If any of the chosen path parts is not found | ||
""" | ||
indexes = self.list_indexes() | ||
if not indexes: | ||
raise SimilarityIndexDetectorException("No similarity indexes found.") | ||
for index in indexes: | ||
await index.update() |
Oops, something went wrong.