Skip to content

Commit

Permalink
Add API and CLI command for updating multiple indexes at one (#4)
Browse files Browse the repository at this point in the history
* Add API and CLI command for updating multiple indexes at one

* Add documentation and additional tests
  • Loading branch information
ludwiktrammer authored Apr 10, 2024
1 parent 3191f24 commit a2e37b5
Show file tree
Hide file tree
Showing 19 changed files with 646 additions and 36 deletions.
88 changes: 88 additions & 0 deletions docs/how-to/update_similarity_indexes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# How-To: Update Similarity Indexes

The Similarity Index is a feature provided by db-ally that takes user input and maps it to the closest matching value in the data source using a chosen similarity metric. This feature is handy when the user input does not exactly match the data source, such as when the user asks to "list all employees in the IT department," while the database categorizes this group as the "computer department." To learn more about Similarity Indexes, refer to the [Concept: Similarity Indexes](../concepts/similarity_indexes.md) page.

While Similarity Indexes can be used directly, they are usually used with [Views](../concepts/views.md), annotating arguments to filter methods. This technique lets db-ally automatically match user-provided arguments to the most similar value in the data source. You can see an example of using similarity indexes with views on the [Quickstart Part 2: Semantic Similarity](../quickstart/quickstart2.md) page.

Similarity Indexes are designed to index all possible values (e.g., on disk or in a different data store). Consequently, when the data source undergoes changes, the Similarity Index must update to reflect these alterations. This guide will explain different ways to update Similarity Indexes.

You can update the Similarity Index through Python code or via the db-ally CLI. The following sections explain how to update these indexes using both methods:

* [Update Similarity Indexes via the CLI](#updating-similarity-indexes-via-the-cli)
* [Update Similarity Indexes via Python Code](#updating-similarity-indexes-via-python-code)

## Update Similarity Indexes via the CLI

To update Similarity Indexes via the CLI, you can use the `dbally update-index` command. This command requires a path to what you wish to update. The path should follow this format: "path.to.module:ViewName.method_name.argument_name" where each part after the colon is optional. The more specific your target is, the fewer Similarity Indexes will be updated.

For example, to update all Similarity Indexes in a module `my_module.views`, use this command:

```bash
dbally update-index my_module.views
```

To update all Similarity Indexes in a specific View, add the name of the View following the module path:

```bash
dbally update-index my_module.views:MyView
```

To update all Similarity Indexes within a specific method of a View, add the method's name after the View name:

```bash
dbally update-index my_module.views:MyView.method_name
```

Lastly, to update all Similarity Indexes in a particular argument of a method, add the argument name after the method name:

```bash
dbally update-index my_module.views:MyView.method_name.argument_name
```

For example, given the following view:

## Update Similarity Indexes via Python Code
### Update on a Single Similarity Index
To manually update a Similarity Index, call the `update` method on the Similarity Index object. The `update` method will re-fetch all possible values from the data source and re-index them. Below is an example of how to manually update a Similarity Index:

```python
from db_ally import SimilarityIndex

# Create a similarity index
similarity_index = SimilarityIndex(fetcher=fetcher, store=store)

# Update the similarity index
await similarity_index.update()
```

### Detect Similarity Indexes in Views
If you are using Similarity Indexes to annotate arguments in views, you can use the [`SimilarityIndexDetector`][dbally.similarity.detector.SimilarityIndexDetector] to locate all Similarity Indexes in a view and update them.

For example, to update all Similarity Indexes in a view named `MyView` in a module labeled `my_module.views`, use the following code:

```python
from db_ally import SimilarityIndexDetector

detector = SimilarityIndexDetector.from_path("my_module.views:MyView")
[await index.update() for index in detector.list_indexes()]
```

The `from_path` method constructs a `SimilarityIndexDetector` object from a view path string in the same format as the CLI command. The `list_indexes` method returns a list of Similarity Indexes detected in the view.

For instance, to detect all Similarity Indexes in a module, provide only the path:

```python
detector = SimilarityIndexDetector.from_path("my_module.views")
```

Conversely, to detect all Similarity Indexes in a specific method of a view, provide the method name:

```python
detector = SimilarityIndexDetector.from_path("my_module.views:MyView.method_name")
```

Lastly, to detect all Similarity Indexes in a particular argument of a method, provide the argument name:

```python
detector = SimilarityIndexDetector.from_path("my_module.views:MyView.method_name.argument_name")
```
3 changes: 3 additions & 0 deletions docs/how-to/use_custom_similarity_fetcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ You can use the index with a custom fetcher [the same way](../quickstart/quickst
await breeds_similarity.update()
```

!!! note
The `update` method will re-fetch all possible values from the data source and re-index them. Usually, you wouldn't call this method each time you use the similarity index. Instead, you would update the index periodically or when the data source changes. See the [How-To: Update Similarity Indexes](../how-to/update_similarity_indexes.md) guide for more information.

Then, you can use the similarity index to find the most similar value to a user input and deliver a response based on that value.

```python
Expand Down
3 changes: 3 additions & 0 deletions docs/how-to/use_custom_similarity_store.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ You can use an index with a custom store [the same way](../quickstart/quickstart
await country_similarity.update()
```

!!! note
The `update` method will re-fetch all possible values from the data source and re-index them. Usually, you wouldn't call this method each time you use the similarity index. Instead, you would update the index periodically or when the data source changes. See the [How-To: Update Similarity Indexes](../how-to/update_similarity_indexes.md) guide for more information.

Then, you can utilize the similarity index to find the closest matching value to a user input and generate a response based on that value.

```python
Expand Down
2 changes: 1 addition & 1 deletion docs/quickstart/quickstart2.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ country_similarity.update()
```

!!! note
Typically, you wouldn't want to update the similarity index every time you run a query, but rather on a schedule or when the database changes.
The `update` method will re-fetch all possible values from the data source and re-index them. Usually, you wouldn't call this method each time you use the similarity index. Instead, you would update the index periodically or when the data source changes. See the [How-To: Update Similarity Indexes](../how-to/update_similarity_indexes.md) guide for more information.

## Annotating the Filter to Use the Similarity Index
Now that we have the similarity index, we can use it to annotate the filter to use the similarity index when filtering candidates by country:
Expand Down
7 changes: 7 additions & 0 deletions docs/reference/similarity/detector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# SimilarityIndexDetector

SimilarityIndexDetector is a class that can be used to detect similarity indexes in views and update them. To see how to use it, see the [How-To: Update Similarity Indexes](../../how-to/update_similarity_indexes.md) guide.

::: dbally.similarity.detector.SimilarityIndexDetector

::: dbally.similarity.detector.SimilarityIndexDetectorException
7 changes: 5 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,10 @@ nav:
- how-to/sql_views.md
- how-to/pandas_views.md
- how-to/custom_views.md
- how-to/use_custom_similarity_fetcher.md
- how-to/use_custom_similarity_store.md
- Using similarity indexes:
- how-to/use_custom_similarity_fetcher.md
- how-to/use_custom_similarity_store.md
- how-to/update_similarity_indexes.md
- how-to/log_runs_to_langsmith.md
- how-to/create_custom_event_handler.md
- how-to/openai_assistants_integration.md
Expand Down Expand Up @@ -55,6 +57,7 @@ nav:
- reference/similarity/similarity_fetcher/index.md
- reference/similarity/similarity_fetcher/sqlalchemy.md
- reference/similarity/similarity_fetcher/sqlalchemy_simple.md
- reference/similarity/detector.md
- Embeddings:
- reference/embeddings/index.md
- reference/embeddings/openai.md
Expand Down
8 changes: 2 additions & 6 deletions src/dbally/iql/_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
IQLUnsupportedSyntaxError,
)
from dbally.iql._type_validators import validate_arg_type
from dbally.similarity.index import SimilarityIndex

if TYPE_CHECKING:
from dbally.views.base import ExposedFunction
Expand Down Expand Up @@ -84,11 +83,8 @@ async def _parse_call(self, node: ast.Call) -> syntax.FunctionCall:
for arg, arg_def in zip(node.args, func_def.parameters):
arg_value = self._parse_arg(arg)

if hasattr(arg_def.type, "__metadata__"):
similarity_indexes = [meta for meta in arg_def.type.__metadata__ if isinstance(meta, SimilarityIndex)]

if similarity_indexes:
arg_value = await similarity_indexes[0].similar(arg_value)
if arg_def.similarity_index:
arg_value = await arg_def.similarity_index.similar(arg_value)

check_result = validate_arg_type(arg_def.type, arg_value)

Expand Down
204 changes: 204 additions & 0 deletions src/dbally/similarity/detector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
import importlib
from types import ModuleType
from typing import Any, Dict, List, Optional, Type

from dbally.similarity import AbstractSimilarityIndex
from dbally.views import decorators
from dbally.views.base import ExposedFunction, MethodParamWithTyping
from dbally.views.methods_base import MethodsBaseView


class SimilarityIndexDetectorException(Exception):
"""
Exception that occured during similarity index discovery
"""

def __init__(self, message: str):
self.message = message
super().__init__(message)

def __str__(self) -> str:
return self.message


class SimilarityIndexDetector:
"""
Class used to detect similarity indexes. Works with method-based views that inherit
from MethodsBaseView (including all built-in dbally views). Automatically detects similarity
indexes on arguments of view's filter methods.
Args:
module: The module to search for similarity indexes
chosen_view_name: The name of the view to search in (optional, all views if None)
chosen_method_name: The name of the method to search in (optional, all methods if None)
chosen_argument_name: The name of the argument to search in (optional, all arguments if None)
"""

def __init__(
self,
module: ModuleType,
chosen_view_name: Optional[str] = None,
chosen_method_name: Optional[str] = None,
chosen_argument_name: Optional[str] = None,
):
self.module = module
self.chosen_view_name = chosen_view_name
self.chosen_method_name = chosen_method_name
self.chosen_argument_name = chosen_argument_name

@classmethod
def from_path(cls, path: str) -> "SimilarityIndexDetector":
"""
Create a SimilarityIndexDetector object from a path string in the format
"path.to.module:ViewName.method_name.argument_name" where each part after the
colon is optional.
Args:
path: The path to the object
Returns:
The SimilarityIndexDetector object
Raises:
SimilarityIndexDetectorException: If the module is not found
"""
module_path, *object_path = path.split(":")
object_parts = object_path[0].split(".") if object_path else []
chosen_view_name = object_parts[0] if object_parts else None
chosen_method_name = object_parts[1] if len(object_parts) > 1 else None
chosen_argument_name = object_parts[2] if len(object_parts) > 2 else None

module = cls.get_module_from_path(module_path)
return cls(module, chosen_view_name, chosen_method_name, chosen_argument_name)

@staticmethod
def get_module_from_path(module_path: str) -> ModuleType:
"""
Get the module from the given path
Args:
module_path: The path to the module
Returns:
The module
Raises:
SimilarityIndexDetectorException: If the module is not found
"""
try:
module = importlib.import_module(module_path)
except ModuleNotFoundError as exc:
raise SimilarityIndexDetectorException(f"Module {module_path} not found.") from exc
return module

def _is_methods_base_view(self, obj: Any) -> bool:
"""
Check if the given object is a subclass of MethodsBaseView
"""
return isinstance(obj, type) and issubclass(obj, MethodsBaseView) and obj is not MethodsBaseView

def list_views(self) -> List[Type[MethodsBaseView]]:
"""
List method-based views in the module, filtering by the chosen view name if given during initialization.
Returns:
List of views
Raises:
SimilarityIndexDetectorException: If the chosen view is not found
"""
views = [
getattr(self.module, name)
for name in dir(self.module)
if self._is_methods_base_view(getattr(self.module, name))
]
if self.chosen_view_name:
views = [view for view in views if view.__name__ == self.chosen_view_name]
if not views:
raise SimilarityIndexDetectorException(
f"View {self.chosen_view_name} not found in module {self.module.__name__}."
)
return views

def list_filters(self, view: Type[MethodsBaseView]) -> List[ExposedFunction]:
"""
List filters in the given view, filtering by the chosen method name if given during initialization.
Args:
view: The view
Returns:
List of filter names
Raises:
SimilarityIndexDetectorException: If the chosen method is not found
"""
methods = view.list_methods_by_decorator(decorators.view_filter)
if self.chosen_method_name:
methods = [method for method in methods if method.name == self.chosen_method_name]
if not methods:
raise SimilarityIndexDetectorException(
f"Filter method {self.chosen_method_name} not found in view {view.__name__}."
)
return methods

def list_arguments(self, method: ExposedFunction) -> List[MethodParamWithTyping]:
"""
List arguments in the given method, filtering by the chosen argument name if given during initialization.
Args:
method: The method
Returns:
List of argument names
Raises:
SimilarityIndexDetectorException: If the chosen argument is not found
"""
parameters = method.parameters
if self.chosen_argument_name:
parameters = [parameter for parameter in parameters if parameter.name == self.chosen_argument_name]
if not parameters:
raise SimilarityIndexDetectorException(
f"Argument {self.chosen_argument_name} not found in method {method.name}."
)
return parameters

def list_indexes(self, view: Optional[Type[MethodsBaseView]] = None) -> Dict[AbstractSimilarityIndex, List[str]]:
"""
List similarity indexes in the module, filtering by the chosen view, method and argument names if given
during initialization.
Args:
view: The view to search in (optional, all views if None)
Returns:
Dictionary mapping indexes to method arguments that use them
Raises:
SimilarityIndexDetectorException: If any of the chosen path parts is not found
"""
indexes: Dict[AbstractSimilarityIndex, List[str]] = {}
views = self.list_views() if view is None else [view]
for view_class in views:
for method in self.list_filters(view_class):
for parameter in self.list_arguments(method):
if parameter.similarity_index:
indexes.setdefault(parameter.similarity_index, []).append(
f"{view_class.__name__}.{method.name}.{parameter.name}"
)
return indexes

async def update_indexes(self) -> None:
"""
Update similarity indexes in the module, filtering by the chosen view, method and argument names if given
during initialization.
Raises:
SimilarityIndexDetectorException: If any of the chosen path parts is not found
"""
indexes = self.list_indexes()
if not indexes:
raise SimilarityIndexDetectorException("No similarity indexes found.")
for index in indexes:
await index.update()
Loading

0 comments on commit a2e37b5

Please sign in to comment.