Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(views): add documentation of freeform views #21

Merged
merged 3 commits into from
Apr 25, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/concepts/freeform_views.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Concept: Freeform Views

Freeform views are a type of [view](views.md) that provides a way for developers using db-ally to define what they need from the LLM without requiring a fixed response structure. This flexibility is beneficial when the data structure is unknown beforehand or when potential queries are too diverse to be covered by a structured view. Though freeform views offer more flexibility than structured views, they are less predictable, efficient, and secure, and may be more challenging to integrate with other systems. For these reasons, we recommend using [structured views](./structured_views.md) when possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a sentence or two about strategy of:

  1. Starting with FreeFormView
  2. Collecting statistics about most common questions and failure cases
  3. Incorporating StructuredViews to overcome them

Unlike structured views, which define a response format and a set of operations the LLM may use in response to natural language queries, freeform views only have one task - to respond directly to natural language queries with data from the datasource. They accomplish this by implementing the [`ask`][dbally.views.base.BaseView] method. This method takes a natural language query as input and returns a response. The method also has access to the LLM model (via the `llm_client` attribute), which is typically used to retrieve the correct data from the source (for example, by generating a source-specific query string). To learn more about implementing freeform views, refer to the [How to: Custom Freeform Views](../how-to/custom_freeform_views.md) guide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freeform views only have one task -> Isn't the task of StructuredView the same?

I mean we can also run freeform in the dry mode and obtain directly the sql or api call.

The main difference is in the IQL generation by the StructuredView so I'd emphasise it


## Security

!!! warning
When using freeform views, the LLM typically gets raw access to the data source and can execute arbitrary operations on it using the query language of the data source (e.g., SQL). This can be powerful but also necessitates that the developer be extremely cautious about securing the data source outside of db-ally. For instance, in the case of Relational Databases, the developer should ensure that the database user used by db-ally has read-only access to the database, and that the database does not contain any sensitive data that shouldn't be exposed to the LLM.
5 changes: 2 additions & 3 deletions docs/concepts/iql.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# Concept: IQL

Intermediate Query Language (IQL) is a simple language that serves as an abstraction layer between natural language and data source-specific query syntax, such as SQL. In db-ally, LLM utilizes IQL to express complex queries in a simplified way.
Intermediate Query Language (IQL) is a simple language that serves as an abstraction layer between natural language and data source-specific query syntax, such as SQL. With db-ally's [structured views](./structured_views.md), LLM utilizes IQL to express complex queries in a simplified way.

For instance, an LLM might generate an IQL query like this when asked "Find me French candidates suitable for a senior data scientist position":

```
from_country('France') AND senior_data_scientist_position()
```

The capabilities made available to the AI model via IQL differ between projects. Developers control these by defining [Views](views.md). db-ally automatically exposes special methods defined in views, known as "filters", via IQL. For instance, the expression above suggests that the specific project contains a view that includes the `from_country` and `senior_data_scientist_position` methods (and possibly others that the LLM did not choose to use for this particular question). Additionally, the LLM can use Boolean operators (`and`,`or`, `not`) to combine individual filters into more complex expressions.
The capabilities made available to the AI model via IQL differ between projects. Developers control these by defining special [Views](structured_views.md). db-ally automatically exposes special methods defined in structured views, known as "filters", via IQL. For instance, the expression above suggests that the specific project contains a view that includes the `from_country` and `senior_data_scientist_position` methods (and possibly others that the LLM did not choose to use for this particular question). Additionally, the LLM can use Boolean operators (`and`,`or`, `not`) to combine individual filters into more complex expressions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Word special somehow implied "hard" when I was reading it, maybe just use the structured to familiarize users with this term

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it will create a repetition, but sometimes they are not that bad


IQL is at the heart of db-ally. By providing a layer of abstraction between the LLM and the data source, it significantly contributes to the primary benefits of db-ally: consistency, security, efficiency, and portability. <!-- TOOD: Link to benefits section of README -->
39 changes: 39 additions & 0 deletions docs/concepts/structured_views.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Concept: Structured Views

Structured views are a type of [view](../concepts/views.md), which provide a way for developers using db-ally to define what they need from the LLM, including:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change "are a type of a view" into something more like -> Implements [View] interface, providing more controlled approach of using db-ally, including ?


* The desired data structure, such as the specific fields to include from the data source.
* A set of operations the LLM may employ in response to natural language queries (currently only “filters” are supported, with more to come)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see alink to filters but we do not have any page, so maybe we should create one? Maybe a header in the IQL concept page?


Given different natural language queries, a db-ally view will produce different responses while maintaining a consistent data structure. This consistency offers a reliable interface for integration - the code consuming responses from a particular structured view knows what data structure to expect and can utilize this knowledge when displaying or processing the data. This feature of db-ally makes it stand out in terms of reliability and stability compared to standard text-to-SQL approaches.

Each structured view can contain one or more “filters”, which the LLM may decide to choose and apply to the extracted data so that it meets the criteria specified in the natural language query. Given such a query, LLM chooses which filters to use, provides arguments to the filters, and connects the filters with Boolean operators. The LLM expresses these filter combinations using a special language called [IQL](iql.md), in which the defined view filters provide a layer of abstraction between the LLM and the raw syntax used to query the data source (e.g., SQL).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here filters are clearly explained but I think there should be separate header in other section explaining just filters


!!! example
For instance, this is a simple [view that uses SQLAlchemy](../how-to/sql_views.md) to select data from specific columns in a SQL database. It contains a single filter, that the LLM may optionally use to control which table rows to fetch:

```python
class CandidateView(SqlAlchemyBaseView):
"""
A view for retrieving candidates from the database.
"""

def get_select(self):
"""
Defines which columns to select
"""
return sqlalchemy.select(Candidate.id, Candidate.name, Candidate.country)

@decorators.view_filter()
def from_country(self, country: str):
"""
Filter candidates from a specific country.
"""
return Candidate.country == country
```

In addition to structured views, db-ally also provides [freeform views](freeform_views.md), which are more flexible and can be used to create views that do not require a fixed data structure. Freeform views come in handy when the data structure is not predefined or when the scope of potential queries is too vast to be addressed by a structured view. Conversely, structured views are more predictable, efficient, secure, and easier to integrate with other systems. Therefore, we recommend using structured views where possible. To read about the advantages and disadvantages of both kinds of views, refer to [Concept: Views](views.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put this Freeform vs Structured View into separate section or page so that one can link to it?


A project can implement several structured views, each tailored to different output formats and filters to suit various use cases. It can also combine structured views with freeform views to allow a more flexible interface for users. The LLM selects the most suitable view that best matches the specific natural language query. For more information, you consider reading our article on [Collections](collections.md).

See the [Quickstart](../quickstart/index.md) guide for a complete example of how to define and use structured views.
40 changes: 11 additions & 29 deletions docs/concepts/views.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,19 @@
# Concept: Views
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe structured vs unstructured views use-cases and scenarios section shoould go here


Views provide a way for developers using db-ally to define what they need from the LLM, including:
Views are a core concept in db-ally. They represent a way to define what you need from the LLM and connect it to the data source. The library provides two types of views:

* The desired data structure, such as the specific fields to include from the data source.
* A set of operations the LLM may employ in response to natural language queries (currently only “filters” are supported, with more to come)
* [Structured views](structured_views.md) *(recommended)* - these define a desired data structure and a set of operations that the LLM may use in response to natural language queries.
* [Freeform views](freeform_views.md) - these provide a more flexible way to define views, without a specific data structure or predefined operations.

Given different natural language queries, a db-ally view will produce different responses while maintaining a consistent data structure. This consistency offers a reliable interface for integration - the code consuming responses from a particular view knows what data structure to expect and can utilize this knowledge when displaying or processing the data. This feature of db-ally makes it stand out in terms of reliability and stability compared to standard text-to-SQL approaches.
Structured views are built on top of [IQL](iql.md), a simple language that acts as an abstraction layer between natural language and data source-specific query syntax, such as SQL. IQL allows the LLM to express complex queries in a more straightforward manner. In contrast, freeform views operate directly on the raw data source, using the data source's query language.

Each view can contain one or more “filters”, which the LLM may decide to choose and apply to the extracted data so that it meets the criteria specified in the natural language query. Given such a query, LLM chooses which filters to use, provides arguments to the filters, and connects the filters with Boolean operators. The LLM expresses these filter combinations using a special language called [IQL](iql.md), in which the defined view filters provide a layer of abstraction between the LLM and the raw syntax used to query the data source (e.g., SQL).
We consider **structured views** to be at the heart of db-ally. These enable the library's core benefits (consistency, security, efficiency, and portability), and provide a reliable interface for integration. Structured views are especially useful for applications with precise requirements in behavior or data format. For this reason, we recommend using structured views whenever possible.

!!! example
For instance, this is a simple view that uses SQLAlchemy to select data from specific columns in a SQL database. It contains a single filter, that the LLM may optionally use to control which table rows to fetch: <!-- TODO: Add a link to how-to about SQL views -->
Here are the differences between structured and freeform views, in terms of the core benefits of db-ally:

```python
class CandidateView(SqlAlchemyBaseView):
"""
A view for retrieving candidates from the database.
"""
* **Consistency**: Structured views ensure predictable output formats, while freeform views offer more flexibility and can define views that do not require a fixed data structure. The former is easier to integrate with other systems and more predictable, while the latter provides more flexibility.
* **Security**: Structured views limit data source operations to those predefined by developers, whereas freeform views often allow the LLM to execute arbitrary operations on the data source. The former approach is considerably more secure (including protection against SQL injection attacks), whilst the latter approach is more flexible but requires developers to ensure the security of data sources outside of db-ally.
* **Efficiency**: Structured views provide [a layer of abstraction](iql.md) between the model and the data, which enables the LLM to focus on essential aspects, improving performance. Complex operations from the data source perspective can appear simple to the LLM. Conversely, freeform views can operate on the raw data source, which can be powerful but may also make it more challenging for the LLM to deliver good performance.
* **Portability**: Both structured and freeform views are typically defined in terms of a specific data source type and can be integrated with various database technologies and other data sources. However, freeform views integrate easier with data sources that already use a query language which the LLM can generate (like SQL), while structured views aren't similarly limited since they come with their query language (IQL).

def get_select(self):
"""
Defines which columns to select
"""
return sqlalchemy.select(Candidate.id, Candidate.name, Candidate.country)

@decorators.view_filter()
def from_country(self, country: str):
"""
Filter candidates from a specific country.
"""
return Candidate.country == country
```

A project might implement multiple views, each tailored to different output formats and filters for various use cases. The LLM selects the appropriate view which best corresponds to the specific natural language query. For further details, consider reading our article on [Collections](collections.md).

See the [Quickstart](../quickstart/index.md) guide for a complete example of how to define and use views.
A project might implement multiple views, of both types, each customised for different use cases. The LLM selects the most appropriate view corresponding to the specific natural language query. For further details, consider reading our article on [Collections](collections.md).
3 changes: 3 additions & 0 deletions docs/how-to/custom_freeform_views.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# How-To: Create custom Freeform Views

TODO
7 changes: 5 additions & 2 deletions docs/how-to/custom_views.md
Original file line number Diff line number Diff line change
@@ -3,7 +3,7 @@
!!! note
This is an advanced topic. If you're looking to create a view that retrieves data from an SQL database, please refer to the [SQL Views](sql_views.md) guide instead.

In this guide, we'll show you how to create views that connect to custom data sources. This could be useful if you need to retrieve data from a REST API, a NoSQL database, or any other data source not supported by the built-in base views.
In this guide, we'll show you how to create [structured views](../concepts/structured_views.md) that connect to custom data sources. This could be useful if you need to retrieve data from a REST API, a NoSQL database, or any other data source not supported by the built-in base views.

# Summary
Firstly, we will create a custom base view called `FilteredIterableBaseView` that retrieves data from a Python iterable and allows it to be filtered. It forms the base that implements data source-specific logic and lets other views inherit from it in order to define filters for specific use cases (similar to how `SqlAlchemyBaseView` is a base view provided by db-ally).
@@ -54,13 +54,16 @@ class CandidateView(FilteredIterableBaseView):
Lastly, we will illustrate how to use the `CandidatesView` like any other view in db-ally. We will create an instance of the view, add it to a collection, and start querying it.

## Types of Custom Views
There are two main ways to create custom views:
There are two main ways to create custom structured views:

* By subclassing the `MethodsBaseView`: This is the most common method. These views expect filters to be defined as class methods and help manage them. All the built-in db-ally views use this method.
* By subclassing the `BaseStructuredView` directly: This is a more low-level method. It makes no assumptions about how filters are defined and managed. This may be useful if you want to create a view that doesn't fit the standard db-ally view pattern, like when the list of available filters is dynamic or comes from an external source. In these cases, you'll need to create the entire filter management logic yourself by implementing the `list_filters` and `apply_filters` methods.

If you're not sure which method to choose, we recommend starting with the `MethodsBaseView`. It's simpler and easier to use, and you can switch to the `BaseStructuredView` later if you find you need more control over filter management. For this guide, we'll focus on the `MethodsBaseView`.

!!! note
Both are methods of creating [structured views](../concepts/structured_views.md). If you're looking to create a [freeform view](../concepts/freeform_views.md), refer to the [Freeform Views](custom_freeform_views.md) guide instead.

## The Example
Throughout the guide, we'll use an example of creating a custom base view called `FilteredIterableBaseView`. To keep things simple, the "data source" it uses is a list defined in Python. The goal is to demonstrate how to create a custom view and define filters for it. In most real-world scenarios, data would usually come from an external source, like a REST API or a database.

2 changes: 1 addition & 1 deletion docs/how-to/pandas_views.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# How-To: Use Pandas DataFrames with db-ally

In this guide, you will learn how to write [views](../concepts/views.md) that use [Pandas](https://pandas.pydata.org/) DataFrames as their data source. You will understand how to define such a view, create filters that operate on the DataFrame, and register it while providing it with the source DataFrame.
In this guide, you will learn how to write [structured views](../concepts/structured_views.md) that use [Pandas](https://pandas.pydata.org/) DataFrames as their data source. You will understand how to define such a view, create filters that operate on the DataFrame, and register it while providing it with the source DataFrame.

The example used in this guide is a DataFrame containing information about candidates. The DataFrame includes columns such as `id`, `name`, `country`, `years_of_experience`. This is the same use case as the one in the [Quickstart](../quickstart/index.md) and [Custom Views](./custom_views.md) guides. Please feel free to compare the different approaches.

Loading