Skip to content

Commit

Permalink
Added DSL documentation to Guide
Browse files Browse the repository at this point in the history
  • Loading branch information
miguelgrinberg committed Jan 23, 2025
1 parent d2f93eb commit 59ab1ea
Show file tree
Hide file tree
Showing 11 changed files with 2,318 additions and 1 deletion.
103 changes: 103 additions & 0 deletions docs/guide/dsl/asyncio.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
[[asyncio]]
==== Using asyncio with Elasticsearch DSL

The DSL module supports async/await with
https://docs.python.org/3/library/asyncio.html[asyncio]. To ensure that
you have all the required dependencies, install the `++[++async++]++`
extra:

[source,bash]
----
$ python -m pip install elasticsearch-dsl[async]
----

===== Connections

Use the `async++_++connections` module to manage your asynchronous
connections.

[source,python]
----
from elasticsearch.dsl import async_connections
async_connections.create_connection(hosts=['localhost'], timeout=20)
----

All the options available in the `connections` module can be used with
`async++_++connections`.

====== How to avoid 'Unclosed client session / connector' warnings on exit

These warnings come from the `aiohttp` package, which is used internally
by the `AsyncElasticsearch` client. They appear often when the
application exits and are caused by HTTP connections that are open when
they are garbage collected. To avoid these warnings, make sure that you
close your connections.

[source,python]
----
es = async_connections.get_connection()
await es.close()
----

===== Search DSL

Use the `AsyncSearch` class to perform asynchronous searches.

[source,python]
----
from elasticsearch.dsl import AsyncSearch
s = AsyncSearch().query("match", title="python")
async for hit in s:
print(hit.title)
----

Instead of using the `AsyncSearch` object as an asynchronous iterator,
you can explicitly call the `execute()` method to get a `Response`
object.

[source,python]
----
s = AsyncSearch().query("match", title="python")
response = await s.execute()
for hit in response:
print(hit.title)
----

An `AsyncMultiSearch` is available as well.

[source,python]
----
from elasticsearch.dsl import AsyncMultiSearch
ms = AsyncMultiSearch(index='blogs')
ms = ms.add(AsyncSearch().filter('term', tags='python'))
ms = ms.add(AsyncSearch().filter('term', tags='elasticsearch'))
responses = await ms.execute()
for response in responses:
print("Results for query %r." % response.search.query)
for hit in response:
print(hit.title)
----

===== Asynchronous Documents, Indexes, and more

The `Document`, `Index`, `IndexTemplate`, `Mapping`, `UpdateByQuery` and
`FacetedSearch` classes all have asynchronous versions that use the same
name with an `Async` prefix. These classes expose the same interfaces as
the synchronous versions, but any methods that perform I/O are defined
as coroutines.

Auxiliary classes that do not perform I/O do not have asynchronous
versions. The same classes can be used in synchronous and asynchronous
applications.

When using a `custom analyzer ++<++Analysis++>++` in an asynchronous
application, use the `async++_++simulate()` method to invoke the Analyze
API on it.

Consult the `api` section for details about each specific method.
125 changes: 125 additions & 0 deletions docs/guide/dsl/configuration.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
=== Configuration

There are several ways to configure connections for the library. The
easiest and most useful approach is to define one default connection
that can be used every time an API call is made without explicitly
passing in other connections.

[NOTE]
====
Unless you want to access multiple clusters from your application, it is
highly recommended that you use the `create++_++connection` method and
all operations will use that connection automatically.
====

==== Default connection

To define a default connection that can be used globally, use the
`connections` module and the `create++_++connection` method like this:

[source,python]
----
from elasticsearch.dsl import connections
connections.create_connection(hosts=['localhost'], timeout=20)
----

===== Single connection with an alias

You can define the `alias` or name of a connection so you can easily
refer to it later. The default value for `alias` is `default`.

[source,python]
----
from elasticsearch.dsl import connections
connections.create_connection(alias='my_new_connection', hosts=['localhost'], timeout=60)
----

Additional keyword arguments (`hosts` and `timeout` in our example) will
be passed to the `Elasticsearch` class from `elasticsearch-py`.

To see all possible configuration options refer to the
https://elasticsearch-py.readthedocs.io/en/latest/api/elasticsearch.html[documentation].

==== Multiple clusters

You can define multiple connections to multiple clusters at the same
time using the `configure` method:

[source,python]
----
from elasticsearch.dsl import connections
connections.configure(
default={'hosts': 'localhost'},
dev={
'hosts': ['esdev1.example.com:9200'],
'sniff_on_start': True
}
)
----

Such connections will be constructed lazily when requested for the first
time.

You can alternatively define multiple connections by adding them one by
one as shown in the following example:

[source,python]
----
# if you have configuration options to be passed to Elasticsearch.__init__
# this also shows creating a connection with the alias 'qa'
connections.create_connection('qa', hosts=['esqa1.example.com'], sniff_on_start=True)
# if you already have an Elasticsearch instance ready
connections.add_connection('another_qa', my_client)
----

===== Using aliases

When using multiple connections, you can refer to them using the string
alias specified when you created the connection.

This example shows how to use an alias to a connection:

[source,python]
----
s = Search(using='qa')
----

A `KeyError` will be raised if there is no connection registered with
that alias.

==== Manual

If you don't want to supply a global configuration, you can always pass
in your own connection as an instance of `elasticsearch.Elasticsearch`
with the parameter `using` wherever it is accepted like this:

[source,python]
----
s = Search(using=Elasticsearch('localhost'))
----

You can even use this approach to override any connection the object
might be already associated with:

[source,python]
----
s = s.using(Elasticsearch('otherhost:9200'))
----

[NOTE]
====
When using the `dsl` module, it is highly recommended that you
use the built-in serializer
(`elasticsearch.dsl.serializer.serializer`) to ensure your objects
are correctly serialized into `JSON` every time. The
`create++_++connection` method that is described here (and that the
`configure` method uses under the hood) will do that automatically for
you, unless you explicitly specify your own serializer. The built-in
serializer also allows you to serialize your own objects - just define a
`to++_++dict()` method on your objects and that method will be
automatically called when serializing your custom objects to `JSON`.
====
5 changes: 5 additions & 0 deletions docs/guide/dsl/examples.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
=== Examples

Please see the
https://github.com/elastic/elasticsearch-py/tree/master/examples/dsl[examples]
directory to see some complex examples using `elasticsearch-dsl`.
145 changes: 145 additions & 0 deletions docs/guide/dsl/faceted_search.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
[[faceted_search]]
==== Faceted Search

The library comes with a simple abstraction aimed at helping you develop
faceted navigation for your data.

[NOTE]
====
This API is experimental and will be subject to change. Any feedback is
welcome.
====

===== Configuration

You can provide several configuration options (as class attributes) when
declaring a `FacetedSearch` subclass:

- `index`:
the name of the index (as string) to search through, defaults to
`'++_++all'`.
- `doc++_++types`:
list of `Document` subclasses or strings to be used, defaults to
`++[++'++_++all'++]++`.
- `fields`:
list of fields on the document type to search through. The list will
be passes to `MultiMatch` query so can contain boost values
(`'title^5'`), defaults to `++[++'++*++'++]++`.
- `facets`:
dictionary of facets to display/filter on. The key is the name
displayed and values should be instances of any `Facet` subclass, for
example: `++{++'tags': TermsFacet(field='tags')}`

====== Facets

There are several different facets available:

- `TermsFacet`:
provides an option to split documents into groups based on a value of
a field, for example `TermsFacet(field='category')`
- `DateHistogramFacet`:
split documents into time intervals, example:
`DateHistogramFacet(field="published++_++date", calendar++_++interval="day")`
- `HistogramFacet`:
similar to `DateHistogramFacet` but for numerical values:
`HistogramFacet(field="rating", interval=2)`
- `RangeFacet`:
allows you to define your own ranges for a numerical fields:
`RangeFacet(field="comment++_++count", ranges=++[++("few", (None, 2)), ("lots", (2, None))++]++)`
- `NestedFacet`:
is just a simple facet that wraps another to provide access to nested
documents:
`NestedFacet('variants', TermsFacet(field='variants.color'))`

By default facet results will only calculate document count, if you wish
for a different metric you can pass in any single value metric
aggregation as the `metric` kwarg
(`TermsFacet(field='tags', metric=A('max', field=timestamp))`). When
specifying `metric` the results will be, by default, sorted in
descending order by that metric. To change it to ascending specify
`metric++_++sort="asc"` and to just sort by document count use
`metric++_++sort=False`.

====== Advanced

If you require any custom behavior or modifications simply override one
or more of the methods responsible for the class' functions:

- `search(self)`:
is responsible for constructing the `Search` object used. Override
this if you want to customize the search object (for example by adding
a global filter for published articles only).
- `query(self, search)`:
adds the query position of the search (if search input specified), by
default using `MultiField` query. Override this if you want to modify
the query type used.
- `highlight(self, search)`:
defines the highlighting on the `Search` object and returns a new one.
Default behavior is to highlight on all fields specified for search.

===== Usage

The custom subclass can be instantiated empty to provide an empty search
(matching everything) or with `query`, `filters` and `sort`.

- `query`:
is used to pass in the text of the query to be performed. If `None` is
passed in (default) a `MatchAll` query will be used. For example
`'python web'`
- `filters`:
is a dictionary containing all the facet filters that you wish to
apply. Use the name of the facet (from `.facets` attribute) as the key
and one of the possible values as value. For example
`++{++'tags': 'python'}`.
- `sort`:
is a tuple or list of fields on which the results should be sorted.
The format of the individual fields are to be the same as those passed
to `~elasticsearch.dsl.Search.sort`.

====== Response

the response returned from the `FacetedSearch` object (by calling
`.execute()`) is a subclass of the standard `Response` class that adds a
property called `facets` which contains a dictionary with lists of
buckets -each represented by a tuple of key, document count and a flag
indicating whether this value has been filtered on.

===== Example

[source,python]
----
from datetime import date
from elasticsearch.dsl import FacetedSearch, TermsFacet, DateHistogramFacet
class BlogSearch(FacetedSearch):
doc_types = [Article, ]
# fields that should be searched
fields = ['tags', 'title', 'body']
facets = {
# use bucket aggregations to define facets
'tags': TermsFacet(field='tags'),
'publishing_frequency': DateHistogramFacet(field='published_from', interval='month')
}
def search(self):
# override methods to add custom pieces
s = super().search()
return s.filter('range', publish_from={'lte': 'now/h'})
bs = BlogSearch('python web', {'publishing_frequency': date(2015, 6)})
response = bs.execute()
# access hits and other attributes as usual
total = response.hits.total
print('total hits', total.relation, total.value)
for hit in response:
print(hit.meta.score, hit.title)
for (tag, count, selected) in response.facets.tags:
print(tag, ' (SELECTED):' if selected else ':', count)
for (month, count, selected) in response.facets.publishing_frequency:
print(month.strftime('%B %Y'), ' (SELECTED):' if selected else ':', count)
----
7 changes: 7 additions & 0 deletions docs/guide/dsl/howto.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
=== How-To Guides

include::search_dsl.asciidoc[]
include::persistence.asciidoc[]
include::faceted_search.asciidoc[]
include::update_by_query.asciidoc[]
include::asyncio.asciidoc[]
Loading

0 comments on commit 59ab1ea

Please sign in to comment.