Added DSL documentation to Guide

elastic · Jan 23, 2025 · 59ab1ea · 59ab1ea
1 parent d2f93eb
commit 59ab1ea
Show file tree

Hide file tree

Showing 11 changed files with 2,318 additions and 1 deletion.
diff --git a/docs/guide/dsl/asyncio.asciidoc b/docs/guide/dsl/asyncio.asciidoc
@@ -0,0 +1,103 @@
+[[asyncio]]
+==== Using asyncio with Elasticsearch DSL
+
+The DSL module supports async/await with
+https://docs.python.org/3/library/asyncio.html[asyncio]. To ensure that
+you have all the required dependencies, install the `++[++async++]++`
+extra:
+
+[source,bash]
+----
+$ python -m pip install elasticsearch-dsl[async]
+----
+
+===== Connections
+
+Use the `async++_++connections` module to manage your asynchronous
+connections.
+
+[source,python]
+----
+from elasticsearch.dsl import async_connections
+
+async_connections.create_connection(hosts=['localhost'], timeout=20)
+----
+
+All the options available in the `connections` module can be used with
+`async++_++connections`.
+
+====== How to avoid 'Unclosed client session / connector' warnings on exit
+
+These warnings come from the `aiohttp` package, which is used internally
+by the `AsyncElasticsearch` client. They appear often when the
+application exits and are caused by HTTP connections that are open when
+they are garbage collected. To avoid these warnings, make sure that you
+close your connections.
+
+[source,python]
+----
+es = async_connections.get_connection()
+await es.close()
+----
+
+===== Search DSL
+
+Use the `AsyncSearch` class to perform asynchronous searches.
+
+[source,python]
+----
+from elasticsearch.dsl import AsyncSearch
+
+s = AsyncSearch().query("match", title="python")
+async for hit in s:
+    print(hit.title)
+----
+
+Instead of using the `AsyncSearch` object as an asynchronous iterator,
+you can explicitly call the `execute()` method to get a `Response`
+object.
+
+[source,python]
+----
+s = AsyncSearch().query("match", title="python")
+response = await s.execute()
+for hit in response:
+    print(hit.title)
+----
+
+An `AsyncMultiSearch` is available as well.
+
+[source,python]
+----
+from elasticsearch.dsl import AsyncMultiSearch
+
+ms = AsyncMultiSearch(index='blogs')
+
+ms = ms.add(AsyncSearch().filter('term', tags='python'))
+ms = ms.add(AsyncSearch().filter('term', tags='elasticsearch'))
+
+responses = await ms.execute()
+
+for response in responses:
+    print("Results for query %r." % response.search.query)
+    for hit in response:
+        print(hit.title)
+----
+
+===== Asynchronous Documents, Indexes, and more
+
+The `Document`, `Index`, `IndexTemplate`, `Mapping`, `UpdateByQuery` and
+`FacetedSearch` classes all have asynchronous versions that use the same
+name with an `Async` prefix. These classes expose the same interfaces as
+the synchronous versions, but any methods that perform I/O are defined
+as coroutines.
+
+Auxiliary classes that do not perform I/O do not have asynchronous
+versions. The same classes can be used in synchronous and asynchronous
+applications.
+
+When using a `custom analyzer ++<++Analysis++>++` in an asynchronous
+application, use the `async++_++simulate()` method to invoke the Analyze
+API on it.
+
+Consult the `api` section for details about each specific method.
diff --git a/docs/guide/dsl/configuration.asciidoc b/docs/guide/dsl/configuration.asciidoc
@@ -0,0 +1,125 @@
+=== Configuration
+
+There are several ways to configure connections for the library. The
+easiest and most useful approach is to define one default connection
+that can be used every time an API call is made without explicitly
+passing in other connections.
+
+[NOTE]
+====
+Unless you want to access multiple clusters from your application, it is
+highly recommended that you use the `create++_++connection` method and
+all operations will use that connection automatically.
+====
+
+==== Default connection
+
+To define a default connection that can be used globally, use the
+`connections` module and the `create++_++connection` method like this:
+
+[source,python]
+----
+from elasticsearch.dsl import connections
+
+connections.create_connection(hosts=['localhost'], timeout=20)
+----
+
+===== Single connection with an alias
+
+You can define the `alias` or name of a connection so you can easily
+refer to it later. The default value for `alias` is `default`.
+
+[source,python]
+----
+from elasticsearch.dsl import connections
+
+connections.create_connection(alias='my_new_connection', hosts=['localhost'], timeout=60)
+----
+
+Additional keyword arguments (`hosts` and `timeout` in our example) will
+be passed to the `Elasticsearch` class from `elasticsearch-py`.
+
+To see all possible configuration options refer to the
+https://elasticsearch-py.readthedocs.io/en/latest/api/elasticsearch.html[documentation].
+
+==== Multiple clusters
+
+You can define multiple connections to multiple clusters at the same
+time using the `configure` method:
+
+[source,python]
+----
+from elasticsearch.dsl import connections
+
+connections.configure(
+    default={'hosts': 'localhost'},
+    dev={
+        'hosts': ['esdev1.example.com:9200'],
+        'sniff_on_start': True
+    }
+)
+----
+
+Such connections will be constructed lazily when requested for the first
+time.
+
+You can alternatively define multiple connections by adding them one by
+one as shown in the following example:
+
+[source,python]
+----
+# if you have configuration options to be passed to Elasticsearch.__init__
+# this also shows creating a connection with the alias 'qa'
+connections.create_connection('qa', hosts=['esqa1.example.com'], sniff_on_start=True)
+
+# if you already have an Elasticsearch instance ready
+connections.add_connection('another_qa', my_client)
+----
+
+===== Using aliases
+
+When using multiple connections, you can refer to them using the string
+alias specified when you created the connection.
+
+This example shows how to use an alias to a connection:
+
+[source,python]
+----
+s = Search(using='qa')
+----
+
+A `KeyError` will be raised if there is no connection registered with
+that alias.
+
+==== Manual
+
+If you don't want to supply a global configuration, you can always pass
+in your own connection as an instance of `elasticsearch.Elasticsearch`
+with the parameter `using` wherever it is accepted like this:
+
+[source,python]
+----
+s = Search(using=Elasticsearch('localhost'))
+----
+
+You can even use this approach to override any connection the object
+might be already associated with:
+
+[source,python]
+----
+s = s.using(Elasticsearch('otherhost:9200'))
+----
+
+[NOTE]
+====
+When using the `dsl` module, it is highly recommended that you
+use the built-in serializer
+(`elasticsearch.dsl.serializer.serializer`) to ensure your objects
+are correctly serialized into `JSON` every time. The
+`create++_++connection` method that is described here (and that the
+`configure` method uses under the hood) will do that automatically for
+you, unless you explicitly specify your own serializer. The built-in
+serializer also allows you to serialize your own objects - just define a
+`to++_++dict()` method on your objects and that method will be
+automatically called when serializing your custom objects to `JSON`.
+====
diff --git a/docs/guide/dsl/examples.asciidoc b/docs/guide/dsl/examples.asciidoc
@@ -0,0 +1,5 @@
+=== Examples
+
+Please see the
+https://github.com/elastic/elasticsearch-py/tree/master/examples/dsl[examples]
+directory to see some complex examples using `elasticsearch-dsl`.
diff --git a/docs/guide/dsl/faceted_search.asciidoc b/docs/guide/dsl/faceted_search.asciidoc
@@ -0,0 +1,145 @@
+[[faceted_search]]
+==== Faceted Search
+
+The library comes with a simple abstraction aimed at helping you develop
+faceted navigation for your data.
+
+[NOTE]
+====
+This API is experimental and will be subject to change. Any feedback is
+welcome.
+====
+
+===== Configuration
+
+You can provide several configuration options (as class attributes) when
+declaring a `FacetedSearch` subclass:
+
+- `index`:
+  the name of the index (as string) to search through, defaults to
+  `'++_++all'`.
+- `doc++_++types`:
+  list of `Document` subclasses or strings to be used, defaults to
+  `++[++'++_++all'++]++`.
+- `fields`:
+  list of fields on the document type to search through. The list will
+  be passes to `MultiMatch` query so can contain boost values
+  (`'title^5'`), defaults to `++[++'++*++'++]++`.
+- `facets`:
+  dictionary of facets to display/filter on. The key is the name
+  displayed and values should be instances of any `Facet` subclass, for
+  example: `++{++'tags': TermsFacet(field='tags')}`
+
+====== Facets
+
+There are several different facets available:
+
+- `TermsFacet`:
+  provides an option to split documents into groups based on a value of
+  a field, for example `TermsFacet(field='category')`
+- `DateHistogramFacet`:
+  split documents into time intervals, example:
+  `DateHistogramFacet(field="published++_++date", calendar++_++interval="day")`
+- `HistogramFacet`:
+  similar to `DateHistogramFacet` but for numerical values:
+  `HistogramFacet(field="rating", interval=2)`
+- `RangeFacet`:
+  allows you to define your own ranges for a numerical fields:
+  `RangeFacet(field="comment++_++count", ranges=++[++("few", (None, 2)), ("lots", (2, None))++]++)`
+- `NestedFacet`:
+  is just a simple facet that wraps another to provide access to nested
+  documents:
+  `NestedFacet('variants', TermsFacet(field='variants.color'))`
+
+By default facet results will only calculate document count, if you wish
+for a different metric you can pass in any single value metric
+aggregation as the `metric` kwarg
+(`TermsFacet(field='tags', metric=A('max', field=timestamp))`). When
+specifying `metric` the results will be, by default, sorted in
+descending order by that metric. To change it to ascending specify
+`metric++_++sort="asc"` and to just sort by document count use
+`metric++_++sort=False`.
+
+====== Advanced
+
+If you require any custom behavior or modifications simply override one
+or more of the methods responsible for the class' functions:
+
+- `search(self)`:
+  is responsible for constructing the `Search` object used. Override
+  this if you want to customize the search object (for example by adding
+  a global filter for published articles only).
+- `query(self, search)`:
+  adds the query position of the search (if search input specified), by
+  default using `MultiField` query. Override this if you want to modify
+  the query type used.
+- `highlight(self, search)`:
+  defines the highlighting on the `Search` object and returns a new one.
+  Default behavior is to highlight on all fields specified for search.
+
+===== Usage
+
+The custom subclass can be instantiated empty to provide an empty search
+(matching everything) or with `query`, `filters` and `sort`.
+
+- `query`:
+  is used to pass in the text of the query to be performed. If `None` is
+  passed in (default) a `MatchAll` query will be used. For example
+  `'python web'`
+- `filters`:
+  is a dictionary containing all the facet filters that you wish to
+  apply. Use the name of the facet (from `.facets` attribute) as the key
+  and one of the possible values as value. For example
+  `++{++'tags': 'python'}`.
+- `sort`:
+  is a tuple or list of fields on which the results should be sorted.
+  The format of the individual fields are to be the same as those passed
+  to `~elasticsearch.dsl.Search.sort`.
+
+====== Response
+
+the response returned from the `FacetedSearch` object (by calling
+`.execute()`) is a subclass of the standard `Response` class that adds a
+property called `facets` which contains a dictionary with lists of
+buckets -each represented by a tuple of key, document count and a flag
+indicating whether this value has been filtered on.
+
+===== Example
+
+[source,python]
+----
+from datetime import date
+
+from elasticsearch.dsl import FacetedSearch, TermsFacet, DateHistogramFacet
+
+class BlogSearch(FacetedSearch):
+    doc_types = [Article, ]
+    # fields that should be searched
+    fields = ['tags', 'title', 'body']
+
+    facets = {
+        # use bucket aggregations to define facets
+        'tags': TermsFacet(field='tags'),
+        'publishing_frequency': DateHistogramFacet(field='published_from', interval='month')
+    }
+
+    def search(self):
+        # override methods to add custom pieces
+        s = super().search()
+        return s.filter('range', publish_from={'lte': 'now/h'})
+
+bs = BlogSearch('python web', {'publishing_frequency': date(2015, 6)})
+response = bs.execute()
+
+# access hits and other attributes as usual
+total = response.hits.total
+print('total hits', total.relation, total.value)
+for hit in response:
+    print(hit.meta.score, hit.title)
+
+for (tag, count, selected) in response.facets.tags:
+    print(tag, ' (SELECTED):' if selected else ':', count)
+
+for (month, count, selected) in response.facets.publishing_frequency:
+    print(month.strftime('%B %Y'), ' (SELECTED):' if selected else ':', count)
+----
diff --git a/docs/guide/dsl/howto.asciidoc b/docs/guide/dsl/howto.asciidoc
@@ -0,0 +1,7 @@
+=== How-To Guides
+
+include::search_dsl.asciidoc[]
+include::persistence.asciidoc[]
+include::faceted_search.asciidoc[]
+include::update_by_query.asciidoc[]
+include::asyncio.asciidoc[]