-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d2f93eb
commit 59ab1ea
Showing
11 changed files
with
2,318 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
[[asyncio]] | ||
==== Using asyncio with Elasticsearch DSL | ||
|
||
The DSL module supports async/await with | ||
https://docs.python.org/3/library/asyncio.html[asyncio]. To ensure that | ||
you have all the required dependencies, install the `++[++async++]++` | ||
extra: | ||
|
||
[source,bash] | ||
---- | ||
$ python -m pip install elasticsearch-dsl[async] | ||
---- | ||
|
||
===== Connections | ||
|
||
Use the `async++_++connections` module to manage your asynchronous | ||
connections. | ||
|
||
[source,python] | ||
---- | ||
from elasticsearch.dsl import async_connections | ||
async_connections.create_connection(hosts=['localhost'], timeout=20) | ||
---- | ||
|
||
All the options available in the `connections` module can be used with | ||
`async++_++connections`. | ||
|
||
====== How to avoid 'Unclosed client session / connector' warnings on exit | ||
|
||
These warnings come from the `aiohttp` package, which is used internally | ||
by the `AsyncElasticsearch` client. They appear often when the | ||
application exits and are caused by HTTP connections that are open when | ||
they are garbage collected. To avoid these warnings, make sure that you | ||
close your connections. | ||
|
||
[source,python] | ||
---- | ||
es = async_connections.get_connection() | ||
await es.close() | ||
---- | ||
|
||
===== Search DSL | ||
|
||
Use the `AsyncSearch` class to perform asynchronous searches. | ||
|
||
[source,python] | ||
---- | ||
from elasticsearch.dsl import AsyncSearch | ||
s = AsyncSearch().query("match", title="python") | ||
async for hit in s: | ||
print(hit.title) | ||
---- | ||
|
||
Instead of using the `AsyncSearch` object as an asynchronous iterator, | ||
you can explicitly call the `execute()` method to get a `Response` | ||
object. | ||
|
||
[source,python] | ||
---- | ||
s = AsyncSearch().query("match", title="python") | ||
response = await s.execute() | ||
for hit in response: | ||
print(hit.title) | ||
---- | ||
|
||
An `AsyncMultiSearch` is available as well. | ||
|
||
[source,python] | ||
---- | ||
from elasticsearch.dsl import AsyncMultiSearch | ||
ms = AsyncMultiSearch(index='blogs') | ||
ms = ms.add(AsyncSearch().filter('term', tags='python')) | ||
ms = ms.add(AsyncSearch().filter('term', tags='elasticsearch')) | ||
responses = await ms.execute() | ||
for response in responses: | ||
print("Results for query %r." % response.search.query) | ||
for hit in response: | ||
print(hit.title) | ||
---- | ||
|
||
===== Asynchronous Documents, Indexes, and more | ||
|
||
The `Document`, `Index`, `IndexTemplate`, `Mapping`, `UpdateByQuery` and | ||
`FacetedSearch` classes all have asynchronous versions that use the same | ||
name with an `Async` prefix. These classes expose the same interfaces as | ||
the synchronous versions, but any methods that perform I/O are defined | ||
as coroutines. | ||
|
||
Auxiliary classes that do not perform I/O do not have asynchronous | ||
versions. The same classes can be used in synchronous and asynchronous | ||
applications. | ||
|
||
When using a `custom analyzer ++<++Analysis++>++` in an asynchronous | ||
application, use the `async++_++simulate()` method to invoke the Analyze | ||
API on it. | ||
|
||
Consult the `api` section for details about each specific method. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
=== Configuration | ||
|
||
There are several ways to configure connections for the library. The | ||
easiest and most useful approach is to define one default connection | ||
that can be used every time an API call is made without explicitly | ||
passing in other connections. | ||
|
||
[NOTE] | ||
==== | ||
Unless you want to access multiple clusters from your application, it is | ||
highly recommended that you use the `create++_++connection` method and | ||
all operations will use that connection automatically. | ||
==== | ||
|
||
==== Default connection | ||
|
||
To define a default connection that can be used globally, use the | ||
`connections` module and the `create++_++connection` method like this: | ||
|
||
[source,python] | ||
---- | ||
from elasticsearch.dsl import connections | ||
connections.create_connection(hosts=['localhost'], timeout=20) | ||
---- | ||
|
||
===== Single connection with an alias | ||
|
||
You can define the `alias` or name of a connection so you can easily | ||
refer to it later. The default value for `alias` is `default`. | ||
|
||
[source,python] | ||
---- | ||
from elasticsearch.dsl import connections | ||
connections.create_connection(alias='my_new_connection', hosts=['localhost'], timeout=60) | ||
---- | ||
|
||
Additional keyword arguments (`hosts` and `timeout` in our example) will | ||
be passed to the `Elasticsearch` class from `elasticsearch-py`. | ||
|
||
To see all possible configuration options refer to the | ||
https://elasticsearch-py.readthedocs.io/en/latest/api/elasticsearch.html[documentation]. | ||
|
||
==== Multiple clusters | ||
|
||
You can define multiple connections to multiple clusters at the same | ||
time using the `configure` method: | ||
|
||
[source,python] | ||
---- | ||
from elasticsearch.dsl import connections | ||
connections.configure( | ||
default={'hosts': 'localhost'}, | ||
dev={ | ||
'hosts': ['esdev1.example.com:9200'], | ||
'sniff_on_start': True | ||
} | ||
) | ||
---- | ||
|
||
Such connections will be constructed lazily when requested for the first | ||
time. | ||
|
||
You can alternatively define multiple connections by adding them one by | ||
one as shown in the following example: | ||
|
||
[source,python] | ||
---- | ||
# if you have configuration options to be passed to Elasticsearch.__init__ | ||
# this also shows creating a connection with the alias 'qa' | ||
connections.create_connection('qa', hosts=['esqa1.example.com'], sniff_on_start=True) | ||
# if you already have an Elasticsearch instance ready | ||
connections.add_connection('another_qa', my_client) | ||
---- | ||
|
||
===== Using aliases | ||
|
||
When using multiple connections, you can refer to them using the string | ||
alias specified when you created the connection. | ||
|
||
This example shows how to use an alias to a connection: | ||
|
||
[source,python] | ||
---- | ||
s = Search(using='qa') | ||
---- | ||
|
||
A `KeyError` will be raised if there is no connection registered with | ||
that alias. | ||
|
||
==== Manual | ||
|
||
If you don't want to supply a global configuration, you can always pass | ||
in your own connection as an instance of `elasticsearch.Elasticsearch` | ||
with the parameter `using` wherever it is accepted like this: | ||
|
||
[source,python] | ||
---- | ||
s = Search(using=Elasticsearch('localhost')) | ||
---- | ||
|
||
You can even use this approach to override any connection the object | ||
might be already associated with: | ||
|
||
[source,python] | ||
---- | ||
s = s.using(Elasticsearch('otherhost:9200')) | ||
---- | ||
|
||
[NOTE] | ||
==== | ||
When using the `dsl` module, it is highly recommended that you | ||
use the built-in serializer | ||
(`elasticsearch.dsl.serializer.serializer`) to ensure your objects | ||
are correctly serialized into `JSON` every time. The | ||
`create++_++connection` method that is described here (and that the | ||
`configure` method uses under the hood) will do that automatically for | ||
you, unless you explicitly specify your own serializer. The built-in | ||
serializer also allows you to serialize your own objects - just define a | ||
`to++_++dict()` method on your objects and that method will be | ||
automatically called when serializing your custom objects to `JSON`. | ||
==== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
=== Examples | ||
|
||
Please see the | ||
https://github.com/elastic/elasticsearch-py/tree/master/examples/dsl[examples] | ||
directory to see some complex examples using `elasticsearch-dsl`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
[[faceted_search]] | ||
==== Faceted Search | ||
|
||
The library comes with a simple abstraction aimed at helping you develop | ||
faceted navigation for your data. | ||
|
||
[NOTE] | ||
==== | ||
This API is experimental and will be subject to change. Any feedback is | ||
welcome. | ||
==== | ||
|
||
===== Configuration | ||
|
||
You can provide several configuration options (as class attributes) when | ||
declaring a `FacetedSearch` subclass: | ||
|
||
- `index`: | ||
the name of the index (as string) to search through, defaults to | ||
`'++_++all'`. | ||
- `doc++_++types`: | ||
list of `Document` subclasses or strings to be used, defaults to | ||
`++[++'++_++all'++]++`. | ||
- `fields`: | ||
list of fields on the document type to search through. The list will | ||
be passes to `MultiMatch` query so can contain boost values | ||
(`'title^5'`), defaults to `++[++'++*++'++]++`. | ||
- `facets`: | ||
dictionary of facets to display/filter on. The key is the name | ||
displayed and values should be instances of any `Facet` subclass, for | ||
example: `++{++'tags': TermsFacet(field='tags')}` | ||
|
||
====== Facets | ||
|
||
There are several different facets available: | ||
|
||
- `TermsFacet`: | ||
provides an option to split documents into groups based on a value of | ||
a field, for example `TermsFacet(field='category')` | ||
- `DateHistogramFacet`: | ||
split documents into time intervals, example: | ||
`DateHistogramFacet(field="published++_++date", calendar++_++interval="day")` | ||
- `HistogramFacet`: | ||
similar to `DateHistogramFacet` but for numerical values: | ||
`HistogramFacet(field="rating", interval=2)` | ||
- `RangeFacet`: | ||
allows you to define your own ranges for a numerical fields: | ||
`RangeFacet(field="comment++_++count", ranges=++[++("few", (None, 2)), ("lots", (2, None))++]++)` | ||
- `NestedFacet`: | ||
is just a simple facet that wraps another to provide access to nested | ||
documents: | ||
`NestedFacet('variants', TermsFacet(field='variants.color'))` | ||
|
||
By default facet results will only calculate document count, if you wish | ||
for a different metric you can pass in any single value metric | ||
aggregation as the `metric` kwarg | ||
(`TermsFacet(field='tags', metric=A('max', field=timestamp))`). When | ||
specifying `metric` the results will be, by default, sorted in | ||
descending order by that metric. To change it to ascending specify | ||
`metric++_++sort="asc"` and to just sort by document count use | ||
`metric++_++sort=False`. | ||
|
||
====== Advanced | ||
|
||
If you require any custom behavior or modifications simply override one | ||
or more of the methods responsible for the class' functions: | ||
|
||
- `search(self)`: | ||
is responsible for constructing the `Search` object used. Override | ||
this if you want to customize the search object (for example by adding | ||
a global filter for published articles only). | ||
- `query(self, search)`: | ||
adds the query position of the search (if search input specified), by | ||
default using `MultiField` query. Override this if you want to modify | ||
the query type used. | ||
- `highlight(self, search)`: | ||
defines the highlighting on the `Search` object and returns a new one. | ||
Default behavior is to highlight on all fields specified for search. | ||
|
||
===== Usage | ||
|
||
The custom subclass can be instantiated empty to provide an empty search | ||
(matching everything) or with `query`, `filters` and `sort`. | ||
|
||
- `query`: | ||
is used to pass in the text of the query to be performed. If `None` is | ||
passed in (default) a `MatchAll` query will be used. For example | ||
`'python web'` | ||
- `filters`: | ||
is a dictionary containing all the facet filters that you wish to | ||
apply. Use the name of the facet (from `.facets` attribute) as the key | ||
and one of the possible values as value. For example | ||
`++{++'tags': 'python'}`. | ||
- `sort`: | ||
is a tuple or list of fields on which the results should be sorted. | ||
The format of the individual fields are to be the same as those passed | ||
to `~elasticsearch.dsl.Search.sort`. | ||
|
||
====== Response | ||
|
||
the response returned from the `FacetedSearch` object (by calling | ||
`.execute()`) is a subclass of the standard `Response` class that adds a | ||
property called `facets` which contains a dictionary with lists of | ||
buckets -each represented by a tuple of key, document count and a flag | ||
indicating whether this value has been filtered on. | ||
|
||
===== Example | ||
|
||
[source,python] | ||
---- | ||
from datetime import date | ||
from elasticsearch.dsl import FacetedSearch, TermsFacet, DateHistogramFacet | ||
class BlogSearch(FacetedSearch): | ||
doc_types = [Article, ] | ||
# fields that should be searched | ||
fields = ['tags', 'title', 'body'] | ||
facets = { | ||
# use bucket aggregations to define facets | ||
'tags': TermsFacet(field='tags'), | ||
'publishing_frequency': DateHistogramFacet(field='published_from', interval='month') | ||
} | ||
def search(self): | ||
# override methods to add custom pieces | ||
s = super().search() | ||
return s.filter('range', publish_from={'lte': 'now/h'}) | ||
bs = BlogSearch('python web', {'publishing_frequency': date(2015, 6)}) | ||
response = bs.execute() | ||
# access hits and other attributes as usual | ||
total = response.hits.total | ||
print('total hits', total.relation, total.value) | ||
for hit in response: | ||
print(hit.meta.score, hit.title) | ||
for (tag, count, selected) in response.facets.tags: | ||
print(tag, ' (SELECTED):' if selected else ':', count) | ||
for (month, count, selected) in response.facets.publishing_frequency: | ||
print(month.strftime('%B %Y'), ' (SELECTED):' if selected else ':', count) | ||
---- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
=== How-To Guides | ||
|
||
include::search_dsl.asciidoc[] | ||
include::persistence.asciidoc[] | ||
include::faceted_search.asciidoc[] | ||
include::update_by_query.asciidoc[] | ||
include::asyncio.asciidoc[] |
Oops, something went wrong.