Skip to content

Latest commit

 

History

History
44 lines (38 loc) · 1.81 KB

README.md

File metadata and controls

44 lines (38 loc) · 1.81 KB

elasticstream

Lightweight (no dependencies) Python script/library that streams from Elasticsearch documents to newline-delimited JSON or CSV output

bin/elasticstream usage

usage: elasticstream [-h] [-u URL] [-i INDEX] [-k KEEPALIVE] [-s SIZE]
                     [-d DSL] [-f {jsonl,csv}] [-o OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  -u URL, --url URL     Elasticsearch REST endpoint URL (default:
                        localhost:9200)
  -i INDEX, --index INDEX
                        Elasticsearch index to stream (wildcards allowed)
                        (default: *)
  -k KEEPALIVE, --keepalive KEEPALIVE
                        Duration to keep scroll alive. Tune according to
                        --size (default: 1m)
  -s SIZE, --size SIZE  Number of hits to scroll at once. Tune according to
                        sharding (default: 10)
  -d DSL, --dsl DSL     Elasticsearch DSL query or @file (e.g., "curl -d"
                        syntax) containing query (default: {"query":
                        {"match_all": {}}})
  -f {ndjson,csv}, --format {ndjson,csv}
                        Output format. Newline-delimited JSON or CSV (default:
                        ndjson)
  -o OUTPUT, --output OUTPUT
                        Output file to stream to. If not stdout, prints
                        progress to stdout (default: <open file '<stdout>',
                        mode 'w' at 0x1082fd150>)

elasticstream.ElasticStream usage

from elasticstream import ElasticStream

url = 'localhost:9200'
index = '*'
dsl = '{"query": {"match_all": {}}}'

stream = ElasticStream(url, index, dsl, scroll_keepalive='1m', scroll_size='10')

for hit in stream:
    print hit