Skip to content

Demo repo on how to use Bano with elastic to do data enrichment

Notifications You must be signed in to change notification settings

dadoonet/bano-elastic

Repository files navigation

Demo scripts used for Bano Talk

Come and learn how you can enrich your existing data with normalized postal addresses with geo location points thanks to open data and BANO project.

Most of the time postal addresses from our customers or users are not very well formatted or defined in our information systems. And it can become a nightmare if you are a call center employee for example and want to find a customer by its address. Imagine as well how a sales service could easily put on a map where are located the customers and where they can open a new shop...

Let's take a simple example:

{
  "name": "Joe Smith",
  "address": {
    "number": "23",
    "street_name": "r verdiere",
    "city": "rochelle",
    "country": "France"
  }
}

Or the opposite. I do have the coordinates but I can't tell what is the postal address corresponding to it:

{
  "name": "Joe Smith",
  "location": {
    "lat": 46.15735,
    "lon": -1.1551
  }
}

In this live coding session, I will show you how to solve all those questions using the Elastic stack.

Setup

Run on cloud (recommended)

This specific configuration is used to ingest the whole bano dataset on a cloud instance. You need to create a .cloud local file which contains:

CLOUD_ID=the_cloud_id_you_can_read_from_cloud_console
CLOUD_PASSWORD=the_generated_elastic_password

Run:

./setup.sh

Run Locally

Run Elastic Stack:

echo docker-compose down -v
echo docker-compose up

And run:

./setup.sh

Inject the whole dataset

Run:

./inject-all.sh

Open Kibana.

Go to the dev tools application and check that data is coming with:

GET bano-*/_count

Go to the map application and check that data is coming.

Side notes

The csv fields we want to extract are:

  • _id
  • address.number
  • address.street_name
  • address.zipcode
  • address.city
  • source
  • location.lat
  • location.lon

The fields we can remove are:

  • @timestamp
  • input
  • ecs
  • host
  • agent
  • message

Logstash pipeline

The pipeline reads data from Elasticsearch (person index) and enriches it with the BANO dataset from Elasticsearch (bano-* indices) and write the corrected data to Elasticsearch (person-new index).

The following dashboard shows the corrected dataset.

Dashboard

About

Demo repo on how to use Bano with elastic to do data enrichment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages