Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Elasticsearch source skeleton #148

Merged
merged 20 commits into from
Oct 13, 2015
Merged

Conversation

mgrauer
Copy link
Contributor

@mgrauer mgrauer commented Oct 9, 2015

@danlamanna

Can you QA this? It should give you a good start.

Also nicks the corner of #134 .

@aashish24
Copy link
Contributor

👏

@mgrauer mgrauer force-pushed the elasticsearch_source_skeleton branch from 10c5930 to 0d3d159 Compare October 10, 2015 00:37
@mgrauer
Copy link
Contributor Author

mgrauer commented Oct 10, 2015

I wanted to do one more thing, I'm done now.

@cpatrick
Copy link
Contributor

Ah memories 😸 .

@danlamanna
Copy link
Member

@mgrauer This looks great.

We agreed Elasticsearch as a source makes sense to have in core Minerva so others can use it. If it makes sense to you I'll implement a very rudimentary backend for minerva_elasticsearch_query and get this merged in.

This will require adding elasticsearch as a Minerva dependency, which is a pretty lightweight wrapper around urllib3.

Following that, I will create analyses specific to the Elasticsearch indices people using Terra will want.

Sound good?

@mgrauer
Copy link
Contributor Author

mgrauer commented Oct 10, 2015

This all sounds good to me. I think it makes the most sense to add that additional endpoint to elasticsearch_source.py.

I think we said that endpoint will do something like

  • create a Job (async local or Romanesco)
  • create a Dataset, tie it into the Job (there was some discussion about adding it to the meta/minerva object in the Job, we'll have to figure out the exact specifics of this later)
  • return the Job from the endpoint
  • in the async Job, bring back the query result, save it as a JSON file in the Dataset
  • create GeoJson from the JSON, as a collection of lat/long points, save the GeoJson file to the Dataset
  • set the minerva metadata similar to this

And let me know if you have any better ideas or questions

@danlamanna
Copy link
Member

My concern with putting that into master is that in general, Elasticsearch indices won't always contain geospatial information. I want to avoid putting anything into core that is specific to a particular domain, Memex HT data in this instance.

So if we create the general endpoint which just processes an ES query and puts the resulting JSON into a dataset, then the user can do whatever they want with it.

Then in Terra, I will create a specific analysis that takes that dataset and alters it using the code you've mentioned.

@mgrauer
Copy link
Contributor Author

mgrauer commented Oct 10, 2015

great, makes sense.

@danlamanna
Copy link
Member

@mgrauer This is ready for review, let me know if the general approach makes sense and feel free to be as nitpicky as you'd like :)

@mgrauer
Copy link
Contributor Author

mgrauer commented Oct 12, 2015

I don't get the purpose of the elasticsearch dataset endpoint, it seems like it isn't used. Can you remove it and the test? Or else explain its use to me.

The way I see it, you have endpoints that do everything you need already, without this other endpoint

  • create an elasticsearch source
  • run a query job on an elasticsearch source
    • create a job
    • create a JSON dataset
    • the job runs the query on the source and uploads the result to the JSON dataset

@danlamanna
Copy link
Member

I'll remove it, I originally added it to follow convention with the WMS endpoints.

}
metadata['minerva'] = minerva_metadata
self.model('item').setMetadata(dataset, metadata)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace

if 'meta' in dataset:
...
self.model('item').setMetadata(dataset, metadata)

with

minerva_metadata = {
    'dataset_type': 'json',  
    'source_id': params['sourceId'],
    'source': 'elasticsearch',
    'elasticsearch_params': elasticsearchParams,
    'base_url': baseURL
}
updateMinervaMetadata(dataset, minerva_metadata)

?

I know this wasn't very clear.

@mgrauer
Copy link
Contributor Author

mgrauer commented Oct 12, 2015

Will you want to convert this to a romanesco job? If so, please add an issue for that.

@danlamanna
Copy link
Member

@mgrauer Updated. This look good to you?

@mgrauer
Copy link
Contributor Author

mgrauer commented Oct 13, 2015

LGTM.

danlamanna added a commit that referenced this pull request Oct 13, 2015
@danlamanna danlamanna merged commit 90cc0d0 into master Oct 13, 2015
@danlamanna danlamanna deleted the elasticsearch_source_skeleton branch October 13, 2015 17:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants