Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page Scraping Enhancer #26

Open
valentinedwv opened this issue Oct 31, 2017 · 1 comment
Open

Page Scraping Enhancer #26

valentinedwv opened this issue Oct 31, 2017 · 1 comment
Assignees

Comments

@valentinedwv
Copy link
Member

Grab URL from form page, scrape page for text for submitting to pipeline

@valentinedwv
Copy link
Member Author

Creating a service using sumy https://github.com/miso-belica/sumy and spyne

Thoughts:

  • two endpoints, get_summary_url, get_summary

get_summary_url (url=,method='LexRank',sentences_count=10,keywords=false)

  • uses sumy to get a summary from the url
  • calls scigraph on the aggreated text to get keywords

get_summary (text=,method='LexRank',sentences_count=10,keywords=false,isHtml=false)

  • uses sumy to get a summary from the string
  • calls scigraph on the aggreated text to get keywords

returns:

{ 
summary:[string,string]
keywords:[{string,reference},{string,reference}]
}

Deep dive:
Attempts to identify urls on page. retrieves if appropriate

  • known service (OGC, Swagger description)
  • known format (CSV, TSV, Excel)

@valentinedwv valentinedwv self-assigned this Nov 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant