Skip to content

Latest commit

 

History

History
141 lines (96 loc) · 5.9 KB

README.md

File metadata and controls

141 lines (96 loc) · 5.9 KB

DATO.RSS

A seamless RSS Search Engine experience with a hint of Machine Learning.

SEED

An SQL dump of the database with over 3 million entries extracted in over a year can be downloaded at https://davidesantangelo.gumroad.com/l/nkyymb

BETA

Dato.RSS is in beta, and will likely see many changes in the near future.

If you have comments or suggestions, please send them to us using the Issues TAB.

Thanks for trying the beta!

Alt Text

Search Engine: Quickly search through the millions of available RSS feeds.

RESTful API: Turns feed data into an awesome API. The API simplifies how you handle RSS, Atom, or JSON feeds. You can add and keep track of your favourite feed data with a simple, fast and clean REST API. All entries are enriched by Machine Learning and Semantic engines.

Example

curl 'https://<yourhost>/api/searches?q=news' | json_pp

{
  "data": [
    {
      "id": "86b0f829-e300-4eef-82e1-82f34d03aff6",
      "type": "entry",
      "attributes": {
        "title": "\"Pandemic, Infodemic\": 2 Cartoon Characters Battling Fake News In Assam",
        "url": "https://www.ndtv.com/india-news/coronavirus-pandemic-infodemic-2-cartoon-characters-battling-fake-news-in-assam-2222333",
        "published_at": 1588448805,
        "body": "An English daily in Assam's Guwahati has been publishing a cartoon strip to tackle the fake news related to the coronavirus pandemic. The two central characters- \"Pandemic and Infodemic\"- are being...<img src=\"http://feeds.feedburner.com/~r/NDTV-LatestNews/~4/lEmH201Q8jI\" height=\"1\" width=\"1\" alt=\"\"/>",
        "text": "An English daily in Assam's Guwahati has been publishing a cartoon strip to tackle the fake news related to the coronavirus pandemic. The two central characters- \"Pandemic and Infodemic\"- are being...",
        "categories": [
          "all india"
        ],
        "sentiment": null,
        "parent": {
          "id": "c97bdae6-b5d1-4966-b9f3-615e29d4d47d",
          "title": "NDTV News  -  Special",
          "url": "feed:http://feeds.feedburner.com/NDTV-LatestNews",
          "rank": 99
        },
        "tags": []
      },
      "relationships": {
        "feed": {
          "data": {
            "id": "c97bdae6-b5d1-4966-b9f3-615e29d4d47d",
            "type": "feed"
          }
        }
      }
    },
  ]
}

Search

Search is just implemented with Full Text Search Postgres feature.

I used the pg_search Gem, which can be used in two ways:

Multi Search: Search across multiple models and return a single array of results. Imagine having three models: Product, Brand, and Review. Using Multi Search we could search across all of them at the same time, seeing a single set of search results. This would be perfect for adding federated search functionality to your app.

Search Scope: Search within a single model, but with greater flexibility.

    execute <<-SQL
      ALTER TABLE entries
      ADD COLUMN searchable tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('simple', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('simple', coalesce(body,'')), 'B') ||
        setweight(to_tsvector('simple', coalesce(url,'')), 'C')
      ) STORED;
    SQL

Feed Rank

Feed Ranking is provided by openrank a free root domain authority metric based on the common search pagerank dataset. The value is normilized by

((Math.log10(domain_rank) / Math.log10(100)) * 100).round

Machine Learning

Machine Learning is provided by dandelion API Semantic Text Analytics as a service, from text to actionable data. Extract meaning from unstructured text and put it in context with a simple API.

Add Feed

You can add as many feeds as you want for the automatic crawler to handle.

https:///feeds/new

Wiki

All API documentation is in the Wiki section. Feel free to make it better, of course.

https://github.com/davidesantangelo/dato.rss/wiki

To use some features such as adding a new feed you need a token with write permission. Currently only I can enable it. In case contact me

Built With

  • Ruby on Rails — Our back end API is a Rails app. It responds to requests RESTfully in JSON.
  • PostgreSQL — Our main data store is in Postgres.
  • Redis — We use Redis as a cache and for transient data.
  • Feedjira — Feedjira is a Ruby library designed to parse feeds.
  • Dandelion — Semantic Text Analytics as a service.
  • Sidekiq — Simple, efficient background processing for Ruby.
  • JSON:API Serialization — A fast JSON:API serializer for Ruby Objects..
  • PgSearch — PgSearch builds named scopes that take advantage of PostgreSQL's full text search.
  • TailwindCSS — A utility-first CSS framework for rapidly building custom user interfaces.

Plus lots of Ruby Gems, a complete list of which is at /main/Gemfile.

Sponsor me

If you want to support me in server costs to keep dato.ess free and up, consider sponsorize! Thanks!

GitHub sponsor

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/davidesantangelo/dato.rss. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.