Skip to content

apgiorgi/awesome-streaming-data-sources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Streaming Data Sources Awesome

A curated list of public data feeds (changing data sources). Emphasis is given to real-time sources, preferably through modern streaming protocols, but any quality data source is welcome.

[NOTE: There might be some disagreements on calling polled feeds (HTTP short-polling, for example) as "streaming". My intent is to collect sources that should be processed in a streaming fashion, close to real-time, possibly without access to a historical dataset. As per Wikipedia, "Streaming data is data that is continuously generated by different sources. Such data should be processed incrementally using stream processing techniques without having access to all of the data."(https://en.wikipedia.org/wiki/Streaming_data)]

As most of the feeds are currently available exclusively via HTTP polling, I plan to ingest those sources into a streaming platform like Ably Hub or a data integrator (as soon as I have time) and provide them also through a streaming protocol. On top of that, I'm employing "git scraping" (Flat Data) to keep a historical view of relevant feeds. You can find more information on scraped-streaming-data-sources.

Would you like to help? Please open an issue or discussion thread.

Contents

🔑 - Registration || API Key required

Weather

Finance

Industrial

  • The Open Industrial Data Project 🔑

    [PROTOCOL: HTTP(API/SDK/Spark); PAYLOAD: JSON; MAX. RESOLUTION: ~20s]

    NOTE: Data is delayed by 7 days or more.

    The data originates from a single compressor on Aker BP’s Valhall oil platform in the North Sea. Aker BP selected the first stage compressor on the Valhall because it is a subsystem with clearly defined boundaries, rich in time series and maintenance data.

    The data set available in the Cognite Data Platform includes time series data, maintenance history, and Process & Instrumentation Diagrams (P&IDs) for Valhall’s first stage compressor and associated process equipment: first stage suction cooler, first stage suction scrubber, first stage compressor and first stage discharge coolers.In addition, data from the compressor’s lubrication system, dry gas seal system and condition monitoring system (temperature and vibration) will be available.

Energy

  • Balancing Mechanism Reporting Service (BMRS) 🔑

    [PROTOCOL: HTTP(API)/OpenWire/STOMP/AMPQ; PAYLOAD: XML/CSV; MAX. RESOLUTION: ???]

    This is the primary channel for providing operational data relating to the Great Britain Electricity Balancing and Settlement arrangements. It is used extensively by market participants to help make trading decisions and understanding market dynamics and acts as a prompt reporting platform as well as a means of accessing historic data.

  • PV_Live API

    [PROTOCOL: HTTP(API); PAYLOAD: JSON/CSV; MAX. RESOLUTION: 5 min]

    The PV_Live web API provides access to near-real-time and historical estimates of PV generation on the Great Britain transmission network.

Transport and Travel

  • Transport for London Open Data Unified API 🔑

    [PROTOCOL: WebSocket(SignalR)/HTTP(API); PAYLOAD: JSON/XML; MAX. RESOLUTION: 5s]

    All public TfL data (or "open data") is freely released for developers to use in their software and services. You can learn more at TfL Digital Blog

  • Open Rail Data (Darwin) 🔑

    [PROTOCOL: HTTP(API)/OpenWire/STOMP; PAYLOAD: JSON/XML/CIF; MAX. RESOLUTION: ???]

    National Rail Enquiries (NRE) built their Darwin system in the early to mid 2000s to improve the level of accuracy of information displayed to passengers. Darwin should be considered the single source of truth for passenger information, and it feeds information to nearly all customer information systems at stations.

    Darwin uses its own internal algorithms to forecast arrival and departure times along a train's route. It can also record additional and skipped stops before they happen, as well as report known delays, for example, if a train will leave its origin late due to awaiting train crew.

    Check the Open Rail Data Wiki for more information.

  • Network Rail feeds (TRUST) 🔑

    [PROTOCOL: HTTP(API)/OpenWire/STOMP; PAYLOAD: JSON/XML/CIF; MAX. RESOLUTION: 100Hz (5s batches)]

    TRUST's primary purpose is to act as a historical record of train movements, allowing comparison between scheduled and actual times, as well as to record cancellations. When a delay - a change in lateness between two TRUST reporting points - of over a certain threshold occurs, a separate system called TRUST DA (Delay Attribution) requires that delay be explained and attributed. This data is not available in real-time at present, nor is it updated in real-time, and a delay may be reattributed many times until is it agreed.

    Check the Open Rail Data Wiki for more information.

Others

  • PubNub Sample Real-time Data Streams 🔑

    Twitter Firehose sample, Hacker News articles, Wikipedia changes, and some simulated streams.

  • Ably Hub - Open Data Streaming Program 🔑

    [PROTOCOL: HTTP/SSE/WebSocket/MQTT/Others; PAYLOAD: JSON/CSV; MAX. RESOLUTION: 5 min]

    Ably supports open data sources in the Hub offering the API Streamer for free to open data producers and consumers, like BBC News, BART, Coindesk, MTBA, OpenWeatherMap, and more.

Related Lists

  • Awesome Public Datasets - A list of high quality public, mainly historical, datasets.
  • Awesome Streaming - A curated list of awesome streaming (stream processing) frameworks, applications, readings and other resources.
  • Public APIs - A collective list of free APIs for use in software and web development.

Please Contribute

Contributions are welcome! Please, read the contribution guidelines first.

About

A curated list of public data feeds (changing data sources).

Topics

Resources

Code of conduct

Stars

Watchers

Forks