Awesome Streaming Data Sources

A curated list of public data feeds (changing data sources). Emphasis is given to real-time sources, preferably through modern streaming protocols, but any quality data source is welcome.

[NOTE: There might be some disagreements on calling polled feeds (HTTP short-polling, for example) as "streaming". My intent is to collect sources that should be processed in a streaming fashion, close to real-time, possibly without access to a historical dataset. As per Wikipedia, "Streaming data is data that is continuously generated by different sources. Such data should be processed incrementally using stream processing techniques without having access to all of the data."(https://en.wikipedia.org/wiki/Streaming_data)]

As most of the feeds are currently available exclusively via HTTP polling, I plan to ingest those sources into a streaming platform like Ably Hub or a data integrator (as soon as I have time) and provide them also through a streaming protocol. On top of that, I'm employing "git scraping" (Flat Data) to keep a historical view of relevant feeds. You can find more information on scraped-streaming-data-sources.

Would you like to help? Please open an issue or discussion thread.

Weather

NOAA National Data Buoy Center - Latest Observations

[PROTOCOL: HTTP; PAYLOAD: TSV; MAX. RESOLUTION: 5 min]

💾 Git Scraped Data

⚖️ Public Domain License

Measurement Descriptions and Units

"This file has the most recent observation(provided that the observation is less than two hours old)from all stations hosted on the NDBC website. Since this file has multiple stations, it also contains each station's position information (latitude and longitude). The file is relatively small, less than 100KB, and is updated approximately every 5 minutes, so it would be a good data source if you are interested in meteorological observations from multiple stations." More Information
NOAA Weather Wire Service - Open Interface 🔑

[PROTOCOL: XMPP; PAYLOAD: XML; MAX. RESOLUTION: ???]

⚖️ Public Domain License

"NWWS is the fastest method to receive text alerts, warnings, advisories, and weather information from the National Weather Service (NWS) within 10 seconds of when the texts are issued." More Information
Netatmo Weather API

[PROTOCOL: HTTP (API); PAYLOAD: JSON; MAX. RESOLUTION: 5/10 min]

⚖️ Terms of Use

Publicly available data from Netatmo Weather stations.

Finance

Binance Websocket Live Market Streams

[PROTOCOL: WebSocket; PAYLOAD: JSON; MAX. RESOLUTION: 100ms]

⚖️ Terms of Use

This stream provides real-time updates on the latest price of all symbols on Binance.

Industrial

The Open Industrial Data Project 🔑

[PROTOCOL: HTTP(API/SDK/Spark); PAYLOAD: JSON; MAX. RESOLUTION: ~20s]

NOTE: Data is delayed by 7 days or more.

The data originates from a single compressor on Aker BP’s Valhall oil platform in the North Sea. Aker BP selected the first stage compressor on the Valhall because it is a subsystem with clearly defined boundaries, rich in time series and maintenance data.

The data set available in the Cognite Data Platform includes time series data, maintenance history, and Process & Instrumentation Diagrams (P&IDs) for Valhall’s first stage compressor and associated process equipment: first stage suction cooler, first stage suction scrubber, first stage compressor and first stage discharge coolers.In addition, data from the compressor’s lubrication system, dry gas seal system and condition monitoring system (temperature and vibration) will be available.

Energy

Balancing Mechanism Reporting Service (BMRS) 🔑

[PROTOCOL: HTTP(API)/OpenWire/STOMP/AMPQ; PAYLOAD: XML/CSV; MAX. RESOLUTION: ???]

This is the primary channel for providing operational data relating to the Great Britain Electricity Balancing and Settlement arrangements. It is used extensively by market participants to help make trading decisions and understanding market dynamics and acts as a prompt reporting platform as well as a means of accessing historic data.
PV_Live API

[PROTOCOL: HTTP(API); PAYLOAD: JSON/CSV; MAX. RESOLUTION: 5 min]

The PV_Live web API provides access to near-real-time and historical estimates of PV generation on the Great Britain transmission network.

Transport and Travel

Transport for London Open Data Unified API 🔑

[PROTOCOL: WebSocket(SignalR)/HTTP(API); PAYLOAD: JSON/XML; MAX. RESOLUTION: 5s]

All public TfL data (or "open data") is freely released for developers to use in their software and services. You can learn more at TfL Digital Blog
Open Rail Data (Darwin) 🔑

[PROTOCOL: HTTP(API)/OpenWire/STOMP; PAYLOAD: JSON/XML/CIF; MAX. RESOLUTION: ???]

National Rail Enquiries (NRE) built their Darwin system in the early to mid 2000s to improve the level of accuracy of information displayed to passengers. Darwin should be considered the single source of truth for passenger information, and it feeds information to nearly all customer information systems at stations.

Darwin uses its own internal algorithms to forecast arrival and departure times along a train's route. It can also record additional and skipped stops before they happen, as well as report known delays, for example, if a train will leave its origin late due to awaiting train crew.

Check the Open Rail Data Wiki for more information.
Network Rail feeds (TRUST) 🔑

[PROTOCOL: HTTP(API)/OpenWire/STOMP; PAYLOAD: JSON/XML/CIF; MAX. RESOLUTION: 100Hz (5s batches)]

TRUST's primary purpose is to act as a historical record of train movements, allowing comparison between scheduled and actual times, as well as to record cancellations. When a delay - a change in lateness between two TRUST reporting points - of over a certain threshold occurs, a separate system called TRUST DA (Delay Attribution) requires that delay be explained and attributed. This data is not available in real-time at present, nor is it updated in real-time, and a delay may be reattributed many times until is it agreed.

Check the Open Rail Data Wiki for more information.

Others

PubNub Sample Real-time Data Streams 🔑

Twitter Firehose sample, Hacker News articles, Wikipedia changes, and some simulated streams.
Ably Hub - Open Data Streaming Program 🔑

[PROTOCOL: HTTP/SSE/WebSocket/MQTT/Others; PAYLOAD: JSON/CSV; MAX. RESOLUTION: 5 min]

Ably supports open data sources in the Hub offering the API Streamer for free to open data producers and consumers, like BBC News, BART, Coindesk, MTBA, OpenWeatherMap, and more.

Related Lists

Awesome Public Datasets - A list of high quality public, mainly historical, datasets.
Awesome Streaming - A curated list of awesome streaming (stream processing) frameworks, applications, readings and other resources.
Public APIs - A collective list of free APIs for use in software and web development.

Please Contribute

Contributions are welcome! Please, read the contribution guidelines first.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitattributes		.gitattributes
code-of-conduct.md		code-of-conduct.md
contributing.md		contributing.md
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Streaming Data Sources

Contents

Weather

Finance

Industrial

Energy

Transport and Travel

Others

Related Lists

Please Contribute

About

apgiorgi/awesome-streaming-data-sources

Folders and files

Latest commit

History

Repository files navigation

Awesome Streaming Data Sources

Contents

Weather

Finance

Industrial

Energy

Transport and Travel

Others

Related Lists

Please Contribute

About

Topics

Resources

Code of conduct

Stars

Watchers

Forks