Skip to content

Data Gathering

Calvin J Lin edited this page Jul 7, 2020 · 1 revision

Data Gathering

Currently, all data for this project is obtained from the TCEQ and PurpleAir websites. It's all publicly available.

PurpleAir

Data for the PurpleAir sensors used in this project can be obtained using the web interface on their website. However, since that is both a somewhat time-consuming and laborious process, we decided to use their Thingspeak API as described in this document and download the data through a Python script. The steps we took to create the script are described below.

  1. Before you can download any data through Thingspeak, you must determine the channel ID and api key for each sensor channel that you want to download data from. If you have only a few sensors, you just go to the PurpleAir map and click the JSON link under the Get this widget option for each sensor. The IDs and key's for UT's sensors can be found at the below:
https://www.purpleair.com/json?exclude=true&key=null&show=null&nwlat=30.291268505204116&selat=30.272526603783206&nwlng=-97.7717631299262&selng=-97.72423886855452

You should also note that this JSON file contains location data and other metadata which will not be provided by Thingspeak.

  1. Make multiple HTTP GET request to the Thingspeak API as described here. You can download up to 8000 entries per request which amounts to about slightly over 11 days worth of data at PurpleAir's highest frequency of 2 minutes. If you obtaining more than 11 days of data, you will have to make multiple GET requests and merge the data in your software later.

  2. Merge the data in Pandas.

  3. Replace the generic Thingspeak headers with the correct ones used by PurpleAir.

  4. Generate a filename using the metadata contained in the JSON file referenced earlier.

Clone this wiki locally