Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As BrennerLEC technical expert I would like that the Open Data Hub can manage "physical" and "virtual" station locations associated to the low-cost air quality sensors used #287

Open
rcavaliere opened this issue Jul 18, 2024 · 11 comments
Assignees

Comments

@rcavaliere
Copy link
Member

rcavaliere commented Jul 18, 2024

We need to handle a particular case for the dataset air quality provided by A22 through their AUGE platform, supplied by algorab.

The low-cost sensors in use will be periodically disinstalled from their physical locations on the highway, brought to an intercalibration site in Trento, and then installed again on a physical location on the highway that could be different from the previous one.

What we need is to have a modeling of the physical locations on the highway, including the intercalibration site, and an association with the sensor (identified by the identifier AQxx) that is currently installed there. This should ideally be organized through this mapping table: https://github.com/noi-techpark/bdp-commons/blob/main/data-collectors/environment-a22/src/main/resources/mappings/stationMappings.csv

What we have to ensure is:

  • physical stations are always visible on the Open Data Hub
  • physical stations always provide the detail of the associated sensor which is currently installed
  • physical stations always provide the detail of the history about which sensor was installed when on that physical location (this should be possible with the metadata history)
@clezag
Copy link
Member

clezag commented Sep 13, 2024

@rcavaliere my proposal would be this kind of format for the csv:

station_id station_name latitude longitude sensor_id sensor_start
Stazione_KM140-605 Stazione_KM140-605 46.04227080945 11.11604421025 AIRQ10 01.01.2024
Stazione_KM140-605 Stazione_KM140-605 46.04227080945 11.11604421025 AIRQ15 01.05.2024
calibration_1 calibration_1 46.104338 11.110227 AIRQ10 01.05.2024
calibration_2 calibration_2 46.104338 11.110227 AIRQ15 05.03.2023

Since with the new architecture we will be able to replay history, we need to also track the history of which sensor was where, so that reimporting data does not associate to the wrong sensor.

When a sensor gets moved, just add a row to the csv with the new sensor and starting date.
You then also have to "remove" the sensor at it's previous location, by associating another or empty sensor to the old location.

In this example, we start with AIRQ 15 in calibration, and AIRQ10 set up at at KM 140.
AIRQ10 is then moved to calibration, and AIRQ15 is moved to KM 140 in it's place.
The intercalibration station where AIRQ15 was located is set to inactive because it has no sensor anymore.

An incidental advantage of this logic will be that we can add changes in advance, they don't have to be synchronized with the actual moving of the sensor.

I will also implement a small verification script, so that when we update this file, our CI/CD will first check it's validity to avoid overlapping dates or multiple sensors associated to the same physical station

What do you think?

@clezag
Copy link
Member

clezag commented Sep 13, 2024

@rcavaliere Will we deprecate the existing dataset in favor of completely new stations/names here?

@rcavaliere
Copy link
Member Author

@clezag if I got it well, the stations with a station_id will be always be flagged as active and available = TRUE as soon as they are in this CSV file; the information about the associated sensor and sensor_start in the metadata. Right? If yes, then absolutely OK for me. Once your are ready for the switch, then we deprecate the old file. What about historical data? I would suggest to also put in the new CSV file the information of the "old movements", if possible.

@clezag
Copy link
Member

clezag commented Sep 13, 2024

@rcavaliere

if I got it well, the stations with a station_id will be always be flagged as active and available = TRUE as soon as they are in this CSV file

I think more correct would be setting the stations that don't have any sensors attached to active=false, but that is something we can easily change.
But in general you are right, the opendatahub stations will be based on the CSV list, and we then just attach the data points we receive to the station according to sensor mapping.

the information about the associated sensor and sensor_start in the metadata. Right?

I would set the currently attached sensor_ID as a single metadata field so that people can easily filter for it. But since we have it in the CSV already, we could in addition add the whole sensor history as a separate field.

What about historical data? I would suggest to also put in the new CSV file the information of the "old movements", if possible

One issue will be that the current station codes are in fact the codes of the sensors (AIRQ10 etc.), so that probably has to change if we disassociate sensors from physical stations. Do we make a new set of stations and migrate the data over?
We can also maintain the old codes, but it could be confusing to users when the AIRQ10 station has the AIRQ15 sensor attached and the AIRQ10 sensor is somewhere else.
If we migrate, then I agree on also recording the old movements

@rcavaliere
Copy link
Member Author

@clezag OK, let me make some further thoughts during the week-end about your proposal...

@rcavaliere
Copy link
Member Author

@clezag additional feedback from my side. The proposal is in general absolute OK for me, so let's go in this direction.
For the historical data: what is relevant is at present the reference to the field stationcode, which the information of the exact sensor. What I could provide is the information about which sensor was installed where during time. We can then convert this information in the new CSV, as you proposed. We have then to assign the historical data to the new stations using the stationcode as key, I think this should work (this would be probably a manual task, which we will make once). What do you think?

@rcavaliere
Copy link
Member Author

Waiting for A22 to provide additional grant access to MQTT broker.

@clezag
Copy link
Member

clezag commented Oct 24, 2024

@rcavaliere I've migrated the data collector to the new infrastructure, and implemented the virtual stations.
You can take a look here:
https://analytics.dev.testingmachine.eu/

Note that the old stations are still there, but not updated, and the elaborations are not active in this environment.

The CSV file with the stations is here: https://github.com/noi-techpark/opendatahub-collectors/blob/main/transformers/environment-a22/resources/stations.csv
You can see an example of sensor history for station A22_KM087-875 (next to airport BZ)
In the CSV file there are three lines for this station, each with a separate sensor and start date:

A22_KM087-875,Stazione_KM087-875,46.453644131,11.30604736775,AIRQ23,2020.01.01
A22_KM087-875,Stazione_KM087-875,46.453644131,11.30604736775,AIRQ24,2022.01.01
A22_KM087-875,Stazione_KM087-875,46.453644131,11.30604736775,AIRQ25,2023.01.01

Resulting station looks like this:

curl 'https://mobility.api.dev.testingmachine.eu/tree/EnvironmentStation?where=scode.eq.%22A22_KM087-875%22'
          "sactive": true,
          "savailable": true,
          "scode": "A22_KM087-875",
          "scoordinate": {
            "x": 11.306047,
            "y": 46.453644,
            "srid": 4326
          },
          "smetadata": {
            "sensor_id": "AIRQ25",
            "sensor_history": [
              {
                "id": "AIRQ23",
                "end": "2022-01-01",
                "start": "2020-01-01"
              },
              {
                "id": "AIRQ24",
                "end": "2023-01-01",
                "start": "2022-01-01"
              },
              {
                "id": "AIRQ25",
                "end": "",
                "start": "2023-01-01"
              }
            ]
          },
          "sname": "Stazione_KM087-875",
          "sorigin": "a22-algorab",
          "stype": "EnvironmentStation"

If you are happy with this, next steps would be preparing the actual CSV containing the sensor changes, and then the migration of old measurements to the new stations

@rcavaliere
Copy link
Member Author

@clezag that's perfect! For me it's everything OK, I don't have comments on this. Shall then I put the real history of the sensors in the CSV, isn't it?

@clezag
Copy link
Member

clezag commented Oct 25, 2024

@rcavaliere yes, please. But it's not blocking currently, I also still have to prepare the data migration from the old stations.
EDIT: Sorry, it is blocking, since I don't know where to migrate without having the sensor information

For the migration, I will move (not copy) the data, and make the old stations unavailable (not just inactive), right?

@rcavaliere
Copy link
Member Author

@clezag right, let's proceed with all these steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants