EPIC: as traffic manager I would like to integrate A22 "traffic events" in the Open Data Hub #263

rcavaliere · 2021-03-04T10:18:54Z

This user story deals with the integration of the A22 traffic events in the ODH. The implementation is implemented by 1006.org and Catch&Solve. As far as the ODH data model is concerned, my suggestion is to consider an "event" like a station (since it is characterized by coordinates) and to use the metadata to characterize is. In this case we could have the situation that this type of stations does not have types and measurements associated to it.

Specification document:
201202_RichiestaOfferte_EventiStradali.pdf

Tasks covered by this epic:

Implementation of the connector to the A22 interface (by 1006.org)
specification of the adapter with the ODH data model (by @rcavaliere @Piiit and @bertolla) <-- See EPIC: As a project manager I want to enhance the ODH Mobility core so to efficiently integrate traffic events bdp-core#205 (blocking)
implementation of the adapter to the ODH (by Catch&Solve)
preparation of the repository + pipeline (by @bertolla)
put the DC in the testing environment (implement the associated pipeline), and test it with the ODH core extension (already running in the testing environment!)
put the DC and the new ODH core version in production
remember to copy the data stored in the testing environment to the production environment

davidebz · 2021-06-04T07:53:30Z

As discuss in the email betwenn Roberto and Davide (31/05/21, 18:22
), we can't modify the active flag, this should be made by the "writer"

Note: for A22 events the active flag should not depends on stations during the syncStation
call but using the metadata data_inizio and data_fine.

rcavaliere · 2021-06-04T09:54:57Z

@davidebz yes correct! Do you have a PR ready also for this DC? Then we should foresee a little development on the writer side in order to handle the status of the events provided by this DC, which can made by @bertolla

noctho · 2021-06-04T13:54:57Z

I have created a PR request (#280), but I have marked it as WIP, because of the following reasons:

In the specification is written that we should let the name of the station empty. But it's a mandatory field, so we have to specificity it's value (eiter the Id, or something like "<description of idtipoevento> <id>" or something else). At the moment, I'm using the id as a placeholder.
I have found an idtipoevento that is not specified: 24. When I look at the idsottotipoevento it seems to me to be equivalent to 12 = Informazione. In addition, I haven't found an event with idtipoevento = 12 from 2015 until now.
You have to decide how often do you want to schedule this task. As a placeholder, I have specified every 10 minutes.

As @davidebz said it before, at the moment the data collector does not handles the active flag correctly. I've implemented it this way:

I start from a starting point specified in the properties and fetch all events from the A22 API within a specified time window. Then I synchronize the retrieved stations and fetch the data for the next time window. I repeat this process until I have reached the current timestamp.
I synchronize the stations after each period, because I have noticed that tens of thousands of stations accumulate over all periods. This reduces the workload of the writer, otherwise it explodes it on my testing machine...
Then I fetch all events from the API of the active events (/eventi/lista/attivi) and synchronize them. Should we only set these events as active events in our database? Because they are officially active according to the A22 API.

In the specification is also written something about planned events, but I didn't find any event via the API ... I've set the range of the historical API (/eventi/lista/storici) from now to the end of the year, but I only found some events where the data_inizio is in the past and the data_fine is in the future. But these events are missing in the result of /eventi/lista/attivi ...

rcavaliere · 2021-06-07T04:20:35Z

@noctho thanks for the inputs. My feedback to your points:

name of the station: I would suggest to create it automatically from other fields, namely "id_idtipoevento_idsottotipoevento"
idtipoevento not specified: thanks for the notice, I will ask A22 for more information
scheduler: I would also suggest to put something less, i.e. 5 minutes so that in case of an event we are quick to provide this info
active / not active events: in my view we should try to provide to the ODH all events available in the A22 systems, not only the active ones. It's clear that the first ingestion will be quite heavy, but this also depends on the width of the time window we are going to ask. If possible, I would ask all events from 1.1.2018 onwards, since this could be useful for some analysis related to the A22. But once we are aligned with the real-time situation I don't expect any critical issues here, the writer is already designed to not update stations that are not modified with respect to the current state saved. This means, just the "new" or "modified" events will be saved, which is the intended behavior. For the planned events, it's fine, probably A22 does not manage them (currently).

@bertolla do you see particular adaptions needs at the writer level in order to properly handle the integration of this new dataset? Can the "active" field be already managed in the expected way as it is currently implemented or do we need some refactoring?

bertolla · 2021-08-04T13:20:22Z

First test release running on testserver in version 5.3.0-SNAPSHOT of bdp.

rcavaliere · 2021-08-10T15:36:01Z

@bertolla please integrate this PR by Catche & Solve: #280
@Piiit when could we have the upgrade of the Ninja API so that event data are also available?

Piiit · 2021-09-29T09:14:57Z

@rcavaliere I merged the PR #280, but we do not have a pipeline yet... so that would be needed to test it properly... Then we need to deploy the new core to production, and finally implement the event api on ninja... Hopefully in October I can finalize this

rcavaliere · 2021-09-29T20:05:17Z

@Piiit thanks for the info! The steps would be:

put the DC in the testing environment (implement the associated pipeline), and test it with the ODH core extension (already running in the testing environment!)
put the DC and the new ODH core version in production
remember to copy the data stored in the testing environment to the production environment

UPDATE by @Piiit
I copy these points into the main comment section... checks are only handled there!

RudiThoeni · 2021-10-28T07:49:11Z

planned to work on in the 2nd half of november

Piiit · 2021-11-24T08:43:37Z

@rcavaliere I move this back to TODO, since we need to implement the pipelines etc....

Piiit · 2021-11-24T09:38:31Z

@rcavaliere I need the following information:

* change the file
  `src/main/resources/it/bz/noi/a22/events/a22connector.properties`. set the url,
  the username and the password to connect to the A22 API (or configure it
  within a CI tool)

url=
user=
password=

rcavaliere · 2021-11-26T05:55:55Z

@Piiit I have tested the Data Collector. For the provenance_id missing, can we fix on our side or do we need this to be fixed by @noctho? @noctho we also need to check how "data_fine" and "data_inizio" are returned to the ODH, we don't see this data correctly stored in event_interval

Piiit · 2021-11-26T10:50:05Z

We can fix the provenance on our end

noctho · 2021-11-26T13:29:01Z

I set the eventStart and eventEnd properties of EventDto by taking the epoch seconds data_inizio and data_fine that I get from the data service. But as I see here, it should probably be the timestamp in millis:
https://github.com/noi-techpark/bdp-core/blob/adaf491bbc30611392f1fbe41f3d782f3d169e63/dal/src/main/java/it/bz/idm/bdp/dal/Event.java#L182

rcavaliere · 2021-11-26T14:57:55Z

@Piiit if I see the code, it seems to be everything OK. Can we debug a little bit this in order to understand what happens here?

Piiit · 2021-11-29T16:39:56Z

@rcavaliere Provenance insertion fixed, now I will look into the start and end points of each interval, which are stored as bytea and not rangetype... I think, that also this needs to be fixed on our side

rcavaliere · 2021-11-29T21:10:53Z

@Piiit very good let us know, also @noctho so that he is aware if he has to do something there

Piiit · 2021-11-30T09:53:28Z

I set the eventStart and eventEnd properties of EventDto by taking the epoch seconds data_inizio and data_fine that I get from the data service. But as I see here, it should probably be the timestamp in millis: https://github.com/noi-techpark/bdp-core/blob/adaf491bbc30611392f1fbe41f3d782f3d169e63/dal/src/main/java/it/bz/idm/bdp/dal/Event.java#L182

@noctho You were right, I fixed that... also the coordinates where switched... fixed that too

Piiit · 2021-11-30T09:58:06Z

@Piiit very good let us know, also @noctho so that he is aware if he has to do something there

The intervals work now, but I need to know two things to conclude it:

Shall we use also timezones with each interval? I suggest, yes... also for all other time fields, but that would be a major enhancement in the core
I suggest to have half-open intervals, with these boundaries [upper, lower)... because, calculations with olap queries would be easier and if a user wants to have a moving window across the timeline, he does not need to think about points which might have already been used in statistics etc.

Finally, I get many duplicate key errors: ERROR: duplicate key value violates unique constraint "uc_event_uuid"

Is the uuid field unique? And if so, why does the data collector create it and not the writer itself? I think we need to rethink this... and maybe have a compound key to be checked for uniqueness....

Piiit · 2021-11-30T10:48:43Z

@rcavaliere I had a second look into the uuid problem and suggest to rename it to name and use the category,name tuple as compound unique constraint, and related key... what do you think?

So for users of the API, this might be clearer. They insert and retrieve events, that must have a name and category, and inside that they are unique...

rcavaliere · 2021-11-30T21:40:01Z

Shall we use also timezones with each interval? I suggest, yes... also for all other time fields, but that would be a major enhancement in the core

I suggest to follow the same approach as for measurements, in which we don't have timezones, isn't it? So it's clear that all timestamps are provided in the same way

rcavaliere · 2021-11-30T21:46:25Z

@Piiit for the uuid field: the idea we consolidated with Patrick B. was to use this field so to avoid to insert multiple records of the same information. For an event record to be unique, we considered the following data (coming from the A22 web-service):

id
data_inizio
idtipoevento
idsottotipoevento
lat_fine
lat_inizio
lon_fine
lon_inizio

What is strange are the errors we get. The expected behaviour should be:

if record does not exist -> save it
if record already exists -> do not save nothing

As you probably say, this logic should be covered by the writer and not by the Data Collector, which should only pass the retrieved data. Can we adjust this behaviour?

Piiit · 2021-12-01T09:58:52Z

UPDATED VERSION

@rcavaliere OK. The errors come from the fact that the code was not complete... The check to see if a event already exists was not done, it had just a hard-coded "do-nothing", that is always return that the record does not exist.

So if these things should not be duplicated, we could already have that inside the DB itself... Also the UUID thing could be created with a Postgres native generated columns. We should go with constraints etc., not re-invent that on the client side.

rcavaliere · 2021-12-01T20:41:38Z

@Piiit ok, please have a look at this and make a proposal on how you suggest to manage this. As said, the DC should not have complex logics, but just provide the data retrieved from the API. All the intelligence related to the data storage should be on the ODH core side.

Piiit · 2022-01-13T19:46:23Z

@rcavaliere I suggest the following:

The single event is identifiable with the uuid field, which is unique in the whole table
I would also add a human-readable name field, which is unique inside the origin + category hierarchy (see below)
The event series is identifiable with a new field called event_series_id, which is arbitrary and comes from the client application. I think this is needed, even if I thought differently in previous comments, but the writer cannot always understand automatically if events belong to each other, that is only possible for simple scenarios...
I also add a new origin, which tells us where a certain event came from, for example A22
The origin, category and event_series_id together form a unique constraint, that is, in each origin = A22 and category = incident for example, there can be no duplicates of event series ids
Finally, as a nice-to-have, we could also add a list of tags to each event, which can then be searched and filtered via the new ninja api calls later (could also be added as metadata json as a simpler but slower solution)

For the UUID problem, I checked the code again, also on the event-a22 DC side... I think that the approach taken is not bad. I could not figure out another way to do it, because the writer cannot understand if something already exists in such a complex json structure with so many fields without a major effort and thus performance problems.

I would implement the logic of an unique id generation inside the dc-interface though, such that it is reusable in all DCs... for the UUID and the event_series_id, if not otherwise given.

Here I have two ideas:

The encoding of the original data into a UUID unique field as it is now in the event-a22 dc
Another idea could be to automatically generate the UUID as real UUID field, and have another field that is optional but unique called checksum

What do you think about this?

rcavaliere · 2022-01-18T21:15:04Z

@Piiit thanks for the proposals, I agree with them.

I think that having two IDs is the right way to go: UUID which is something that we can generate based on the values of the following fields:

```
`id`
```
```
`data_inizio`
```
```
`idtipoevento`
```
```
`idsottotipoevento`
```
```
`lat_fine`
```
```
`lat_inizio`
```
```
`lon_fine`
```
```
`lon_inizio`
```

and event_series_id in order to save the event ID provided by A22 (contained in the field id).

The generation of the UUID value should be kept in the events Data Collector and be reusable by other similar Data Collectors as you suggest. The way it is generated is for me the same, I would suggest something simple and quick also for performance issues.

In my view the Data Collector should be "stupid", e.g. always provide the available event records provided by the A22 API. The writer should just implement a logic that checks if he has already a record with the same event_series_id and UUID: if already stored, the new provided record should not be stored.

As a data consumer, I can then filter by and order by timestamp to get the history of the evolution of that event. The single record, identified by the UUID field, determines a specific "state" of the event.

Piiit · 2022-01-28T11:17:24Z

@rcavaliere The DC works now and uses the new writer backend, which is running on our staging server... Lets keep it for some time to check what lands inside Postgres... I will implement the API side now, so we can check it further.

Just a remark: The event ID that comes from A22 is unique, that is we cannot create event series ids ... Is there another field maybe to concatenate events into series?

rcavaliere · 2022-01-28T16:31:57Z

@Piiit we have to check this looking at the data, thanks for the update indeed

rcavaliere · 2022-02-04T08:33:11Z

@Piiit I wrote to A22 in order to understand better how the "id" field provided by A22. One question: I see in "name" currently a unique number (e.g. 329404). Where does this number come from in relation to the fields exposed by the A22 web-service?

rcavaliere · 2022-02-17T21:38:09Z

@Piiit I am not sure that we still have the desired mapping as far as the A22 events data is concerned. I verified with A22, and they confirm that in case of a queue we should have something like that:
`

ID	Data versione	Sottotipo evento	Km inizio	Località inizio	Km fine	Località fine
496655	01/30/2022 16:15:54	Traffico rallentato	157,900	ROVERETO NORD	206,700	AFFI
496655	01/30/2022 16:31:58	Traffico rallentato	131,400	TRENTO NORD	206,700	AFFI
496655	01/30/2022 16:59:34	Traffico rallentato con code	131,400	TRENTO NORD	206,700	AFFI
496655	01/30/2022 19:33:01	Traffico rallentato con code	157,900	ROVERETO NORD	206,700	AFFI
496655	01/30/2022 19:59:05	Traffico rallentato	157,900	ROVERETO NORD	206,700	AFFI
496655	01/30/2022 20:20:20	Traffico rallentato	179,100	ALA/AVIO	206,700	AFFI
496655	01/30/2022 20:39:34	Traffico rallentato	179,100	ALA/AVIO	206,700	AFFI`

The ID field exposed by A22 should be stored 1:1 into the field event_series_uuid (table event), I see however that at present we do some kind of hashing, as for the uuid field. Can you please check what the Data Collector is doing? In case we can ask Catch&Solve to change this if it was implemented in a different way than what originally indicated.

Piiit · 2022-02-18T07:40:52Z

Ok. I'll have to check it on the data collector's side... Let's open a new issue for that

Piiit · 2022-02-18T07:42:51Z

About the hashing: that is correct in the sense that if the same id comes from A22 always the same hash should come up... I'll explain that later.

Piiit · 2022-02-18T09:50:12Z

@rcavaliere The new issue is #426 ... archiving this therefore

rcavaliere changed the title ~~EPIC: as traffic manager I would like to integrated "traffic events" in the Open Data Hub~~ EPIC: as traffic manager I would like to integrate "traffic events" in the Open Data Hub Mar 4, 2021

rcavaliere changed the title ~~EPIC: as traffic manager I would like to integrate "traffic events" in the Open Data Hub~~ EPIC: as traffic manager I would like to integrate A22 "traffic events" in the Open Data Hub Apr 16, 2021

rcavaliere assigned bertolla May 25, 2021

bertolla removed their assignment May 25, 2021

bertolla transferred this issue from noi-techpark/bdp-core May 25, 2021

bertolla assigned davidebz May 25, 2021

rcavaliere assigned bertolla, Piiit and rcavaliere and unassigned davidebz Jun 25, 2021

Piiit added the blocked label Jun 30, 2021

bertolla removed their assignment Aug 31, 2021

Piiit added the prj:mobility label Sep 17, 2021

rcavaliere closed this as completed Oct 23, 2021

rcavaliere removed the blocked label Oct 23, 2021

rcavaliere reopened this Oct 28, 2021

rcavaliere mentioned this issue Nov 20, 2021

As a traffic manager I would like to integrate "traffic events" of the regional roads in the Province of Bolzano in the Open Data Hub #307

Closed

Piiit mentioned this issue Nov 24, 2021

EPIC: As a project manager I want to enhance the ODH Mobility core so to efficiently integrate traffic events noi-techpark/bdp-core#205

Closed

23 tasks

Piiit added the epic label Dec 23, 2021

Piiit closed this as completed Feb 8, 2022

Piiit mentioned this issue Feb 18, 2022

Event A22 DC has a problem with time series IDs, which should be multiple and hence create a flow of connected events #426

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: as traffic manager I would like to integrate A22 "traffic events" in the Open Data Hub #263

EPIC: as traffic manager I would like to integrate A22 "traffic events" in the Open Data Hub #263

rcavaliere commented Mar 4, 2021 •

edited by Piiit

Loading

davidebz commented Jun 4, 2021

rcavaliere commented Jun 4, 2021

noctho commented Jun 4, 2021 •

edited

Loading

rcavaliere commented Jun 7, 2021

bertolla commented Aug 4, 2021

rcavaliere commented Aug 10, 2021

Piiit commented Sep 29, 2021

rcavaliere commented Sep 29, 2021 •

edited by Piiit

Loading

RudiThoeni commented Oct 28, 2021

Piiit commented Nov 24, 2021

Piiit commented Nov 24, 2021

rcavaliere commented Nov 26, 2021

Piiit commented Nov 26, 2021

noctho commented Nov 26, 2021

rcavaliere commented Nov 26, 2021

Piiit commented Nov 29, 2021

rcavaliere commented Nov 29, 2021

Piiit commented Nov 30, 2021

Piiit commented Nov 30, 2021 •

edited

Loading

Piiit commented Nov 30, 2021 •

edited

Loading

rcavaliere commented Nov 30, 2021 •

edited

Loading

rcavaliere commented Nov 30, 2021 •

edited

Loading

Piiit commented Dec 1, 2021 •

edited

Loading

rcavaliere commented Dec 1, 2021

Piiit commented Jan 13, 2022 •

edited

Loading

rcavaliere commented Jan 18, 2022

Piiit commented Jan 28, 2022

rcavaliere commented Jan 28, 2022

rcavaliere commented Feb 4, 2022

rcavaliere commented Feb 17, 2022 •

edited

Loading

Piiit commented Feb 18, 2022 •

edited

Loading

Piiit commented Feb 18, 2022 •

edited

Loading

Piiit commented Feb 18, 2022

EPIC: as traffic manager I would like to integrate A22 "traffic events" in the Open Data Hub #263

EPIC: as traffic manager I would like to integrate A22 "traffic events" in the Open Data Hub #263

Comments

rcavaliere commented Mar 4, 2021 • edited by Piiit Loading

davidebz commented Jun 4, 2021

rcavaliere commented Jun 4, 2021

noctho commented Jun 4, 2021 • edited Loading

rcavaliere commented Jun 7, 2021

bertolla commented Aug 4, 2021

rcavaliere commented Aug 10, 2021

Piiit commented Sep 29, 2021

rcavaliere commented Sep 29, 2021 • edited by Piiit Loading

RudiThoeni commented Oct 28, 2021

Piiit commented Nov 24, 2021

Piiit commented Nov 24, 2021

rcavaliere commented Nov 26, 2021

Piiit commented Nov 26, 2021

noctho commented Nov 26, 2021

rcavaliere commented Nov 26, 2021

Piiit commented Nov 29, 2021

rcavaliere commented Nov 29, 2021

Piiit commented Nov 30, 2021

Piiit commented Nov 30, 2021 • edited Loading

Piiit commented Nov 30, 2021 • edited Loading

rcavaliere commented Nov 30, 2021 • edited Loading

rcavaliere commented Nov 30, 2021 • edited Loading

Piiit commented Dec 1, 2021 • edited Loading

rcavaliere commented Dec 1, 2021

Piiit commented Jan 13, 2022 • edited Loading

rcavaliere commented Jan 18, 2022

Piiit commented Jan 28, 2022

rcavaliere commented Jan 28, 2022

rcavaliere commented Feb 4, 2022

rcavaliere commented Feb 17, 2022 • edited Loading

Piiit commented Feb 18, 2022 • edited Loading

Piiit commented Feb 18, 2022 • edited Loading

Piiit commented Feb 18, 2022

rcavaliere commented Mar 4, 2021 •

edited by Piiit

Loading

noctho commented Jun 4, 2021 •

edited

Loading

rcavaliere commented Sep 29, 2021 •

edited by Piiit

Loading

Piiit commented Nov 30, 2021 •

edited

Loading

Piiit commented Nov 30, 2021 •

edited

Loading

rcavaliere commented Nov 30, 2021 •

edited

Loading

rcavaliere commented Nov 30, 2021 •

edited

Loading

Piiit commented Dec 1, 2021 •

edited

Loading

Piiit commented Jan 13, 2022 •

edited

Loading

rcavaliere commented Feb 17, 2022 •

edited

Loading

Piiit commented Feb 18, 2022 •

edited

Loading

Piiit commented Feb 18, 2022 •

edited

Loading