-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPIC: as traffic manager I would like to integrate A22 "traffic events" in the Open Data Hub #263
Comments
As discuss in the email betwenn Roberto and Davide (31/05/21, 18:22 Note: for A22 events the active flag should not depends on stations during the syncStation |
I have created a PR request (#280), but I have marked it as WIP, because of the following reasons:
As @davidebz said it before, at the moment the data collector does not handles the active flag correctly. I've implemented it this way:
In the specification is also written something about planned events, but I didn't find any event via the API ... I've set the range of the historical API ( |
@noctho thanks for the inputs. My feedback to your points:
@bertolla do you see particular adaptions needs at the writer level in order to properly handle the integration of this new dataset? Can the "active" field be already managed in the expected way as it is currently implemented or do we need some refactoring? |
First test release running on testserver in version 5.3.0-SNAPSHOT of bdp. |
@rcavaliere I merged the PR #280, but we do not have a pipeline yet... so that would be needed to test it properly... Then we need to deploy the new core to production, and finally implement the event api on ninja... Hopefully in October I can finalize this |
@Piiit thanks for the info! The steps would be:
UPDATE by @Piiit |
planned to work on in the 2nd half of november |
@rcavaliere I move this back to TODO, since we need to implement the pipelines etc.... |
@rcavaliere I need the following information:
|
We can fix the provenance on our end |
I set the eventStart and eventEnd properties of EventDto by taking the epoch seconds data_inizio and data_fine that I get from the data service. But as I see here, it should probably be the timestamp in millis: |
@Piiit if I see the code, it seems to be everything OK. Can we debug a little bit this in order to understand what happens here? |
@rcavaliere Provenance insertion fixed, now I will look into the start and end points of each interval, which are stored as bytea and not rangetype... I think, that also this needs to be fixed on our side |
@noctho You were right, I fixed that... also the coordinates where switched... fixed that too |
The intervals work now, but I need to know two things to conclude it:
Finally, I get many duplicate key errors: Is the |
@rcavaliere I had a second look into the uuid problem and suggest to rename it to So for users of the API, this might be clearer. They insert and retrieve events, that must have a name and category, and inside that they are unique... |
I suggest to follow the same approach as for measurements, in which we don't have timezones, isn't it? So it's clear that all timestamps are provided in the same way |
@Piiit for the uuid field: the idea we consolidated with Patrick B. was to use this field so to avoid to insert multiple records of the same information. For an event record to be unique, we considered the following data (coming from the A22 web-service):
What is strange are the errors we get. The expected behaviour should be:
As you probably say, this logic should be covered by the writer and not by the Data Collector, which should only pass the retrieved data. Can we adjust this behaviour? |
UPDATED VERSION @rcavaliere OK. The errors come from the fact that the code was not complete... The check to see if a event already exists was not done, it had just a hard-coded "do-nothing", that is always return that the record does not exist. So if these things should not be duplicated, we could already have that inside the DB itself... Also the UUID thing could be created with a Postgres native generated columns. We should go with constraints etc., not re-invent that on the client side. |
@Piiit ok, please have a look at this and make a proposal on how you suggest to manage this. As said, the DC should not have complex logics, but just provide the data retrieved from the API. All the intelligence related to the data storage should be on the ODH core side. |
@rcavaliere I suggest the following:
For the UUID problem, I checked the code again, also on the event-a22 DC side... I think that the approach taken is not bad. I could not figure out another way to do it, because the writer cannot understand if something already exists in such a complex json structure with so many fields without a major effort and thus performance problems. I would implement the logic of an unique id generation inside the dc-interface though, such that it is reusable in all DCs... for the UUID and the event_series_id, if not otherwise given. Here I have two ideas:
What do you think about this? |
@Piiit thanks for the proposals, I agree with them. I think that having two IDs is the right way to go:
and The generation of the In my view the Data Collector should be "stupid", e.g. always provide the available event records provided by the A22 API. The writer should just implement a logic that checks if he has already a record with the same As a data consumer, I can then filter by and order by |
@rcavaliere The DC works now and uses the new writer backend, which is running on our staging server... Lets keep it for some time to check what lands inside Postgres... I will implement the API side now, so we can check it further. Just a remark: The event ID that comes from A22 is unique, that is we cannot create event series ids ... Is there another field maybe to concatenate events into series? |
@Piiit we have to check this looking at the data, thanks for the update indeed |
@Piiit I wrote to A22 in order to understand better how the "id" field provided by A22. One question: I see in "name" currently a unique number (e.g. 329404). Where does this number come from in relation to the fields exposed by the A22 web-service? |
@Piiit I am not sure that we still have the desired mapping as far as the A22 events data is concerned. I verified with A22, and they confirm that in case of a queue we should have something like that:
The ID field exposed by A22 should be stored 1:1 into the field |
Ok. I'll have to check it on the data collector's side... Let's open a new issue for that |
About the hashing: that is correct in the sense that if the same id comes from A22 always the same hash should come up... I'll explain that later. |
@rcavaliere The new issue is #426 ... archiving this therefore |
This user story deals with the integration of the A22 traffic events in the ODH. The implementation is implemented by 1006.org and Catch&Solve. As far as the ODH data model is concerned, my suggestion is to consider an "event" like a station (since it is characterized by coordinates) and to use the metadata to characterize is. In this case we could have the situation that this type of stations does not have types and measurements associated to it.
Specification document:
201202_RichiestaOfferte_EventiStradali.pdf
Tasks covered by this epic:
The text was updated successfully, but these errors were encountered: