Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Trip Segmentation by reducing DB calls #956

Closed

Conversation

humbleOldSage
Copy link
Contributor

The changes below that led to these performance upgrades are investigated in e-mission/e-mission-docs#1041 . They are :

  1. db calls for transition and motion dataframes are moved upstream from is_tracking_restarted_in_range function and get_ongoing_motion_in_range in restart_checking.py to trip_segmentaiton.py. The old setting which had multiple db calls ( for each iteration ) now happen once in the improved setting.

  2. All the other changes in trip_segmentation.py and dwell_segmentation_dist_filter.py are just to support the change in point 1 ( above).

  3. in dwell_segmentation_time_filter.py,other than the changes to support point 1 ( above), there an additional improvement. The calculations for last10PointsDistances and last5MinsPoints are vectorised. For this, calDistance in common.py now supports numpy arrays.

The changes  below  that led to these performance upgrades are investigated in   e-mission/e-mission-docs#1041 . They are :

1.  db calls for transition and motion dataframes are moved upstream from  `is_tracking_restarted_in_range` function  and `get_ongoing_motion_in_range` in
 `restart_checking.py` to `trip_segmentaiton.py`.  The old setting which had multiple db calls ( for each iteration ) now happen once in the improved setting.

2. All the other changes in `trip_segmentation.py` and `dwell_segmentation_dist_filter.py` are just to support the change  in point 1 ( above).

3. in `dwell_segmentation_time_filter.py`,other than the changes to support point 1 ( above), there an additional improvement.  The calculations for `last10PointsDistances` and `last5MinsPoints` are vectorised.  For this,  `calDistance` in `common.py` now supports numpy arrays.
@shankari
Copy link
Contributor

shankari commented Feb 4, 2024

in dwell_segmentation_time_filter.py,other than the changes to support point 1 ( above), there an additional improvement. The calculations for last10PointsDistances and last5MinsPoints are vectorised. For this, calDistance in common.py now supports numpy arrays.

I don't see any indication of the performance improvement related to this change in the issue. I would anticipate the change to be minimal, since len(last10PointsDistances) == 10 and vectorization is unlikely to be helpful until we reach thousands of rows. I am not sure that the additional code complexity is worth it.

Also, it would be helpful if you would link to the specific comment in the issue that documented the improvement, and even duplicate the numbers here to make it easier to look up...

@humbleOldSage
Copy link
Contributor Author

humbleOldSage commented Feb 4, 2024

I don't see any indication of the performance improvement related to this change in the issue.

Its here e-mission/e-mission-docs#1041 (comment).

I tried some pandas improvements which might be overkill ( in which case we can roll this back) which reduced overall runtime from ~2.12 to ~1.5.

I have now added a separate comment for this.
e-mission/e-mission-docs#1041 (comment)

I would anticipate the change to be minimal, since len(last10PointsDistances) == 10 and vectorization is unlikely to be helpful until we reach thousands of rows. I am not sure that the additional code complexity is worth it.

Yes, it's just for 10 rows. But, since this part falls inside the loop, it's the overall loop time that is improved as mentioned.

... and even duplicate the numbers here to make it easier to look up...

Sure.

@humbleOldSage
Copy link
Contributor Author

humbleOldSage commented Feb 4, 2024

Overall, Current AndorraWrapper runtime is ~1.6s , iOS Runtime is ~0.4s and CombinedWrapper runtime is ~2.1 ( as mentioned here e-mission/e-mission-docs#1041 (comment) )

@shankari
Copy link
Contributor

shankari commented Feb 5, 2024

@humbleOldSage there is a comment in the issue that indicates that there were some failures, but I don't see any updates on the fix. e-mission/e-mission-docs#1041 (comment)

Also, what are you testing this on? Why do we have the magic number of 327? Please also separate the vectorization from the code that actually reduces DB calls so we can evaluate each of them independently

@humbleOldSage
Copy link
Contributor Author

Did , in a separate PR #958 . Closing this one since no updates expected here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants