The goal of this project is to develop a program that, given a series of points (latitude, longitude, timestamp) for a courier journey from A-B, will disregard potentially erroneous points
go build cmd/betterpath.go
./betterpath assets/points.csv
The first thing to do was to spot the erroneous points from the dataset given with the subject.
I plotted the coordinates on a map and noticed that some points were far from the main path:
The second thing to do was to identify which aspects of these points made them erroneous.
I chose to work on the distances between the points, rather than the speed of the courier because it would have required trigonometry computations and a greater overhead.
Since the "wrong" distances seemed significantly higher than the "normal" ones I decided to work on the standard deviation of the distances, inspired by the work I did on linear and logistic regressions and an interesting article about the concept of normal distribution.
Another matter was the position of the wrong points after being sorted by timestamp. Whether they were contiguous, at the beginning of the path or at the end, it required to update carefully the data stored.
Two slices:
- for storing the coordinates and the timestamp of each point, using a structure Point
type Point struct {
x float64
y float64
t int64
}
- for storing the distances between points and to calculate the standard deviation
The two slices are browsed only once, and simultaneously. When a distance is considered too high (ie. superior to the standard deviation), the erroneous point of the pair is identified, deleted, and the distances are updated (and checked because two erroneous points can be contiguous).