A human behavior analyzer package
This project is aimed at collecting basic data about human behavior through cameras. The current version is the initial implementation of the idea. The data collected includes location data for each person (pixel coordinates), based on the map provided, and actions performed by the person in the video. This output is returned as a storyboard or a profile in a .json file in the following format:
//Insert image containing the format of the profiles
Fig. 1: The basic pipeline of the algorithm
It starts by tracking multiple pedestrians using SORT (Simple Online Realtime Tracking). Then the frame is cropped to the dimensions of the bounding box containing each person, and is sent into an action detection algorithm (slow-fast network). ...
(The version numbers provided are the ones used while testing and development)
Package Name | Version No. |
---|---|
OpenCV | 4.1.0 |
NumPy | 1.15.4 |
SciPy | 1.1.0 |
FilterPy | 1.1.0 |
Numba | 0.39.0 |
Scikit-Image | 0.14.0 |
Scikit-Learn | 0.19.2 |
PyTorch | 1.0.1 |
TorchVision | 0.2.1 |
CudaToolKit | 9.0 |
Visit installation.md for steps to install required dependencies, and to setup the repository.
To use the sort tracker, use the following command
usage: main.py [-h] [-t OBJ_THRESH] [-n NMS_THRESH] -v VIDEO [--cuda] [-m]
[-i IMG] [-c CORR]
Human Behavior Analysis
optional arguments:
-h, --help show this help message and exit
-t OBJ_THRESH, --obj-thresh OBJ_THRESH
objectness threshold, DEFAULT: 0.5
-n NMS_THRESH, --nms-thresh NMS_THRESH
non max suppression threshold, DEFAULT: 0.4
-v VIDEO, --video VIDEO
flag for adding a video input
--cuda flag for running on GPU
-m, --map flag from projecting people on a map
-i IMG, --img IMG flag for providing an imput map image to print the
tracking results on
-c CORR, --corr CORR correspondance points for the map projection as a .txt
NOTE: The -c flag is required with the -m flag, as it is necessary to generate the mapping. However the -i flag, along with the input image does not have to be provided. This -i flag is only for actually drawing the points onto the image of the map to get a visual representation, and is not necessary for generating the profiles
Example Usage, for tracking people in a video, and then projecting it and printing it onto a 2-D map: (When inside the sort_tracker directory)
python3 main.py --cuda -v input_video.avi -m -i map.jpg -c corr_points.txt
The correspondance points (present in corr_points.txt) for mapping the tracks of each person onto a map (map.jpg here) is of the format:
x11 y11 x12 y12
x21 y21 x22 y22
x31 y31 x32 y32
...
xn1 yn1 xn2 yn2
Here, each row contains two x-y coordinate pairs. In the example above, there are n correspondance points. For row number n, for example, xn1 yn1
are the x-y pair for a point in the video frame, and the xn2 yn2
are the x-y pair for the same point in the image containing the map.
For training the action detection using your own dataset, TensorBoardX is needed.
- Removing cars from detections
- Adding the dependencies and the requirements for action detection
- Integrating action detection
- Completing the description in the readme
- Generation of Profiles
- Obtaining timestamps online for storyboards
- Retraining YOLO for small images of people
- Paper YOLOv3: An Incremental Improvement
- Paper Website
- YOLOv3 Tutorial
- Paper SlowFast Networks for Video Recognition
- Paper Simple Online and Realtime Tracking