victordibia · JuanuMusic · Nov 25, 2020 · Nov 25, 2020
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
-## Real-time Hand-Detection using Neural Networks (SSD) on Tensorflow.
+## Real-time Hand-Detection using Neural Networks (SSD) on Tensorflow 2.
 
-This repo documents steps and scripts used to train a hand detector using Tensorflow (Object Detection API). As with any DNN based task, the most expensive (and riskiest) part of the process has to do with finding or creating the right (annotated) dataset. I was interested mainly in detecting hands on a table (egocentric view point). I experimented first with the [Oxford Hands Dataset](http://www.robots.ox.ac.uk/~vgg/data/hands/) (the results were not good). I then tried the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) which was a much better fit to my requirements.
+This repo documents steps and scripts used to train a hand detector using Tensorflow 2 (Object Detection API). As with any DNN based task, the most expensive (and riskiest) part of the process has to do with finding or creating the right (annotated) dataset. I was interested mainly in detecting hands on a table (egocentric view point). I experimented first with the [Oxford Hands Dataset](http://www.robots.ox.ac.uk/~vgg/data/hands/) (the results were not good). I then tried the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) which was a much better fit to my requirements.
 
 The goal of this repo/post is to demonstrate how neural networks can be applied to the (hard) problem of tracking hands (egocentric and other views). Better still, provide code that can be adapted to other uses cases.
 
@@ -23,7 +23,7 @@ Both examples above were run on a macbook pro **CPU** (i7, 2.5GHz, 16GB). Some f
 | 16  | 320 * 240  | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) |
 | 11  | 640 * 480  | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) |
 
-> Note: The code in this repo is written and tested with Tensorflow `1.4.0-rc0`. Using a different version may result in [some errors](https://github.com/tensorflow/models/issues/1581).
+> Note: The code in this repo is written and tested with Tensorflow `2.3.1`. Using a different version may result in [some errors](https://github.com/tensorflow/models/issues/1581).
 You may need to [generate your own frozen model](https://pythonprogramming.net/testing-custom-object-detector-tensorflow-object-detection-api-tutorial/?completed=/training-custom-objects-tensorflow-object-detection-api-tutorial/) graph using the [model checkpoints](model-checkpoint) in the repo to fit your TF version.
 
 The tensorflow object detection repo has a [python file for exporting a checkpoint to frozen graph here](https://github.com/tensorflow/models/blob/master/research/object_detection/export_inference_graph.py).  You can copy it to the current directory and use it as follows

diff --git a/detect_multi_threaded.py b/detect_multi_threaded.py
@@ -18,7 +18,7 @@
 def worker(input_q, output_q, cap_params, frame_processed):
     print(">> loading frozen model for worker")
     detection_graph, sess = detector_utils.load_inference_graph()
-    sess = tf.Session(graph=detection_graph)
+    sess = tf.compat.v1.Session(graph=detection_graph)
     while True:
         #print("> ===== in worker loop, frame ", frame_processed)
         frame = input_q.get()

diff --git a/detect_multi_threaded.py.old b/detect_multi_threaded.py.old
@@ -0,0 +1,171 @@
+from utils import detector_utils as detector_utils
+import cv2
+import tensorflow as tf
+import multiprocessing
+from multiprocessing import Queue, Pool
+import time
+from utils.detector_utils import WebcamVideoStream
+import datetime
+import argparse
+
+frame_processed = 0
+score_thresh = 0.2
+
+# Create a worker thread that loads graph and
+# does detection on images in an input queue and puts it on an output queue
+
+
+def worker(input_q, output_q, cap_params, frame_processed):
+    print(">> loading frozen model for worker")
+    detection_graph, sess = detector_utils.load_inference_graph()
+    sess = tf.Session(graph=detection_graph)
+    while True:
+        #print("> ===== in worker loop, frame ", frame_processed)
+        frame = input_q.get()
+        if (frame is not None):
+            # Actual detection. Variable boxes contains the bounding box cordinates for hands detected,
+            # while scores contains the confidence for each of these boxes.
+            # Hint: If len(boxes) > 1 , you may assume you have found atleast one hand (within your score threshold)
+
+            boxes, scores = detector_utils.detect_objects(
+                frame, detection_graph, sess)
+            # draw bounding boxes
+            detector_utils.draw_box_on_image(
+                cap_params['num_hands_detect'], cap_params["score_thresh"],
+                scores, boxes, cap_params['im_width'], cap_params['im_height'],
+                frame)
+            # add frame annotated with bounding box to queue
+            output_q.put(frame)
+            frame_processed += 1
+        else:
+            output_q.put(frame)
+    sess.close()
+
+
+if __name__ == '__main__':
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '-src',
+        '--source',
+        dest='video_source',
+        type=int,
+        default=0,
+        help='Device index of the camera.')
+    parser.add_argument(
+        '-nhands',
+        '--num_hands',
+        dest='num_hands',
+        type=int,
+        default=2,
+        help='Max number of hands to detect.')
+    parser.add_argument(
+        '-fps',
+        '--fps',
+        dest='fps',
+        type=int,
+        default=1,
+        help='Show FPS on detection/display visualization')
+    parser.add_argument(
+        '-wd',
+        '--width',
+        dest='width',
+        type=int,
+        default=300,
+        help='Width of the frames in the video stream.')
+    parser.add_argument(
+        '-ht',
+        '--height',
+        dest='height',
+        type=int,
+        default=200,
+        help='Height of the frames in the video stream.')
+    parser.add_argument(
+        '-ds',
+        '--display',
+        dest='display',
+        type=int,
+        default=1,
+        help='Display the detected images using OpenCV. This reduces FPS')
+    parser.add_argument(
+        '-num-w',
+        '--num-workers',
+        dest='num_workers',
+        type=int,
+        default=4,
+        help='Number of workers.')
+    parser.add_argument(
+        '-q-size',
+        '--queue-size',
+        dest='queue_size',
+        type=int,
+        default=5,
+        help='Size of the queue.')
+    args = parser.parse_args()
+
+    input_q = Queue(maxsize=args.queue_size)
+    output_q = Queue(maxsize=args.queue_size)
+
+    video_capture = WebcamVideoStream(
+        src=args.video_source, width=args.width, height=args.height).start()
+
+    cap_params = {}
+    frame_processed = 0
+    cap_params['im_width'], cap_params['im_height'] = video_capture.size()
+    cap_params['score_thresh'] = score_thresh
+
+    # max number of hands we want to detect/track
+    cap_params['num_hands_detect'] = args.num_hands
+
+    print(cap_params, args)
+
+    # spin up workers to paralleize detection.
+    pool = Pool(args.num_workers, worker,
+                (input_q, output_q, cap_params, frame_processed))
+
+    start_time = datetime.datetime.now()
+    num_frames = 0
+    fps = 0
+    index = 0
+
+    cv2.namedWindow('Multi-Threaded Detection', cv2.WINDOW_NORMAL)
+
+    while True:
+        frame = video_capture.read()
+        frame = cv2.flip(frame, 1)
+        index += 1
+
+        input_q.put(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
+        output_frame = output_q.get()
+
+        output_frame = cv2.cvtColor(output_frame, cv2.COLOR_RGB2BGR)
+
+        elapsed_time = (datetime.datetime.now() - start_time).total_seconds()
+        num_frames += 1
+        fps = num_frames / elapsed_time
+        # print("frame ",  index, num_frames, elapsed_time, fps)
+
+        if (output_frame is not None):
+            if (args.display > 0):
+                if (args.fps > 0):
+                    detector_utils.draw_fps_on_image("FPS : " + str(int(fps)),
+                                                     output_frame)
+                cv2.imshow('Multi-Threaded Detection', output_frame)
+                if cv2.waitKey(1) & 0xFF == ord('q'):
+                    break
+            else:
+                if (num_frames == 400):
+                    num_frames = 0
+                    start_time = datetime.datetime.now()
+                else:
+                    print("frames processed: ", index, "elapsed time: ",
+                          elapsed_time, "fps: ", str(int(fps)))
+        else:
+            # print("video end")
+            break
+    elapsed_time = (datetime.datetime.now() - start_time).total_seconds()
+    fps = num_frames / elapsed_time
+    print("fps", fps)
+    pool.terminate()
+    video_capture.stop()
+    cv2.destroyAllWindows()
diff --git a/detect_single_threaded.py.old b/detect_single_threaded.py.old
@@ -0,0 +1,121 @@
+from utils import detector_utils as detector_utils
+import cv2
+import tensorflow as tf
+import datetime
+import argparse
+
+detection_graph, sess = detector_utils.load_inference_graph()
+
+if __name__ == '__main__':
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '-sth',
+        '--scorethreshold',
+        dest='score_thresh',
+        type=float,
+        default=0.2,
+        help='Score threshold for displaying bounding boxes')
+    parser.add_argument(
+        '-fps',
+        '--fps',
+        dest='fps',
+        type=int,
+        default=1,
+        help='Show FPS on detection/display visualization')
+    parser.add_argument(
+        '-src',
+        '--source',
+        dest='video_source',
+        default=0,
+        help='Device index of the camera.')
+    parser.add_argument(
+        '-wd',
+        '--width',
+        dest='width',
+        type=int,
+        default=320,
+        help='Width of the frames in the video stream.')
+    parser.add_argument(
+        '-ht',
+        '--height',
+        dest='height',
+        type=int,
+        default=180,
+        help='Height of the frames in the video stream.')
+    parser.add_argument(
+        '-ds',
+        '--display',
+        dest='display',
+        type=int,
+        default=1,
+        help='Display the detected images using OpenCV. This reduces FPS')
+    parser.add_argument(
+        '-num-w',
+        '--num-workers',
+        dest='num_workers',
+        type=int,
+        default=4,
+        help='Number of workers.')
+    parser.add_argument(
+        '-q-size',
+        '--queue-size',
+        dest='queue_size',
+        type=int,
+        default=5,
+        help='Size of the queue.')
+    args = parser.parse_args()
+
+    cap = cv2.VideoCapture(args.video_source)
+    cap.set(cv2.CAP_PROP_FRAME_WIDTH, args.width)
+    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, args.height)
+
+    start_time = datetime.datetime.now()
+    num_frames = 0
+    im_width, im_height = (cap.get(3), cap.get(4))
+    # max number of hands we want to detect/track
+    num_hands_detect = 2
+
+    cv2.namedWindow('Single-Threaded Detection', cv2.WINDOW_NORMAL)
+
+    while True:
+        # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
+        ret, image_np = cap.read()
+        # image_np = cv2.flip(image_np, 1)
+        try:
+            image_np = cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB)
+        except:
+            print("Error converting to RGB")
+
+        # Actual detection. Variable boxes contains the bounding box cordinates for hands detected,
+        # while scores contains the confidence for each of these boxes.
+        # Hint: If len(boxes) > 1 , you may assume you have found atleast one hand (within your score threshold)
+
+        boxes, scores = detector_utils.detect_objects(image_np,
+                                                      detection_graph, sess)
+
+        # draw bounding boxes on frame
+        detector_utils.draw_box_on_image(num_hands_detect, args.score_thresh,
+                                         scores, boxes, im_width, im_height,
+                                         image_np)
+
+        # Calculate Frames per second (FPS)
+        num_frames += 1
+        elapsed_time = (datetime.datetime.now() - start_time).total_seconds()
+        fps = num_frames / elapsed_time
+
+        if (args.display > 0):
+            # Display FPS on frame
+            if (args.fps > 0):
+                detector_utils.draw_fps_on_image("FPS : " + str(int(fps)),
+                                                 image_np)
+
+            cv2.imshow('Single-Threaded Detection',
+                       cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR))
+
+            if cv2.waitKey(25) & 0xFF == ord('q'):
+                cv2.destroyAllWindows()
+                break
+        else:
+            print("frames processed: ", num_frames, "elapsed time: ",
+                  elapsed_time, "fps: ", str(int(fps)))
diff --git a/report.txt b/report.txt
@@ -0,0 +1,17 @@
+TensorFlow 2.0 Upgrade Script
+-----------------------------
+Converted 1 files
+Detected 0 issues that require attention
+--------------------------------------------------------------------------------
+================================================================================
+Detailed log follows:
+
+================================================================================
+--------------------------------------------------------------------------------
+Processing file 'detect_single_threaded_v1.py'
+ outputting to 'detect_single_threaded.py'
+--------------------------------------------------------------------------------
+
+
+--------------------------------------------------------------------------------
+
diff --git a/reportfile.txt b/reportfile.txt
@@ -0,0 +1,38 @@
+TensorFlow 2.0 Upgrade Script
+-----------------------------
+Converted 3 files
+Detected 0 issues that require attention
+--------------------------------------------------------------------------------
+================================================================================
+Detailed log follows:
+
+================================================================================
+================================================================================
+Input tree: 'utils-v1/'
+================================================================================
+--------------------------------------------------------------------------------
+Processing file 'utils-v1/detector_utils.py'
+ outputting to 'utils/detector_utils.py'
+--------------------------------------------------------------------------------
+
+41:23: INFO: Renamed 'tf.GraphDef' to 'tf.compat.v1.GraphDef'
+42:13: INFO: Renamed 'tf.gfile.GFile' to 'tf.io.gfile.GFile'
+46:15: INFO: Renamed 'tf.Session' to 'tf.compat.v1.Session'
+--------------------------------------------------------------------------------
+
+--------------------------------------------------------------------------------
+Processing file 'utils-v1/label_map_util.py'
+ outputting to 'utils/label_map_util.py'
+--------------------------------------------------------------------------------
+
+116:9: INFO: Renamed 'tf.gfile.GFile' to 'tf.io.gfile.GFile'
+--------------------------------------------------------------------------------
+
+--------------------------------------------------------------------------------
+Processing file 'utils-v1/__init__.py'
+ outputting to 'utils/__init__.py'
+--------------------------------------------------------------------------------
+
+
+--------------------------------------------------------------------------------
+
diff --git a/utils-v1/__init__.py b/utils-v1/__init__.py