Skip to content

Developer Guide

Muhammad Hilal edited this page Jan 17, 2017 · 5 revisions

Source Files

YOLO.swift > Swift interface and post-processing for YOLO object detection, getting rectangular bounds for objects of known classes, with four TensorFlow models to choose from.

FaceNet.swift > Swift interface and post-processing for FacNet face recognition, converting face images to n-dimensional embeddings that can be used as feature vector inputs to classifiers. Face square bounds are obtained with native iOS face detector.

Inception.swift > Swift interface for Inception object recognition, getting a list of object class (labels) and probabilities for existence of that class in the frame image.

Jetpac.swift > Swift interface for Jetpac’s DeepBeliefSDK, getting a list of labels and probabilities like Inception, faster but with less accuracy.

kNN.swift > Swift interface for a kNN classifier, getting a class number and minimum distance of test feature vector to a collection of samples.

tfWrap.mm & .h > ObjC wrapper to TensorFlow library, with session ready to run on images, used in YOLO/FaceNet/Inception

tfkNN.mm & .h > ObjC wrapper to TensorFlow library, with session ready to run on feature vectors of kNN classifier, used in kNN.swift

ViewController.swift > example user interface using detection and recognition capabilities.

Transparency.swift > User interface element used in ViewController to draw colored rectangles.

Camera.swift > extension to ViewController to create AVCapture session and obtain image frames from camera while previewing video output on a “WorldView” layer. Important function in it is captureOutput( ) that runs a task frameProcessing( ) on each camera frame. By changing the closure assigned to frameProcessing variable one can switch the current task running on camera frames.

TypeExtensions.swift > A personal collection of useful utils and extension functions for common tasks in Swift programming, used slightly in the code.

Main Classes

Found in files [ YOLO.swift ], [ FaceNet.swift ], [ Inception.swift ], [ Jetpac.swift ], [ kNN.swift ].

They wrap the recognition and detection capabilities in consistent Swift interface, and provide post-processing for the raw neural network outputs.

Each class of them has three interface functions: load( ), run( ), clean( ).

Before using a detector/recognizer you need to call load( ) to load the corresponding network into memory, then use it by calling run( ), when you don’t need it anymore you call clean( ).

class YOLO (Swift) “Object detection”

load(model) takes a number to specify which of the four available YOLO models to load.

run(image) takes a whole frame image and outputs an array of (label, probability, box, object image).

clean( ) frees the memory.

class FaceDetector (Swift) “Face detection”

wraps the CIDetector of type CIDetectorTypeFace.

extractFaces(image) takes a whole frame image and outputs an array of (face snap, box rectangle in the screen coordinates, whether a smile is detected or not).

class FaceNet (Swift) “Face recognition”

load( ) only one model available.

run(image) takes a face snap image (from the iOS face detector) and outputs an array of numbers as a feature vector to be used for classification (e.g. by kNN).

clean( ) frees the memory.

class Inception (Swift) “Object recognition”

load( ) for default Inception v3 model.

loadRetrained( ) for retrained models.

run(image) takes a whole frame image and outputs an array of (label, probability).

clean( ) frees the memory.

class Jetpac (Swift) “Object recognition”

load( ) using the jpcnn_ function call from DeepBeliefSDK to load the .ntwk file.

run(image) takes a whole frame image and outputs an array of (label, probability).

run2(image) takes a snap image and outputs an array of numbers as a feature vector to be used for classification (e.g. by kNN).

clean( ) frees the memory.

class kNN (Swift) “Feature classification”

load( ) loads kNN model.

run(x, samples, classes) takes a feature vector (x), and an array of feature vectors (samples) with array of corresponding integer numbers (classes), outputs the closest class of them to x and the minimum Euclidean distance between x and samples of that class.

clean( ) frees the memory.

Auxiliary classes

class tfWrap (ObjC)

TensorFlow wrapper for all detector models that work on images.

class tfkNN (ObjC)

TensorFlow wrapper for kNN model that works on a feature vector and other sample vectors.

UI ViewController

Just an example for using enVision classes in a simple user interface to show their capabilities. Replace with your own user interface code.

It contains multiple extensions to ViewController each corresponds to certain UI object, and is divided into marked sections (//MARK:) that can be accessed from Xcode's Jump Bar:

YOLO/FaceNet/Inception/Jetpac: sections where Main classes capabilities are directly used.

Menu UI: for the Model selection menu, each model needs to load( ) and to assign frameProcessing( ) to a closure eventually containing the run( ) of the model to process each frame.

Gestures: where taps and presses of screen and data slots are processed.

Data Slots UI: for the five previews at the bottom of the screen and the list of images in each. Each slot corresponds to a UIImageView (the preview) and an array of generated features (using FaceNet or Jetpac) , a label and a photo.

Drawing Transparency: using a “Transparency” object to draw colored rectangles.

Threshold Steppers: the +,- buttons to set thresholds: yoloThreshold, jyoloThreshold (for kNN after Jetpac on YOLO snaps), fnetThreshold (for kNN after FaceNet).

Prediction list UI: the label/probability list displayed with Inception and Jetpac, it uses a decayValue factor to keep the list updates smooth and be less vulnerable to abrupt noise.

.

Good Luck,