State of the Art Faster RCNN
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. This algorithm introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. Using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. [Faster R-CNN Real Time Object Detection]
OpenCV Based Object Detection
Computer Vision and Video Detection systems have an immense variety of targeted applications, such as motion detection, optical flow of movement, but most importantly, object tracking. These usages enter into an extremely broad range of engineering fields, with increasing reliance upon object tracking and recognition in biomedical fields, as well as in everyday life (such as facial-phone-lock recognition). Utilizing OpenCV, which manipulates every image and video as an array of values, detecting certain behaviors for certain objects, or manipulating the appearance of the image is accomplishable. Detection algorithms rely upon mathematical transforms on a pixel by pixel, and frame by frame basis.
Digital Image Processing and Object Detection
There are a multitude of simple implementations of digital image processing with computer vision. Previous projects for members of this project group include a colorblind transform research paper, documenting the possible advantages of colorblind individuals in identifying camouflaged objects. This utilized the combination of several edge detection algorithms, as well as color-transformations in image-space to simulate varying degrees of protanopia and deuteranopia. In addition, applications in real-space also have been used, such as for a laser keyboard. Using an infrared camera to detect reflected laser light from a planar, linear laser along a surface, OpenCV was applied to map the reflected light into space. Forming a chessboard coordinate grid, blobifying any reflected specs of light into each respective chessboard spot resulted in a very simple, but efficient implementation of object localization.
There are a multitude of uses in industry as well, for identifying facial features, or tracking traffic flow rates. Iris recognition, license plate recognition, assisted surgery, all rely heavily on computer vision and object recognition
Summary of Methods
Implementation of Computer vision relies heavily upon the necessary application. For some large, overarching problems, the solution can be found in a heavily simplified version of the same problem. For example, detecting a stove that has been left on boils down to the simple problem of finding an abundance of blue in a processed video frame (assuming the stovetop fire burns blue). This becomes an extremely simple problem, of simply for searching for a number of blue pixels on a filtered image that exceeds a certain threshold. Other implementations are far, far more complex, such as on the Falcon landings for SpaceX, self-driving cars, or even applications such as handprint recognition. These rely on autonomous systems, that need to respond to a set of conditions independently, and must be equipped to deal with many situations.
However, many of the specific applications are distilled after a common procedure is applied to a video for image processing.
However, many of the specific applications are distilled after a common procedure is applied to a video for image processing.
Video Acquisition
A video that satisfies a set of basic requirements is achieved. Depending on the sensors used, these can be ordinary still images, or even much more complex, 3D representations of real-life. These are ordinarily stored as frames of a video, of which each frame is a set of pixel corresponding to values of light such as RGB, or YUV.
Processing and Extraction
Now is the part where things start to get complicated. There is a necessity to pre-process the video in such a way that the information that is of interest can be extracted. For example, certain things in a video become much more obvious after an edge detection algorithm is applied. Sampling of specific areas of interest in a video can also be helpful for narrowing down detection later on. Blobs and points can also be identified, as well as more complex things, such as direction of motion, or texture, or depth of field.
Higher Level Processing and Detection
This can achieved in several ways. Faster RCNN relies upon a pretrained model that is capable of identifying a car’s pixel matrix as opposed to any other object. Haar Cascades are capable of the same effect, able to specifically pinpoint in a localized area a pattern that most likely corresponds to the edges, shape, and behavior of a car. The processing that occurs on the data must satisfy such a model, as well as a set of assumptions for general object parameters, such as size.
Image Recognition
Using the now processed data, a detected object can be segmented into different categories, depending on the processed results, any identified features, and any additional features of interest.
Confirmation
Now, given the results, final decisions must be made, such as pass/fail on certain inspections and thresholding of results. Matches can also be confirmed, and additionally, annotating interfaces can be used to help train a deep learning model for what is an acceptable result.