# Convolution Concept

1. Concept
Convolution is used in signal processing and analysis.
We can use convolution to construct the output of system for any arbitrary input signal, if we know the impulse response of system.
2. Definition
The mathematical form of convolution in discrete time domain;
$y[n]=x[n]*h[n]=\sum_{k=-\infty }^{+\infty }x[k].h[n-k]$
where:
$x[n]$ is input signal
$h[n]$ is impulse response, $h[n-k]$ time shifted
$y[n]$ is output
* denotes convolution
In order to understand the meaning of convolution, we have to take a look on impulse response and impulse decomposition.
3. Impulse decomposition.
The input signal is decomposed into simple additive components (in general, weighted sum of basis signals). And the system response of the input signal is the additive output of these components passed through the system.
We often use impulse (delta) functions for the basis signals.
Example how a signal is decomposed into a set of impulse (delta) functions:
Figure: input signal decomposition
An impulse (delta) function : $\delta [n]=\left\{\begin{matrix} 1, & n=0\\ 0, & n\neq 0 \end{matrix}\right.$
So
$x[0] = x[0].\delta [n] = 2\delta [n]$
$x[1] = x[1].\delta [n-1] = 3\delta [n-1]$
$x[2] = x[2].\delta [n-2] = 1\delta [n-2]$
because
$\delta [n-1]$ is 1 at n=1 and zeros at others.
Then
$x[n]$ can be represented by adding 3 shifted and scaled impulse functions.
$x[n] = x[0].\delta [n] +x[1].\delta [n-1]+x[2].\delta [n-2]=\sum_{k=-\infty }^{+\infty }x[k].h[n-k]$
4. Impulse Response
It is the output of a system resulting from an impulse input.
Figure: h[n] is impulse response
Figure: the response of a time-shifted impulse function is also a time-shifted response, shifted by K
Figure: a scaled in input signal causes a scaled in the output (linear), c is constant
Figure: 3 additive components input, 3 additive components output
Figure: convolution equation
Finally,
$x[n]=\sum x[k].\delta [n-k]\\ y[n]=\sum x[k].h [n-k]$
It means a signal is decomposed into a set of impulses and the output signal can be computed by adding the scaled and shifted impulse responses.
Convolution works on the linear and time invariant system.
5. Convolution in 1D
Consider example:
$x[n] = \{3, 4, 5\}$
$h[n] = \{2, 1\}$
Figure: x[n]
Figure: h[n]
From the equation of convolution:
$y[n]=x[n]*h[n]=\sum_{k=-\infty }^{+\infty }x[k].h[n-k]$
We have:
$y[0]=x[0].h[0-0]=3x2=6$
$y[1]=x[0].h[1-0]+x[1].h[1-1]=3x2+4x2=14$
$y[2]=x[0].h[2-0]+x[1].h[2-1]+x[2].h[2-2]=3x0+4x1+5x2=14$
$y[3]=x[0].h[3-0]+x[1].h[3-1]+x[2].h[3-2]+x[3].h[3-3]=4x0+5x1=5$
$y[4]=x[0].h[4-0]+x[1].h[4-1]+x[2].h[4-2]+x[3].h[4-3]+x[4].h[4-4]=0$
Figure: y[n]
The pattern for programming:
$y[i]=x[i]h[0]+x[i-1]h[1]+...+x[i-k]h[k]$
If you face the boundary such as x[-1],x[-2] you can skip the convolution at the boundary or pad 0 to them.
6. Convolution in 2D
2D convolution is extension of 1D convolution. It convolves both horizontal and vertical directions. It is used in image processing.
Figure: impulse function in 2D
The impulse (delta) function in 2D:
$\delta [m,n]=\left\{\begin{matrix} 1, & m,n=0\\ 0, & m,n\neq 0 \end{matrix}\right.$
Here, the matrix uses [column, row] form.
A signal can be decomposed into a sum of scaled and shifted impulse functions.
$x[m,n]=\sum_{j=-\infty }^{+\infty}\sum_{i=-\infty }^{+\infty}x[i,j].\delta [m-i,n-j]$
and the output is:
$y[m,n]=x[m,n]*h[m,n]=\sum_{j=-\infty }^{+\infty}\sum_{i=-\infty }^{+\infty}x[i,j].h [m-i,n-j]$
The index of kernel matrix can be negative:
Figure: a 3x3 matrix with indices -1,0,1, with origin is at middle of kernel.
Example:
$y[1,1]=\sum_{j=-\infty }^{+\infty}\sum_{i=-\infty }^{+\infty}x[i,j].h [1-i,1-j]=\\ x[0,0].h[1-0,1-0]+x[1,0].h[1-1,1-0]+x[2,0].h[1-2,1-0]\\ + x[0,1].h[1-0,1-1]+x[1,1].h[1-1,1-1]+x[2,1].h[1-2,1-1]\\ + x[0,2].h[1-0,1-2]+x[1,2].h[1-1,1-2]+x[2,2].h[1-2,1-2]$

# Part 7: Studying Digital Image Processing with OpenCV-Python

This part focuses on Video Analysis.
1. Meanshift
1.1 Concept
Consider the example:
The blue circle "C1", its original center is blue rectangle "C1_o". Find the centroid of the points inside that window "C1_r" (small blue circle). Circle center and centroid do not match. Move blue circle such that center of new circle matches with previous centroid. Again find the new centroid. repeat the process until center of new circle and its centroid falls on the same location. It is green circle "C2" and it has maximum number of points (maximum pixel distribution). This is meanshift algorithm.
1.2 Meanshift in OpenCV
OpenCv provides cv2.meanShift() and we have to do:
+ Setup the target
+ Provide initial location of window
+ Find its histogram (only Hue is considered) so that we can backproject the target on each frame for  meanshift calculation.
2. Camshift
2.1 Concept
There is a problem with previous demo. Our window always has the same size when object is farther away and it is very close to camera. We need to adapt the window size with size and rotation of the target. We use camshift. It updates the size of the window and also calculates the orientation of best fitting ellipse to it. Then it applies the meanshift with new scaled search window.
2.2 Camshift in OpenCV
OpenCV provides cv2.CamShift() returns a rotated rectangle and box parameters.
Let 's make a demo using Camshift:
Tracking the bird in Flappy Bird game. Here are resources
+ The video for the demo is here.
+ Flappy bird template for using SIFT feature matching:
Figure: Flappy bird template
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 import numpy as np import cv2 MIN_MATCH_COUNT = 10 cap = cv2.VideoCapture('flappybird.mp4') ret,frame = cap.read() img2 = cv2.imread('bird.png') # trainImage gray1= cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY) gray2= cv2.cvtColor(img2,cv2.COLOR_BGR2GRAY) # Initiate SIFT detector sift = cv2.xfeatures2d.SIFT_create() # find the keypoints and descriptors with SIFT kp1, des1 = sift.detectAndCompute(gray1,None) kp2, des2 = sift.detectAndCompute(gray2,None) # BFMatcher with default params bf = cv2.BFMatcher() matches = bf.knnMatch(des1,des2, k=2) # Apply ratio test good = [] for m,n in matches: if m.distance < 0.5*n.distance: good.append([m]) if len(good)>MIN_MATCH_COUNT: #find the object to track src_pts = np.float32([ kp1[m[0].queryIdx].pt for m in good ]).reshape(-1,1,2) c,r,w,h = cv2.boundingRect(src_pts) track_window = (c,r,w,h) # set up the ROI for tracking roi = frame[r:r+h, c:c+w] hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV) #these range was gotten by trial and error using the Pain tool #https://tech.deepumohan.com/2011/04/how-to-get-html-color-code-from-image.html mask = cv2.inRange(hsv_roi, np.array((0., 60.,32.)), np.array((130.,130.,130.))) roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180]) cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX) # Setup the termination criteria, either 10 iteration or move by atleast 1 pt term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 ) while(1): if ret == True: hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) dst = cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1) # apply meanshift to get the new location ret, track_window = cv2.CamShift(dst, track_window, term_crit) # Draw it on image pts = cv2.boxPoints(ret) pts = np.int0(pts) img2 = cv2.polylines(frame,[pts],True, 255,2) cv2.imshow('img2',img2) k = cv2.waitKey(60) & 0xff if k == 27: break else: cv2.imwrite(chr(k)+".jpg",img2) ret ,frame = cap.read() else: break cv2.destroyAllWindows() cap.release() else: print "Not enough matches are found - %d/%d" % (len(good),MIN_MATCH_COUNT) 
3. Optical Flow
3.1 Concept
Optical flow is the pattern of apparent motion of image objects between two consecutive frames caused by the movement of object or camera. Optical flow has assumptions:
+ The pixel intensities of an object do not change between consecutive frames.
+ Neighbor pixels have similar motion.
Optical flow has many applications in areas like :
+ Video Compression
+ Video Stabilization
+ Measure the velocities of objects in the video. In general, moving objects that are closer to the camera will display more apparent motion than distant objects that are moving at the same speed.
Figure: a ball moving in 5 consecutive frames
The equation of Optical Flow is:
$f_{x}u+f_{y}v+f_{t}=0\\ f_{x}=\frac{\delta f}{\delta x};f_{y}=\frac{\delta f}{\delta y}\\ u=\frac{\delta x}{\delta t};v=\frac{\delta y}{\delta t}$
$f_{x};f_{y}$ are image gradient, $f_{y}$ is time gradient. $(u,v)$ are unknown.
There are several methods to solve this problem.
It assumes that the flow is essentially constant in a local neighborhood of the pixel under consideration, and solves the basic optical flow equations for all the pixels in that neighborhood, by the least squares criterion.
3.3 Optical Flow in OpenCV
OpenCV provides cv2.calcOpticalFlowPyrLK(). We have to do:
+ Using cv2.goodFeaturesToTrack() to find points to track.
+ Pass the previous frame, previous points and next frame to cv2.calcOpticalFlowPyrLK()
+ Function returns next points and status number with value of 1 if next point is found, else zero
Apply this for the Flappy Bird video above:
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 import numpy as np import cv2 cap = cv2.VideoCapture('flappybird.mp4') # params for ShiTomasi corner detection feature_params = dict( maxCorners = 100, qualityLevel = 0.3, minDistance = 7, blockSize = 7 ) # Parameters for lucas kanade optical flow lk_params = dict( winSize = (15,15), maxLevel = 2, criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03)) # Create some random colors color = np.random.randint(0,255,(100,3)) # Take first frame and find corners in it ret, old_frame = cap.read() old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY) p0 = cv2.goodFeaturesToTrack(old_gray, mask = None, **feature_params) # Create a mask image for drawing purposes mask = np.zeros_like(old_frame) refind = 0 while(1): ret,frame = cap.read() frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) refind = refind + 1 #re-find feature to track after 100 frames in case object disappears if(refind == 100): old_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) p0 = cv2.goodFeaturesToTrack(old_gray, mask = None, **feature_params) mask = np.zeros_like(frame) refind = 0 # calculate optical flow p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params) # Select good points good_new = p1[st==1] good_old = p0[st==1] # draw the tracks for i,(new,old) in enumerate(zip(good_new,good_old)): a,b = new.ravel() c,d = old.ravel() mask = cv2.line(mask, (a,b),(c,d), color[i].tolist(), 2) frame = cv2.circle(frame,(a,b),5,color[i].tolist(),-1) img = cv2.add(frame,mask) cv2.imshow('frame',img) k = cv2.waitKey(30) & 0xff if k == 27: break # Now update the previous frame and previous points old_gray = frame_gray.copy() p0 = good_new.reshape(-1,1,2) cv2.destroyAllWindows() cap.release() 
4. Background Subtraction
Consider the application which counts visitors or vehicles. We need to extract the person or vehicles alone or extract the moving foreground from static background. OpenCV provides some methods:
4.1 BackgroundSubtractorMOG
It is a Gaussian Mixture-based Background/Foreground Segmentation Algorithm. This method uses the characteristic that the probable background colours are the ones which stay longer and more static.
  1 2 3 4 5 6 7 8 9 10 11 12 13 import numpy as np import cv2 cap = cv2.VideoCapture('flappybird.mp4') fgbg = cv2.bgsegm.createBackgroundSubtractorMOG() while(1): ret, frame = cap.read() fgmask = fgbg.apply(frame) cv2.imshow('frame',fgmask) k = cv2.waitKey(30) & 0xff if k == 27: break cap.release() cv2.destroyAllWindows() 
4.2 BackgroundSubtractorMOG2
This method improves the adaptability to varying scenes due illumination changes compare to previous method.
  1 2 3 4 5 6 7 8 9 10 11 12 13 import numpy as np import cv2 cap = cv2.VideoCapture('flappybird.mp4') fgbg = cv2.createBackgroundSubtractorMOG2() while(1): ret, frame = cap.read() fgmask = fgbg.apply(frame) cv2.imshow('frame',fgmask) k = cv2.waitKey(30) & 0xff if k == 27: break cap.release() cv2.destroyAllWindows() 

# Part 6: Studying Digital Image Processing with OpenCV-Python

1. What is feature?
Consider the image:
Figure: example of feature (from OpenCV)
+ Blue rectangle is flat area. If we move the blue rectangle, it looks the same. So it is not special.
+ Black rectangle is on edge. If we move it in vertical direction, it changes. If we move it along the edge, it looks the same.
+ Red rectangle, it is a corner. if we move it, it looks different. So it is special. And corners are considered to be good features in an image.
Feature Detection : finding regions in images which have maximum variation when moved (by a small amount) in all regions around it.
Feature Description : describe the region around the feature so that the feature can be found in other images (we can stitch them).
Note: install this package before starting. just run:
pip install opencv-contrib-python
2. Harris Corner Detection
Refer to A Combined Corner and Edge Detector paper by Chris Harris & Mike Stephens
This algorithm basically finds the difference in intensity for a displacement of (u,v) in all directions.
$E(u,v)=\sum_{x,y}^{ }w(x,y)[I[x+u,y+v]-I[x,y]]^{2}$
$w(x,y)$: window function
$I[x+u,y+v]$: shifted intensity
$I[x,y]$: intensity
Then we maximize this function to find corners using Taylor Expansion and Sobel derivatives in x and y.
A score function is defined to determine if a window can contain a corner or not. Its formula:
$R=\lambda _{1}\lambda _{2}-k(\lambda _{1}+\lambda _{2})^{2}$
The result of Harris Corner Detection is a gray scale image with scores. Thresholding it to mark the corners in the image.
OpenCV provides  cv2.cornerHarris(). Its arguments are :
+ img - Input image, it should be grayscale and float32 type.
+ blockSize - It is the size of neighborhood considered for corner detection
+ ksize - Aperture parameter of Sobel derivative used.
+ k - Harris detector free parameter in the equation.
Apply Harris algorithm for image below:
Figure: input image to apply Harris
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import cv2 import numpy as np filename = 'images/wall.png' img = cv2.imread(filename) gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) gray = np.float32(gray) dst = cv2.cornerHarris(gray,3,9,0.04) # Threshold for an optimal value, it may vary depending on the image. img[dst>0.1*dst.max()]=[0,0,255] cv2.imshow('dst',img) cv2.waitKey(0) cv2.destroyAllWindows() 
Figure: adjusting arguments to achieve good result
3. Shi-Tomasi Corner Detector & Good Features to Track
Shi-Tomasi score function was proposed:
$R=min(\lambda _{1},\lambda _{2})$
If it is a greater than a threshold value, it is considered as a corner.
Input image should be a gray scale image.
OpenCV has a function, cv2.goodFeaturesToTrack(). Its arguments:
+ Specify number of corners you want to find.
+ specify the quality level (between 0-1), which denotes the corner which quality is below this, will be rejected.
+ The minimum euclidean distance between corners detected.
Apply Shi-Tomasi for the image above:
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import numpy as np import cv2 img = cv2.imread('images/wall.png') gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) corners = cv2.goodFeaturesToTrack(gray,500,0.3,4) for i in corners: x,y = i.ravel() cv2.circle(img,(x,y),2, (0,0,255),-1) cv2.imshow('dst',img) cv2.waitKey(0) cv2.destroyAllWindows() 
Figure: adjusting arguments to achieve good result
4. Scale-Invariant Feature Transform (SIFT)
4.1 Concept
As its name, this algorithm will cover the case where image is scaled.
There are 4 steps:
Step 1: Scale-space Extrema Detection
Take the original image, calculate Difference of Gaussian (DoG) of it. It is obtained as the difference of Gaussian blurring of that image with two different scales (blur levels) $\sigma$ and $k\sigma$. Then, you resize the original image to half size (create another octaves) and do the similar process. As the figure below:

Figure: calculate DoG for octaves
Search for local maxima/minima in DoG images by taking one pixel in DoG image and compare with its 8 neighbours as well as 9 pixels in next scale and 9 pixels in previous scales. If it is a local extrema, it is a potential keypoint.
Figure: Search for local maxima/minima in DoG images
Step 2: Keypoint Localization
This step  eliminates any low-contrast keypoints and edge keypoints.
In order to eliminate potential keypoints, Taylor series expansion was used to calculate an intensity. If the intensity at this extrema is less than a threshold value, it is rejected. This threshold is called contrastThreshold in OpenCV.
In order to eliminate edges, 2x2 Hessian matrix was used to calculate a ratio. If this ratio is greater than a threshold, called edgeThreshold in OpenCV.
Step 3: Orientation Assignment
Assigned orientation to each keypoint to achieve invariance to image rotation.
The magnitude and orientation is calculated for all pixels around the keypoint. An orientation histogram with 36 bins covering 360 degrees is created. The highest peak and peaks above 80% of it in the histogram are considered to calculate the orientation. It creates keypoints with same location and scale, but different orientations.
Step 4: Keypoint Descriptor
Take a 16x16 neighborhood around the keypoint and devide it into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created. So a total of 128 bin values are available. It is represented as a vector to form keypoint descriptor.
Step 5: Keypoint Matching
Keypoints between two images are matched by identifying their nearest neighbours.
4.2 SIFT in OpenCV
First we have to construct a SIFT object "sift"
sift.detect() function finds the keypoint in the images.
cv2.drawKeyPoints() draws the small circles on the location and orientation of keypoints.
sift.compute() which computes the descriptors from the keypoints
sift.detectAndCompute() find keypoints and descriptors in one step.
Apply SIFT for the image above:
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 import cv2 import numpy as np img = cv2.imread('wall.png') gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) sift = cv2.xfeatures2d.SIFT_create(0,3,0.07,10,1.6) kp = sift.detect(gray,None) cv2.drawKeypoints(gray,kp,img,flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS) cv2.imshow('SIFT',img) cv2.waitKey(0) cv2.destroyAllWindows() 
Figure: output of SIFT
5. Speeded-Up Robust Features (SURF)
5.1 Concept
SURF is 3 times faster than SIFT. SURF is good at handling images with blurring and rotation, but not good at handling viewpoint change and illumination change.
Instead of using DoG, SURF approximates LoG with Box Filter.
For orientation assignment, SURF uses wavelet responses in horizontal and vertical direction for a neighborhood of size 6s. In case rotation invariance is not required, Upright-SURF or U-SURF (upright flag in OpenCV) is used to speed up the process.
For feature description, SURF uses Wavelet responses in horizontal and vertical direction.
SURF feature descriptor has 64 dimensions and extended 128 dimension version. OpenCV supports both by setting the value of flag extended with 0 and 1 for 64-dim and 128-dim.
It uses sign of Laplacian to distinguishe bright blobs on dark backgrounds from the reverse situation. So in the matching step, only features that have the same type of contrast, will be compared, as image below:
Refer to SURF: Speeded Up Robust Features paper.
5.2 SURF in OpenCV
Apply SURF for image below:
Figure: input image for SURF
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import cv2 import numpy as np img = cv2.imread('surf.png') gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) #here the Hessian threshold is 20000 surf = cv2.xfeatures2d.SURF_create(20000) #detect keypoints kp, des = surf.detectAndCompute(img,None) img2 = cv2.drawKeypoints(img,kp,None,(255,0,0),4) cv2.imshow('SURF',img2) cv2.waitKey(0) cv2.destroyAllWindows() 
Figure: adjusting arguments to achieve good result, and SURF act as a blobs detection (detect white objects in black background)
If Upright is enabled. We just add surf.setUpright(True)
Figure: All the orientations are shown in same direction
If using 128 dimensions version, just using surf.extended = True
6. Features from Accelerated Segment Test (FAST)
6.1 Concept
It is several times faster than other existing corner detectors.
It is not robust to high levels of noise.
It depends on a threshold.
Refer to Machine learning for high-speed corner detection.
The algorithm:
+ Select a test pixel p in the image and its intensity be Ip.
+ Select threshold value t.
+ Consider a circle of 16 pixels around the test pixel.
+ The pixel p is a corner if there are n contiguous pixels in the circle (white dash lines of 16 pixels) which are all brighter than Ip+t, or all darker than Ip−t.. n was chosen to be 12.
+ A high-speed test examines only the four pixels at 1, 9, 5 and 13. If p is a corner, then at least three of these must all be brighter than Ip+t or darker than Ip−t.
+ The full segment test examines all pixels in the circle.
+ There are several weaknesses:
++ It does not reject as many candidates for n < 12.
++ The choice of pixels is not optimal because its efficiency depends on ordering of the questions and distribution of corner appearances.
++ Results of high-speed tests are thrown away.
++ Multiple features are detected adjacent to one another.
First 3 points are addressed with a machine learning approach. Last one is addressed using non-maximal suppression.
6.2 FAST in OpenCV
Arguments:
+ Specify the threshold
+ non-maximum suppression to be applied or not.
+ the neighborhood, three flags are defined,  cv.FAST_FEATURE_DETECTOR_TYPE_5_8, cv.FAST_FEATURE_DETECTOR_TYPE_7_12 and cv.FAST_FEATURE_DETECTOR_TYPE_9_16.
Apply FAST for image:
Figure: input image for FAST
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 import numpy as np import cv2 as cv img = cv.imread('images/wall.png') gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY) # Initiate FAST object with default values fast = cv.FastFeatureDetector_create(55,True,cv.FAST_FEATURE_DETECTOR_TYPE_9_16 ) # find and draw the keypoints kp = fast.detect(gray,None) img2 = cv.drawKeypoints(img, kp, None, color=(255,0,0)) # Print all default params print( "Threshold: {}".format(fast.getThreshold()) ) print( "nonmaxSuppression:{}".format(fast.getNonmaxSuppression()) ) print( "neighborhood: {}".format(fast.getType()) ) print( "Total Keypoints with nonmaxSuppression: {}".format(len(kp)) ) cv.imwrite('fast_true.png',img2) # Disable nonmaxSuppression fast.setNonmaxSuppression(0) kp = fast.detect(img,None) print( "Total Keypoints without nonmaxSuppression: {}".format(len(kp)) ) img3 = cv.drawKeypoints(img, kp, None, color=(255,0,0)) cv.imwrite('fast_false.png',img3) 
Figure: no apply non-maximum suppression+adjusting threshold
Figure: apply non-maximum suppression+adjusting threshold
7. Binary Robust Independent Elementary Features (BRIEF)
7.1 Concept
BRIEF is a faster method feature descriptor calculation and matching. It also provides high recognition rate if there is not large in-plane rotation.
A vector with thousands of features consumes a lot of memory. Moreover, larger the memory, longer the time it takes for matching. These are not feasible for the low resource systems. So we need to compress it using methods such as PCA, LDA,  LSH (Locality Sensitive Hashing).
LSH converts SIFT descriptors from floating point numbers to binary strings. These binary strings are used to match features using Hamming distance.
BRIEF finds the binary strings directly without finding descriptors.
Refer to Binary Robust Independent Elementary Features paper.
From smoothed image, BRIEF takes $n_{d}(x,y)$ pairs. For each pair, compare the intensity of 2 pixels, if $I(p)<I(q)$ result is 1, else it is 0. Repeat for $n_{d}(x,y)$ pairs.
$n_{d}(x,y)$ can be 128, 256, 512. OpenCV represents it in bytes so the values will be 16, 32 and 64. Then using Hamming Distance to match these descriptors.
The paper recommends to use CenSurE to find the features and then apply BRIEF.
7.2 BRIEF in OpenCV
Apply FAST+BRIEF for the image above:
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import numpy as np import cv2 as cv img = cv.imread('wall.png') gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY) # Initiate FAST object with default values fast = cv.FastFeatureDetector_create(55,True,cv.FAST_FEATURE_DETECTOR_TYPE_9_16 ) # Initiate BRIEF extractor brief = cv.xfeatures2d.BriefDescriptorExtractor_create(64, True) # find and draw the keypoints kp = fast.detect(gray,None) # compute the descriptors with BRIEF kp, des = brief.compute(gray, kp) print brief.descriptorSize() img2 = cv.drawKeypoints(img, kp, None, color=(255,0,0)) cv.imwrite('fast_true.png',img2) 
Figure: after applying BRIEF, comparing to previouys result
8. Oriented FAST and Rotated BRIEF (ORB)
Refer to ORB: An efficient alternative to SIFT or SURF paper.
ORB is an improvement of FAST keypoint detector and BRIEF descriptor. it implement many modifications to enhance the performance.
OpenCV provides cv2.ORB() or using feature2d common interface, with main arguments:
+ nFeatures : maximum number of features to be retained (by default 500)
+ scoreType : Harris score or FAST score to rank the features (by default, Harris score)
+ WTA_K : number of points that produce each element of the oriented BRIEF descriptor.
++ WTA_K=2, NORM_HAMMING distance is used for matching
++ WTA_K=3 or 4, NORM_HAMMING2 distance is used for matching
Apply ORB for image above:
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 import cv2 import numpy as np img = cv2.imread('wall.png') gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) # Initiate ORB detector orb = cv2.ORB_create(300,1.1,4,10,0,2,1,31,10) # find the keypoints with ORB kp = orb.detect(img,None) # compute the descriptors with ORB kp, des = orb.compute(img, kp) # draw only keypoints location,not size and orientation img2 = cv2.drawKeypoints(img, kp, None, color=(255,0,0), flags=0) cv2.imshow('ORB',img2) cv2.waitKey(0) cv2.destroyAllWindows() 
Figure: output of ORB
9. Feature Matching
It takes the descriptor of one feature in first set and compare with all features in second set using specific distance calculation. And the closest one is returned.
OpenCV creates BFMatcher object using cv2.BFMatcher() with 2 optinal arguments:
+ normType : specify the distance measurement to be used.
++ cv2.NORM_L2 (default value) is for SIFT, SURF
++ cv2.NORM_HAMMING is for binary string based descriptors like ORB, BRIEF, BRISK
++ cv2.NORM_HAMMING2 is for ORB is using WTA_K == 3 or 4
+ crossCheck : the two features in both sets should match each other. The matched result with value (i,j) such that i-th descriptor in set A has j-th descriptor in set B as the best match and vice-versa.
+ BFMatcher.match() : returns the best match
+ BFMatcher.knnMatch() : returns k best matches
+ cv2.drawMatches() : draws the matches
+ cv2.drawMatchesKnn : draws all the k best matches
The DMatch object has following attributes:
+ DMatch.distance - Distance between descriptors. The lower, the better it is.
+ DMatch.trainIdx - Index of the descriptor in train descriptors
+ DMatch.queryIdx - Index of the descriptor in query descriptors
+ DMatch.imgIdx - Index of the train image.
Apply Matching with SIFT Descriptors for the image below:
Figure: image to search

Figure: this query image was cut + rotate + scale from original image
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import numpy as np import cv2 from matplotlib import pyplot as plt img1 = cv2.imread('query.png') # queryImage img2 = cv2.imread('toy-story.jpg') # trainImage gray1= cv2.cvtColor(img1,cv2.COLOR_BGR2GRAY) gray2= cv2.cvtColor(img2,cv2.COLOR_BGR2GRAY) # Initiate SIFT detector sift = cv2.xfeatures2d.SIFT_create() # find the keypoints and descriptors with SIFT kp1, des1 = sift.detectAndCompute(gray1,None) kp2, des2 = sift.detectAndCompute(gray2,None) # BFMatcher with default params bf = cv2.BFMatcher() matches = bf.knnMatch(des1,des2, k=2) # Apply ratio test good = [] for m,n in matches: if m.distance < 0.5*n.distance: good.append([m]) img3 = None # cv2.drawMatchesKnn expects list of lists as matches. img3 = cv2.drawMatchesKnn(img1,kp1,img2,kp2,good,img3,flags=2) plt.imshow(cv2.cvtColor(img3, cv2.COLOR_BGR2RGB)),plt.show() 
Figure: Matching with SIFT Descriptors
10. Feature Matching + Homography to find Objects
How to find the object exactly on the source image?
We pass a set of points from both the images to cv2.findHomography() to find the perspective transformation of that object (It needs at least four points to find the transformation). Using RANSAC or LEAST_MEDIAN algorithm flag to eliminate abnormal data.
Then we use cv2.perspectiveTransform() to find the object.
Reuse the example above:
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 import numpy as np import cv2 from matplotlib import pyplot as plt MIN_MATCH_COUNT = 10 img1 = cv2.imread('query.png') # queryImage img2 = cv2.imread('toy-story.jpg') # trainImage gray1= cv2.cvtColor(img1,cv2.COLOR_BGR2GRAY) gray2= cv2.cvtColor(img2,cv2.COLOR_BGR2GRAY) # Initiate SIFT detector sift = cv2.xfeatures2d.SIFT_create() # find the keypoints and descriptors with SIFT kp1, des1 = sift.detectAndCompute(gray1,None) kp2, des2 = sift.detectAndCompute(gray2,None) # BFMatcher with default params bf = cv2.BFMatcher() matches = bf.knnMatch(des1,des2, k=2) # Apply ratio test good = [] for m,n in matches: if m.distance < 0.5*n.distance: good.append([m]) if len(good)>MIN_MATCH_COUNT: src_pts = np.float32([ kp1[m[0].queryIdx].pt for m in good ]).reshape(-1,1,2) dst_pts = np.float32([ kp2[m[0].trainIdx].pt for m in good ]).reshape(-1,1,2) M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0) #extract found ROI matchesMask = mask.ravel().tolist() h,w,d = img1.shape pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2) dst = cv2.perspectiveTransform(pts,M) #draw rectangle img2 = cv2.polylines(img2,[np.int32(dst)],True,255,3, cv2.LINE_AA) else: print "Not enough matches are found - %d/%d" % (len(good),MIN_MATCH_COUNT) matchesMask = None draw_params = dict(matchColor = (0,255,0), # draw matches in green color singlePointColor = None, matchesMask = matchesMask, # draw only inliers flags = 2) img3 = None # cv2.drawMatchesKnn expects list of lists as matches. img3 = cv2.drawMatchesKnn(img1,kp1,img2,kp2,good,img3,flags=2) plt.imshow(cv2.cvtColor(img3, cv2.COLOR_BGR2RGB)),plt.show() 
Figure: Feature Matching + Homography to find Objects

# Part 5: Studying Digital Image Processing with OpenCV-Python

Here we apply Machine learning to Digital Image Processing
1. K-Nearest Neighbor (kNN)
1.1 Concept
kNN is a classification algorithm of supervised learning. The idea is to search for closest match of the test data in feature space.
A feature space is a space where all data are projected. Ex: in 2D coordinate space. each data has two features according to x and y coordinates. N features need N-dimensional space.
Figure: knn example from OpenCV
There are 2 classes: Blue Squares and Red Triangles. Now there is new Green circle. How it should be classify to one of these Blue/Red classes.
One method is to check which is its nearest neighbour. From the image, it is the Red Triangle. So it should be added into Red Triangle class. This method is called simply Nearest Neighbour. But if there are lot of Blue Squares near to it. So Blue Squares have more strength in that locality than Red Triangle. So just checking nearest one is not sufficient. We have to check some k nearest classes. Consider 3 cases:
+ k=3, there are two Red and one Blue. So it should be added to Red class.
+ k=7, there are 2 Red and 5 Blue. Now it should be added to Blue class.
+ k = 4, there are 2 Red and 2 Blue. It is confused. So we should take k as an odd number.
In case k=4, it is confused. In order to solve it, we can consider the 2 Red are more closer to Green than the 2 Blue. It means we give weights to each class depending on their distance to the Green. So those are near to Green has higher weights and vice versa. So which class gets highest total weights, Green will be added to that class.
1.2 kNN in OpenCV
Let 's apply kNN for the image above:
Figure: input image for kNN
We just aplly some known techniques so here are steps to do:
- Separate Red, Blue and Green objects into 3 classes.
- Assign co-ordinate for each object in each class using its contour and centroid.
- Apply kNN for training data and test it with the Green object.
Here label "0" is stand for Red object and label "1" stand for Blue object. And we have to test label of Green object is "0" or "1".
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 import cv2 as cv import matplotlib.pyplot as plt import numpy as np trainData = [] responses = [] #load the image img = cv.imread('images/knn_theory.png') gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY) hsv = cv.cvtColor(img, cv.COLOR_BGR2HSV) # define threshold range of red color in HSV lower_red = np.array([0, 100, 100]) upper_red = np.array([10,255,255]) lower_green = np.array([50,100,100]) upper_green = np.array([70,255,255]) lower_blue = np.array([110,100,100]) upper_blue = np.array([130,255,255]) #masks mask_red = cv.inRange(hsv, lower_red, upper_red) mask_green = cv.inRange(hsv, lower_green, upper_green) mask_blue = cv.inRange(hsv, lower_blue, upper_blue) #get Red mask = mask_red res = cv.bitwise_and(gray,gray, mask= mask) ret,thresh = cv.threshold(res,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) _, contours, hierarchy = cv.findContours(thresh,cv.RETR_TREE,cv.CHAIN_APPROX_NONE) #for each Red object calculate centroid as coordinate for cnt in contours: M = cv.moments(cnt) cx = float(M['m10']/M['m00']) cy = float(M['m01']/M['m00']) trainData.append([cx, cy]) #Red object with label 0 responses.append(0) #get Blue mask = mask_blue res = cv.bitwise_and(gray,gray, mask= mask) ret,thresh = cv.threshold(res,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) _, contours, hierarchy = cv.findContours(thresh,cv.RETR_TREE,cv.CHAIN_APPROX_NONE) #for each Blue object calculate centroid as coordinate for cnt in contours: M = cv.moments(cnt) cx = float(M['m10']/M['m00']) cy = float(M['m01']/M['m00']) trainData.append([cx, cy]) #Blue object with label 1 responses.append(1) #convert data to numpy for calculation trainData = np.array(trainData, dtype=np.float32) responses = np.array(responses, dtype=np.float32) #plot it red = trainData[responses.ravel()==0] plt.scatter(red[:,0],red[:,1],80,'r','^') blue = trainData[responses.ravel()==1] plt.scatter(blue[:,0],blue[:,1],80,'b','s') #get Green mask = mask_green res = cv.bitwise_and(gray,gray, mask= mask) ret,thresh = cv.threshold(res,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) _, contours, hierarchy = cv.findContours(thresh,cv.RETR_TREE,cv.CHAIN_APPROX_NONE) M = cv.moments(contours[0]) new_comerx = float(M['m10']/M['m00']) new_comery = float(M['m01']/M['m00']) plt.scatter(new_comerx,new_comery,80,'g','o') newcomer = np.array([[new_comerx, new_comery]]).astype(np.float32) #Apply kNN knn = cv.ml.KNearest_create() knn.train(trainData, cv.ml.ROW_SAMPLE, responses) ret, results, neighbours ,dist = knn.findNearest(newcomer, 3) print( "result: {}\n".format(results) ) print( "neighbours: {}\n".format(neighbours) ) print( "distance: {}\n".format(dist) ) plt.show() 
Figure: for k=3, Green object belongs to to Red
2. Support Vector Machine (SVM)
2.1 Concept
The concept of SVM can be refered here.
2.2 SVM in OpenCV
Let 's apply SVM for image below:
Figure: input image for SVM
We do similar steps as for kNN.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 import cv2 as cv import matplotlib.pyplot as plt import numpy as np trainData = [] responses = [] #load the image img = cv.imread('images/svm.png') gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY) hsv = cv.cvtColor(img, cv.COLOR_BGR2HSV) # define threshold range of red color in HSV lower_red = np.array([0, 100, 100]) upper_red = np.array([10,255,255]) lower_green = np.array([50,100,100]) upper_green = np.array([70,255,255]) lower_blue = np.array([110,100,100]) upper_blue = np.array([130,255,255]) #masks mask_red = cv.inRange(hsv, lower_red, upper_red) mask_green = cv.inRange(hsv, lower_green, upper_green) mask_blue = cv.inRange(hsv, lower_blue, upper_blue) #get Red mask = mask_red res = cv.bitwise_and(gray,gray, mask= mask) ret,thresh = cv.threshold(res,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) _, contours, hierarchy = cv.findContours(thresh,cv.RETR_TREE,cv.CHAIN_APPROX_NONE) #for each Red object calculate centroid as coordinate for cnt in contours: M = cv.moments(cnt) cx = float(M['m10']/M['m00']) cy = float(M['m01']/M['m00']) trainData.append([cx, cy]) #Red object with label 0 responses.append(0) #get Blue mask = mask_blue res = cv.bitwise_and(gray,gray, mask= mask) ret,thresh = cv.threshold(res,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) _, contours, hierarchy = cv.findContours(thresh,cv.RETR_TREE,cv.CHAIN_APPROX_NONE) #for each Blue object calculate centroid as coordinate for cnt in contours: M = cv.moments(cnt) cx = float(M['m10']/M['m00']) cy = float(M['m01']/M['m00']) trainData.append([cx, cy]) #Blue object with label 1 responses.append(1) #convert data to numpy for calculation trainData = np.array(trainData, dtype=np.float32) responses = np.array(responses, dtype=np.int) #plot it red = trainData[responses.ravel()==0] plt.scatter(red[:,0],red[:,1],80,'r','^') blue = trainData[responses.ravel()==1] plt.scatter(blue[:,0],blue[:,1],80,'b','s') #get Green mask = mask_green res = cv.bitwise_and(gray,gray, mask= mask) ret,thresh = cv.threshold(res,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) _, contours, hierarchy = cv.findContours(thresh,cv.RETR_TREE,cv.CHAIN_APPROX_NONE) M = cv.moments(contours[0]) new_comerx = float(M['m10']/M['m00']) new_comery = float(M['m01']/M['m00']) plt.scatter(new_comerx,new_comery,80,'g','o') newcomer = np.array([[new_comerx, new_comery]]).astype(np.float32) #Apply kNN svm = cv.ml.SVM_create() svm.setKernel(cv.ml.SVM_LINEAR) svm.setType(cv.ml.SVM_C_SVC) svm.setC(2.67) svm.setGamma(5.383) svm.train(trainData, cv.ml.ROW_SAMPLE, responses) #predict result = svm.predict(newcomer)[1] print( "result: {}".format(result)) #support vectors print(svm.getUncompressedSupportVectors()) plt.show() 
Figure: Green object belongs to Blue
3. K-Means Clustering
3.1 Concept
Let 's see an example. There is a company that is going to release a new T-shirt to market. So they manufacture it in different sizes. The company make a graph of data of people's height and weight, as below:
Figure: chart of weight-height-T-shirt size
Instead of manufacturing all the sizes of T-shirts, using k-means clustering, they divide the sizes to 3 groups Small, Medium and Large that can fit into all the people. The algorithm gives best 3 group sizes. And if it doesn't fit all, company can divide to more groups, as below:
Figure: chart of weight-height-T-shirt groups
3.2 K-Means Clustering Algorithm
Consider data below:
Figure: dataset for K-Means Clustering
The algorithm do steps:
Step 1 - Algorithm randomly chooses two centroids (it can take 2 data as the centroids), C1 and C2
Step 2 - It calculates the distance from each point to both centroids. If a test data is more closer to C1, then that data is labelled with '0'. If it is closer to C2, then labelled as '1' (If more centroids are there, labelled as '2','3', ...).
Figure: '0' labelled with red, and '1' labelled with blue
Step 3 - Next calculate the new centroids (average) of all blue points and red points separately. C1 and C2 are shifted to new centroids.
Figure: new centroids
Repeat Step 2 and Step 3 until both centroids are converged to fixed points or maximum number of iterations is reached, or a specific accuracy is reached.
The objective is to minimize the function that represent sum of distances between C1<->Red_Points and C2<->Blue_Points.
Figure: optimal centroids
3.3 K-Means Clustering in OpenCV
Inputs:
+ samples : each feature should be put in a single column and its data type is np.float32
+ nclusters(K) : Number of clusters required.
+ criteria : termination condition. It is a tuple of 3 parameters ( type, max_iter, epsilon ):
++ type of termination condition. It has 3 flags:
+++ cv.TERM_CRITERIA_EPS - stop the iteration if required accuracy-epsilon is reached.
+++ cv.TERM_CRITERIA_MAX_ITER - stop the iteration if maximum number of iterations is reached.
+++ cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER - combination of 2 conditions.
++ max_iter - An integer specifying maximum number of iterations.
++ epsilon - Required accuracy
+ attempts : Flag to specify the number of times the algorithm is executed using different initial labellings.
+ flags : specify how initial centers are taken. Normally two flags are used for this : cv.KMEANS_PP_CENTERS and cv.KMEANS_RANDOM_CENTERS.
Outputs:
+ compactness : It is the sum of squared distance from each point to their corresponding centers.
+ labels : This is an array of labels
+ centers : This is an array of centroids.
Apply K-means Clustering for T-shirt chart image above:
Figure: input for K-Means clustering
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 import cv2 as cv import matplotlib.pyplot as plt import numpy as np trainData = [] #load the image img = cv.imread('images/kclus.png') gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY) ret,thresh = cv.threshold(gray,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) _, contours, hierarchy = cv.findContours(thresh,cv.RETR_TREE,cv.CHAIN_APPROX_NONE) #for each object calculate centroid as coordinate for cnt in contours: M = cv.moments(cnt) cx = float(M['m10']/M['m00']) cy = float(M['m01']/M['m00']) trainData.append([cx, cy]) #convert data to numpy for calculation trainData = np.array(trainData, dtype=np.float32) #plot it plt.scatter(trainData[:,0],trainData[:,1],80,'b','o'),plt.xlabel('Height'),plt.ylabel('Weight'),plt.show() # define criteria and apply kmeans() criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 10, 1.0) #define 3 clusters ret,label,center=cv.kmeans(trainData,3,None,criteria,10,cv.KMEANS_RANDOM_CENTERS) # Now separate the data, Note the flatten() A = trainData[label.ravel()==0] B = trainData[label.ravel()==1] C = trainData[label.ravel()==2] # Plot the result for each cluster plt.scatter(A[:,0],A[:,1],c = 'y') plt.scatter(B[:,0],B[:,1],c = 'r') plt.scatter(C[:,0],C[:,1],c = 'b') plt.scatter(center[:,0],center[:,1],s = 80,c = 'y', marker = 's') plt.show() 
Figure: result of clustering into 3 clusters
3.4 Color Quantization
Color Quantization reduces number of colors in an image. So it reduces the memory. K-means clustering is used for color quantization.
There are 3 features (R,G,B), we do:
+ Reshape the image to an array of Mx3 size (M is number of pixels in image).
+ Apply K-means clustering.
+ Apply centroid values (it is also R,G,B) to all pixels. So result image will have specified number of colors.
+ Reshape it back to the shape of original image.
Apply for the image below in case K=2:

Figure: input image for color quantization
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 import numpy as np import cv2 as cv img = cv.imread('images/rose.jpg') Z = img.reshape((-1,3)) # convert to np.float32 Z = np.float32(Z) # define criteria, number of clusters(K) and apply kmeans() criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 10, 1.0) K = 2 ret,label,center=cv.kmeans(Z,K,None,criteria,10,cv.KMEANS_RANDOM_CENTERS) # Now convert back into uint8, and make original image center = np.uint8(center) #number of colors reduce to number of centroid colors res = center[label.flatten()] #reconstruct image res2 = res.reshape((img.shape)) cv.imshow('res2',res2) cv.waitKey(0) cv.destroyAllWindows() 
Figure: output image only has 2 color