Part 1: Studying Digital Image Processing with OpenCV-Python - Tech It Yourself

Tuesday, 15 May 2018

1. Setup
The libraries are needed for learning OpenCV
import numpy as np
import cv2
from matplotlib import pyplot as plt
2. Operations on image
- Get a point at (x,y): img[x,y] -> BGR value
- Access BLUE (BGR) value: img.item(x,y,0)
and access GREEN (BGR) value: img.item(x,y,1)
and access RED (BGR) value: img.item(x,y,2)
- Make all the red pixels to zero using numpy: img[:,:,2] = 0 (for BGR image)
- Split into BGR values: b,g,r = cv2.split(img)
Note: cv2.split() is a costly operation (in terms of time). So do it only if you need it. Otherwise go for Numpy indexing.
- Merge BGR values to img = cv2.merge((b,g,r))
3. Image properties
- img.shape => Shape of image
Note: If image is grayscale, tuple returned contains only number of rows and columns. So it is a good method to check if loaded image is grayscale or color image.
- img.size => Total number of pixels
- img.dtype => Image datatype (e.g: uint8)
Note: img.dtype is very important while debugging because a large number of errors in OpenCV-Python code is caused by invalid datatype.
4. Image ROI
Get ROI (280:340, 330:390): img[280:340, 330:390]
5. Bitwise Operations
This includes bitwise AND, OR, NOT and XOR operations. They will be highly useful while extracting any part of the image, defining and working with non-rectangular ROI etc.
Apply bitwise operations to extract background and foreground of image "test.png" below:
Figure: The simple test image has white background and red foreground
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import cv2 #load the image img = cv2.imread('test.png') #convert to grayscale imggray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) #using binary threshold to find mask ret, mask = cv2.threshold(imggray, 100, 255, cv2.THRESH_BINARY) #inverse mask mask_inv = cv2.bitwise_not(mask) #extract background img_bg = cv2.bitwise_and(img,img,mask = mask_inv) #extract foreground img_fg = cv2.bitwise_and(img,img,mask = mask) #display result img_bg or img_fg image cv2.imshow('result',img_bg) cv2.waitKey(0) cv2.destroyAllWindows() 
6. Changing Colorspaces
Just focus on most widely used ones BGR ↔ Gray, BGR ↔ HSV
- BGR → Gray conversion we use the flags cv2.COLOR_BGR2GRAY
We knew how to convert BGR → Gray in previous example.
- BGR → HSV, we use the flag cv2.COLOR_BGR2HSV.
In HSV, it is more easier to represent a color than in BGR color-space. We can use this to extract a colored object
Note: To get all flags related to COLOR_
 1 2 3 import cv2 flags = [i for i in dir(cv2) if i.startswith('COLOR_')] print flags 
Apply BGR → HSV to extract red foreground of picture "opencv_logo.png".
Figure: opencv_logo.png
Steps:
- Convert from BGR to HSV color-space
- We threshold the HSV image for a range of red color
- Now extract the red object alone.
Note: How to find HSV values to track?
 1 2 3 4 5 6 7 8 import cv2 import numpy as np #red color in BGR red = np.uint8([[[0,0,255 ]]]) #convert to HSV hsv_red = cv2.cvtColor(red,cv2.COLOR_BGR2HSV) print (hsv_red) 
The result is [[[  0 255 255]]]
Now take [0, 100,100] and [H+10, 255, 255]
So the code is:
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import cv2 import numpy as np #load the image img = cv2.imread('opencv_logo.png') hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) # define threshold range of red color in HSV lower_red = np.array([0, 100, 100]) upper_red = np.array([10,255,255]) #create mask mask = cv2.inRange(hsv, lower_red, upper_red) #using bitwise and res = cv2.bitwise_and(img,img, mask= mask) #display result image cv2.imshow('result',mask) cv2.waitKey(0) cv2.destroyAllWindows() 
Figure: result after extracting red foreground
In order to extract 3 foreground colors, using below code
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 import cv2 import numpy as np #load the image img = cv2.imread('opencv_logo.png') hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) # define threshold range of red color in HSV lower_red = np.array([0, 100, 100]) upper_red = np.array([10,255,255]) lower_green = np.array([50,100,100]) upper_green = np.array([70,255,255]) lower_blue = np.array([110,100,100]) upper_blue = np.array([130,255,255]) #or 3 masks mask_red = cv2.inRange(hsv, lower_red, upper_red) mask_green = cv2.inRange(hsv, lower_green, upper_green) mask_blue = cv2.inRange(hsv, lower_blue, upper_blue) mask = cv2.bitwise_or(cv2.bitwise_or(mask_red,mask_green),mask_blue) #using bitwise and res = cv2.bitwise_and(img,img, mask= mask) #display result image cv2.imshow('result',res) cv2.waitKey(0) cv2.destroyAllWindows() 
Figure: result after extracting 3 foreground colors
7. Image Thresholding
Just focus on Global thresholding, Adaptive thresholding, Otsu's thresholding
7.1 Global thresholding
If pixel value is greater than a threshold value, it is assigned one value (may be white), else it is assigned another value (may be black). We used an arbitrary value for threshold value. This value was found by using trial and error method.
Use cv2.threshold with arguments:
- First argument is the source image, which should be a grayscale image.
- Second argument is the threshold value which is used to classify the pixel values.
- Third argument is the maxVal which represents the value to be given if pixel value is more than (sometimes less than) the threshold value.
OpenCV provides different styles of thresholding:
- cv.THRESH_BINARY
- cv.THRESH_BINARY_INV
- cv.THRESH_TRUNC
- cv.THRESH_TOZERO
- cv.THRESH_TOZERO_INV
Note: White = 0xFF, Black = 0x00
Figure: The image output bases on style of thresholding
Make an example with cv.THRESH_TRUNC and image below:
Figure: source image
  1 2 3 4 5 6 7 8 9 10 11 import cv2 as cv #load the image img = cv.imread('test2.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) # do threshold ret,thresh = cv.threshold(gray_img,127,255,cv.THRESH_TRUNC) #display result image cv.imshow('result',thresh) cv.waitKey(0) cv.destroyAllWindows() 
Because white background and "1111" have pixel values greater than threshold (127) so their values will be changed to threshold.
Figure: output image with cv.THRESH_TRUNC
The algorithm calculates the threshold for a small regions of the image. So we have different thresholds for different regions of the same image. This is useful for the image that has different lighting conditions (varying illumination) in different areas. The threshold value at each pixel location depends on the neighboring pixels. How thresholding value is calculated?
- First calculating the mean or Gaussian weighted mean of neighborhood values using ADAPTIVE_THRESH_MEAN_C or ADAPTIVE_THRESH_GAUSSIAN_C.
- Then subtracting the mean or Gaussian weighted mean by constant C
Reuse the picture of previous example:
Figure: source image
  1 2 3 4 5 6 7 8 9 10 11 import cv2 as cv #load the image img = cv.imread('test2.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) thresh = cv.adaptiveThreshold(gray_img,255,\ cv.ADAPTIVE_THRESH_GAUSSIAN_C,cv.THRESH_BINARY,3,1) #display result image cv.imshow('result',thresh) cv.waitKey(0) cv.destroyAllWindows() 
So the result is better than Global thresholding method.
7.3 Otsu’s Binarization
This method only applies for a bimodal image (an image whose histogram has two peaks). This method approximately takes a value in the middle of those peaks as threshold value. For images which are not bimodal, binarization won’t be accurate. We will use cv.threshold(), but add extra flag cv.THRESH_OTSU. That is cv.THRESH_BINARY+cv.THRESH_OTSU.
Apply Otsu’s Binarization for the image that was added noise:

Figure: source image
The histogram of this picture:
 1 2 3 4 5 6 7 8 import cv2 as cv from matplotlib import pyplot as plt #load the image img = cv.imread('test2.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) plt.hist(gray_img.ravel(),256,[0,256]) plt.show() 
Figure: The histogram has 2 peaks
  1 2 3 4 5 6 7 8 9 10 11 12 13 import cv2 as cv #load the image img = cv.imread('test4.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #filter noise blur = cv.GaussianBlur(gray_img,(5,5),0) retVal,thresh = cv.threshold(blur,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) print('Threshold value: ' + str(retVal)) #display result image cv.imshow('result',blur) cv.waitKey(0) cv.destroyAllWindows() 
Figure: result using Otsu’s Binarization
Note: the value of retVal is optimal threshold. In this example it is 148. You can use it with THRESH_BINARY. It will give similar result to Otsu method.
  1 2 3 4 5 6 7 8 9 10 11 12 import cv2 as cv #load the image img = cv.imread('test4.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) blur = cv.GaussianBlur(gray_img,(5,5),0) # do threshold ret,thresh = cv.threshold(blur,148,255,cv.THRESH_BINARY) #display result image cv.imshow('result',thresh) cv.waitKey(0) cv.destroyAllWindows() 
8. Smoothing Image
Images also can be filtered with low-pass filters(LPF), high-pass filters(HPF) etc.
- LPF helps in removing noises, blurring the images etc.
- HPF filters helps in finding edges in the images.
8.1 2D Convolution (Image Filtering)
In OpenCV, we use cv.filter2D() to convolve a kernel with an image.
Reuse the example using Otsu method, we replace cv.GaussianBlur(gray_img,(5,5),0) with cv.filter2D. Using a 5x5 averaging filter kernel will look like below:
$K=\frac{1}{25}\begin{bmatrix} 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1 \end{bmatrix}$
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import cv2 as cv import numpy as np #load the image img = cv.imread('test4.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #apply 5x5 averaging filter kernel kernel = np.ones((5,5),np.float32)/25 blur = cv.filter2D(gray_img,-1,kernel) retVal,thresh = cv.threshold(blur,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) print('Threshold value: ' + str(retVal)) #display result image cv.imshow('result',thresh) cv.waitKey(0) cv.destroyAllWindows() 
The result shoul be similar to Otsu method.
8.2 Image Blurring (Image Smoothing)
Image blurring is achieved by convolving the image with a low-pass filter kernel. It is useful for removing high frequency content (eg: noise, edges) from the image. So edges are blurred a little bit in this operation.
OpenCV provides mainly four types of blurring techniques.
8.2.1 Averaging
This is done by the function cv.blur() or cv.boxFilter(). We should specify the width and height of kernel. A 5x5 normalized box filter would look like below:
$K=\frac{1}{25}\begin{bmatrix} 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1 \end{bmatrix}$
Reuse the example above but using 5x5 normalized box filter.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import cv2 as cv import numpy as np #load the image img = cv.imread('test4.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #apply 5x5 averaging filter kernel blur = cv.blur(gray_img,(5,5)) retVal,thresh = cv.threshold(blur,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) print('Threshold value: ' + str(retVal)) #display result image cv.imshow('result',thresh) cv.waitKey(0) cv.destroyAllWindows() 
The result should be similar to example above.
8.2.2 Gaussian Blurring
Instead of using box filter, using gaussian kernel. Gaussian blurring is effective in removing Gaussian noise from the image.
OpenCV provides function cv.GaussianBlur() with main parameters:
- The width and height of kernel which should be positive and odd.
- The standard deviation in X and Y direction, sigmaX and sigmaY respectively. If both are given as zeros, they are calculated from kernel size.
- A Gaussian kernel can be created with cv.getGaussianKernel().
Reuse the example of Otsu method.
8.2.3 Median Blurring
This is effective against salt-and-pepper noise in the images. It takes median of all the pixels under kernel area and central element is replaced with this median value. Its kernel size should be a positive odd integer.
In the above filters, newly calculated central element may be a pixel value in the image or a new value. But in median blurring, central element is always replaced by some pixel value in the image.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import cv2 as cv import numpy as np #load the image img = cv.imread('test4.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #apply 5x5 median blur blur = cv.medianBlur(gray_img,5) retVal,thresh = cv.threshold(blur,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) print('Threshold value: ' + str(retVal)) #display result image cv.imshow('result',thresh) cv.waitKey(0) cv.destroyAllWindows() 
The result should be similar to example above
8.2.4 Bilateral Filtering
It is effective in noise removal while keeping edges sharp. The operation is slower compared to other filters.
Bilateral filter uses 2 Gaussian filters:
- Gaussian filter in space make sure only nearby pixels are considered for blurring.
- Gaussian function of intensity difference make sure only those pixels with similar intensity to central pixel is considered for blurring. So it preserves the edges since pixels at edges will have large intensity variation.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import cv2 as cv import numpy as np #load the image img = cv.imread('test4.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #apply 5x5 median blur blur = cv.bilateralFilter(gray_img,9,75,75) retVal,thresh = cv.threshold(blur,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU) print('Threshold value: ' + str(retVal)) #display result image cv.imshow('source',gray_img) cv.imshow('result',thresh) cv.waitKey(0) cv.destroyAllWindows() 
After running this code and notice the edges of black square you will see the difference comparing to other filters.
Figure: edges are still there
9. Morphological Transformations
Morphological transformations are some simple operations based on the image shape.
It is normally performed on binary images.
It needs two inputs, one is our original image, second one is called structuring element or kernel which decides the nature of operation.
Morphological operations such as Erosion, Dilation, Opening, Closing,...
9.1 Erosion
It erodes away the boundaries of foreground object (Always try to keep foreground in white).
When the kernel slides through the image, a pixel in the image (1 or 0) will be 1 if all the pixels under the kernel is 1, otherwise it is 0 (eroded).
With this rule, all the pixels near boundary will be discarded depending upon the size of kernel. So the size (thickness) of the foreground object decreases (white region decreases in the image).
Erosion is applied to remove small white noises, detach two connected objects.
9.2. Dilation
It is opposite of erosion.
A pixel element is '1' if at least one pixel under the kernel is '1'. It increases the size of foreground object (white region in the image).
In cases like noise removal, erosion is followed by dilation. Because, erosion removes white noises, but it also shrinks our object. So we dilate it. It is also useful in joining broken parts of an object.
We combine 2 examples of erosion and dialation into one for the image below:
Figure: example image for erosion and dialation
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 import cv2 as cv import numpy as np #load the image img = cv.imread('test6.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #apply 5x5 median blur blur = cv.medianBlur(gray_img,5) retVal,thresh = cv.threshold(blur,0,255,cv.THRESH_BINARY_INV+cv.THRESH_OTSU) kernel = np.ones((7,7),np.uint8) #erosion erosion = cv.erode(thresh,kernel,iterations = 1) #dilation dilation = cv.dilate(erosion,kernel,iterations = 1) #display result image cv.imshow('source',gray_img) cv.imshow('erosion',erosion) cv.imshow('dilation',dilation) cv.waitKey(0) cv.destroyAllWindows() 
Figure: Erosion then Dilation
9.3 Opening
Opening is erosion followed by dilation. It is useful in removing noise.
Remove noise for the picture below:
Figure: Noise image
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import cv2 as cv import numpy as np #load the image img = cv.imread('test7.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #apply 5x5 median blur blur = cv.medianBlur(gray_img,5) retVal,thresh = cv.threshold(blur,0,255,cv.THRESH_BINARY_INV+cv.THRESH_OTSU) kernel = np.ones((3,3),np.uint8) #apply opening opening = cv.morphologyEx(thresh, cv.MORPH_OPEN, kernel) #display result image cv.imshow('opening',opening) cv.waitKey(0) cv.destroyAllWindows() 
Figure: after removing noise
9.4 Closing
Closing is reverse of Opening, Dilation followed by Erosion.
It is useful in closing small holes inside the foreground objects, or small black points on the object.
Remove holes for the object in image below:
Figure: object with holes
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import cv2 as cv import numpy as np #load the image img = cv.imread('test6.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #apply 5x5 median blur blur = cv.medianBlur(gray_img,5) retVal,thresh = cv.threshold(blur,0,255,cv.THRESH_BINARY_INV+cv.THRESH_OTSU) kernel = np.ones((3,3),np.uint8) #apply closing closing = cv.morphologyEx(thresh, cv.MORPH_CLOSE, kernel) #display result image cv.imshow('closing',closing) cv.waitKey(0) cv.destroyAllWindows() 
Figure: after removing holes
It is the difference between dilation and erosion of an image.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import cv2 as cv import numpy as np #load the image img = cv.imread('test6.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #apply 5x5 median blur blur = cv.medianBlur(gray_img,5) retVal,thresh = cv.threshold(blur,0,255,cv.THRESH_BINARY_INV+cv.THRESH_OTSU) kernel = np.ones((5,5),np.uint8) #apply MORPH_GRADIENT MORPH_GRADIENT = cv.morphologyEx(thresh, cv.MORPH_GRADIENT, kernel) #display result image cv.imshow('MORPH_GRADIENT',MORPH_GRADIENT) cv.waitKey(0) cv.destroyAllWindows() 
Structuring Element
We manually created a rectangular shape kernel in the previous examples with help of Numpy. But if you need elliptical/circular shaped kernels, you can use OpenCV function cv.getStructuringElement(). Just pass the shape and size of the kernel, you get the desired kernel.
Example:
cv.getStructuringElement(cv.MORPH_RECT,(5,5))
cv.getStructuringElement(cv.MORPH_ELLIPSE,(5,5))
cv.getStructuringElement(cv.MORPH_CROSS,(5,5))
Support to find Image gradients, edges.
OpenCV provides three types of gradient filters or High-pass filters: Sobel, Scharr and Laplacian.
10.1. Sobel and Scharr Derivatives
Sobel operators = Gausssian smoothing + differentiation operation. So it is more resistant to noise. It is also a derivate mask and is used for edge detection. It is used to detect two kinds of edges in an image:
+ Vertical direction
+ Horizontal direction
Sobel operators need to specify:
- The direction of derivatives to be taken (vertical or horizontal by the arguments).
- The size of kernel by the argument ksize. If ksize = -1, a 3x3 Scharr filter is used which gives better results than 3x3 Sobel filter.
Let 's take Sobel Derivative of the image:
Figure: Input for Sobel Derivative
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import cv2 as cv import numpy as np #load the image img = cv.imread('test5.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #take Sobel Derivative in X direction sobelx = cv.Sobel(gray_img,cv.CV_8U,1,0,ksize=5) #take Sobel Derivative in Y direction sobely = cv.Sobel(gray_img,cv.CV_8U,0,1,ksize=5) #display result image cv.imshow('sobelx',sobelx) cv.imshow('sobely',sobely) cv.waitKey(0) cv.destroyAllWindows() 
Figure: Sobel Derivative in X and Y directions
You can see that Sobel in X detects edges following X direction, Sobel in Y detects edges following Y direction. But the results missed some edges. The reason is that output datatype is uint8 (check by using print(sobelx.dtype)). Moreover Black-to-White transition is taken as Positive slope (it has a positive value) while White-to-Black transition is taken as a Negative slope (It has negative value). So when you convert data to np.uint8, all negative slopes are made zero. So you miss that edge.
In order to fix it, you need to upgrade the output datatype to some higher forms (cv.CV_16S, cv.CV_64F), take its absolute value and then convert back to cv.CV_8U.
So the new code is;
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 import cv2 as cv import numpy as np #load the image img = cv.imread('test5.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #take Sobel Derivative in X direction sobelx64f = cv.Sobel(gray_img,cv.CV_64F,1,0,ksize=5) #take Sobel Derivative in Y direction sobely64f = cv.Sobel(gray_img,cv.CV_64F,0,1,ksize=5) #absolute value and then convert back to cv.CV_8U abs_sobelx64f = np.absolute(sobelx64f) sobelx_8u = np.uint8(abs_sobelx64f) abs_sobely64f = np.absolute(sobely64f) sobely_8u = np.uint8(abs_sobely64f) #display result image cv.imshow('sobelx',sobelx_8u) cv.imshow('sobely',sobely_8u) cv.waitKey(0) cv.destroyAllWindows() 
Figure: edges were recovered
10.2. Laplacian Derivatives
Laplacian Operator is also a derivative operator which is used to find edges in an image.
The major difference between Laplacian and other derivative operators like Prewitt, Sobel, Robinson and Kirsch is that these all are first order derivative masks but Laplacian is a second order derivative.
Laplacian don’t find edges in any particular direction (X or Y) but it find edges in classifications:
+ Inward Edge
+ Outward Edge

The Laplacian of the image given by the formula:
$\bigtriangledown src=\frac{\delta ^{2}src}{\delta x^{2}}+\frac{\delta ^{2}src}{\delta y^{2}}$
where each derivative is found using Sobel derivatives.
Let 's take Laplacian Derivative of the image:
Figure: Input for Laplacian Derivative
  1 2 3 4 5 6 7 8 9 10 11 12 import cv2 as cv import numpy as np #load the image img = cv.imread('test5.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #take Laplacian laplacian = cv.Laplacian(gray_img,cv.CV_64F) #display result image cv.imshow('laplacian',laplacian) cv.waitKey(0) cv.destroyAllWindows() 
Figure: Input for Laplacian Derivative
11. Canny Edge Detection
Canny Edge Detection is a popular edge detection algorithm. With steps:
+ Noise Reduction
+ Edge detection is susceptible to noise in the image, first step is to remove the noise in the image with a 5x5 Gaussian filter.
+ Finding Intensity Gradient of the Image.
+ Smoothened image is then filtered with a Sobel kernel in both horizontal and vertical direction to get first derivative in horizontal direction ($G_{x}$) and vertical direction ($G_{y}$). From these two images, we can find edge gradient and direction for each pixel as follows:
Edge_Gradient (G) = $\sqrt{G_{x}^{2}+G_{y}^{2}}$
Angle ($\theta$) = $tan^{-1}=\frac{G_{y}}{G_{x}}$
Gradient direction is always perpendicular to edges.
+ Non-maximum Suppression: after getting gradient magnitude and direction, a full scan of image is done to remove any unwanted pixels which may not constitute the edge. Every pixel is checked if it is a local maximum in its neighborhood in the direction of gradient. If so, it is considered for next step, otherwise, it is put to zero (suppressed).
Point A is on the edge. Gradient direction is normal to the edge. Point B and C are in gradient directions. So point A is checked with point B and C to see if it forms a local maximum.
+ Hysteresis Thresholding: decides which edges are really edges or not. For this, we define two threshold values (minVal and maxVal). Any edges with intensity gradient greater than maxVal are sure to be edges and those smaller than minVal are sure to be non-edges. For edges between these two thresholds, if they are connected to "real-edge" pixels, they are considered to be part of edges. Otherwise, they are also non-edges.
OpenCV provides cv.Canny() for Canny Edge Detection with parameters:
+ First argument is our input image.
+ Second and third arguments are our minVal and maxVal respectively.
+ Third argument is the size of Sobel kernel used for find image gradients. By default it is 3.
+ Last argument is L2gradient which specifies the equation for finding gradient magnitude.
Apply Canny Edge Detection for the image below:
Figure: input of Canny Edge Detection
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 import cv2 as cv import numpy as np #load the image img = cv.imread('test6.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) #apply bilateral filter blur = cv.bilateralFilter(gray_img,9,75,75) #apply Canny Edge Detection edges = cv.Canny(blur,7,200) #display result image cv.imshow('edges',edges) cv.waitKey(0) cv.destroyAllWindows() 
Figure: output of Canny Edge Detection
Another example
Write a small application to find the Canny edge detection whose threshold values can be varied using two trackbars.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 import numpy as np import cv2 as cv def nothing(x): pass # Create a black image, a window img = cv.imread('test6.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) blur = cv.bilateralFilter(gray_img,9,75,75) cv.namedWindow('image') # create trackbars for color change cv.createTrackbar('minVal','image',0,255,nothing) cv.createTrackbar('maxVal','image',0,255,nothing) while(1): cv.imshow('image',img) k = cv.waitKey(1) & 0xFF if k == 27: break # get current positions of four trackbars minVal = cv.getTrackbarPos('minVal','image') maxVal = cv.getTrackbarPos('maxVal','image') img = cv.Canny(blur,minVal,maxVal) cv.destroyAllWindows() 
Figure: Solution for example
12. Image Pyramids
Consider example when searching face in an image, we are not sure what size the face will be present. So we will create a set of images with different resolution and search for face in all the images. These set of images with different resolution are called Image Pyramids (a stack with biggest image at bottom and smallest image at top).
We have 2 kinds:
+ Gaussian Pyramid : Higher level (Low resolution) is formed by removing consecutive rows and columns in Lower level (higher resolution) image. a M×N image becomes M/2×N/2 image. So area reduces to one-fourth of original area. It is called an Octave. OpenCV provides cv2.pyrDown() and cv2.pyrUp() functions.
+ Laplacian Pyramid : A level in Laplacian Pyramid is formed by the difference between that level in Gaussian Pyramid and expanded version of its upper level in Gaussian Pyramid. Laplacian pyramid images are like edge images only.
Figure: example input
Example of Gaussian Pyramid:
 1 2 3 4 5 6 7 8 import numpy as np import cv2 reso = cv2.imread('toy-story.jpg') for i in range(0,4): reso = cv2.pyrDown(reso) cv2.imshow(str(i+1),reso) cv2.waitKey(0) cv2.destroyAllWindows() 
Figure: Image Pyramid pyrDown
Example of Laplacian Pyramid:
Figure: The image above was resized to 512x512
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import numpy as np import cv2 A = cv2.imread('images/toy_story.jpg') G = A.copy() gpA = [G] for i in xrange(6): G = cv2.pyrDown(G) gpA.append(G) # generate Laplacian Pyramid for A lpA = [gpA[5]] for i in xrange(5,0,-1): GE = cv2.pyrUp(gpA[i]) L = cv2.subtract(gpA[i-1],GE) lpA.append(L) for i in range(0,len(lpA)): cv2.imshow(str(i+1),lpA[i]) cv2.waitKey(0) cv2.destroyAllWindows() 
Figure: Laplacian Pyramid