Wednesday, 11 July 2018

How to setup Tensorflow/Keras running on Nvidia GPU

We will use Virtualenv for this set-up. Just following the commands:
1. Install Virtualenv:
sudo pip install virtualenv
2. Commands to setup Virtualenv
- mkdir virtualws
- virtualenv virtualws/keras_demo
- cd virtualws/keras_demo/bin
- source activate
3. Install Tensorflow as Keras
- pip install tensorflow_gpu
- pip install keras
Note: we installed "tensorflow_gpu"
4. Install driver for Nvidia GPU:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt-get install nvidia-xxx
Note: "xxx" can be checked here.
or you can download the driver here (.run file) and install it manually. Then using commands:
sudo chmod +x xxx.run
./xxx.run
Then verify the driver using: lsmod | grep nvidia
5. Install NVIDIA CUDA Toolkit and cuDNN
To check which version od NVIDIA CUDA Toolkit and cuDNN are needed. We create a simple demo that using Tensorflow/Keras.
Create a test.py file with content and run python test.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# 3. Import libraries and modules
import numpy as np
np.random.seed(123)  # for reproducibility
 
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
 
# 4. Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
 
# 5. Preprocess input data
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
 
# 6. Preprocess class labels
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
 
# 7. Define model architecture
model = Sequential()

model.add(Convolution2D(32, (3, 3), activation='relu', input_shape=(1,28,28), data_format='channels_first'))
model.add(Convolution2D(32, (3, 3), activation='relu', data_format='channels_first'))
model.add(MaxPooling2D(pool_size=(2,2), data_format='channels_first'))
model.add(Dropout(0.25))
 
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
 
# 8. Compile model
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
 
# 9. Fit model on training data
model.fit(X_train, Y_train, 
          batch_size=32, nb_epoch=10, verbose=1)
 
# 10. Evaluate model on test data
score = model.evaluate(X_test, Y_test, verbose=0)
Python will throw error that it could not load the library file with version. Based on that you can download NVIDIA CUDA Toolkit and cuDNN versions accordingly.
If you download NVDIA CUDA Toolkit as a ".run" file. Just using commands:
sudo chmod +x xxx.run
./xxx.run
Press ENTER until 100% was reached and choose YES for all the questions except the question that asking you to install NVDIA driver (If you got driver just choose NO).
Then open the hiden file .bashrc at /home/your_user/.bashrc and append the lines:
export PATH=/usr/local/cuda/bin\${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64\${LD_LIBRARY_PATH:+:\${LD_LIBRARY_PATH}}
In order to install cuDNN, unzip it and using commands:
cd to_unzip_folder
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

6. Install dependencies
Run the commands:
sudo apt-get install libcupti-dev
Again open .bashrc and append the line:
export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
Note:
To monitor GPU running using command: watch -n 1 nvidia-smi
That is all.
Run the demo above again. The training process is faster:
Figure: Tensorflow/Keras runs on Nvidia GPU

Friday, 6 July 2018

GPU concept

1. GPU
- The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. But we can implement any algorithm, not only graphics with high efficiency and performance.
GPU computing - features:
- Massively parallel.
- Hundreds of cores.
- Thousands of threads.
- Cheap.
- Highly available.
- Computing:
    + 1 Teraflop (Single precision)
    + 100 Gflops (Double precision)
- Programable: CUDA
- Important factors to consider: power and cooling.
Figure: GPU vs CPU
Note:
- The amount of RAM featured on a certain GPU is typically the first spec listed when naming the card. Basically, the more vRAM a card has, the more complex tasks it can load. If the tasks run overload your GPU’s vRAM, the overflow goes to system RAM, significantly impacting performance in a negative way.
- On Linux and NVIDIA GPU, we can use the command "nvidia-smi" or "watch -n 1 nvidia-smi" or "nvidia-settings" to show GPU memory used and the processes using the GPU.
2. Compute Unified Device Architecture (CUDA)
Its characteristics:
- A compiler and toolkit for programming NVIDIA GPUs.
- API extends the C programming language.
- Runs on thousands of threads.
- Scalable model.
- Parallelism.
- Give a high level abstraction from hardware.
- CUDA language is vendor dependent.
- OpenCL is going to become an industry standard. It is a low level specification, more complex to program with than CUDA C.
3. CUDA architecture
Abstracting from the hardware.
Automatic Thread management (can handle +100k threads).Languages: C, C++, OpenCL.
OS: Windows, Linux, OS X.
Host The CPU and its memory (host memory)
Device The GPU and its memory (device memory)
Figure: CUDA architecture
User just take care:
- Analyze algorithm for exposing parallelism such as Block size, Number of threads.
- Resources management (efficient data transfers, Local data set - Register space that are limited on-chip memory)
4. Grid - Block - Thread - Kernel
4.1 Grid
The set of blocks is referred to as a grid.
4.2 Block
- Blocks are grouped in a grid.
- Blocks are independent.
- Each invocation can refer to its block index using blockIdx.x.
4.3 Thread
- Kernels are executed by thread.
- Each thread has a ID.
- Thousands of threads execute same kernel.
- Threads are grouped into blocks.
- Threads in a block can synchronize execution.
- Each invocation can refer to its block index using threadIdx.x.
4.4 Kernel
- A kernel is a simple C program.
Figure: Block - Thread - Kernel
5. Work flow
Figure: GPU calculation Work flow
6. C extensions
Block size: (x, y, z). x*y*z = Maximum of 768 threads total. (Hw dependent)
Grid size: (x, y). Maximum of thousands of threads. (Hw dependent)
__global__ : to be called by the host but executed by the GPU.
__host__ : to be called and executed by the host.
__shared__ : variable in shared memory.
__syncthreads() : sync of threads within a block.
*Indexing Arrays with Blocks and Threads
Consider indexing an array with one element per thread (8 threads/block)
Figure: block - thread Id
With M threads per block, a unique index for each thread is given by:
int index = threadIdx.x + blockIdx.x * M;
or using
int index = threadIdx.x + blockIdx.x * blockDim.x; // blockDim.x number of threads
Figure: element 21 will be hamdle by thread 5 following threadIdx.x + blockIdx.x * M = 5+2*8
Example: Vector Addition with Blocks and Threads
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#define N (2048*2048)
#define THREADS_PER_BLOCK 512

__global__ void add(int *a, int *b, int *c, int n) { 
 int index = threadIdx.x + blockIdx.x * blockDim.x;
 // Avoid accessing beyond the end of the arrays
 if (index < n) 
  c[index] = a[index] + b[index]; 
}

int main(void) {
 int *a, *b, *c; // host copies of a, b, c
 int *d_a, *d_b, *d_c; // device copies of a, b, c
 int size = N * sizeof(int);
 
 // Alloc space for device copies of a, b, c
 cudaMalloc((void **)&d_a, size);
 cudaMalloc((void **)&d_b, size);
 cudaMalloc((void **)&d_c, size);
 
 // Alloc space for host copies of a, b, c and setup input values
 a = (int *)malloc(size); random_ints(a, N);
 b = (int *)malloc(size); random_ints(b, N);
 c = (int *)malloc(size);
 
 // Copy inputs to device
 cudaMemcpy(d_a, a, size, cudaMemcpyHostToDevice);
 cudaMemcpy(d_b, b, size, cudaMemcpyHostToDevice);
 
 // Launch add() kernel on GPU with THREADS_PER_BLOCK threads
 add<<<N/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>(d_a, d_b, d_c, N);
 
 // Copy result back to host
 cudaMemcpy(c, d_c, size, cudaMemcpyDeviceToHost);
 
 // Cleanup
 free(a); free(b); free(c);
 cudaFree(d_a); cudaFree(d_b); cudaFree(d_c);
 return 0;
}

Thursday, 21 June 2018

Isolate Python working environment using Virtualenv

While working on multiple Python projects, you may face the issue that one project need a version of package that is completely different from the others. Virtualenv supports to resolve this issue by Isolating Python working environments. So dependencies required by the projects are isolated.
1. Install Virtualenv: sudo pip install virtualenv
2. Commands to setup Virtualenv
- mkdir ~/virtualws
- virtualenv ~/virtualws/keras_demo
- cd ~/virtualws/keras_demo/bin
- source activate
3. Test with Keras
- pip install tensorflow
- pip install keras
4. Run demo
Create a test.py file with content and run python test.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# 3. Import libraries and modules
import numpy as np
np.random.seed(123)  # for reproducibility
 
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
 
# 4. Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
 
# 5. Preprocess input data
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
 
# 6. Preprocess class labels
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
 
# 7. Define model architecture
model = Sequential()

model.add(Convolution2D(32, (3, 3), activation='relu', input_shape=(1,28,28), data_format='channels_first'))
model.add(Convolution2D(32, (3, 3), activation='relu', data_format='channels_first'))
model.add(MaxPooling2D(pool_size=(2,2), data_format='channels_first'))
model.add(Dropout(0.25))
 
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
 
# 8. Compile model
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
 
# 9. Fit model on training data
model.fit(X_train, Y_train, 
          batch_size=32, nb_epoch=10, verbose=1)
 
# 10. Evaluate model on test data
score = model.evaluate(X_test, Y_test, verbose=0)

Friday, 8 June 2018

Convolution Concept

1. Concept
Convolution is used in signal processing and analysis.
We can use convolution to construct the output of system for any arbitrary input signal, if we know the impulse response of system.
2. Definition
The mathematical form of convolution in discrete time domain;
$y[n]=x[n]*h[n]=\sum_{k=-\infty }^{+\infty }x[k].h[n-k]$
where:
$x[n]$ is input signal
$h[n]$ is impulse response, $h[n-k]$ time shifted
$y[n]$ is output
* denotes convolution
In order to understand the meaning of convolution, we have to take a look on impulse response and impulse decomposition.
3. Impulse decomposition.
The input signal is decomposed into simple additive components (in general, weighted sum of basis signals). And the system response of the input signal is the additive output of these components passed through the system.
We often use impulse (delta) functions for the basis signals.
Example how a signal is decomposed into a set of impulse (delta) functions:
Figure: input signal decomposition
An impulse (delta) function : $\delta [n]=\left\{\begin{matrix}
1, & n=0\\
0, & n\neq 0
\end{matrix}\right.$
So
$x[0] = x[0].\delta [n] = 2\delta [n]$
$x[1] = x[1].\delta [n-1] = 3\delta [n-1]$
$x[2] = x[2].\delta [n-2] = 1\delta [n-2]$
because 
$\delta [n-1]$ is 1 at n=1 and zeros at others.
Then
$x[n]$ can be represented by adding 3 shifted and scaled impulse functions.
$x[n] = x[0].\delta [n] +x[1].\delta [n-1]+x[2].\delta [n-2]=\sum_{k=-\infty }^{+\infty }x[k].h[n-k]$
4. Impulse Response
It is the output of a system resulting from an impulse input.
Figure: h[n] is impulse response
Figure: the response of a time-shifted impulse function is also a time-shifted response, shifted by K
Figure: a scaled in input signal causes a scaled in the output (linear), c is constant
Figure: 3 additive components input, 3 additive components output
Figure: convolution equation
Finally,
$x[n]=\sum x[k].\delta [n-k]\\
y[n]=\sum x[k].h [n-k]$
It means a signal is decomposed into a set of impulses and the output signal can be computed by adding the scaled and shifted impulse responses. 
Convolution works on the linear and time invariant system.
5. Convolution in 1D
Consider example:
$x[n] = \{3, 4, 5\}$
$h[n] = \{2, 1\}$
Figure: x[n]
Figure: h[n]
From the equation of convolution:
$y[n]=x[n]*h[n]=\sum_{k=-\infty }^{+\infty }x[k].h[n-k]$
We have:
$y[0]=x[0].h[0-0]=3*2=6$
$y[1]=x[0].h[1-0]+x[1].h[1-1]=3*2+4*2=14$
$y[2]=x[0].h[2-0]+x[1].h[2-1]+x[2].h[2-2]=3*0+4*1+5*2=14$
$y[3]=x[0].h[3-0]+x[1].h[3-1]+x[2].h[3-2]+x[3].h[3-3]=4*0+5*1=5$
$y[4]=x[0].h[4-0]+x[1].h[4-1]+x[2].h[4-2]+x[3].h[4-3]+x[4].h[4-4]=0$
Figure: y[n]
The pattern for programming:
$y[i]=x[i]h[0]+x[i-1]h[1]+...+x[i-k]h[k]$
If you face the boundary such as x[-1],x[-2] you can skip the convolution at the boundary or pad 0 to them.
6. Convolution in 2D
2D convolution is extension of 1D convolution. It convolves both horizontal and vertical directions. It is used in image processing.
Figure: impulse function in 2D
The impulse (delta) function in 2D:
$\delta [m,n]=\left\{\begin{matrix}
1, & m,n=0\\
0, & m,n\neq 0
\end{matrix}\right.$
Here, the matrix uses [column, row] form.
A signal can be decomposed into a sum of scaled and shifted impulse functions.
$x[m,n]=\sum_{j=-\infty }^{+\infty}\sum_{i=-\infty }^{+\infty}x[i,j].\delta [m-i,n-j]$
and the output is:
$y[m,n]=x[m,n]*h[m,n]=\sum_{j=-\infty }^{+\infty}\sum_{i=-\infty }^{+\infty}x[i,j].h [m-i,n-j]$
The index of kernel matrix can be negative:
Figure: a 3x3 matrix with indices -1,0,1, with origin is at middle of kernel.
Example:
$
y[1,1]=\sum_{j=-\infty }^{+\infty}\sum_{i=-\infty }^{+\infty}x[i,j].h [1-i,1-j]=\\
 x[0,0].h[1-0,1-0]+x[1,0].h[1-1,1-0]+x[2,0].h[1-2,1-0]\\
+ x[0,1].h[1-0,1-1]+x[1,1].h[1-1,1-1]+x[2,1].h[1-2,1-1]\\
+ x[0,2].h[1-0,1-2]+x[1,2].h[1-1,1-2]+x[2,2].h[1-2,1-2]
$

Thursday, 7 June 2018

Part 7: Studying Digital Image Processing with OpenCV-Python

This part focuses on Video Analysis.
1. Meanshift
1.1 Concept
Consider the example:
The blue circle "C1", its original center is blue rectangle "C1_o". Find the centroid of the points inside that window "C1_r" (small blue circle). Circle center and centroid do not match. Move blue circle such that center of new circle matches with previous centroid. Again find the new centroid. repeat the process until center of new circle and its centroid falls on the same location. It is green circle "C2" and it has maximum number of points (maximum pixel distribution). This is meanshift algorithm.
1.2 Meanshift in OpenCV
OpenCv provides cv2.meanShift() and we have to do:
+ Setup the target
+ Provide initial location of window
+ Find its histogram (only Hue is considered) so that we can backproject the target on each frame for  meanshift calculation.
2. Camshift
2.1 Concept
There is a problem with previous demo. Our window always has the same size when object is farther away and it is very close to camera. We need to adapt the window size with size and rotation of the target. We use camshift. It updates the size of the window and also calculates the orientation of best fitting ellipse to it. Then it applies the meanshift with new scaled search window.
2.2 Camshift in OpenCV
OpenCV provides cv2.CamShift() returns a rotated rectangle and box parameters.
Let 's make a demo using Camshift:
Tracking the bird in Flappy Bird game. Here are resources
+ The video for the demo is here.
+ Flappy bird template for using SIFT feature matching:
Figure: Flappy bird template
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import numpy as np
import cv2

MIN_MATCH_COUNT = 10
cap = cv2.VideoCapture('flappybird.mp4')
ret,frame = cap.read()
img2 = cv2.imread('bird.png') # trainImage
gray1= cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
gray2= cv2.cvtColor(img2,cv2.COLOR_BGR2GRAY)
# Initiate SIFT detector
sift = cv2.xfeatures2d.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(gray1,None)
kp2, des2 = sift.detectAndCompute(gray2,None)
# BFMatcher with default params
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1,des2, k=2)
# Apply ratio test
good = []
for m,n in matches:
    if m.distance < 0.5*n.distance:
        good.append([m])
if len(good)>MIN_MATCH_COUNT:
 #find the object to track
 src_pts = np.float32([ kp1[m[0].queryIdx].pt for m in good ]).reshape(-1,1,2)
 c,r,w,h = cv2.boundingRect(src_pts)
 track_window = (c,r,w,h)
 # set up the ROI for tracking
 roi = frame[r:r+h, c:c+w]
 hsv_roi =  cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
 #these range was gotten by trial and error using the Pain tool
 #https://tech.deepumohan.com/2011/04/how-to-get-html-color-code-from-image.html
 mask = cv2.inRange(hsv_roi, np.array((0., 60.,32.)), np.array((130.,130.,130.)))
 roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])
 cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX)
 # Setup the termination criteria, either 10 iteration or move by atleast 1 pt
 term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )
 while(1):
  if ret == True:
      hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
      dst = cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1)
      # apply meanshift to get the new location
      ret, track_window = cv2.CamShift(dst, track_window, term_crit)
      # Draw it on image
      pts = cv2.boxPoints(ret)
      pts = np.int0(pts)
      img2 = cv2.polylines(frame,[pts],True, 255,2)
      cv2.imshow('img2',img2)
      k = cv2.waitKey(60) & 0xff
      if k == 27:
          break
      else:
          cv2.imwrite(chr(k)+".jpg",img2)
   ret ,frame = cap.read()
  else:
      break
 cv2.destroyAllWindows()
 cap.release()
else:
    print "Not enough matches are found - %d/%d" % (len(good),MIN_MATCH_COUNT)
3. Optical Flow
3.1 Concept
Optical flow is the pattern of apparent motion of image objects between two consecutive frames caused by the movement of object or camera. Optical flow has assumptions:
    + The pixel intensities of an object do not change between consecutive frames.
    + Neighbor pixels have similar motion.
Optical flow has many applications in areas like :
    + Video Compression
    + Video Stabilization
    + Measure the velocities of objects in the video. In general, moving objects that are closer to the camera will display more apparent motion than distant objects that are moving at the same speed.
Figure: a ball moving in 5 consecutive frames
The equation of Optical Flow is:
$f_{x}u+f_{y}v+f_{t}=0\\
f_{x}=\frac{\delta f}{\delta x};f_{y}=\frac{\delta f}{\delta y}\\
u=\frac{\delta x}{\delta t};v=\frac{\delta y}{\delta t}$
$f_{x};f_{y}$ are image gradient, $f_{y}$ is time gradient. $(u,v)$ are unknown.
There are several methods to solve this problem.
3.2 Lucas-Kanade method
It assumes that the flow is essentially constant in a local neighborhood of the pixel under consideration, and solves the basic optical flow equations for all the pixels in that neighborhood, by the least squares criterion.
3.3 Optical Flow in OpenCV
OpenCV provides cv2.calcOpticalFlowPyrLK(). We have to do:
+ Using cv2.goodFeaturesToTrack() to find points to track.
+ Pass the previous frame, previous points and next frame to cv2.calcOpticalFlowPyrLK()
+ Function returns next points and status number with value of 1 if next point is found, else zero
Apply this for the Flappy Bird video above:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import numpy as np
import cv2
cap = cv2.VideoCapture('flappybird.mp4')
# params for ShiTomasi corner detection
feature_params = dict( maxCorners = 100,
                       qualityLevel = 0.3,
                       minDistance = 7,
                       blockSize = 7 )
# Parameters for lucas kanade optical flow
lk_params = dict( winSize  = (15,15),
                  maxLevel = 2,
                  criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))
# Create some random colors
color = np.random.randint(0,255,(100,3))
# Take first frame and find corners in it
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
p0 = cv2.goodFeaturesToTrack(old_gray, mask = None, **feature_params)
# Create a mask image for drawing purposes
mask = np.zeros_like(old_frame)
refind = 0
while(1):
 ret,frame = cap.read()
 frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
 refind = refind + 1
 #re-find feature to track after 100 frames in case object disappears
 if(refind == 100):
  old_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
  p0 = cv2.goodFeaturesToTrack(old_gray, mask = None, **feature_params)
  mask = np.zeros_like(frame)
  refind = 0
 # calculate optical flow
 p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)
 # Select good points
 good_new = p1[st==1]
 good_old = p0[st==1]
 # draw the tracks
 for i,(new,old) in enumerate(zip(good_new,good_old)):
  a,b = new.ravel()
  c,d = old.ravel()
  mask = cv2.line(mask, (a,b),(c,d), color[i].tolist(), 2)
  frame = cv2.circle(frame,(a,b),5,color[i].tolist(),-1)
 img = cv2.add(frame,mask)
 cv2.imshow('frame',img)
 k = cv2.waitKey(30) & 0xff
 if k == 27:
  break
 # Now update the previous frame and previous points
 old_gray = frame_gray.copy()
 p0 = good_new.reshape(-1,1,2)
cv2.destroyAllWindows()
cap.release()
4. Background Subtraction 
Consider the application which counts visitors or vehicles. We need to extract the person or vehicles alone or extract the moving foreground from static background. OpenCV provides some methods:
4.1 BackgroundSubtractorMOG
It is a Gaussian Mixture-based Background/Foreground Segmentation Algorithm. This method uses the characteristic that the probable background colours are the ones which stay longer and more static.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import numpy as np
import cv2
cap = cv2.VideoCapture('flappybird.mp4')
fgbg = cv2.bgsegm.createBackgroundSubtractorMOG()
while(1):
    ret, frame = cap.read()
    fgmask = fgbg.apply(frame)
    cv2.imshow('frame',fgmask)
    k = cv2.waitKey(30) & 0xff
    if k == 27:
        break
cap.release()
cv2.destroyAllWindows()
4.2 BackgroundSubtractorMOG2
This method improves the adaptability to varying scenes due illumination changes compare to previous method.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import numpy as np
import cv2
cap = cv2.VideoCapture('flappybird.mp4')
fgbg = cv2.createBackgroundSubtractorMOG2()
while(1):
    ret, frame = cap.read()
    fgmask = fgbg.apply(frame)
    cv2.imshow('frame',fgmask)
    k = cv2.waitKey(30) & 0xff
    if k == 27:
        break
cap.release()
cv2.destroyAllWindows()

Wednesday, 6 June 2018

Part 6: Studying Digital Image Processing with OpenCV-Python

1. What is feature?
Consider the image:
Figure: example of feature (from OpenCV)
+ Blue rectangle is flat area. If we move the blue rectangle, it looks the same. So it is not special.
+ Black rectangle is on edge. If we move it in vertical direction, it changes. If we move it along the edge, it looks the same.
+ Red rectangle, it is a corner. if we move it, it looks different. So it is special. And corners are considered to be good features in an image.
Feature Detection : finding regions in images which have maximum variation when moved (by a small amount) in all regions around it.
Feature Description : describe the region around the feature so that the feature can be found in other images (we can stitch them).
Note: install this package before starting. just run:
pip install opencv-contrib-python
2. Harris Corner Detection
Refer to A Combined Corner and Edge Detector paper by Chris Harris & Mike Stephens
This algorithm basically finds the difference in intensity for a displacement of (u,v) in all directions.
$E(u,v)=\sum_{x,y}^{ }w(x,y)[I[x+u,y+v]-I[x,y]]^{2}$
$w(x,y)$: window function
$I[x+u,y+v]$: shifted intensity
$I[x,y]$: intensity
Then we maximize this function to find corners using Taylor Expansion and Sobel derivatives in x and y.
A score function is defined to determine if a window can contain a corner or not. Its formula:
$R=\lambda _{1}\lambda _{2}-k(\lambda _{1}+\lambda _{2})^{2}$
The result of Harris Corner Detection is a gray scale image with scores. Thresholding it to mark the corners in the image.
OpenCV provides  cv2.cornerHarris(). Its arguments are :
+ img - Input image, it should be grayscale and float32 type.
+ blockSize - It is the size of neighborhood considered for corner detection
+ ksize - Aperture parameter of Sobel derivative used.
+ k - Harris detector free parameter in the equation.
Apply Harris algorithm for image below:
Figure: input image to apply Harris
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import cv2
import numpy as np

filename = 'images/wall.png'
img = cv2.imread(filename)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

gray = np.float32(gray)
dst = cv2.cornerHarris(gray,3,9,0.04)

# Threshold for an optimal value, it may vary depending on the image.
img[dst>0.1*dst.max()]=[0,0,255]

cv2.imshow('dst',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Figure: adjusting arguments to achieve good result
3. Shi-Tomasi Corner Detector & Good Features to Track
Shi-Tomasi score function was proposed:
$R=min(\lambda _{1},\lambda _{2})$
If it is a greater than a threshold value, it is considered as a corner.
Input image should be a gray scale image.
OpenCV has a function, cv2.goodFeaturesToTrack(). Its arguments:
+ Specify number of corners you want to find.
+ specify the quality level (between 0-1), which denotes the corner which quality is below this, will be rejected.
+ The minimum euclidean distance between corners detected.
Apply Shi-Tomasi for the image above:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import numpy as np
import cv2

img = cv2.imread('images/wall.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

corners = cv2.goodFeaturesToTrack(gray,500,0.3,4)

for i in corners:
    x,y = i.ravel()
    cv2.circle(img,(x,y),2, (0,0,255),-1)

cv2.imshow('dst',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Figure: adjusting arguments to achieve good result
4. Scale-Invariant Feature Transform (SIFT)
4.1 Concept 
As its name, this algorithm will cover the case where image is scaled.
There are 4 steps:
Step 1: Scale-space Extrema Detection
Take the original image, calculate Difference of Gaussian (DoG) of it. It is obtained as the difference of Gaussian blurring of that image with two different scales (blur levels) $\sigma$ and $k\sigma$. Then, you resize the original image to half size (create another octaves) and do the similar process. As the figure below:


Figure: calculate DoG for octaves
Search for local maxima/minima in DoG images by taking one pixel in DoG image and compare with its 8 neighbours as well as 9 pixels in next scale and 9 pixels in previous scales. If it is a local extrema, it is a potential keypoint.
Figure: Search for local maxima/minima in DoG images
Step 2: Keypoint Localization
This step  eliminates any low-contrast keypoints and edge keypoints.
In order to eliminate potential keypoints, Taylor series expansion was used to calculate an intensity. If the intensity at this extrema is less than a threshold value, it is rejected. This threshold is called contrastThreshold in OpenCV.
In order to eliminate edges, 2x2 Hessian matrix was used to calculate a ratio. If this ratio is greater than a threshold, called edgeThreshold in OpenCV.
Step 3: Orientation Assignment
Assigned orientation to each keypoint to achieve invariance to image rotation.
The magnitude and orientation is calculated for all pixels around the keypoint. An orientation histogram with 36 bins covering 360 degrees is created. The highest peak and peaks above 80% of it in the histogram are considered to calculate the orientation. It creates keypoints with same location and scale, but different orientations.
Step 4: Keypoint Descriptor
Take a 16x16 neighborhood around the keypoint and devide it into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created. So a total of 128 bin values are available. It is represented as a vector to form keypoint descriptor.
Step 5: Keypoint Matching
Keypoints between two images are matched by identifying their nearest neighbours.
4.2 SIFT in OpenCV
First we have to construct a SIFT object "sift"
sift.detect() function finds the keypoint in the images.
cv2.drawKeyPoints() draws the small circles on the location and orientation of keypoints. 
sift.compute() which computes the descriptors from the keypoints
sift.detectAndCompute() find keypoints and descriptors in one step.
Apply SIFT for the image above:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import cv2
import numpy as np

img = cv2.imread('wall.png')
gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

sift = cv2.xfeatures2d.SIFT_create(0,3,0.07,10,1.6)
kp = sift.detect(gray,None)

cv2.drawKeypoints(gray,kp,img,flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

cv2.imshow('SIFT',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Figure: output of SIFT
5. Speeded-Up Robust Features (SURF)
5.1 Concept
SURF is 3 times faster than SIFT. SURF is good at handling images with blurring and rotation, but not good at handling viewpoint change and illumination change.
Instead of using DoG, SURF approximates LoG with Box Filter.
For orientation assignment, SURF uses wavelet responses in horizontal and vertical direction for a neighborhood of size 6s. In case rotation invariance is not required, Upright-SURF or U-SURF (upright flag in OpenCV) is used to speed up the process.
For feature description, SURF uses Wavelet responses in horizontal and vertical direction.
SURF feature descriptor has 64 dimensions and extended 128 dimension version. OpenCV supports both by setting the value of flag extended with 0 and 1 for 64-dim and 128-dim.
It uses sign of Laplacian to distinguishe bright blobs on dark backgrounds from the reverse situation. So in the matching step, only features that have the same type of contrast, will be compared, as image below:
Refer to SURF: Speeded Up Robust Features paper.
5.2 SURF in OpenCV
Apply SURF for image below:
Figure: input image for SURF
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import cv2
import numpy as np

img = cv2.imread('surf.png')
gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
#here the Hessian threshold is 20000
surf = cv2.xfeatures2d.SURF_create(20000)
#detect keypoints
kp, des = surf.detectAndCompute(img,None)

img2 = cv2.drawKeypoints(img,kp,None,(255,0,0),4)

cv2.imshow('SURF',img2)
cv2.waitKey(0)
cv2.destroyAllWindows()
Figure: adjusting arguments to achieve good result, and SURF act as a blobs detection (detect white objects in black background)
If Upright is enabled. We just add surf.setUpright(True)
Figure: All the orientations are shown in same direction
If using 128 dimensions version, just using surf.extended = True
6. Features from Accelerated Segment Test (FAST)
6.1 Concept
It is several times faster than other existing corner detectors.
It is not robust to high levels of noise.
It depends on a threshold.
Refer to Machine learning for high-speed corner detection. 
The algorithm:
+ Select a test pixel p in the image and its intensity be Ip.
+ Select threshold value t.
+ Consider a circle of 16 pixels around the test pixel.
+ The pixel p is a corner if there are n contiguous pixels in the circle (white dash lines of 16 pixels) which are all brighter than Ip+t, or all darker than Ip−t.. n was chosen to be 12.
+ A high-speed test examines only the four pixels at 1, 9, 5 and 13. If p is a corner, then at least three of these must all be brighter than Ip+t or darker than Ip−t.
+ The full segment test examines all pixels in the circle.
+ There are several weaknesses:
 ++ It does not reject as many candidates for n < 12.
 ++ The choice of pixels is not optimal because its efficiency depends on ordering of the questions and distribution of corner appearances.
 ++ Results of high-speed tests are thrown away.
 ++ Multiple features are detected adjacent to one another.
First 3 points are addressed with a machine learning approach. Last one is addressed using non-maximal suppression.
6.2 FAST in OpenCV 
Arguments:
+ Specify the threshold
+ non-maximum suppression to be applied or not.
+ the neighborhood, three flags are defined,  cv.FAST_FEATURE_DETECTOR_TYPE_5_8, cv.FAST_FEATURE_DETECTOR_TYPE_7_12 and cv.FAST_FEATURE_DETECTOR_TYPE_9_16.
Apply FAST for image:
Figure: input image for FAST
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import numpy as np
import cv2 as cv

img = cv.imread('images/wall.png')
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# Initiate FAST object with default values
fast = cv.FastFeatureDetector_create(55,True,cv.FAST_FEATURE_DETECTOR_TYPE_9_16 )
# find and draw the keypoints
kp = fast.detect(gray,None)
img2 = cv.drawKeypoints(img, kp, None, color=(255,0,0))
# Print all default params
print( "Threshold: {}".format(fast.getThreshold()) )
print( "nonmaxSuppression:{}".format(fast.getNonmaxSuppression()) )
print( "neighborhood: {}".format(fast.getType()) )
print( "Total Keypoints with nonmaxSuppression: {}".format(len(kp)) )
cv.imwrite('fast_true.png',img2)
# Disable nonmaxSuppression
fast.setNonmaxSuppression(0)
kp = fast.detect(img,None)
print( "Total Keypoints without nonmaxSuppression: {}".format(len(kp)) )
img3 = cv.drawKeypoints(img, kp, None, color=(255,0,0))
cv.imwrite('fast_false.png',img3)
 Figure: no apply non-maximum suppression+adjusting threshold
Figure: apply non-maximum suppression+adjusting threshold
7. Binary Robust Independent Elementary Features (BRIEF)
7.1 Concept
BRIEF is a faster method feature descriptor calculation and matching. It also provides high recognition rate if there is not large in-plane rotation.
A vector with thousands of features consumes a lot of memory. Moreover, larger the memory, longer the time it takes for matching. These are not feasible for the low resource systems. So we need to compress it using methods such as PCA, LDA,  LSH (Locality Sensitive Hashing).
LSH converts SIFT descriptors from floating point numbers to binary strings. These binary strings are used to match features using Hamming distance.
BRIEF finds the binary strings directly without finding descriptors.
Refer to Binary Robust Independent Elementary Features paper.
From smoothed image, BRIEF takes $n_{d}(x,y)$ pairs. For each pair, compare the intensity of 2 pixels, if $I(p)<I(q)$ result is 1, else it is 0. Repeat for $n_{d}(x,y)$ pairs.
$n_{d}(x,y)$ can be 128, 256, 512. OpenCV represents it in bytes so the values will be 16, 32 and 64. Then using Hamming Distance to match these descriptors.
The paper recommends to use CenSurE to find the features and then apply BRIEF.
7.2 BRIEF in OpenCV
Apply FAST+BRIEF for the image above:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import numpy as np
import cv2 as cv

img = cv.imread('wall.png')
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# Initiate FAST object with default values
fast = cv.FastFeatureDetector_create(55,True,cv.FAST_FEATURE_DETECTOR_TYPE_9_16 )
# Initiate BRIEF extractor
brief = cv.xfeatures2d.BriefDescriptorExtractor_create(64, True)
# find and draw the keypoints
kp = fast.detect(gray,None)
# compute the descriptors with BRIEF
kp, des = brief.compute(gray, kp)
print brief.descriptorSize()
img2 = cv.drawKeypoints(img, kp, None, color=(255,0,0))
cv.imwrite('fast_true.png',img2)
Figure: after applying BRIEF, comparing to previouys result
8. Oriented FAST and Rotated BRIEF (ORB)
Refer to ORB: An efficient alternative to SIFT or SURF paper.
ORB is an improvement of FAST keypoint detector and BRIEF descriptor. it implement many modifications to enhance the performance.
OpenCV provides cv2.ORB() or using feature2d common interface, with main arguments:
+ nFeatures : maximum number of features to be retained (by default 500)
+ scoreType : Harris score or FAST score to rank the features (by default, Harris score)
+ WTA_K : number of points that produce each element of the oriented BRIEF descriptor.
++ WTA_K=2, NORM_HAMMING distance is used for matching
++ WTA_K=3 or 4, NORM_HAMMING2 distance is used for matching
Apply ORB for image above:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import cv2
import numpy as np

img = cv2.imread('wall.png')
gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Initiate ORB detector
orb = cv2.ORB_create(300,1.1,4,10,0,2,1,31,10)
# find the keypoints with ORB
kp = orb.detect(img,None)
# compute the descriptors with ORB
kp, des = orb.compute(img, kp)
# draw only keypoints location,not size and orientation
img2 = cv2.drawKeypoints(img, kp, None, color=(255,0,0), flags=0)

cv2.imshow('ORB',img2)
cv2.waitKey(0)
cv2.destroyAllWindows()
Figure: output of ORB
9. Feature Matching
It takes the descriptor of one feature in first set and compare with all features in second set using specific distance calculation. And the closest one is returned.
OpenCV creates BFMatcher object using cv2.BFMatcher() with 2 optinal arguments:
+ normType : specify the distance measurement to be used.
   ++ cv2.NORM_L2 (default value) is for SIFT, SURF
   ++ cv2.NORM_HAMMING is for binary string based descriptors like ORB, BRIEF, BRISK
   ++ cv2.NORM_HAMMING2 is for ORB is using WTA_K == 3 or 4
+ crossCheck : the two features in both sets should match each other. The matched result with value (i,j) such that i-th descriptor in set A has j-th descriptor in set B as the best match and vice-versa.
+ BFMatcher.match() : returns the best match
+ BFMatcher.knnMatch() : returns k best matches
+ cv2.drawMatches() : draws the matches
+ cv2.drawMatchesKnn : draws all the k best matches
The DMatch object has following attributes:
+ DMatch.distance - Distance between descriptors. The lower, the better it is.
+ DMatch.trainIdx - Index of the descriptor in train descriptors
+ DMatch.queryIdx - Index of the descriptor in query descriptors
+ DMatch.imgIdx - Index of the train image.
Apply Matching with SIFT Descriptors for the image below:
Figure: image to search

Figure: this query image was cut + rotate + scale from original image
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import numpy as np
import cv2

from matplotlib import pyplot as plt
img1 = cv2.imread('query.png')          # queryImage
img2 = cv2.imread('toy-story.jpg') # trainImage
gray1= cv2.cvtColor(img1,cv2.COLOR_BGR2GRAY)
gray2= cv2.cvtColor(img2,cv2.COLOR_BGR2GRAY)
# Initiate SIFT detector
sift = cv2.xfeatures2d.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(gray1,None)
kp2, des2 = sift.detectAndCompute(gray2,None)
# BFMatcher with default params
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1,des2, k=2)
# Apply ratio test
good = []
for m,n in matches:
    if m.distance < 0.5*n.distance:
        good.append([m])
img3 = None
# cv2.drawMatchesKnn expects list of lists as matches.
img3 = cv2.drawMatchesKnn(img1,kp1,img2,kp2,good,img3,flags=2)
plt.imshow(cv2.cvtColor(img3, cv2.COLOR_BGR2RGB)),plt.show()
Figure: Matching with SIFT Descriptors
10. Feature Matching + Homography to find Objects
How to find the object exactly on the source image?
We pass a set of points from both the images to cv2.findHomography() to find the perspective transformation of that object (It needs at least four points to find the transformation). Using RANSAC or LEAST_MEDIAN algorithm flag to eliminate abnormal data.
Then we use cv2.perspectiveTransform() to find the object.
Reuse the example above:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import numpy as np
import cv2
from matplotlib import pyplot as plt
MIN_MATCH_COUNT = 10
img1 = cv2.imread('query.png')          # queryImage
img2 = cv2.imread('toy-story.jpg') # trainImage
gray1= cv2.cvtColor(img1,cv2.COLOR_BGR2GRAY)
gray2= cv2.cvtColor(img2,cv2.COLOR_BGR2GRAY)
# Initiate SIFT detector
sift = cv2.xfeatures2d.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(gray1,None)
kp2, des2 = sift.detectAndCompute(gray2,None)
# BFMatcher with default params
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1,des2, k=2)
# Apply ratio test
good = []
for m,n in matches:
    if m.distance < 0.5*n.distance:
        good.append([m])
if len(good)>MIN_MATCH_COUNT:
 src_pts = np.float32([ kp1[m[0].queryIdx].pt for m in good ]).reshape(-1,1,2)
 dst_pts = np.float32([ kp2[m[0].trainIdx].pt for m in good ]).reshape(-1,1,2)
 M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)
 #extract found ROI
 matchesMask = mask.ravel().tolist()
 h,w,d = img1.shape
 pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
 dst = cv2.perspectiveTransform(pts,M)
 #draw rectangle
 img2 = cv2.polylines(img2,[np.int32(dst)],True,255,3, cv2.LINE_AA)
else:
    print "Not enough matches are found - %d/%d" % (len(good),MIN_MATCH_COUNT)
    matchesMask = None


draw_params = dict(matchColor = (0,255,0), # draw matches in green color
                   singlePointColor = None,
                   matchesMask = matchesMask, # draw only inliers
                   flags = 2)
img3 = None
# cv2.drawMatchesKnn expects list of lists as matches.
img3 = cv2.drawMatchesKnn(img1,kp1,img2,kp2,good,img3,flags=2)
plt.imshow(cv2.cvtColor(img3, cv2.COLOR_BGR2RGB)),plt.show()
Figure: Feature Matching + Homography to find Objects