Thursday, 31 January 2019

Collection of most common Neural Network mistakes


1) you didn't try to overfit a single batch first. (i.e. if you can't overfit a small amount of data you've got a simple bug somewhere)
2) you forgot to toggle train/eval mode for the net.
3) you forgot to .zero_grad() (in pytorch) before .backward().
4) you passed softmaxed outputs to a loss that expects raw logits. ; others? :)
5) you didn't use bias=False for your Linear/Conv2d layer when using BatchNorm, or conversely forget to include it for the output layer .This one won't make you silently fail, but they are spurious parameters.
6) thinking view() and permute() are the same thing (& incorrectly using view)
7) I like to start with the simplest possible sanity checks - e.g. also training on all zero data first to see what loss I get with the base output distribution, then gradually include more inputs and scale up the net, making sure I beat the previous thing each time. (starting with small model + small amount of data & growing both together; I always find it really insightful)
(I turn my data back on and get the same loss :) also if doing this produces a nice/decaying loss curve, this usually indicates not very clever initialization. I sometimes like to tweak the final layer biases to be close to base distribution)

Wednesday, 22 August 2018

Studying Keras - Convolution Neural Network

1. Concept
2. How to build Models in Keras
2 ways to build models in Keras:
- Using Sequential : different predefined layers are stacked in a stack.
- Using Functional API to build complex models.
E.g: example of using Sequential
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
model = Sequential()
model.add(Dense(N_HIDDEN, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(DROPOUT))
model.add(Dense(N_HIDDEN))
model.add(Activation('relu'))
model.add(Dropout(DROPOUT))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.summary()
3. Prebuilt layers in Keras
A Dense model is a fully connected neural network layer.
RNN layers (Recurrent, SimpleRNN, GRU, LSTM)
CNN layers (Conv1D, Conv2D, MaxPooling1D, MaxPooling2D)
4. Regularization
kernel_regularizer: Regularizer function applied to the weight matrix
bias_regularizer: Regularizer function applied to the bias vector
activity_regularizer: Regularizer function applied to the output of
Dropout regularization is frequently used
Note:
+ you can write your regularizers in an object-oriented way
5. Batch normalization
It is a way to accelerate learning and generally achieve better accuracy.
1
keras.layers.normalization.BatchNormalization()
6. Activation functions
7. Losses functions
There are 4 categories:
- Accuracy : is for classification problems.
+ binary_accuracy : for binary classification problems
+ categorical_accuracy : for multiclass classification problems
+ sparse_categorical_accuracy : for sparse targets
+ top_k_categorical_accuracy : for the target class is within the top_k predictions
- Error loss : the difference between the values predicted and the values actually observed.
+ mse : mean square error
+ rmse : root square error
+ mae : mean absolute error
+ mape : mean percentage error
+ msle : mean squared logarithmic error
- Hinge loss : for training classifiers. There are two versions: hinge and squared hinge
- Class loss : to calculate the cross-entropy for classification problems:
+ binary cross-entropy
+ categorical cross-entropy
8. Metrics
A metric is a function that is used to judge the performance of your model. It is similar to an objective function but the results from evaluating a metric are not used when training the model.
9. Optimizers
Optimizers are SGD, RMSprop, and Adam.
10. Saving and loading the model
1
2
3
4
5
6
7
from keras.models import model_from_json 

# save as JSON 
json_string = model.to_json()
# save as YAML yaml_string = model.to_yaml()
#load model from JSON: 
model = model_from_json(json_string)
1
2
3
4
5
6
from keras.models import load_model 

#save model weights
model.save('my_model.h5')
#load model weights
model = load_model('my_model.h5')
10. Callbacks for customizing the training process
- The training process can be stopped when a metric has stopped improving by using keras.callbacks.EarlyStopping()
- You can define callback: class LossHistory(keras.callbacks.Callback)
- Check pointing saves a snapshot of the application's state at regular intervals, so the application can be restarted/recovered from the last saved state in case of failure. The state of a deep learning model is the weights of the model at that time. Using ModelCheckpoint(). Set save_best_only to true only saving a checkpoint if the current metric is better than the previously saved checkpoint.
- Using TensorBoard and Keras:
1
2
3
keras.callbacks.TensorBoard(log_dir='./logs', histogram_freq=0,write_graph=True, write_images=False)

#tensorboard --logdir=/full_path_to_your_logs
- Using Quiver and Keras : a tool for visualizing ConvNets features in an interactive way.
11. A simple neural net in Keras
11.1 Input/Output
X_train : training set
Y_train : labels of training set
We reserve apart of the training data (test set) for measuring the performance on the validation while training.
X_test : test set
Y_test : labels of test set
Data is converted into float32 for supporting GPU computation and normalized to [0, 1]
Use MNIST images, each image is a 28x28 pixels. So create a neural network with 28x28 neurons, one neuron for one pixel.
Each pixel has intensity 255, so we divide it with 255 to normalize in the range [0, 1].
The final layer only has a neuron with activation function softmax. It "squashes" a K-dimensional vector of arbitrary real values to a K-dimensional vector of real values, where each entry is in the range (0, 1). In this example, the output value is in [0-9] so K = 10.
Note: One-hot encoding - OHE
In recognizing handwritten digits, the output value is in [0-9] can be encoded into a binary vector with 10 positions, which always has 0 value, except the d-th position where a 1 is present.
11.2 Setup model
Compile (compile()) the model by selecting:
- optimizer : the specific algorithm used to update weights when training our model.
- objective function - loss function - cost function : used by the optimizer to map an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. In this case it is to be minimized.
- metrics to evaluate the trained model
11.3 Train model
Trained with the fit()
- epoches:  number of times the optimizer tries to adjust the weights so that the objective function is minimized.
- batch_size :  number of samples that the optimizer performs a weight update.
- validation_split : reserve a part of the training set for validation.
Note: On Linux, we can use the command "nvidia-smi" to show GPU memory used so that you can increase batch size if not fully utilized.
Evaluate the network with evaluate() with (X_test, Y_test)
Figure: Result of training simple model
11.4 Visualize Model Training History
This helps :
- Checking convergence and over fitting when training.
- History callback is registered when training all deep learning models. It records:
 + accuracy on the training and validation datasets over training epochs.
 + loss on the training and validation datasets over training epochs.
 + Use results of history.history.keys() returns ['acc', 'loss', 'val_acc', 'val_loss']. So plot them :
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
11.5 Improving the simple net
Note: For every experiments we have to plot the charts to compare the effect.
- Add additional layers (hidden layers) to our network. After the first hidden layer, we add second hidden layer, again with the N_HIDDEN neurons and an activation function RELU.
1
2
model.add(Dense(N_HIDDEN))
model.add(Activation('relu'))
Figure: Add additional layers
- Using Dropout. This is a form of regularization. It randomly drop with the dropout probability some of the neurons of hidden layers. In testing phase, there is no dropout, so we are now using all our highly tuned neurons.
1
2
3
model.add(Dense(N_HIDDEN))
model.add(Activation('relu'))
model.add(Dropout(DROPOUT))
- Testing different optimizers. Keras supports stochastic gradient descent (SGD) and optimization algorithms RMSprop and Adam. RMSprop and Adam used the concept of momentum in addition to SGD. These combinations improve the accuracy and training speed.
1
2
3
4
from keras.optimizers import RMSprop, Adam
...
OPTIMIZER = RMSprop() # optimizer,
#OPTIMIZER = Adam() # optimizer
Figure: Accuracy when using Adam and RMSprop
Figure: Dropout combine Adam algorithm
- Adjusting the optimizer learning rate. This can help.
1
2
3
keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False)
#keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
#keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
Figure: The accuracy when changing learning rate
- Increasing the number of internal hidden neurons. The complexity of the model increases, one epoch time increases because there are more parameters to optimize. The accuracy may increase.
Figure: The accuracy when changing number of neurons
- Increasing the training batch size. Look the accuracy chart to choose the optimal batch size.
Figure: The accuracy when changing batch size
- Increasing the number of epochs
Figure: The accuracy when changing epochs, not improve
11.6 Avoiding over-fitting
Learning is more about generalization than memorization.
Look the graph below:
Figure: loss function on both validation and training sets
Loss function decreasing on both validation and training sets. However, a certain point the loss on validation starts to increase. This is over-fitting and we have a problem with model complexity.
The complexity of a model can be measured by the number of nonzero weights.
If we have 2 models M1 and M2, with the same result of loss function over training epochs, then we should choose the simplest model that has the minimum complexity.
In order to avoid over-fitting we use regularization. Keras supports both L1 (lasso, use absolute values), L2 (ridge, use square values), and elastic net (combine L1+L2) regularizations.
11.7 Summarizing the experiments
We create a summarized table:
Figure: experiments summarized table
11.8 Hyper-parameters tuning
As above, we have multiple parameters that need to be optimized. This process is Hyperparameter tuning. This process find the optimal combination of those parameters that minimize cost functions. There are some ways:
+ Grid search: build a model for each possible combination of all of the hyperparameter values provided.
E.g:
param1 = [10, 50, 100]
param2 = [3, 9, 27]
=> all possible combinations:[10,3];[10,9];[10,27];[50,3];[50,9];[50,27];[100,3];[100,9];[100,27]
+ Random search : build a statistical distribution for each hyperparameter then values may be chosen randomly. We use this in case hyperparameters have different impact.
+ Bayesian optimization : the next hyperparameters are a improvement that based on the  of training result of previous hyperparameters.
Refer here.
11.9 Predicting output
Using:
predictions = model.predict(X) : compute outputs
model.evaluate() : compute the loss values
model.predict_classes() : compute category outputs
model.predict_proba() : compute class probabilities
12. Convolutional neural network (CNN)
General deep network do not care about the spatial structure and relations of each image. But convolutional neural networks (ConvNet) cares about the spatial information so it improves image classification. CNN uses convolution and pooling to do that. A ConvNet has multiple filters stacked together which learn to recognize specific visual features independently in the image.
1
2
3
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(256, 256, 3))
model.add(MaxPooling2D(pool_size = (2, 2)))
Applying a 3 x 3 convolution on a 256 x 256 image with three input channels and 32 filters, and max polling.
13. Simple net vs  Deep CNN
Using MINST data, we test these 2 networks by reducing training set size with 5,900; 3,000; 1,800; 600 and 300. Test set size still be 10,000 examples. All the experiments are run for four training iterations.
Figure: Deep CNN is better than Simple net, even when it has more weights but less training data
13. Recognizing CIFAR-10 images with deep learning
The CIFAR-10 dataset contains 60,000 color images of 32 x 32 pixels in 3 channels divided into 10 classes. Each class contains 6,000 images. The training set size is 50,000 images, while the test sets size is 10,000 images.
The goal is to recognize unseen images and assign them to one of the 10 classes.
Figure: CIFAR-10 images
13.1  First simple deep network
13.2  Increasing the depth of network
13.3 Increasing data with data augmentation
This method generates more images for our training. We take the available CIFAR training set and augment this training set with multiple types of transformations including rotation, rescaling, horizontal/vertical flip, zooming, channel shift, ...
13.4 Extracting features from pre-built deep learning models
Each layer learns to identify the features of input that are necessary for the final classification.
Higher layers compose lower layers features to form shapes or objects.
13.5 Transfer learning
Reuse pre-trained CNNs to generate new CNN where the data set may not be large enough to train entire CNN from scratch.
to be continued ...

Wednesday, 11 July 2018

How to setup Tensorflow/Keras running on Nvidia GPU

We will use Virtualenv for this set-up. Just following the commands:
1. Install Virtualenv:
sudo pip install virtualenv
2. Commands to setup Virtualenv
- mkdir virtualws
- virtualenv virtualws/keras_demo
- cd virtualws/keras_demo/bin
- source activate
3. Install Tensorflow as Keras
- pip install tensorflow_gpu
- pip install keras
Note: we installed "tensorflow_gpu"
4. Install driver for Nvidia GPU:
4.1 Method 1
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt-get install nvidia-xxx
Note: "xxx" can be checked here.
or you can download the driver here (.run file) and install it manually. Then using commands:
sudo chmod +x xxx.run
./xxx.run
4.2 Method 2
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
And using Software & Updates => Additional Drivers
Note: choose the version that match your Linux kernel version (using "uname -r")
4.3 Method3
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt-get install nvidia-driver-xxx
Note: this way you should choose gcc driver with version gcc-5, g++-5
sudo apt-get install gcc-5 g++-5
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5

Refer: https://askubuntu.com/questions/26498/how-to-choose-the-default-gcc-and-g-version
Then verify the driver using: lsmod | grep nvidia
5. Install NVIDIA CUDA Toolkit and cuDNN
To check which version od NVIDIA CUDA Toolkit and cuDNN are needed. We create a simple demo that using Tensorflow/Keras.
Create a test.py file with content and run python test.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# 3. Import libraries and modules
import numpy as np
np.random.seed(123)  # for reproducibility
 
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
 
# 4. Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
 
# 5. Preprocess input data
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
 
# 6. Preprocess class labels
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
 
# 7. Define model architecture
model = Sequential()

model.add(Convolution2D(32, (3, 3), activation='relu', input_shape=(1,28,28), data_format='channels_first'))
model.add(Convolution2D(32, (3, 3), activation='relu', data_format='channels_first'))
model.add(MaxPooling2D(pool_size=(2,2), data_format='channels_first'))
model.add(Dropout(0.25))
 
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
 
# 8. Compile model
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
 
# 9. Fit model on training data
model.fit(X_train, Y_train, 
          batch_size=32, nb_epoch=10, verbose=1)
 
# 10. Evaluate model on test data
score = model.evaluate(X_test, Y_test, verbose=0)
Python will throw error that it could not load the library file with version. Based on that you can download NVIDIA CUDA Toolkit and cuDNN versions accordingly.
If you download NVDIA CUDA Toolkit as a ".run" file. Just using commands:
sudo chmod +x xxx.run
./xxx.run
Press ENTER until 100% was reached and choose YES for all the questions except the question that asking you to install NVDIA driver (If you got driver just choose NO).
Then open the hiden file .bashrc at /home/your_user/.bashrc and append the lines:
export LD_LIBRARY_PATH=/usr/local/cuda/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
In order to install cuDNN, unzip it and using commands:
cd to_unzip_folder
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

6. Install dependencies
Run the commands:
sudo apt-get install libcupti-dev
Again open .bashrc and append the line:
export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
Note:
To monitor GPU running using command: watch -n 1 nvidia-smi
That is all.
Run the demo above again. The training process is faster:
Figure: Tensorflow/Keras runs on Nvidia GPU

Friday, 6 July 2018

GPU concept

1. GPU
- The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. But we can implement any algorithm, not only graphics with high efficiency and performance.
GPU computing - features:
- Massively parallel.
- Hundreds of cores.
- Thousands of threads.
- Cheap.
- Highly available.
- Computing:
    + 1 Teraflop (Single precision)
    + 100 Gflops (Double precision)
- Programable: CUDA
- Important factors to consider: power and cooling.
Figure: GPU vs CPU
Note:
- The amount of RAM featured on a certain GPU is typically the first spec listed when naming the card. Basically, the more vRAM a card has, the more complex tasks it can load. If the tasks run overload your GPU’s vRAM, the overflow goes to system RAM, significantly impacting performance in a negative way.
- On Linux and NVIDIA GPU, we can use the command "nvidia-smi" or "watch -n 1 nvidia-smi" or "nvidia-settings" to show GPU memory used and the processes using the GPU.
2. Compute Unified Device Architecture (CUDA)
Its characteristics:
- A compiler and toolkit for programming NVIDIA GPUs.
- API extends the C programming language.
- Runs on thousands of threads.
- Scalable model.
- Parallelism.
- Give a high level abstraction from hardware.
- CUDA language is vendor dependent.
- OpenCL is going to become an industry standard. It is a low level specification, more complex to program with than CUDA C.
3. CUDA architecture
Abstracting from the hardware.
Automatic Thread management (can handle +100k threads).Languages: C, C++, OpenCL.
OS: Windows, Linux, OS X.
Host The CPU and its memory (host memory)
Device The GPU and its memory (device memory)
Figure: CUDA architecture
User just take care:
- Analyze algorithm for exposing parallelism such as Block size, Number of threads.
- Resources management (efficient data transfers, Local data set - Register space that are limited on-chip memory)
4. Grid - Block - Thread - Kernel
4.1 Grid
The set of blocks is referred to as a grid.
4.2 Block
- Blocks are grouped in a grid.
- Blocks are independent.
- Each invocation can refer to its block index using blockIdx.x.
4.3 Thread
- Kernels are executed by thread.
- Each thread has a ID.
- Thousands of threads execute same kernel.
- Threads are grouped into blocks.
- Threads in a block can synchronize execution.
- Each invocation can refer to its block index using threadIdx.x.
4.4 Kernel
- A kernel is a simple C program.
Figure: Block - Thread - Kernel
5. Work flow
Figure: GPU calculation Work flow
6. C extensions
Block size: (x, y, z). x*y*z = Maximum of 768 threads total. (Hw dependent)
Grid size: (x, y). Maximum of thousands of threads. (Hw dependent)
__global__ : to be called by the host but executed by the GPU.
__host__ : to be called and executed by the host.
__shared__ : variable in shared memory.
__syncthreads() : sync of threads within a block.
*Indexing Arrays with Blocks and Threads
Consider indexing an array with one element per thread (8 threads/block)
Figure: block - thread Id
With M threads per block, a unique index for each thread is given by:
int index = threadIdx.x + blockIdx.x * M;
or using
int index = threadIdx.x + blockIdx.x * blockDim.x; // blockDim.x number of threads
Figure: element 21 will be hamdle by thread 5 following threadIdx.x + blockIdx.x * M = 5+2*8
Example: Vector Addition with Blocks and Threads
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#define N (2048*2048)
#define THREADS_PER_BLOCK 512

__global__ void add(int *a, int *b, int *c, int n) { 
 int index = threadIdx.x + blockIdx.x * blockDim.x;
 // Avoid accessing beyond the end of the arrays
 if (index < n) 
  c[index] = a[index] + b[index]; 
}

int main(void) {
 int *a, *b, *c; // host copies of a, b, c
 int *d_a, *d_b, *d_c; // device copies of a, b, c
 int size = N * sizeof(int);
 
 // Alloc space for device copies of a, b, c
 cudaMalloc((void **)&d_a, size);
 cudaMalloc((void **)&d_b, size);
 cudaMalloc((void **)&d_c, size);
 
 // Alloc space for host copies of a, b, c and setup input values
 a = (int *)malloc(size); random_ints(a, N);
 b = (int *)malloc(size); random_ints(b, N);
 c = (int *)malloc(size);
 
 // Copy inputs to device
 cudaMemcpy(d_a, a, size, cudaMemcpyHostToDevice);
 cudaMemcpy(d_b, b, size, cudaMemcpyHostToDevice);
 
 // Launch add() kernel on GPU with THREADS_PER_BLOCK threads
 add<<<N/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>(d_a, d_b, d_c, N);
 
 // Copy result back to host
 cudaMemcpy(c, d_c, size, cudaMemcpyDeviceToHost);
 
 // Cleanup
 free(a); free(b); free(c);
 cudaFree(d_a); cudaFree(d_b); cudaFree(d_c);
 return 0;
}

Thursday, 21 June 2018

Isolate Python working environment using Virtualenv

While working on multiple Python projects, you may face the issue that one project need a version of package that is completely different from the others. Virtualenv supports to resolve this issue by Isolating Python working environments. So dependencies required by the projects are isolated.
1. Install Virtualenv: sudo pip install virtualenv
2. Commands to setup Virtualenv
- mkdir ~/virtualws
- virtualenv ~/virtualws/keras_demo
- cd ~/virtualws/keras_demo/bin
- source activate
3. Test with Keras
- pip install tensorflow
- pip install keras
4. Run demo
Create a test.py file with content and run python test.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# 3. Import libraries and modules
import numpy as np
np.random.seed(123)  # for reproducibility
 
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
 
# 4. Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
 
# 5. Preprocess input data
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
 
# 6. Preprocess class labels
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
 
# 7. Define model architecture
model = Sequential()

model.add(Convolution2D(32, (3, 3), activation='relu', input_shape=(1,28,28), data_format='channels_first'))
model.add(Convolution2D(32, (3, 3), activation='relu', data_format='channels_first'))
model.add(MaxPooling2D(pool_size=(2,2), data_format='channels_first'))
model.add(Dropout(0.25))
 
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
 
# 8. Compile model
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
 
# 9. Fit model on training data
model.fit(X_train, Y_train, 
          batch_size=32, nb_epoch=10, verbose=1)
 
# 10. Evaluate model on test data
score = model.evaluate(X_test, Y_test, verbose=0)

Friday, 8 June 2018

Convolution Concept

1. Concept
Convolution is used in signal processing and analysis.
We can use convolution to construct the output of system for any arbitrary input signal, if we know the impulse response of system.
2. Definition
The mathematical form of convolution in discrete time domain;
$y[n]=x[n]*h[n]=\sum_{k=-\infty }^{+\infty }x[k].h[n-k]$
where:
$x[n]$ is input signal
$h[n]$ is impulse response, $h[n-k]$ time shifted
$y[n]$ is output
* denotes convolution
In order to understand the meaning of convolution, we have to take a look on impulse response and impulse decomposition.
3. Impulse decomposition.
The input signal is decomposed into simple additive components (in general, weighted sum of basis signals). And the system response of the input signal is the additive output of these components passed through the system.
We often use impulse (delta) functions for the basis signals.
Example how a signal is decomposed into a set of impulse (delta) functions:
Figure: input signal decomposition
An impulse (delta) function : $\delta [n]=\left\{\begin{matrix}
1, & n=0\\
0, & n\neq 0
\end{matrix}\right.$
So
$x[0] = x[0].\delta [n] = 2\delta [n]$
$x[1] = x[1].\delta [n-1] = 3\delta [n-1]$
$x[2] = x[2].\delta [n-2] = 1\delta [n-2]$
because 
$\delta [n-1]$ is 1 at n=1 and zeros at others.
Then
$x[n]$ can be represented by adding 3 shifted and scaled impulse functions.
$x[n] = x[0].\delta [n] +x[1].\delta [n-1]+x[2].\delta [n-2]=\sum_{k=-\infty }^{+\infty }x[k].h[n-k]$
4. Impulse Response
It is the output of a system resulting from an impulse input.
Figure: h[n] is impulse response
Figure: the response of a time-shifted impulse function is also a time-shifted response, shifted by K
Figure: a scaled in input signal causes a scaled in the output (linear), c is constant
Figure: 3 additive components input, 3 additive components output
Figure: convolution equation
Finally,
$x[n]=\sum x[k].\delta [n-k]\\
y[n]=\sum x[k].h [n-k]$
It means a signal is decomposed into a set of impulses and the output signal can be computed by adding the scaled and shifted impulse responses. 
Convolution works on the linear and time invariant system.
5. Convolution in 1D
Consider example:
$x[n] = \{3, 4, 5\}$
$h[n] = \{2, 1\}$
Figure: x[n]
Figure: h[n]
From the equation of convolution:
$y[n]=x[n]*h[n]=\sum_{k=-\infty }^{+\infty }x[k].h[n-k]$
We have:
$y[0]=x[0].h[0-0]=3*2=6$
$y[1]=x[0].h[1-0]+x[1].h[1-1]=3*2+4*2=14$
$y[2]=x[0].h[2-0]+x[1].h[2-1]+x[2].h[2-2]=3*0+4*1+5*2=14$
$y[3]=x[0].h[3-0]+x[1].h[3-1]+x[2].h[3-2]+x[3].h[3-3]=4*0+5*1=5$
$y[4]=x[0].h[4-0]+x[1].h[4-1]+x[2].h[4-2]+x[3].h[4-3]+x[4].h[4-4]=0$
Figure: y[n]
The pattern for programming:
$y[i]=x[i]h[0]+x[i-1]h[1]+...+x[i-k]h[k]$
If you face the boundary such as x[-1],x[-2] you can skip the convolution at the boundary or pad 0 to them.
6. Convolution in 2D
2D convolution is extension of 1D convolution. It convolves both horizontal and vertical directions. It is used in image processing.
Figure: impulse function in 2D
The impulse (delta) function in 2D:
$\delta [m,n]=\left\{\begin{matrix}
1, & m,n=0\\
0, & m,n\neq 0
\end{matrix}\right.$
Here, the matrix uses [column, row] form.
A signal can be decomposed into a sum of scaled and shifted impulse functions.
$x[m,n]=\sum_{j=-\infty }^{+\infty}\sum_{i=-\infty }^{+\infty}x[i,j].\delta [m-i,n-j]$
and the output is:
$y[m,n]=x[m,n]*h[m,n]=\sum_{j=-\infty }^{+\infty}\sum_{i=-\infty }^{+\infty}x[i,j].h [m-i,n-j]$
The index of kernel matrix can be negative:
Figure: a 3x3 matrix with indices -1,0,1, with origin is at middle of kernel.
Example:
$
y[1,1]=\sum_{j=-\infty }^{+\infty}\sum_{i=-\infty }^{+\infty}x[i,j].h [1-i,1-j]=\\
 x[0,0].h[1-0,1-0]+x[1,0].h[1-1,1-0]+x[2,0].h[1-2,1-0]\\
+ x[0,1].h[1-0,1-1]+x[1,1].h[1-1,1-1]+x[2,1].h[1-2,1-1]\\
+ x[0,2].h[1-0,1-2]+x[1,2].h[1-1,1-2]+x[2,2].h[1-2,1-2]
$

Thursday, 7 June 2018

Part 7: Studying Digital Image Processing with OpenCV-Python

This part focuses on Video Analysis.
1. Meanshift
1.1 Concept
Consider the example:
The blue circle "C1", its original center is blue rectangle "C1_o". Find the centroid of the points inside that window "C1_r" (small blue circle). Circle center and centroid do not match. Move blue circle such that center of new circle matches with previous centroid. Again find the new centroid. repeat the process until center of new circle and its centroid falls on the same location. It is green circle "C2" and it has maximum number of points (maximum pixel distribution). This is meanshift algorithm.
1.2 Meanshift in OpenCV
OpenCv provides cv2.meanShift() and we have to do:
+ Setup the target
+ Provide initial location of window
+ Find its histogram (only Hue is considered) so that we can backproject the target on each frame for  meanshift calculation.
2. Camshift
2.1 Concept
There is a problem with previous demo. Our window always has the same size when object is farther away and it is very close to camera. We need to adapt the window size with size and rotation of the target. We use camshift. It updates the size of the window and also calculates the orientation of best fitting ellipse to it. Then it applies the meanshift with new scaled search window.
2.2 Camshift in OpenCV
OpenCV provides cv2.CamShift() returns a rotated rectangle and box parameters.
Let 's make a demo using Camshift:
Tracking the bird in Flappy Bird game. Here are resources
+ The video for the demo is here.
+ Flappy bird template for using SIFT feature matching:
Figure: Flappy bird template
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import numpy as np
import cv2

MIN_MATCH_COUNT = 10
cap = cv2.VideoCapture('flappybird.mp4')
ret,frame = cap.read()
img2 = cv2.imread('bird.png') # trainImage
gray1= cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
gray2= cv2.cvtColor(img2,cv2.COLOR_BGR2GRAY)
# Initiate SIFT detector
sift = cv2.xfeatures2d.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(gray1,None)
kp2, des2 = sift.detectAndCompute(gray2,None)
# BFMatcher with default params
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1,des2, k=2)
# Apply ratio test
good = []
for m,n in matches:
    if m.distance < 0.5*n.distance:
        good.append([m])
if len(good)>MIN_MATCH_COUNT:
 #find the object to track
 src_pts = np.float32([ kp1[m[0].queryIdx].pt for m in good ]).reshape(-1,1,2)
 c,r,w,h = cv2.boundingRect(src_pts)
 track_window = (c,r,w,h)
 # set up the ROI for tracking
 roi = frame[r:r+h, c:c+w]
 hsv_roi =  cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
 #these range was gotten by trial and error using the Pain tool
 #https://tech.deepumohan.com/2011/04/how-to-get-html-color-code-from-image.html
 mask = cv2.inRange(hsv_roi, np.array((0., 60.,32.)), np.array((130.,130.,130.)))
 roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])
 cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX)
 # Setup the termination criteria, either 10 iteration or move by atleast 1 pt
 term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )
 while(1):
  if ret == True:
      hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
      dst = cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1)
      # apply meanshift to get the new location
      ret, track_window = cv2.CamShift(dst, track_window, term_crit)
      # Draw it on image
      pts = cv2.boxPoints(ret)
      pts = np.int0(pts)
      img2 = cv2.polylines(frame,[pts],True, 255,2)
      cv2.imshow('img2',img2)
      k = cv2.waitKey(60) & 0xff
      if k == 27:
          break
      else:
          cv2.imwrite(chr(k)+".jpg",img2)
   ret ,frame = cap.read()
  else:
      break
 cv2.destroyAllWindows()
 cap.release()
else:
    print "Not enough matches are found - %d/%d" % (len(good),MIN_MATCH_COUNT)
3. Optical Flow
3.1 Concept
Optical flow is the pattern of apparent motion of image objects between two consecutive frames caused by the movement of object or camera. Optical flow has assumptions:
    + The pixel intensities of an object do not change between consecutive frames.
    + Neighbor pixels have similar motion.
Optical flow has many applications in areas like :
    + Video Compression
    + Video Stabilization
    + Measure the velocities of objects in the video. In general, moving objects that are closer to the camera will display more apparent motion than distant objects that are moving at the same speed.
Figure: a ball moving in 5 consecutive frames
The equation of Optical Flow is:
$f_{x}u+f_{y}v+f_{t}=0\\
f_{x}=\frac{\delta f}{\delta x};f_{y}=\frac{\delta f}{\delta y}\\
u=\frac{\delta x}{\delta t};v=\frac{\delta y}{\delta t}$
$f_{x};f_{y}$ are image gradient, $f_{y}$ is time gradient. $(u,v)$ are unknown.
There are several methods to solve this problem.
3.2 Lucas-Kanade method
It assumes that the flow is essentially constant in a local neighborhood of the pixel under consideration, and solves the basic optical flow equations for all the pixels in that neighborhood, by the least squares criterion.
3.3 Optical Flow in OpenCV
OpenCV provides cv2.calcOpticalFlowPyrLK(). We have to do:
+ Using cv2.goodFeaturesToTrack() to find points to track.
+ Pass the previous frame, previous points and next frame to cv2.calcOpticalFlowPyrLK()
+ Function returns next points and status number with value of 1 if next point is found, else zero
Apply this for the Flappy Bird video above:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import numpy as np
import cv2
cap = cv2.VideoCapture('flappybird.mp4')
# params for ShiTomasi corner detection
feature_params = dict( maxCorners = 100,
                       qualityLevel = 0.3,
                       minDistance = 7,
                       blockSize = 7 )
# Parameters for lucas kanade optical flow
lk_params = dict( winSize  = (15,15),
                  maxLevel = 2,
                  criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))
# Create some random colors
color = np.random.randint(0,255,(100,3))
# Take first frame and find corners in it
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
p0 = cv2.goodFeaturesToTrack(old_gray, mask = None, **feature_params)
# Create a mask image for drawing purposes
mask = np.zeros_like(old_frame)
refind = 0
while(1):
 ret,frame = cap.read()
 frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
 refind = refind + 1
 #re-find feature to track after 100 frames in case object disappears
 if(refind == 100):
  old_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
  p0 = cv2.goodFeaturesToTrack(old_gray, mask = None, **feature_params)
  mask = np.zeros_like(frame)
  refind = 0
 # calculate optical flow
 p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)
 # Select good points
 good_new = p1[st==1]
 good_old = p0[st==1]
 # draw the tracks
 for i,(new,old) in enumerate(zip(good_new,good_old)):
  a,b = new.ravel()
  c,d = old.ravel()
  mask = cv2.line(mask, (a,b),(c,d), color[i].tolist(), 2)
  frame = cv2.circle(frame,(a,b),5,color[i].tolist(),-1)
 img = cv2.add(frame,mask)
 cv2.imshow('frame',img)
 k = cv2.waitKey(30) & 0xff
 if k == 27:
  break
 # Now update the previous frame and previous points
 old_gray = frame_gray.copy()
 p0 = good_new.reshape(-1,1,2)
cv2.destroyAllWindows()
cap.release()
4. Background Subtraction 
Consider the application which counts visitors or vehicles. We need to extract the person or vehicles alone or extract the moving foreground from static background. OpenCV provides some methods:
4.1 BackgroundSubtractorMOG
It is a Gaussian Mixture-based Background/Foreground Segmentation Algorithm. This method uses the characteristic that the probable background colours are the ones which stay longer and more static.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import numpy as np
import cv2
cap = cv2.VideoCapture('flappybird.mp4')
fgbg = cv2.bgsegm.createBackgroundSubtractorMOG()
while(1):
    ret, frame = cap.read()
    fgmask = fgbg.apply(frame)
    cv2.imshow('frame',fgmask)
    k = cv2.waitKey(30) & 0xff
    if k == 27:
        break
cap.release()
cv2.destroyAllWindows()
4.2 BackgroundSubtractorMOG2
This method improves the adaptability to varying scenes due illumination changes compare to previous method.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import numpy as np
import cv2
cap = cv2.VideoCapture('flappybird.mp4')
fgbg = cv2.createBackgroundSubtractorMOG2()
while(1):
    ret, frame = cap.read()
    fgmask = fgbg.apply(frame)
    cv2.imshow('frame',fgmask)
    k = cv2.waitKey(30) & 0xff
    if k == 27:
        break
cap.release()
cv2.destroyAllWindows()