Identifying cricketing shots using AI

Image classification using Deep Learning has been around for almost a decade. In fact, this field with the use of Convolutional Neural Networks (CNN) is quite mature and the algorithms work very well in image classification, object detection, facial recognition and self-driving cars. In this post, I use AI image classification to identify cricketing shots. While the problem falls in a well known domain, the application of image  classification in identifying cricketing shots is probably new. I have selected three cricketing shots, namely, the front drive, sweep shot, and the hook shot for this purpose. My purpose was to build a proof-of-concept and not a perfect product. I have kept the dataset deliberately small (for obvious reasons) of just about 14 samples for each cricketing shot, and for a total of about 41 total samples for both training and test data. Anyway, I get a reasonable performance from the AI model.

Included below are some examples of the data set

This post is based on this or on Image classification from Hugging face. Interestingly, this, the model used here is based on Vision Transformers (ViT from Google Brain) and not on Convolutional Neural Networks as is usually done.

The steps are to fine-tune ViT Transformer with the ‘strokes’ dataset are

  1. Install the necessary libraries
! pip install transformers[torch] datasets evaluate accelerate -U
! pip install -U accelerate
! pip install -U transformers

b) Login to Hugging Face account

 from huggingface_hub import notebook_login
notebook_login()

Login successful

c) Load the batting strokes dataset with 41 images

from datasets import load_dataset
df1 = load_dataset("tvganesh/strokes",split='train')
type(df1)
len(df1)

41
df1
Dataset({
    features: ['image', 'label'],
    num_rows: 41
})

d) Create a dictionary that maps the label name to an integer and vice versa. Display the labels

labels = df1.features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
label2id[label] = str(i)
id2label[str(i)] = label

labels

['front drive', 'hook shot', 'sweep shot']

e) Load ViT image processor. To apply the correct transformations, ImageProcessor is initialised with a configuration that was saved along with the pretrained model 

from transformers import AutoImageProcessor

checkpoint = "google/vit-base-patch16-224-in21k"
image_processor = AutoImageProcessor.from_pretrained(checkpoint)

f) Apply image transformations to the images to make the model more robust against overfitting

from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor

normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
size = (
    image_processor.size["shortest_edge"]
    if "shortest_edge" in image_processor.size
    else (image_processor.size["height"], image_processor.size["width"])
)
_transforms = Compose([RandomResizedCrop(size), ToTensor(), normalize])

g) Create a preprocessing function to apply the transforms and return pixel_values of the image as the inputs to the model – :

def transforms(examples):
    examples["pixel_values"] = [_transforms(img.convert("RGB")) for img in examples["image"]]
    del examples["image"]
    return examples

h) Apply the preprocessing function over the entire dataset, using Hugging Face Dataset’s ‘with_transform’ method

df1 = df1.with_transform(transforms)
from transformers import DefaultDataCollator
data_collator = DefaultDataCollator()

i) Evaluate model’s performance with evaluate

import evaluate
accuracy = evaluate.load("accuracy")

j) Calculate accuracy by passing in predictions and labels

import numpy as np
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

k) Load ViT by specifying the number of labels along with the number of expected labels, and the label mapping

 from transformers import AutoModelForImageClassification, TrainingArguments, Trainer

model = AutoModelForImageClassification.from_pretrained(
    checkpoint,
    num_labels=len(labels),
    id2label=id2label,
    label2id=label2id,
)

l)

  1. Pass the training arguments to Trainer along with the model, dataset, tokenizer, data collator, and compute_metrics function.
  2. Call train() to finetune your model.
training_args = TrainingArguments(
    output_dir="data_classify",
    remove_unused_columns=False,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=8,
    #gradient_accumulation_steps=4,
    per_device_eval_batch_size=6,
    num_train_epochs=20,
    warmup_ratio=0.1,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    tokenizer=image_processor,
    compute_metrics=compute_metrics,
)

trainer.train()

Epoch	Training Loss	Validation Loss	Accuracy
1	No log	0.434451	1.000000
2	No log	0.388312	1.000000
3	0.361200	0.409932	0.888889
4	0.361200	0.245226	1.000000
5	0.293400	0.196930	1.000000
6	0.293400	0.167858	1.000000
7	0.293400	0.140349	1.000000
8	0.203000	0.153016	1.000000
9	0.203000	0.116115	1.000000
10	0.150500	0.129171	1.000000
11	0.150500	0.103121	1.000000
12	0.150500	0.108433	1.000000
13	0.138800	0.107799	1.000000
14	0.138800	0.093700	1.000000
15	0.107600	0.100769	1.000000
16	0.107600	0.113148	1.000000
17	0.107600	0.100740	1.000000
18	0.104700	0.177483	0.888889
19	0.104700	0.084438	1.000000
20	0.090200	0.112654	1.000000
TrainOutput(global_step=80, training_loss=0.18118578270077706, metrics={'train_runtime': 176.3834, 'train_samples_per_second': 3.628, 'train_steps_per_second': 0.454, 'total_flos': 4.959531785650176e+16, 'train_loss': 0.18118578270077706, 'epoch': 20.0})

m) Push to Hub

trainer.push_to_hub()

You can try out my fine-tuned model at identify_stroke̱

Here are a couple of trials

As I mentioned before, the model should be reasonably accurate but not perfect, since my training dataset is extremely small. This is just a prototype to show that shot identification in cricket with AI is in the realm of the possible.

References

  1. Image classification
  2. AN IMAGE IS WORTH 16X16 WORDS:TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

Do take a look at

  1. Using Reinforcement Learning to solve Gridworld
  2. Deconstructing Convolutional Neural Networks with Tensorflow and Keras
  3. GenerativeAI:Using T5 Transformer model to summarise Indian Philosophy
  4. GooglyPlusPlus: Win Probability using Deep Learning and player embeddings
  5. T20 Win Probability using CTGANs, synthetic data
  6. Deep Learning from first principles in Python, R and Octave – Part 6
  7. Introducing QCSimulator: A 5-qubit quantum computing simulator in R
  8. Big Data 6: The T20 Dance of Apache NiFi and yorkpy
  9. Re-introducing cricketr! : An R package to analyze performances of cricketers

To see all posts click Index of posts

Understanding Neural Style Transfer with Tensorflow and Keras

Neural Style Transfer (NST)  is a fascinating area of Deep Learning and Convolutional Neural Networks. NST is an interesting technique, in which the style from an image, known as the ‘style image’ is transferred to another image ‘content image’ and we get a third a image which is a generated image which has the content of the original image and the style of another image.

NST can be used to reimagine how famous painters like Van Gogh, Claude Monet or a Picasso would have visualised a scenery or architecture. NST uses Convolutional Neural Networks (CNNs) to achieve this artistic style transfer from one image to another. NST was originally implemented by Gati et al., in their paper Neural Algorithm of Artistic Style. Convolutional Neural Networks have been very successful in image classification image recognition et cetera. CNN networks have also been have also generated very interesting pictures using Neural Style Transfer which will be shown in this post. An interesting aspect of CNN’s is that the first couple of layers in the CNN capture basic features of the image like edges and  pixel values. But as we go deeper into the CNN, the network captures higher level features of the input image.

To get started with Neural Style transfer  we will be using the VGG19 pre-trained network. The VGG19 CNN is a compact pre-trained your network which can be used for performing the NST. However, we could have also used Resnet or InceptionV3 networks for this purpose but these are very large networks. The idea of using a network trained on a different task and applying it to a new task is called transfer learning.

What needs to be done to transfer the style from one of the image to another image. This brings us to the question – What is ‘style’? What is it that distinguishes Van Gogh’s painting or Picasso’s cubist art. Convolutional Neural Networks capture basic features in the lower layers and much more complex features in the deeper layers.  Style can be computed by taking the correlation of the feature maps in a layer L. This is my interpretation of how style is captured.  Since style  is intrinsic to  the image, it  implies that the style feature would exist across all the filters in a layer. Hence, to pick up this style we would need to get the correlation of the filters across channels of a lawyer. This is computed mathematically, using the Gram matrix which calculates the correlation of the activation of a the filter by the style image and generated image

To transfer the style from one image to the content image we need to do two parallel operations while doing forward propagation
– Compute the content loss between the source image and the generated image
– Compute the style loss between the style image and the generated image
– Finally we need to compute the total loss

In order to get transfer the style from the ‘style’ image to the ‘content ‘image resulting in a  ‘generated’  image  the total loss has to be minimised. Therefore backward propagation with gradient descent  is done to minimise the total loss comprising of the content and style loss.

Initially we make the Generated Image ‘G’ the same as the source image ‘S’

The content loss at layer ‘l’

L_{content} = 1/2 \sum_{i}^{j} ( F^{l}_{i,j} - P^{l}_{i,j})^{2}

where F^{l}_{i,j} and P^{l}_{i,j} represent the activations at layer ‘l’ in a filter i, at position ‘j’. The intuition is that the activations will be same for similar source and generated image. We need to minimise the content loss so that the generated stylized image is as close to the original image as possible. An intermediate layer of VGG19 block5_conv2 is used

The Style layers that are are used are

style_layers = [‘block1_conv1’,
‘block2_conv1’,
‘block3_conv1’,
‘block4_conv1’,
‘block5_conv1’]
To compute the Style Loss the Gram matrix needs to be computed. The Gram Matrix is computed by unrolling the filters as shown below (source: Convolutional Neural Networks by Prof Andrew Ng, Coursera). The result is a matrix of size n_{c} x n_{c} where n_{c} is the number of channels
The above diagram shows the filters of height n_{H} and width n_{W} with n_{C} channels
The contribution of layer ‘l’ to style loss is given by
L^{'}_{style} = \frac{\sum_{i}^{j} (G^{2}_{i,j} - A^l{i,j})^2}{4N^{2}_{l}M^{2}_{l}}
where G_{i,j}  and A_{i,j} are the Gram matrices of the style and generated images respectively. By minimising the distance in the gram matrices of the style and generated image we can ensure that generated image is a stylized version of the original image similar to the style image
The total loss is given by
L_{total} = \alpha L_{content} + \beta L_{style}
Back propagation with gradient descent works to minimise the content loss between the source and generated image, while the style loss tries to minimise the discrepancies in the style of the style image and generated image. Running through forward and backpropagation through several epochs successfully transfers the style from the style image to the source image.
You can check the Notebook at Neural Style Transfer

Note: The code in this notebook is largely based on the Neural Style Transfer tutorial from Tensorflow, though I may have taken some changes from other blogs. I also made a few changes to the code in this tutorial, like removing the scaling factor, or the class definition (Personally, I belong to the old school (C language) and am not much in love with the ‘self.”..All references are included below

Note: Here is a interesting thought. Could we do a Neural Style Transfer in music? Imagine Carlos Santana playing ‘Hotel California’ or Brian May style in ‘Another brick in the wall’. While our first reaction would be that it may not sound good as we are used to style of these songs, we may be surprised by a possible style transfer. This is definitely music to the ears!

 

Here are few runs from this

A) Run 1

1. Neural Style Transfer – a) Content Image – My portrait.  b) Style Image – Wassily Kadinsky Oil on canvas, 1913, Vassily Kadinsky’s composition

 

2. Result of Neural Style Transfer

 

 

2) Run 2

a) Content Image – Portrait of my parents b) Style Image –  Vincent Van Gogh’s ,Starry Night Oil on canvas 1889

 

2. Result of Neural Style Transfer

 

 

Run 3

1.  Content Image – Caesar 2 (Masai Mara- 20 Jun 2018).  Style Image – The Great Wave at Kanagawa – Katsushika Hokosai, 1826-1833

 

Screenshot 2020-04-12 at 12.40.44 PM

2. Result of Neural Style Transfer

lkg

 

 

Run 4

1.   Content Image – Junagarh Fort , Rajasthan   Sep 2016              b) Style Image – Le Pont Japonais by Claude Monet, Oil on canvas, 1920

 

 

2. Result of Neural Style Transfer

 

Neural Style Transfer is a very ingenious idea which shows that we can segregate the style of a painting and transfer to another image.

References

1. A Neural Algorithm of Artistic Style, Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
2. Neural style transfer
3. Neural Style Transfer: Creating Art with Deep Learning using tf.keras and eager execution
4. Convolutional Neural Network, DeepLearning.AI Specialization, Prof Andrew Ng
5. Intuitive Guide to Neural Style Transfer

See also

1. Big Data-5: kNiFi-ing through cricket data with yorkpy
2. Cricketr adds team analytics to its repertoire
3. Cricpy performs granular analysis of players
4. My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
5. Programming Zen and now – Some essential tips-2
6. The Anomaly
7. Practical Machine Learning with R and Python – Part 5
8. Literacy in India – A deepR dive
9. “Is it an animal? Is it an insect?” in Android

To see all posts click Index of posts

De-blurring revisited with Wiener filter using OpenCV

In this post I continue to experiment with the de-blurring of images using the Wiener filter. For details on the Wiener filter, please look at my earlier post “Dabbling with Wiener filter using OpenCV”.  Thanks to Egli Simon, Switzerland for pointing out a bug in the earlier post which I have now fixed.  The Wiener filter attempts to de-blur by assuming that the source signal is convolved with a blur kernel in the presence of noise.   I have also included the blur kernel as estimated by E. Simon in the code. I am including the de-blurring with 3 different blur kernel radii and different values for the Wiener constant K.  While the de-blurring is still a long way off there is some success.

One of the reasons I have assumed a non-blind blur kernel and try to de-convolve with that. The Wiener filter tries to minimize the Mean Square Error (MSE)  which can be expressed as
e(f) = E[X(f) – X1(f)]^2                                – (1)

where e(f) is the Mean Square Error(MSE) in the frequency domain, X(f) is the original image and X1(f) is the estimated signal which we get by de-convolving the Wiener filter with the observed blurred image i.e. and E[] is the expectation

X1(f) = G(f) * Y(f)                                        -(2)
where G(f) is the Wiener de-convolution filter and Y(f) is the observed blurred image
Substituting (2) in (1) we get
e(f) = E[X(f) – G(f) *Y(f)]^2

If the above equation is solved we can effectively remove the blur.
Feel free to post comments, opinions or ideas.

Watch this space …
I will be back! Hasta la Vista!


Note: You can clone the code below from Git Hib – Another implementation of Weiner filter in OpenCV

The complete code is included below and should work as is

/*
============================================================================
Name : deblur_wiener.c
Author : Tinniam V Ganesh & Egli Simon
Version :
Copyright :
Description : Implementation of Wiener filter in OpenCV
============================================================================
*/

#include <stdio.h>
#include <stdlib.h>
#include “cxcore.h”
#include “cv.h”
#include “highgui.h”

#define kappa 10.0
#define rad 8

int main(int argc, char ** argv)
{
int height,width,step,channels,depth;
uchar* data1;

CvMat *dft_A;
CvMat *dft_B;

CvMat *dft_C;
IplImage* im;
IplImage* im1;

IplImage* image_ReB;
IplImage* image_ImB;

IplImage* image_ReC;
IplImage* image_ImC;
IplImage* complex_ImC;
CvScalar val;

IplImage* k_image_hdr;
int i,j,k;
char str[80];
FILE *fp;
fp = fopen(“test.txt”,”w+”);

int dft_M,dft_N;
int dft_M1,dft_N1;

CvMat* cvShowDFT1(IplImage*, int, int,char*);
void cvShowInvDFT1(IplImage*, CvMat*, int, int,char*);

im1 = cvLoadImage( “kutty-1.jpg”,1 );
cvNamedWindow(“Original-color”, 0);
cvShowImage(“Original-color”, im1);
im = cvLoadImage( “kutty-1.jpg”, CV_LOAD_IMAGE_GRAYSCALE );
if( !im )
return -1;

cvNamedWindow(“Original-gray”, 0);
cvShowImage(“Original-gray”, im);

// Create a random noise matrix
fp = fopen(“test.txt”,”w+”);
int val_noise[357*383];
for(i=0; i <im->height;i++){
for(j=0;j<im->width;j++){
fprintf(fp, “%d “,(383*i+j));
val_noise[383*i+j] = rand() % 128;
}
fprintf(fp, “\n”);
}

CvMat noise = cvMat(im->height,im->width, CV_8UC1,val_noise);

// Add the random noise matric to the image
cvAdd(im,&noise,im, 0);

cvNamedWindow(“Original + Noise”, 0);
cvShowImage(“Original + Noise”, im);

cvSmooth( im, im, CV_GAUSSIAN, 7, 7, 0.5, 0.5 );
cvNamedWindow(“Gaussian Smooth”, 0);
cvShowImage(“Gaussian Smooth”, im);

// Create a blur kernel
IplImage* k_image;
float r = rad;
float radius=((int)(r)*2+1)/2.0;

int rowLength=(int)(2*radius);
printf(“rowlength %d\n”,rowLength);
float kernels[rowLength*rowLength];
printf(“rowl: %i”,rowLength);
int norm=0; //Normalization factor
int x,y;
CvMat kernel;
for(x = 0; x < rowLength; x++)
for (y = 0; y < rowLength; y++)
if (sqrt((x – (int)(radius) ) * (x – (int)(radius) ) + (y – (int)(radius))* (y – (int)(radius))) <= (int)(radius))
norm++;
// Populate matrix
for (y = 0; y < rowLength; y++) //populate array with values
{
for (x = 0; x < rowLength; x++) {
if (sqrt((x – (int)(radius) ) * (x – (int)(radius) ) + (y – (int)(radius))
* (y – (int)(radius))) <= (int)(radius)) {
//kernels[y * rowLength + x] = 255;
kernels[y * rowLength + x] =1.0/norm;
printf(“%f “,1.0/norm);
}
else{
kernels[y * rowLength + x] =0;
}
}
}

/*for (i=0; i < rowLength; i++){
for(j=0;j < rowLength;j++){
printf(“%f “, kernels[i*rowLength +j]);
}
}*/

kernel= cvMat(rowLength, // number of rows
rowLength, // number of columns
CV_32FC1, // matrix data type
&kernels);
k_image_hdr = cvCreateImageHeader( cvSize(rowLength,rowLength), IPL_DEPTH_32F,1);
k_image = cvGetImage(&kernel,k_image_hdr);

height = k_image->height;
width = k_image->width;
step = k_image->widthStep/sizeof(float);
depth = k_image->depth;

channels = k_image->nChannels;
//data1 = (float *)(k_image->imageData);
data1 = (uchar *)(k_image->imageData);
cvNamedWindow(“blur kernel”, 0);
cvShowImage(“blur kernel”, k_image);

dft_M = cvGetOptimalDFTSize( im->height – 1 );
dft_N = cvGetOptimalDFTSize( im->width – 1 );

//dft_M1 = cvGetOptimalDFTSize( im->height+99 – 1 );
//dft_N1 = cvGetOptimalDFTSize( im->width+99 – 1 );

dft_M1 = cvGetOptimalDFTSize( im->height+3 – 1 );
dft_N1 = cvGetOptimalDFTSize( im->width+3 – 1 );

printf(“dft_N1=%d,dft_M1=%d\n”,dft_N1,dft_M1);

// Perform DFT of original image
dft_A = cvShowDFT1(im, dft_M1, dft_N1,”original”);
//Perform inverse (check)
//cvShowInvDFT1(im,dft_A,dft_M1,dft_N1, “original”); – Commented as it overwrites the DFT

// Perform DFT of kernel
dft_B = cvShowDFT1(k_image,dft_M1,dft_N1,”kernel”);
//Perform inverse of kernel (check)
//cvShowInvDFT1(k_image,dft_B,dft_M1,dft_N1, “kernel”);- Commented as it overwrites the DFT

// Multiply numerator with complex conjugate
dft_C = cvCreateMat( dft_M1, dft_N1, CV_64FC2 );

printf(“%d %d %d %d\n”,dft_M,dft_N,dft_M1,dft_N1);

// Multiply DFT(blurred image) * complex conjugate of blur kernel
cvMulSpectrums(dft_A,dft_B,dft_C,CV_DXT_MUL_CONJ);
//cvShowInvDFT1(im,dft_C,dft_M1,dft_N1,”blur1″);

// Split Fourier in real and imaginary parts
image_ReC = cvCreateImage( cvSize(dft_N1, dft_M1), IPL_DEPTH_64F, 1);
image_ImC = cvCreateImage( cvSize(dft_N1, dft_M1), IPL_DEPTH_64F, 1);
complex_ImC = cvCreateImage( cvSize(dft_N1, dft_M1), IPL_DEPTH_64F, 2);
printf(“%d %d %d %d\n”,dft_M,dft_N,dft_M1,dft_N1);

//cvSplit( dft_C, image_ReC, image_ImC, 0, 0 );
cvSplit( dft_C, image_ReC, image_ImC, 0, 0 );

// Compute A^2 + B^2 of denominator or blur kernel
image_ReB = cvCreateImage( cvSize(dft_N1, dft_M1), IPL_DEPTH_64F, 1);
image_ImB = cvCreateImage( cvSize(dft_N1, dft_M1), IPL_DEPTH_64F, 1);

// Split Real and imaginary parts
cvSplit( dft_B, image_ReB, image_ImB, 0, 0 );
cvPow( image_ReB, image_ReB, 2.0);
cvPow( image_ImB, image_ImB, 2.0);
cvAdd(image_ReB, image_ImB, image_ReB,0);
val = cvScalarAll(kappa);
cvAddS(image_ReB,val,image_ReB,0);

//Divide Numerator/A^2 + B^2
cvDiv(image_ReC, image_ReB, image_ReC, 1.0);
cvDiv(image_ImC, image_ReB, image_ImC, 1.0);

// Merge Real and complex parts
cvMerge(image_ReC, image_ImC, NULL, NULL, complex_ImC);
sprintf(str,”O/P Wiener – K=%6.4f rad=%d”,kappa,rad);

// Perform Inverse
cvShowInvDFT1(im, (CvMat *)complex_ImC,dft_M1,dft_N1,str);

cvWaitKey(-1);
return 0;
}

CvMat* cvShowDFT1(IplImage* im, int dft_M, int dft_N,char* src)
{
IplImage* realInput;
IplImage* imaginaryInput;
IplImage* complexInput;
CvMat* dft_A, tmp;
IplImage* image_Re;
IplImage* image_Im;
char str[80];
double m, M;
realInput = cvCreateImage( cvGetSize(im), IPL_DEPTH_64F, 1);
imaginaryInput = cvCreateImage( cvGetSize(im), IPL_DEPTH_64F, 1);
complexInput = cvCreateImage( cvGetSize(im), IPL_DEPTH_64F, 2);
cvScale(im, realInput, 1.0, 0.0);
cvZero(imaginaryInput);
cvMerge(realInput, imaginaryInput, NULL, NULL, complexInput);

dft_A = cvCreateMat( dft_M, dft_N, CV_64FC2 );
image_Re = cvCreateImage( cvSize(dft_N, dft_M), IPL_DEPTH_64F, 1);
image_Im = cvCreateImage( cvSize(dft_N, dft_M), IPL_DEPTH_64F, 1);

// copy A to dft_A and pad dft_A with zeros
cvGetSubRect( dft_A, &tmp, cvRect(0,0, im->width, im->height));
cvCopy( complexInput, &tmp, NULL );
if( dft_A->cols > im->width )
{
cvGetSubRect( dft_A, &tmp, cvRect(im->width,0, dft_A->cols – im->width, im->height));
cvZero( &tmp );
}

// no need to pad bottom part of dft_A with zeros because of
// use nonzero_rows parameter in cvDFT() call below
cvDFT( dft_A, dft_A, CV_DXT_FORWARD, complexInput->height );
strcpy(str,”DFT -“);
strcat(str,src);
cvNamedWindow(str, 0);

// Split Fourier in real and imaginary parts
cvSplit( dft_A, image_Re, image_Im, 0, 0 );

// Compute the magnitude of the spectrum Mag = sqrt(Re^2 + Im^2)
cvPow( image_Re, image_Re, 2.0);
cvPow( image_Im, image_Im, 2.0);
cvAdd( image_Re, image_Im, image_Re, NULL);
cvPow( image_Re, image_Re, 0.5 );

// Compute log(1 + Mag)
cvAddS( image_Re, cvScalarAll(1.0), image_Re, NULL ); // 1 + Mag
cvLog( image_Re, image_Re ); // log(1 + Mag)

cvMinMaxLoc(image_Re, &m, &M, NULL, NULL, NULL);
cvScale(image_Re, image_Re, 1.0/(M-m), 1.0*(-m)/(M-m));
cvShowImage(str, image_Re);
return(dft_A);
}

void cvShowInvDFT1(IplImage* im, CvMat* dft_A, int dft_M, int dft_N,char* src)
{
IplImage* realInput;
IplImage* imaginaryInput;
IplImage* complexInput;
IplImage * image_Re;
IplImage * image_Im;
double m, M;
char str[80];
realInput = cvCreateImage( cvGetSize(im), IPL_DEPTH_64F, 1);
imaginaryInput = cvCreateImage( cvGetSize(im), IPL_DEPTH_64F, 1);
complexInput = cvCreateImage( cvGetSize(im), IPL_DEPTH_64F, 2);
image_Re = cvCreateImage( cvSize(dft_N, dft_M), IPL_DEPTH_64F, 1);
image_Im = cvCreateImage( cvSize(dft_N, dft_M), IPL_DEPTH_64F, 1);

//cvDFT( dft_A, dft_A, CV_DXT_INV_SCALE, complexInput->height );
cvDFT( dft_A, dft_A, CV_DXT_INV_SCALE, dft_M);
strcpy(str,”DFT INVERSE – “);
strcat(str,src);
cvNamedWindow(str, 0);

// Split Fourier in real and imaginary parts
cvSplit( dft_A, image_Re, image_Im, 0, 0 );

// Compute the magnitude of the spectrum Mag = sqrt(Re^2 + Im^2)
cvPow( image_Re, image_Re, 2.0);
cvPow( image_Im, image_Im, 2.0);
cvAdd( image_Re, image_Im, image_Re, NULL);
cvPow( image_Re, image_Re, 0.5 );

// Compute log(1 + Mag)
cvAddS( image_Re, cvScalarAll(1.0), image_Re, NULL ); // 1 + Mag
cvLog( image_Re, image_Re ); // log(1 + Mag)

cvMinMaxLoc(image_Re, &m, &M, NULL, NULL, NULL);
cvScale(image_Re, image_Re, 1.0/(M-m), 1.0*(-m)/(M-m));
//cvCvtColor(image_Re, image, CV_GRAY2RGBA);
cvShowImage(str, image_Re);
}
See also
– De-blurring revisited with Wiener filter using OpenCV
–  Dabbling with Wiener filter using OpenCV
– Deblurring with OpenCV: Wiener filter reloaded
– Re-working the Lucy-Richardson Algorithm in OpenCV

You may also like
1.  What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
2.  Bend it like Bluemix, MongoDB with autoscaling – Part 1
3. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
4. A crime map of India in R: Crimes against women

Find me on Google+

Computer Vision: Getting started with OpenCV

 OpenCV (Open Source Computer Vision) is a library of APIs aimed at enabling real time computer vision. OpenCV had Intel as  the champion during its early developmental stages from 1999 to its first formal release in 2006. This post looks at the steps    involved in installing OpenCV on your Linux machine and getting started with some simple programs. For the steps for installing OpenCV in Windows look at my post “Installing and using OpenCV with Visual Studio 2010 express

As a first step download the tarball  OpenCV-2.3.1a.tar.bz2 from the link below to directory

$HOME/opencv

http://sourceforge.net/projects/opencvlibrary/files/opencv-unix/2.3.1/ 

Unzip and and untar the bzipped file using

$ tar jxvf OpenCV-2.3.1a.tar.bz2

Now

$ cd OpenCV-2.3.1

You have to run cmake to configure the directories before running make. I personally found it

easier to run the cmake wizard using

$ cmake -i

Follow through all the prompts that the cmake wizard gives and make appropriate choices.

Once this complete

Run

$ make

Login as root

$ su – root

password: *******

Run

$ make install

This should install all the appropriate files and libraries in /usr/local/lib

Now assuming that everything is fine you should be good to go.

Start Eclipse, open a new C project.

Under Project->Properties->Settings->GCC Compiler ->Directories include the following 2 include paths

../opencv/OpenCV-2.3.1/include/opencv

../opencv/OpenCV-2.3.1/include/opencv2

Under

Project->Properties->Settings->GCC Linker ->Libraries in the library search path

include /usr/local/lib

Under libraries include the following

opencv_highgui , opencv_core , opencv_imgproc , opencv_highgui, opencv_ml, opencv_video, opencv_features2d, opencv_calib3d

, opencv_objdetect, opencv_contrib, opencv_legacy, opencv_flann

(To get the list of libraries you could also run the following command)

pkg-config –libs /usr/local/lib/pkgconfig/opencv.pc

-L/usr/local/lib -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_ml -lopencv_video -lopencv_features2d -lopencv_calib3d -lopencv_objdetect -lopencv_contrib -lopencv_legacy -lopencv_flann

Now you are ready to create your first OpenCV program

The one below will convert an image to test.png

#include “highgui.h”

int main( int argc, char** argv ) {

IplImage* img = cvLoadImage( argv[1],1);

cvSaveImage( “test.png”, img, 0);

cvReleaseImage( &img );

return 0;

}

If you get a runtime error cannot find shared library

“ibopencv_core.so.2.3: cannot open shared object file: No such file or directory”

then you need to ensure that the linker knows the paths of the libraries.

The commands are as follows

$vi /etc/ld.so.conf.d/opencv.conf

Enter

/usr/local/lib

and save file

Now execute

$ldconfig /etc/ld.so.conf

You can check if everything is fine by running

$[root@localhost mycode]# ldconfig -v | grep open

ldconfig: /etc/ld.so.conf.d/kernel-2.6.32.26-175.fc12.i686.PAE.conf:6: duplicate hwcap 0 nosegneg

libopencv_gpu.so.2.3 -> libopencv_gpu.so.2.3.1

libopencv_ml.so.2.3 -> libopencv_ml.so.2.3.1

libopencv_legacy.so.2.3 -> libopencv_legacy.so.2.3.1

libopencv_objdetect.so.2.3 -> libopencv_objdetect.so.2.3.1

libopencv_video.so.2.3 -> libopencv_video.so.2.3.1

…..

Now re-build the code and everything should be fine.

Here’s a second program to run various transformations to an image

IplImage* img = cvLoadImage( argv[1],1);

// create a window. Window name is determined by a supplied argument

cvNamedWindow( argv[1], CV_WINDOW_AUTOSIZE );

// Apply Gaussian smooth

//cvSmooth( img, img, CV_GAUSSIAN, 9, 9, 0, 0 );

cvErode (img,img,NULL,2);

// Display an image inside and window.

cvShowImage( argv[1], img );

//Save image

cvSaveImage( “/home/ganesh/Desktop/baby2.png“, img, 0);

….

There are many samples also downloaded along with the installation. You can try them out. I found the facedetect.cpp sample interesting. It is based on Haar cacades and works really well. Compile facedetect.cpp under samples/c

Check it out. Including facedetect.cpp detecting my face in real time …

Have fun with OpenCV.

Get going! Get hooked!

Find me on Google+