matrix multiplication – Giga thoughts …

Here is a simple Octave Primer. Octave is a powerful language for implementing Machine Learning algorithms. As I have mentioned its strength is its simplicity. I am including some basic commands with which you can get by implementing fairly complex code

%%Matrix
A matrix can be created as a = [1 2 3; 4 7 8; 12 35 14]; % This is 3 x 3 matrix
Matrix multiplication can be done between m x n * n x k matrix as follows

a = [4 56 3; 2 3 4]; b = [23 1; 3 12; 34 12]; % a = 3 x 2 matrix b = 2 x 3 matrix c = a*b; %% c = 3 x 2 * 2 * 3 = 3 x 3 matrix

c = 362 712 191 86

%%Inverse of a matrix can be obtained by
d = pinv(c); octave-3.2.4.exe:37> d = pinv(c) d = -8.2014e-004 6.7900e-003 1.8215e-003 -3.4522e-003

%%Transpose of a matrix
e = c'; % e is the transpose of done

octave-3.2.4.exe:38> e = c' e = 362 191 712 86

The following operations are done on all elements of a matrix or a vector
k = 5; a = [1 2; 3 4; 5 6]; k = 5.23; c = k * a; d = a - 2 e = a / 5 f = a .* a % Dot product g = a .^2; % Square each elements

%% Select slice of matrix
b = a(:,2); % Select column 2 of matrix a (all rows) c = a(2,:) % Select row of matrix 'a' (all columns)

d = [7 8; 8 9; 10 11; 12 13]; % 4 rows 2 columns d(2:3,:); %Select from rows 2 to 3 (all columns)

octave-3.2.4.exe:41> d d = 7 8 8 9 10 11 12 13 octave-3.2.4.exe:43> d(2:3,:) ans = 8 9 10 11

%% Appending rows to matrix
a = [ 4 5; 5 6; 5 7; 9 8]; % 4 x 2 b = [ 1 3; 2 4]; % 2 x 2 c = [ a; b] % stack a over b d = [b ; a] % stack b over a*b
octave-3.2.4.exe:44> a = [ 4 5; 5 6; 5 7; 9 8] % 4 x 2 a = 4 5 5 6 5 7 9 8

octave-3.2.4.exe:45>b = [ 1 3; 2 4] % 2 x 2 b = 1 3 2 4

octave-3.2.4.exe:46> c = [ a; b] % stack a over b c = 4 5 5 6 5 7 9 8 1 3 2 4

octave-3.2.4.exe:47> d = [b ; a] % stack b over a*b d = 1 3 2 4 4 5 5 6 5 7 9 8

%% Appending columns
a = [ 1 2 3; 3 4 5]; b = [ 1 2; 3 4]; c = [a b]; d = [b a];

octave-3.2.4.exe:48> a = [ 1 2 3; 3 4 5] a = 1 2 3 3 4 5

octave-3.2.4.exe:49>b = [ 1 2; 3 4] b = 1 2 3 4

octave-3.2.4.exe:50> c = [a b] c = 1 2 3 1 2 3 4 5 3 4

octave-3.2.4.exe:51>d = [b a] d = 1 2 1 2 3 3 4 3 4 5 %%Size of a matrix [c d ] = size(a);
Creating a matrix of all zeros or ones
d = ones(3,2); e = zeros(4,3);
%Appending an intercept term to a matrix
a = [1 2 3; 4 5 6]; %2 x 3 b = ones(2,1); a = [b a];

%% Plotting
Creating 2 vectors
x = [1 3 4 5 6]; y = [5 6 7 8 9]; plot(x,y);
%%Create labels
xlabel("X values); ylabel("Y values); axis([1 10 4 10]); % Set the range of x and y title("Test plot);

%%Creating a 3D scatter plot
If we have 3 column csv file then we can load the data as follows
data = load('values.csv'); X = data(:, 1:2); y = data(:, 3); scatter3(X(:,1),X(:,2),y,[],[240 15 15],'x'); % X(:,1) - x axis X(:,2) - yaxis y[] - z axis

%% Drawing a 3D mesh
x = linspace(0,xrange + 20,10); y = linspace(1,yrange+ 20,10); [XX, YY ] = meshgrid(x,y);
[a b] = size(XX)

Draw the mesh
for i=1:a, for j= 1:b, ZZ(i,j) = [1 (XX(i,j)-mu(1))/sigma(1) (YY(i,j) - mu(2))/sigma(2) ] * theta; end; end; mesh(XX,YY,ZZ);

For more details please see post Informed choices using Machine Learning 2- Pitting Kumble, Kapil and B S Chandra

%% Creating different polynomial equations
Let X be a feature vector
then
X = [X X.^2 X^3] %X X^2 X^3

This can be created using a for loop as follows
for i= 1:n xtemp = xinput .^i; x = [x xtemp]; end;

Finally while doing multivariate regression if we wanted to create polynomial terms of higher we could do as follows. Let us say we have a feature vector X made of 3 features x1, x2,

Let us say we wanted to create a polynomial of the form x1^2 x1.x2 x2^2 then we could create X as

X = [X(:,1) .^2 X(:,1) . X(:,2) X(:,2) .^2]

As you can see Octave is really powerful language for Machine Learning and has just a few handful of constructs with which one can implement powerful Machine Learning algorithms

In physics there are 4 types of forces – gravitational forces among celestial bodies, electro-magnetic forces and strong and weak forces at the sub-atomic level. The equations that seem to work among large bodies don’t seem to apply at the sub-atomic level though there have been several attempts at grand unification theories

Similarly in computing we have: – computing at personal level, enterprise level, data-center level and a web scale level. The problems and paradigms at each level are very different and unique. The sequential processing, relational database accesses or network speeds at the local area network level are very different to the parallel processing requirements, NoSQL based storage accesses and WAN latencies.

Here is the first of my posts on paradigms at the Web Scale.

The internet now contains in excess of 1 billion hosts. This is based on a report in the World Fact Book published in 2012.

In these 1 billion and odd hosts there are at least ~1.5 billion pages that have been indexed. There must be several hundred million that are not indexed by the major search engines.

Search engines like Google, Bing or Yahoo have to work on several hundred million pages. Similarly social web sites like Facebook, Twitter or LinkedIn have to deal with several hundred million users who constantly perform status updates, upload images, tweet etc. To handle large quantities of data efficiently and quickly there is a need for web scale algorithms.

One such algorithm is the map-reduce, that had its origins in Google. The map reduce essentially consists of a set of mappers which take as input a key-value pair and outputs 0 or more key value pairs. The reducer takes all tuples with the same key and combines them based on some function and emits a key value pair

Map-reduce, and its open source avatar, Hadoop, are now used routinely to solve several large scale problems. To be honest, I was and still am, puzzled whether the 2 simple tasks types of mapping & reducing can be used for a large variety of problems. However, it appears so.

I would have assumed that there would have been other flavors, maybe an ‘identify-update’, ‘determine-solve’ or some such equivalent, unless a large set of problems can be expressed as some combination of the map reduce paradigm.

Anyway here a few examples for which the map reduce algorithm is useful.

Word Counting: The standard example for map-reduce is the word counting program. In this the map reduce algorithm generates a list of words with their corresponding word count from a set of input files. The Map task reads each document and breaks it into a sequence of words (w1, w2, w3 …). It then emits a key value pair as follows

(w1,1),(w2,1),(w3,1),(w1,1) and so on. If a word is repeated in the document it occurs multiple times in the output. Now the entire key, value pairs are grouped by keys and sent to one of the reducer tasks. Each reducer will then sum all the values thus giving the total for each word.

Matrix multiplication: Big Data is a typical challenge in the web where there is a need to determine patterns and trends in mountains of data. Machine learning algorithms are utilized to determine structure in data that has 3 characteristics of volume, variety and velocity. Machine learning algorithms typically depend on matrix operations. Map-reduce is ideally suited for this and one of the original purposes of Google for map-reduce was with matrix multiplication.

Let us assume that we have a n x n matrix M whose element in row i and column j is m_ij

Also let us assume that there is a vector ‘v’ whose jth element is v_j . Then the matrix vector product can be is the vector x of the length n whose ith element is given as

x_i = ∑ m_ijv_j

Map function: The map function applies to each single element of the matrix M. For each element m_ijthe map task outputs a key-value pair as follows (i, m_ijv_j). Hence we will have a key-value pairs for all ‘i’ from 1 to n.

Reduce function: The reduce function takes all pairs with the same key ‘i’ and sum it up.

Hence each reducer will generate

x_i = ∑ m_ijv_j

(Reference: Mining of Massive Datasets– Anand Rajaraman, Jure Leskovec, Jeffrey D Ullman)

This link gives a good write-up on a matrix x matrix multiplication,

Map-reduce for Relational Operations: Map-reduce can be used to perform a number of operations on large scale data that are used in database operations. Multiple database operations can be performed on large scale data like selection, projection, union, intersection, difference, natural join, grouping etc.

Here is a an example taken from ‘Web Intelligence & Big Data’ course from Coursera any Gautam Shroff.

Let us assume that there are 2 tables ‘Sales by address’ and “City by address’ and the need is to find the total ‘Sales by City’. The SQL query for this

SELECT SUM(Sale),City FROM Sales, City WHERE Sales.Addr_id = Cities.Addr_id GROUP BY City

This can be done by 2 map-reduce tasks.

The first map-reduce task GROUPs BY Sales as follows

Map1: The first map task will emit (Address, rest of record (SALE/City))

Reduce1: The first reduce task will SUM (Sales) by Address for every City. Clearly this will have multiple occurrences of City.

At this point we will have the sum of the sales for every city. However each city can occur multiple times. Now we have to GROUP BY City

Map2: Now the mapper emits the (City, rest of record (SALES)

Reduce2: The 2^nd reduce now SUMS all the sales for each city.

Clearly the map-reduce algorithm does solve some major areas. It is extremely useful when there is a need to perform the same operation on multiple documents. It would definitely be useful in building the inverted index or in Page rank. Also, map-reduce is very powerful in handling matrix operations. Large class of problems like machine learning, computer vision all use matrices extensively and map-reduce is extremely critical when it has done in large volumes of data. Besides, the ability of map-reduce to perform a large set of database operations is something that can be used in many situations in the web.

However it is no silver bullet for all types of problems.

Find me on Google+