Neural Networks: The mechanics of backpropagation

The initial work in the  ‘Backpropagation Algorithm’  started in the 1980’s and led to an explosion of interest in Neural Networks and  the application of backpropagation

The ‘Backpropagation’ algorithm computes the minimum of an error function with respect to the weights in the Neural Network. It uses the method of gradient descent. The combination of weights in a multi-layered neural network, which minimizes the error/cost function is considered to be a solution of the learning problem.

neuron-1

In the Neural Network above
out_{o1} =\sum_{i} w_{i}*x_{i}
E = 1/2(target - out)^{2}
\partial E/\partial out= 1/2*2*(target - out) *-1 = -(target - out)
\partial E/\partial w_{i} =\partial E/\partial y* \partial y/\partial w_{i}
\partial E/\partial w_{i} = -(target - out) * x_{i}

Checkout my book ‘Deep Learning from first principles: Second Edition – In vectorized Python, R and Octave’. My book starts with the implementation of a simple 2-layer Neural Network and works its way to a generic L-Layer Deep Learning Network, with all the bells and whistles. The derivations have been discussed in detail. The code has been extensively commented and included in its entirety in the Appendix sections. My book is available on Amazon as paperback ($18.99) and in kindle version($9.99/Rs449).

Perceptrons and single layered neural networks can only classify, if the sample space is linearly separable. For non-linear decision boundaries, a multi layered neural network with  backpropagation is required to generate more complex boundaries.The backpropagation algorithm, computes the minimum of the error function in weight space using the method of gradient descent. This computation of the gradient, requires the activation function to be both differentiable and continuous. Hence the sigmoid or logistic function is typically chosen as the activation function at every layer.

This post looks at a 3 layer neural network with 1 input, 1 hidden and 1 output. To a large extent this post is based on Matt Mazur’s detailed “A step by step backpropagation example“, and Prof Hinton’s “Neural Networks for Machine Learning” at Coursera and a few other sources.

While Matt Mazur’s post uses example values, I generate the formulas for the gradient derivatives for each weight in the hidden and input layers. I intend to implement a vector version of backpropagation in Octave, R and Python. So this post is a prequel to that.

The 3 layer neural network is as below

nn

Some basic derivations which are used in backpropagation

Chain rule of differentiation
Let y=f(u)
and u=g(x) then
\partial y/\partial x = \partial y/\partial u * \partial u/\partial x

An important result
y=1/(1+e^{-z})
Let x= 1 + e^{-z}  then
y = 1/x
\partial y/\partial x = -1/x^{2}
\partial x/\partial z = -e^{-z}

Using the chain rule of differentiation we get
\partial y/\partial z = \partial y/\partial x * \partial x/\partial z
=-1/(1+e^{-z})^{2}* -e^{-z} = e^{-z}/(1+e^{-z})^{2}
Therefore \partial y/\partial z = y(1-y)                                   -(A)

1) Feed forward network
The net output at the 1st hidden layer
in_{h1} = w_{1}i_{1} + w_{2}i_{2} + b_{1}
in_{h2} = w_{3}i_{1} + w_{4}i_{2} + b_{1}

The sigmoid/logistic function function is used to generate the activation outputs for each hidden layer. The sigmoid is chosen because it is continuous and also has a continuous derivative

out_{h1} = 1/1+e^{-in_{h1}}
out_{h2} = 1/1+e^{-in_{h2}}

The net output at the output layer
in_{o1} = w_{5}out_{h_{1}} +  w_{6}out_{h_{2}} + b_{2}
in_{o2} = w_{7}out_{h_{1}} +  w_{8}out_{h_{2}} + b_{2}

Total error
E_{total} = 1/2\sum (target - output)^{2}
E_{total} = E_{o1} + E_{o2}
E_{total} = 1/2(target_{o_{1}} - out_{o_{1}})^{2} + 1/2(target_{o_{2}} - out_{o_{2}})^{2}

2)The backwards pass
In the backward pass we need to compute how the squared error changes with changing weight. i.e we compute \partial E_{total}/\partial w_{i} for each weight w_{i}. This is shown below

A squared error is assumed

Error gradient  with w_{5}

output
 \partial E_{total}/\partial w_{5} = \partial E_{total}/\partial out_{o_{1}} * \partial out_{o_{1}}/\partial in_{o_{1}} * \partial in_{o_{1}}/ \partial w_{5}                -(B)

Since
E_{total} = 1/2\sum (target - output)^{2}
E_{total} = 1/2(target_{o_{1}} - out_{o_{1}})^{2} + 1/2(target_{o_{2}} - out_{o_{2}})^{2}
 \partial E _{total}/\partial out_{o1} = \partial E_{o1}/\partial out_{o1} + \partial E_{o2}/\partial out_{o1}
 \partial E _{total}/\partial out_{o1} = \partial /\partial _{out_{o1}}[1/2(target_{01}-out_{01})^{2}- 1/2(target_{02}-out_{02})^{2}]
 \partial E _{total}/\partial out_{o1} = 2 * 1/2*(target_{01} - out_{01}) *-1 + 0

Now considering the 2nd term in (B)
\partial out_{o1}/\partial in_{o1} = \partial/\partial in_{o1} [1/(1+e^{-in_{o1}})]

Using result (A)
 \partial out_{o1}/\partial in_{o1} = \partial/\partial in_{o1} [1/(1+e^{-in_{o1}})] = out_{o1}(1-out_{o1})

The 3rd term in (B)
 \partial in_{o1}/\partial w_{5} = \partial/\partial w_{5} [w_{5}*out_{h1} + w_{6}*out_{h2}] = out_{h1}
 \partial E_{total}/\partial w_{5}=-(target_{o1} - out_{o1}) * out_{o1} *(1-out_{o1}) * out_{h1}

Having computed \partial E_{total}/\partial w_{5}, we now perform gradient descent, by computing a new weight, assuming a learning rate \alpha
 w_{5}^{+} = w_{5} - \alpha * \partial E_{total}/\partial w_{5}

If we do this for  \partial E_{total}/\partial w_{6} we would get
 \partial E_{total}/\partial w_{6}=-(target_{02} - out_{02}) * out_{02} *(1-out_{02}) * out_{h2}

3)Hidden layer

hidden
We now compute how the total error changes for a change in weight w_{1}
 \partial E_{total}/\partial w_{1}= \partial E_{total}/\partial out_{h1}* \partial out_{h1}/\partial in_{h1} * \partial in_{h1}/\partial w_{1} – (C)

Using
E_{total} = E_{o1} + E_{o2} we get
 \partial E_{total}/\partial w_{1}= (\partial E_{o1}/\partial out_{h1}+  \partial E_{o2}/\partial out_{h1}) * \partial out_{h1}/\partial in_{h1} * \partial in_{h1}/\partial w_{1}
\partial E_{total}/\partial w_{1}=(\partial E_{o1}/\partial out_{h1}+  \partial E_{o2}/\partial out_{h1}) * out_{h1}*(1-out_{h1})*i_{1}     -(D)

Considering the 1st term in (C)
 \partial E_{total}/\partial out_{h1}= \partial E_{o1}/\partial out_{h1}+  \partial E_{o2}/\partial out_{h1}

Now
 \partial E_{o1}/\partial out_{h1} = \partial E_{o1}/\partial out_{o1} *\partial out_{o1}/\partial in_{01} * \partial in_{o1}/\partial out_{h1}
 \partial E_{o2}/\partial out_{h1} = \partial E_{o2}/\partial out_{o2} *\partial out_{o2}/\partial in_{02} * \partial in_{o2}/\partial out_{h1}

which gives the following
 \partial E_{o1}/\partial out_{o1} *\partial out_{o1}/\partial in_{o1} * \partial in_{o1}/\partial out_{h1} =-(target_{o1}-out_{o1}) *out_{o1}(1-out_{o1})*w_{5} – (E)
 \partial E_{o2}/\partial out_{o2} *\partial out_{o2}/\partial in_{02} * \partial in_{o2}/\partial out_{h1} =-(target_{o2}-out_{o2}) *out_{o2}(1-out_{o2})*w_{6} – (F)

Combining (D), (E) & (F) we get
\partial E_{total}/\partial w_{1} = -[(target_{o1}-out_{o1}) *out_{o1}(1-out_{o1})*w_{5} + (target_{o2}-out_{o2}) *out_{o2}(1-out_{o2})*w_{6}]*out_{h1}*(1-out_{h1})*i_{1}

This can be represented as
\partial E_{total}/\partial w_{1} = -\sum_{i}[(target_{oi}-out_{oi}) *out_{oi}(1-out_{oi})*w_{j}]*out_{h1}*(1-out_{h1})*i_{1}

With this derivative a new value of w_{1} is computed
 w_{1}^{+} = w_{1} - \alpha * \partial E_{total}/\partial w_{1}

Hence there are 2 important results
At the output layer we have
a)  \partial E_{total}/\partial w_{j}=-(target_{oi} - out_{oi}) * out_{oi} *(1-out_{oi}) * out_{hi}
At each hidden layer we compute
b) \partial E_{total}/\partial w_{k} = -\sum_{i}[(target_{oi}-out_{oi}) *out_{oi}(1-out_{oi})*w_{j}]*out_{hk}*(1-out_{hk})*i_{k}

Backpropagation, was very successful in the early years,  but the algorithm does have its problems for e.g the issue of the ‘vanishing’ and ‘exploding’ gradient. Yet it is a very key development in Neural Networks, and  the issues with the backprop gradients have been addressed through techniques such as the  momentum method and adaptive learning rate etc.

In this post. I derive the weights at the output layer and the hidden layer. As I already mentioned above, I intend to implement a vector version of the backpropagation algorithm in Octave, R and Python in the days to come.

Watch this space! I’ll be back

P.S. If you find any typos/errors, do let me know!

References
1. Neural Networks for Machine Learning by Prof Geoffrey Hinton
2. A Step by Step Backpropagation Example by Matt Mazur
3. The Backpropagation algorithm by R Rojas
4. Backpropagation Learning Artificial Neural Networks David S Touretzky
5. Artificial Intelligence, Prof Sudeshna Sarkar, NPTEL

Also see my other posts
1. Introducing QCSimulator: A 5-qubit quantum computing simulator in R
2. Design Principles of Scalable, Distributed Systems
3. A method for optimal bandwidth usage by auctioning available bandwidth using the OpenFlow protocol
4. De-blurring revisited with Wiener filter using OpenCV
5. GooglyPlus: yorkr analyzes IPL players, teams, matches with plots and tables
6. Re-introducing cricketr! : An R package to analyze performances of cricketers

To see all my posts go to ‘Index of Posts

Close encounters with the future

ss

Published in Telecom Asia, Oct 22,2013 – Close encounters with the future

Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers in the future may have only 1,000 vacuum tubes and perhaps weigh 1.5 tons.—POPULAR MECHANICS, 1949

Introduction: Ray Kurzweil in his non-fiction book “The Singularity is near – When humans transcend biology” predicts that by the year 2045 the Singularity will allow humans to transcend our ‘frail biological bodies’ and our ‘petty, derivative and circumscribed brains’ . Specifically the book claims “that there will be a ‘technological singularity’ in the year 2045, a point where progress is so rapid it outstrips humans’ ability to comprehend it. Irreversibly transformed, people will augment their minds and bodies with genetic alterations, nanotechnology, and artificial intelligence”.

He believes that advances in robotics, AI, nanotechnology and genetics will grow exponentially and will lead us into a future realm of intelligence that will far exceed biological intelligence. This explosion will be the result of ‘accelerating returns from significant advances in technology”

Futurescape

Here is a look at some of the more fascinating key trends in technology. You can decide whether we are heading to Singularity or not.

Autonomous Vehicles (AVs): Self driving cars have moved from the realm of science fiction to reality in recent times. Google’s autonomous cars has already driven around half a million miles. All the major car manufacturers of the world from BMW, Mercedes, Toyota, Nissan, Ford or GM are all coming with their own versions of autonomous cars. These cars are equipped with Adaptive Cruise Control and Collision Avoidance technologies and are already taking away control drivers. Moreover AVs alert drivers, if their attention strays from the road ahead, for too long. Autonomous Vehicles work with the help of Vehicular Communication Technology.

Vehicular Communication along with the Intelligent Transport Systems (ITS) achieves safety by enabling communication between vehicles, people and roads. Vehicle-to-vehicle communications are the fundamental building block of autonomous, self-driving cars. It enables the exchange of data between vehicles and allows automobiles to “see” and adapt to driving obstacles more completely, preventing accidents besides resulting in more efficient driving.

Smart Assistants: From the defeat of Kasparov in chess by IBM’s Deep Blue in 1997, and then subsequently to  the resounding victory of IBM’s Watson in Jeopardy, capable of understanding natural human language, to the more prevalent Apple’s intelligent assistant Siri, Artificially Intelligent  (AI) systems have come a long way. The newest trend in this area is Smart Assistants.  Robots are currently analyzing documents, filling prescriptions, and handling other tasks that were once exclusively done by humans. Smart Assistants are already taking over the tasks of BPO operators, paralegals, store clerks, baby sitters. Robots, in many ways, are not only smarter than humans, but also do not get easily bored,

Intelligent homes and intelligent offices. Rapid advances in technology will be closer to the home both literally and figuratively. The future home will have the ability to detect the presence of people, pets, smoke and changes to humidity, moisture, lighting, temperature. Smart devices will monitor the environment and take appropriate steps to save energy, improve safety and enhance security of homes.  Devices will start learning your habits and enhance your comfort and convenience. Everything from thermostats, fire detectors, washing machines, refrigerators will be equipped electronics that will be capable of adapting to the environment. All gadgets at home will be accessible through laptops, tablets or smartphones from anywhere. We will be able to monitor all aspects of our intelligent home from anywhere.

Smart devices will also make major inroads into offices leading to the birth of intelligent offices where the lighting, heating, cooling will be based on the presence of people in the offices. This will result in an enormous savings in energy. The advances in intelligent homes and intelligent offices will be in the greater context of the Smart Grid.

Swarms of drones: Contrary to the use of weaponized drones for unmanned aerial survey of enemy territory we will soon have commercial drones. Drone will start being used for civilian purposes.  The most compelling aspect of drones these days is the fact that they can be easily manufactured in large quantities, are cheap and can perform complex tasks either singly or collectively. Remotely controlled drones can perform hundreds of civilian jobs, including traffic monitoring, aerial surveying, and oil pipeline inspections and monitoring of crop conditions. Drones are also being employed for conservation of wildlife. In the wilderness of Africa, drones are already helping in providing aerial footage of the landscape, tracking poachers and in also herding elephants. However, before drones become a common sight, it is necessary to ensure that appropriate laws are made for maintaining the safety and security of civilians. This is likely to happen in US in 2015, when the Federal Aviation Administration (FAA) will come up with rules to safely integrate drones into the American skies.

MOOC (Massive Online Open Course): The concept of MOOC, or the ‘Massive Open Online Course’ from top colleges, though just a few years old, is already taking the world by storm. Coursera, edX and Udacity are the top 3 MOOCs besides many others and offer a variety of courses on technology, philosophy, sociology, computer science etc.  As more courses are available online, the requirements of having a uniform start and end date will diminish gradually. The availability of course lectures at all times and through all devices, namely the laptop, tablet or smartphone, will result in large scale adoption by students of all ages.

Contrary to regimented classes MOOCs now allow students to take classes at their own pace. It is likely that some students will breeze through an entire semester worth of classes in a few weeks. It is also likely that a few students will graduate in 4 years with more than a couple of degrees. MOOCs are a natural development considering that the world is going to be more knowledge driven where there will be the need for experts with a diverse set of in-depth skills. Here is an interesting article in WSJ “What College will be like in 2023

3D Printing: This is another technology that is bound to become ubiquitous in our future. 3D printers will revolutionize manufacturing in ways we could never imagine. A 3-D printer is similar to a hot-glue gun attached to a robotic arm. A 3-D printer creates an object by stacking one layer of material, typically plastic or metal, on top of another.  3D printers have been used for making everything from prosthetic limbs, phone cases, lamps all the way to a NASA funded 3D pizza. Here is a great article in New York Times “Dinner is Printed” It is likely that a 3D printer would be indispensable to our future homes much like the refrigerator and microwave.

Artificial sense organs: A recent news items in Science 2.0 “The Future touch sensitive prosthetic limbs”   discusses the invention of a prosthetic limb that can actually provide the sense of touch by stimulating the regions of the brain that deal with the sense of touch. The researchers identified the neural activity that occurs when grasping or feeling an object and successfully induced these patterns in the brain. Two parallel efforts are underway to understand how the human brain works. They are “The Human Brain Project” which has 130 members of the European Union and Obama’s BRAIN project. Both these projects attempt to ‘to give us a deeper and more meaningful understanding of how the human brain operates”. Possibilities as in the movies ‘Avatar’ or ‘Terminator’ may not be far away.

The Others: Besides the above, technologies like Big Data, Cloud Computing, Semantic Web, Internet of Things and Smart Grid will also be swamp us in the future and much has already been said about it.

Conclusion: The above sets of technologies represent seismic shifts and are bound to explode in our future in a million ways.

Given the advances in bionic limbs, Machine Intelligent AI systems, MOOCs, Autonomous Vehicles are we on target for the Singularity?

I wouldn’t be surprised at all!

Find me on Google+

The computer is not a dumb machine!

computer“The computer is a dumb machine. It needs to be told what to do at every step”. How often have we heard of this refrain from friends and those who have only an incidental interaction with computers?  To them a computer is like a ball which has to be kicked from place to place. These people are either ignorant of computers, say it by force of habit or have a fear of computers. However this is so far from the truth.  In this post, my 100th, I come to the defense of the computer in a slightly philosophical way.

The computer is truly a marvel of technology. The computer truly embodies untapped intelligence. In my opinion even a safety pin is frozen intelligence. From a piece of metal the safety pin can now hold things together while pinning them, besides incorporating an aspect of safety.

Stating that the computer is a dumb machine is like saying that a television is dumb and an airplane is dumber.  An airplane probably represents a modern miracle in which the laws of flight are built into every nut and bolt that goes into the plane. The electronics and the controls enable it to lift off, fly and land with precision  and perform a miracle in every flight.

Similarly a computer from the bare hardware to the upper most layer of software is nothing but layer and layer of human ingenuity, creativity and innovation. At the bare metal the hardware of the computer is made up integrated chips that work at the rate of 1 billion+ instructions per second. The circuits are organized so precisely that they are able to work together and produce a coherent output, all at blazing speeds of less than a billionth of a second.

computer3

On top of the bare bones hardware we have some programs that work at the assembly and machine code made of 0’s and 1’s. The machine code is nothing more than an amorphous strings of 0’s and 1’s. At this level   the thing that is worked on (object) and the thing that works on it (subject) are indistinguishable. There is no subject and object at this level. What distinguishes then is the context.

Over this layer we have the Operating System (OS) which I would like to refer to as the mind of the computer. The OS is managing many things all at once much like the mind has complete control over sense organs which receive external input. So the OS manages processes, memory, devices and CPU (resources)

As humans, we like to pride ourselves that we have consciousness. Rather than going into any metaphysical discussion on what consciousness is or isn’t it is clear that the OS keeps the computer completely conscious of the state of all its resources.  So just like we react to data received through our sense organs the computer reacts to input received through devices (mouse, keyboard) or its memory etc. So does the computer have consciousness?

You say human beings are capable of thought. So what is thought but some sensible evaluation of known concepts? In a way the OS is also constant churning in the background trying to make sense of the state of the CPU, the memory or the disk.

Not to give in I can hear you say “But human beings understand choice”. Really! So here is my program for a human being

If provoked
Get angry

If insulted
Get hurt

If ego stoked
Go mad with joy

Just kidding! Anyway the recent advances in cognitive computing now show it is possible to have computers choose the best alternative. IBM’s Watson is capable of evaluation alternative choices.

Over the OS we have compilers and above that we have several applications.
The computer truly represents layers and layers of solidified human thought. Whether it is the precise hardware circuitry, the OS, compilers, or any application they are all result of human thought and they are constantly working in the computer.

So if your initial attempt to perform something useful did not quite work out, you must understand that you are working with decades of human thought embodied in the computer. So your instructions should be precise and logical. Otherwise your attempts will be thwarted.

computer1
So whether it’s the computer, the mobile or your car, we should look and appreciate the deep beauty that resides in these modern conveniences, gadgets or machinery.

Find me on Google+

Working with binary trees in Lisp

Success finally! I have been trying to create and work with binary trees in Lisp for sometime. It is really difficult in Lisp in which there are no real data structures. Trees are so obvious in a language like C where one can visualize the left & right branches with pointers. The structure of a node in C is

struct node tree{
int value
struct node *left;
struct node *right;
}

Lisp has no such structures. Lisp is a list or at best a list of lists. It is really how the program interprets the nested list of lists that makes the list a binary or a n-ary tree. As I mentioned before I have not had a whole lot of success in creating the binary tree in Lisp for quite sometime. Finally I happened to come across a small version of adding an element to a set in “Structure and Interpretation of Computer Programs (SICP)” by Harold Abelson, Gerald and Julie Sussman. This is probably one of the best books I have read in a long time and contains some truly profound insights into computer programming.

I adapted the Scheme code into my version of a adding a node. Finally I was able to code inorder, pre-order and post order traversals.

(Note: You can clone the code below from GitHub : Binary trees in Lisp)

In this version the a node of a binary tree is represented as
(node left right) so
node -> (car tree)
left_branch -> (cadr tree)
right_branch is -> (caddr tree)

Here is the code
(defun entry (tree)
(car tree))

(defun left-branch (tree)
(cadr tree))

(defun right-branch (tree)
(caddr tree))

//Create node in a binary tree
(defun make-tree (entry left right)
(list entry left righta))

// Insert element into tree
add (x tree)
(cond ((null tree) (make-tree x nil nil))
((= x (entry tree)) tree)
((< x (entry tree)) (make-tree (entry tree) (add x
(left-branch tree)) (right-branch tree)))

((> x (entry tree)) (make-tree (entry tree)
(left-branch tree) (add x (right-branch tree))))))

So I can now create a tree with a create-tree function
(defun create-tree(elmnts)
(dolist (x elmnts)
(setf tree (add x tree)))
)

(setf tree nil)
NIL

(setf lst (list 23 12 1 4 5 28 4 9 10 45 89))
(23 12 1 4 5 28 4 9 10 45 89)

(create-tree lst)
NIL

Now I display the tree
tree

(23 (12 (1 NIL (4 NIL (5 NIL (9 NIL (10 NIL NIL))))) NIL) (28 NIL (45 NIL (89 NIL NIL))))

This can be represented pictorially as

Now I created the 3 types of traversals
(defun inorder (tree)
(cond ((null tree))
(t (inorder (left-branch tree))

(print (entry tree))
(inorder (right-branch tree))))))

(defun preorder (tree)
(cond ((null tree))

(t (print (entry tree))
(preorder (left-branch tree))
(preorder (right-branch tree))))))
(defun postorder (tree)
(cond ((null tree))
(t (postorder (left-branch tree))
(postorder (right-branch tree))

(print (entry tree)))))
[89]> (inorder tree)

1
4
5
9
10
12
23
28
45
89

[90]> (preorder tree)
23
12
1
4
5
9
10
28
45
89
T

[91]> (postorder tree)
10
9
5
4
1
12
89
45
28
23
23

Note:  A couple of readers responded to me saying that I very wrong in saying that Lisp has no data structures. I would tend to agree that Lisp would have evolved over the years to include data structures. I hope to pick on Lisp some time later from where I left off!. Till that time….

You may also like
1. A crime map of India in R: Crimes against women
2.  What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
3.  Bend it like Bluemix, MongoDB with autoscaling – Part 2
4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
5. Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data

For all posts see Index of posts

Find me on Google+

Taking baby steps in Lisp

Lisp can be both fascinating and frustrating. Fascinating, because you can write compact code to solve really complex problems. Frustrating, because you can easily get lost in its maze of parentheses. I, for one, have been truly smitten by Lisp. My initial encounter with Lisp did not yield much success as I tried to come to terms with its strange syntax. The books I read on the Lisp language typically gloss over the exotic features of Lisp like writing Lisp code to solve the Towers of Hanoi or the Eight Queens problem. They talk about functions returning functions, back quotes and macros that can make your head spin.

I found this approach extremely difficult to digest the language. So I decided to view Lisp through the eyes of any other regular programming language like C, C++,, Java, Perl, Python or Ruby. I was keen on being able to do regular things with Lisp before I try out its unique features. So I decided to investigate Lisp from this view point and learn how to make Lisp do mundane things like an assignment, conditional, loop, array, input and output etc.

This post is centered on this fact.

Assignment statement

The most fundamental requirement for any language is to perform an assignment. For e.g. these are assignment statements in Lisp and its equivalent in C for e.g.

$ (setf x 5)                                                         -> $ x = 5
$ (setf x (+  (* y 2) (* z 8))                               -> $x = 2y + 8z

Conditional statement
 
There are a couple of forms of conditional statement in Lisp. The most basic is the ‘if’ statement which is special case. You can do if-then-else without the possibility of if-then-else if-else if – else

if (condition) statement else-statement

In Lisp this is written as
$(setf x 5)
$ (if (= x 5)
(setf x  (+ x 5))
(setf  (- x 6)))
10

In C this equivalent to
$ x = 5
$ if (x == 5)
x = x + 5;
else
x = x -6;

However Lisp allows the if-then-else if – else if –else through the use of the COND statement

So we could write

$ (setf x 10)
$ (cond ((< x 5) (setf x (+ x 8)) (setf y (* 2 y)))
((= x 10) (setf x (* x 2)))
(t (setf x 8)))
20

The above statement in C would be
$ x = 2
$ y = 10
$ if (x < 5)
{
x = x + 8;
y = 2 * y;
}
else if (x == 10)
{
x = x * 2;
}
else
x = 8;

Loops
Lisp has many forms of loops dotimes, dolist, do , loop for etc. I found the following most intuitive and best to get started with
$  (setf x 5)
$ (let ((i 0))
(loop
(setf y (* x i))
(when (> i 10) (return))
(print i) (prin1 y)
(incf i
)))

In C this could be written as
$ x = 5
(for i = 0; i < 10; i++)
{
y = x * i
printf(“%d %d\n”,i,y);
}

Another easy looping construct in C is
(loop for x from 2 to 10 by 3
do (print x))
In C this would be
(for x=2; x < 10; x = x+3)
print x;

Arrays
To create an array of 10 elements with initial value of 20
(setf numarray (make-array 10 :initial-element 20))
#(20 20 20 20 20 20 20 20 20 20)
To read an array element it is
$ (aref  numarray 3)                    – – – > numarray[3]
For e.g.
(setf x (* 2 (aref numarray 4)))     – – – – > x = numarray[4] * 2

Functions
(defun square (x)
(* x x))
This is the same as

int square (x)
{
return (x * x)
}

While in C you would invoke the function as
y = square (8)

In Lisp you would write as
(setf y (square 8))

Note: In Lisp the function is invoked as (function arg1 arg2… argn) instead of (function (arg1 arg2  … argn))

Structures
a) Create a global variable *db*
(defvar *db* nil)
 

b) Make a function to add an employee
$(defun make-emp (name age title)
(list :name name :age age :title title))
$(add-emp (make-emp “ganesh” 49 “manager”))
$(add-emp (make-emp “manish” 50 “gm”))
$(add-emp (make-emp “ram” 46 “vp”))
$ (dump-db)

For a more complete and excellent post on managing a simple DB looks at Practical Common Lisp by Peter Siebel

Reading and writing to standard output
To write to standard output you can use
(print “This is a test”) or
(print ‘(This is a test))
To read from standard input use
(let ((temp 0))
(print ‘(Enter temp))
(setf temp (read))
(print (append ‘(the temp is) (list temp))))

Reading and writing to a file
The typical way to do this is to use

a) Read
(with-open-file (stream “C:\\acl82express\\lisp\\count.cl”)
(do ((line (read-line stream nil)
(read-line stream nil)))
((null line))
(print line)))

b) Write
(with-open-file (stream “C:\\acl82express\\lisp\\test.txt”
:direction :output
:if-exists :supersede)
(write-line “test” stream)
nil)

I found the following construct a lot easier
(let ((in (open “C:\\acl82express\\lisp\\count.cl” :if-does-not-exist nil)))
(when in
(loop for line = (read-line in nil)
while line do (format t “~a~%” line))
(close in)))

With the above you can get started on Lisp. However with just the above constructs the code one writes will be very “non-Lispy”. Anyway this is definitely a start.

Find me on Google+

The Future of Programming Languages

How will the computing landscape evolve as we move towards the next millennium? Clearly the computer architecture will evolve towards a more parallel architecture with multiple CPUs each handling a part of the problem in parallel. However, programming parallel architectures is no simple task and will challenge the greatest minds.

In the future where the problems and the architectures will be extremely complex, the programming language will itself evolve towards simplicity. The programming language will be based on natural language that we use to define problems. Behinds the scenes of the natural language interface will be complex algorithms of Artificial Intelligence which will perform the difficult task of specifying the definition of problem into a high level programming language like C++, Java etc.

The Artificial Intelligence interface will handle the task of creating variables, forming loops and actually defining classes and handling errors. The code generated by the machine will have far less syntactical errors than those created by human beings. However while a large set of problems can be solved by the AI interface there will be a certain class of problems which will still require human intervention and the need to work in the languages of today.

One of the drivers for this natural language of programming of the future, besides the complexity of the computer architecture is the need to bring a larger section of domain experts to solve the problems in their fields without the need to learn all the complex syntax, and semantics of the current programming languages.

This will allow everybody from astrophysicists, geneticists, historians and statisticians to be able to utilize the computer to solve the difficult problems in their domain.

Find me on Google+

Ramblings on Lisp

In the world of programming languages Lisp can be considered truly ancient along with its companion FORTRAN. However it has survived till this day. Lisp had its origins as early as 1958 when John McCarthy of MIT published the design of the language in an ACM paper titled “Recursive Functions of Symbolic Expressions and Their Computation by Machine”. Lisp was invented by McCarthy as a mathematical notation of computer programs and is based on Lambda Calculus described by Alonzo Church in 1930.

Lisp has not had the popularity of other more recent languages like C, C++ and Java partly because it has an unfamiliar syntax and also has a very steep learning curve. The Lisp syntax can be particularly intimidating to beginners with its series of parentheses. However it is one of predominant language that is used in AI domain.

Some of the key characteristics of Lisp are

Lisp derives its name from LISt Processing. Hence while most programming languages try to compute on data, Lisp computes on a data structure namely the list. A list can be visualized as a collection of elements which can be either data or functions or lists themselves. Its power comes from the fact that the language includes in its syntax some of the operations that can be performed on a lists and lists of lists. In fact many key features of the Lisp Language have found their way into more current languages like Python, Ruby and Perl.

Second Lisp is a symbolic processing language. This ability to manipulate symbols gives Lisp a powerful edge over other Programming Languages in AI domains like theorem proving or natural language processing.

Thirdly Lisp uses a recursive style of programming. This makes the code much shorter than other languages. Recursion enables the expression of the problem as a combination of a terminating condition and a self describing sub problem.  However the singular advantage that Lisp has over other programming languages is that it uses a technique called “tail recursion” . The beauty of tail recursion is that computing  space is of the order of O(1) and not O(n) that is common in languages like C,C++,Java where the size of the stack grows with each subsequent recursive call.

Lisp blurs the distinction between functions and data. Functions use other functions as arguments during computations. Lisp lists are functions that operate on other functions and data in a self repeating fashion.

The closest analogy to this is to think of machine code which is sequence of 32 bit binary words. Both the logic and the data on which they operate are 32 bit binary words and cannot be distinguished unless one knows where the program is supposed to start executing. If one were to take the snapshot of consecutive memory locations we will encounter 32 bit binary words which represent either a logical or arithmetic operation on data which are also 32 bit binary words.

Lisp is a malleable language and allows the programmer to tailor the language for his own convenience. It allows the programmer to manipulate the language so that it suits the programming style of the programmer. Lisp programs evolve from the bottom-up rather from the top-down style adopted in other languages. The design methodology of Lisp programs takes an inside-out approach rather than an outside-in method.

Lisp has many eminent die-hard adherents who swear by the elegance and beauty of being able to solve difficult problems concisely. On the other hand there are those to whom Lisp represents an ancient Pharaoh’s curse that is difficult to get rid of.

However with Lisp, “Once smitten, you remain smitten”.

Find me on Google+

Singularity

Pete Mettle felt drowsy. He had been working for days on his new inference algorithm. Pete had been in the field of Artificial Intelligence (AI) for close to 3 decades and had established himself as the father of “semantics”. He was particularly renowned for his 3 principles of Artificial Intelligence. He had postulated the Principles of Learning as

The Principle of Knowledge Acquisition: This principle laid out the guidelines for knowledge acquisition by an algorithm. It clearly laid out the rules of what was knowledge and what was not. It could clearly delineate between the wheat and chaff from any textbook or research article.

The Principle of Knowledge Assimilation: This law gave the process for organizing the acquired knowledge in facts, rules and underlying principles. Knowledge assimilation involved storing the individual rules, the relation between the rules and provided the basis for drawing conclusions from them

The Principle of Knowledge Application: This principle according to Pete was the most important. It showed how all knowledge acquired and assimilated could be used to draw inferences andconclusions. In fact it also showed how knowledge could be extrapolated to make safe conclusions.

Zengine The above 3 principles of Pete were hailed as a major landmark in AI. Pete started to work on an inference engine known as “Zengine” based on his above 3 principles. Pete was almost finished fine tuning his algorithm. Pete wanted to test his Zengine on the World Wide Web. The World Wide Web had grown into gigantic proportions. A report in May 2025 issue of Wall Street Journal mentioned that the total data that was held in the internet had crossed 400 zettabytes and that the daily data stored on the web was close to 20 terabytes. It was a well known fact that there an enormous amount of information on the web on a wide variety of topics. Wikis, blogs, articles, ideas, social networks and so on there was a lot of information on almost every conceivable topic under the sun.

Pete was given special permission by the governments of the world to run his Zengine on the internet. It was Pete’s theory that it would take the Zengine close to at least a year to process the information on the web and make any reasonable inferences from them. Accompanied by world wide publicity Zengine started its work of trying to assimilate the information on the World Wide Web. The Zengine was programmed to periodically give a status update of its progress to Pete.

A few months passed. Zengine kept giving updates on the number of sites, periodicals, blogs it had condensed into its knowledge database. After about 10 months Pete received a mail. It read “Markets will crash on March 2026. Petrol prices will sky rocket – Zengine. Pete was surprised at the forecast. So he invoked the API to check on what basis the claim had been made. To his surprise and amazement he found that a lot events happening in the world had been used to make that claim which clearly seemed to point in that direction. A couple of months down the line there was another terse statement “Rebellion very likely in Mogadishu in Dec 2027″. – Zengine.The Zengine also came with corollaries to Fermat’s last theorem. It was becoming clear to Pete and everybody that the Zengine was indeed becoming smarter by the day..It became apparent to everybody when Zengine would become more powerful than human beings.

Celestial events: Around this time peculiar events were observed all over the world. There were a lot of celestial events that were happening. Phenomenon like the aurora borealis became common place. On Dec 12, 2026 there was an unusual amount of electrical activity in the sky. Everywhere there were streaks of lightning. By evening time slivers of lightning hit the earth in several parts of the world. In fact if anybody had viewed the earth from outer space then it would have a resembled a “nebula sphere” with lightning streaks racing towards the earth in all directions. This seemed to happen for many days. Simultaneously the Zengine was getting more and more powerful. In fact it had learnt to spawn of multiple processes to get information and return to it.

Time-space discontinuity: People everywhere were petrified of this strange phenomenon. On the one hand there was the fear of the takeover of the web by the Zengine and on the other was this increased celestial activity. Finally on the morning of Jan 2028 there was a powerful crack followed by a sonic boom and everywhere people had a moment of discontinuity. In the briefest of moments there was a natural time-space discontinuity and mankind had progressed to the next stage in evolution.

The unconscious, sub conscious and the conscious all became a single faculty of super consciousness. It has always been known from the time of Plato that man knows everything there is to know. According to Platonic doctrine of Recollection, human beings are born with a soul possessing all knowledge, and learning is just discovering or recollecting what the soul already knows. Similarly according to Hindu philosophy, behind the individual consciousness of the Atman, is the reality known as the Brahman which is universal consciousness attained in a deep state of mysticism through self-inquiry.

However this evolution by some strange quirk of coincidence seemed to coincide with the development of the world’s first truly learning machine. In this super conscious state a learning machine was not something to be feared but something which could be used to benefit mankind. Just like cranes can lift and earthmovers perform tasks that are beyond our physical capacity so also a learning machine was a useful invention that could be used to harness the knowledge from mankind’s storehouse – the World Wide Web.

Find me on Google+