Pregelian philosophy – Thinking Web Scale – Part 2

“If you squint the right way, you will notice there are graphs are everywhere”. This is a line from a Research blog at Google “Large scale graph computing at Google”.  Transportation problems, disease outbreaks, computer networks and social networks all use graph in one way or another. Social networks of today, with friends of  Facebook, followers in Twitter and connections in LinkedIn are all clearly graphs. The billions of pages in the World Wide Web with incoming and outgoing links is a massive directed graph.

Pregel is Google’s computing model for graph processing. Google came up with this programming model to compute Page Ranks of individual web sites. Pregel is a powerful framework for processing of directed graphs. A directed graph has vertices and edges. Edges are directed towards or away from the vertices. Pregel is a highly scalable model and is well suited for Web Scale problems. It is capable of processing directed graphs with billions of vertices and trillion of edges.

The Map Reduce paradigm with its message passing mechanism is not particularly well suited for this purpose. Map Reduce is capable of processing several documents, images, or matrices in parallel. But handling directed graphs with Map-reduce the combiners/reducers have to wait for the mappers to finish their tasks.  In Pregel programs are executed as a sequence of iterations in which each vertex receives messages sent to it in the previous iteration, execute code, and send messages to other vertices. Each vertex can  modify its state and the topology of the graph

Here is a the model of the Pregel.

pregel

This picture is taken from the lecture in Web Intelligence and Big Data course by Gautam Shroff on Coursera

Pregel works on a sequence of iterations known as supersteps. In each superstep, S, each vertex will receive messages sent to it in the previous superstep S-1, executes a user-defined function specified for the vertex and sends messages to other vertices which they will receive in superstep S+1. The supersteps at all vertices are conceptually supposed to occur in parallel.  At each superstep the vertex can alter its state and the state of the outgoing edges.  The synchronicity of the model is what makes the model semantically manageable.

Pregel is realized on hundreds of commodty serves. The input to the Pregel model is a directed graph. During initialization the vertices are partitioned and each server receives a set of vertices. Each vertex is associated with a modifiable user defined value.

In each superstep each node computes the user defined function in parallel using the message sent to it in the previous superstep.  A vertex can modify its state, the outgoing edges or the topology of the graph.

Pregel has been used for a variety of different problems ranging from determining Page Rank, Shortest path etc. Vertices are first class citizens in Pregel.  Algorithms terminate by a process of voting to halt. When all vertices vote to halt then the computation in Pregel is assumed to have completed. The algorithm as a whole terminates when all the nodes have voted to halt and there are no messages in transit.

A simple example of how Pregel determines the maximum value is illustrated in the original Google research paper Pregel: A System for Large-Scale Graph Processing.

1
The pseudo code for this can be written as

compute(){

i_val := val
while (m) {
if (m > val) {
val = m;
}
else if (i_val == val) {
vote_to_halt
}
else {
for each neighbor v
send_message(v, val)
}
}

In the above diagram the 4 vertices are initialized with the values shown. In superstep 1,

vertices 3 & 6 exchange their values. While 3 updates its value to 6 at its vertex, vertex 6 drops it and vote to halt. Similarly vertex 1 will update its value to 6 while vertex 2 which receives the value 1 will  vote to halt. In superstep 2vertex 2 will receive the value 6 from the vertex to its right, will be woken up and will update its value. In superstep 3 no messages are passed and all nodes would have now voted

Another interesting application is the evaluation of Page Rank. Page Rank essentially determines the probability of hitting a Page if a surfer clicked links on a Web page at random.

Page Rank is evaluated iteratively as follows (source wkipedia.org)

formula

This can be done iteratively through Pregel as below

virtual void Compute(MessageIterator* msgs) {
if (superstep() >= 1) {

for (; !msgs->Done(); msgs->Next()) {
sum += msgs->Value();
*MutableValue() = 0.15 / NumVertices() + 0.85 * sum; – – > (A)
}
if (superstep() < 30) {
const int64 n = GetOutEdgeIterator().size();
SendMessageToAllNeighbors(GetValue() / n); – – > (B)
} else {
VoteToHalt();
}
}
}

The Pregel computation is initialized such that in superstep 0, the value of each vertex

is 1 / NumVertices() .  In each of the first 30 supersteps, each vertex sends along each outgoing edge its tentative PageRank divided by the number of outgoing edges (see step B above).

Starting from superstep 1, each vertex sums up the values arriving on messages into sum and sets its own tentative PageRank to 0.15/NumVertices() + 0.85 * sum (see step A)  After reaching superstep 30, no further messages are sent and each vertex votes to halt.

Clearly the computation of PageRank of the pages indexed by Google in the World Wide Web consisting of billions of pages can be computed fairly efficiently by Pregel.

Pregel also includes ‘combiners’ that perform functions like SUM, MIN,MAX and AVERAGE to save computational steps.

Pregel also includes checkpointing where vertices save their states to revert to a previous state in case of failure.

The synchronous methodology of Pregel helps in avoiding issues of deadlocks and races which are prevalent in asynchronous communicating programs.

Pregel is a powerful programming model which will find applications in many Web Scale applications in the future.

Find me on Google+

Designing a Social Web Portal

Here’s another novel idea of mine, “Designing a Social Web Portal”  that has made it to IP.com (intellectual property).

I have included below the full article in which the Web Portal is re-imagined by adding the social paradigm to a portal.

Abstract

portalThe Social Web Portal, re-imagines the Web Portal using the social paradigm The Social Web Portal is common portal into which all users would login similar to Facebook, Google+ or Twitter. In the Social Web Portal users can choose their family, friends, acquaintances, professional colleagues. Once users are registered in the portal, the Social Web Portal will analyze the click stream history of all the registered users and display the relevant links for each user based on the user’s social circle. Hence in the Social Web Portal each user will get an instantaneous update of relevant,trending URLs/newsitems of his/her social circle based on the click stream data of the social circle in addition to articles of personal interest. Such a portal becomes important in this age of exploding information. The user is completely abreast of all topics of interest of his immediate social circle and the world at large.

Introduction

A large part of our lives is spent on the net. We browse the web for news, stock prices, technology tends, sports updates etc. To do this we typically go to our favorite web sites which are either news aggregators or news curators or we search the web for the required information. This article describes a completely new web browsing experience that is based on the social networking paradigm. This article describes a web portal where the content displayed is based on the browsing preferences of the user, the user’s friends circle, the user’s professional network and the world at large. So the Social Web Portal will display content that is based on the user own preferences, the collective browsing click streams of his/her social network, the user’s professional friends and the world at large. Such a web portal will give the user a snapshot of the kind of news articles that will be of great interest to
him/her. The inclusion of the social paradigm to web browsing provides the user a web browsing experience that is most closely tailored to the user’s taste.

Summary

Web portals like Lycos, Alta-vista, Yahoo, and Excite had their day in the sun early 1990’s. However all this changed with the entry of Google. It had a webpage with a single search bar. With a single stroke Google pushed all the portals to virtual oblivion.

It became obvious to the user that all information was just a “search away”. But much has changed since then. Many pages have been uploaded into the trillion servers that make up the internet. There is so much more information in the worldwide web. News articles, wikis, blogs, tweets, webinars, podcasts, photos, you tube content, social networks etc.

The internet now contains 8.11+ billion pages has more than 1.97 billion users, 266 million websites. We can expect the size to keep growing as the rate of information generation and our thirst for information keeps increasing.

In this world of exploding information the “humble search” will no longer be sufficient. As a user we would like to browse the web in a much more efficient, effective and personalized way. Neither will site aggregators like StumbleUpon, Digg, Reddit or sites which are news curators will be useful. We need to have a smart way to be able to navigate through this information deluge that is personalized to our tastes and to our social circle’s tastes.

We have now entered into an era of social networking where we keep contact with friends on social sites like Facebook, Google+ and with our professional network on LinkedIn and with the world at large on sites like Twitter and Pinterest. These social web sites deliver content based on our connections or our network.

The social web portal delivers content to the portal based on the user’s social network and the user’s social network’s browsing tendencies. It is in this context that it makes great sense to deliver a web portal experience that is based on the user’s personal, family, friend, professional and world browsing
preferences.

Description

Can a Web Portal render content to a single page with topics & news items based on the user’s social circle centered on the user?

The Social Web Portal discusses such a portal for the user which renders content dynamically based on the click stream of the user’s social network. The Social Web Portal will deliver content that has the user’s browsing preferences as the focal center while also displaying the browsing trends of the user’s family, the user’s close friends, the user’s professional colleagues and associates. Finally the portal would also include inputs from what the world at large is interested in and following. The web portal would analyze the key user’s preferences and then create a web portal based on its analysis of
what the user would like to see.

Social Web Portal – Fundamental concept and premise

The Social Web Portal is not a personalized home page in which RSS feeds or inputs from feed aggregators or site aggregators are taken. The Social Web Portal is common portal into which all users would log into similar to Facebook, Google+ or Twitter. All users can choose their friends, acquaintances, professional colleagues in this portal. Once users are registered the click stream history of the all the registered users are continuously updated by the Social Web Portal to a back-end database. Then based on each individual’s social circle the Social Web Portal will perform a statistical analysis of those URLs which were more relevant in the user’s social circle and display these URLs in the user’s page on the Social Web Portal. So when the user logs into the Social Web Portal the webpage will be personalized based on the user’s individual preferences and the collective browsing history of the user’s social circle (friends, colleagues,acquaintances etc).

The Social Web Portal does not take any feeds from existing social networking web sites like Facebook, Google+, Twitter or Youtube. It is independent of these sites. It does not aggregate the feeds from these sites nor does it depend on the social signals from these sites.

The Social Web Portal will generate ‘social signals” independently and completely based on the user’s social circle and the collective browsing history of the user’s social network.

Also the “Social Web Portal” is fundamentally different from link aggregators or a feed aggregators. As mentioned above the Social Web Portal will be based on a statistical analysis of the user network’s browsing history. So regardless of whether a user manually updates a Facebook/Google status, or a user submits a link to a link aggregating web site, the Social Web Portal will analyze the browsing history of the user’s social network and render the portal with the most browsed content.

The collective click stream of the user’s social circle will be analyzed statistically and the sites that have been most visited based on the user’s social circle will be displayed. Hence the user will be aware of the topics of interest of his/her social circle.

The major difference that the Social Web portal has with respect to link aggregators or feed aggregators mentioned above is that the Social Web Portal does not rely on either links submitted by the user’s social circle nor does it depend on the status updates of the user’s network.

The Web page rendered by the Social Web Portal will be based on a statistical ranking of the browsing history of the user’s social circle and also on the relative importance of the friends in the user’s social circle.

Detailed description

The Social Web Portal is based on the collective click stream activity of a user’s family, friends, professional circle and the world at large. This web portal will required to be signed on like any of the social network sites like FB, Google+ or Twitter. The web portal will have a window on the top right corner where the user can send invites, connection requests to his family members, friends and his professional colleagues. The click streams of all those who accept the user’s invite will be used to provide the web browsing experience for the user.

The user can also assign a degree of importance to each of his associations. So while a typical social network site like FB, Google+ or Twitter will provide the status updates of the connections of the user to the user’s updates and include the user’s updates in the connection’s updates, the Social Web Portal will keep track of the click streams of the all the users who have signed into the Social Web Portal. The browsing history of all the users who are registered in the Social Web Portal will be sent to a back-end database for subsequent processing and displaying in the appropriate social circle. Hence as the registered users travel from site to site their browsing history is captured and sent to the back-end database. The click stream history of all the registered users will be continuously updated to a back-end database. It will then render content to each of the individuals in the Social Web Portal
based on the network of that particular user.

The back-end database will be a repository of the browsing click streams of all the users who have signed up for the Social Web Portal. The browsing history of all registered users will be captured and sent to a back-end database, probably using cookies, on a regular basis. These cookies will be analyzed statistically by an application layer over the database which will then display content to a user based on the browsing history of the user’s social circle. Each association in the social circle will be ranked based on a degree of importance assigned by the user.

When a user opens the Social Web Portal the portal will query the back-end database based on the social network that the user has and the degree of importance that the user has for each of his/her connections.

The query will return the overall browsing preferences that are based on the user’s network i.e. the Social Web Portal will render the web page with the aggregate, collective web browsing tendencies of the user’s family, friends, colleagues and friends besides including the user’s own tastes and browsing preferences. So every user will be aware of the common trends and popular items in his/her social circle along with the trending topics in the world at large.

This can be represented in the diagram below

win

Fig 1. Dynamic Window in the Social Web Portal

The rectangle shown in the above window is something that can be tuned by each user for his/ her individual taste. The user can specify how much of the browsing tendencies of friends, family and colleagues he or she would like to include in the Social Web Portal. Based on the user’s taste the content that will be displayed on the user’s Social Web Portal will have appropriate content of the user’s family, friend, colleagues and World

The Social Web Portal for a user can be visualized to be represented as shown below

portal

Fig 2. Snapshot of what each user would see when he logs into the Social Web Portal

As can be seen this Web page will be customized to the user. It will display all the relevant news items and articles of interest for the user. Any user will also be interested to see what people in his/her particular domain are reading. For e.g. a person in finance would like to see specific topics in finance while also being interested in the other relevant news items that he may have missed but may have been read by his/her friends or colleagues.

In other words each user will get a snapshot of information. This information will be tailored to the user based on the individual’s personal preferences, the trending topics in among his family, friends, colleagues, acquaintances and the world at large. So every user will be fully abreast of the popular topics issues in the world without having to individually browse sites. The above figure shows how this snapshot would look for each user.

People also typically like to see if they are up to date with the world on topics. The Social Web Portal will ensure that popular articles automatically bubble up to each and every user.

A diagrammatic representation of the Social Web Portal in action can be represented as below

data

Fig 3. Browsing history maintained in a back-end database and displayed for each user.

In the above figure the click streams of the network of all the users of the Social Web Portal are collected in the distributed database. When a user logs into his Social Web Portal the query will return the overall browsing trends of the user’s family, friends, professional colleagues and the world. Those news items that are popular will be bubbled up to the user along with his or her own preferences. Hence the user will feel connected to his/her network and will have a novel browsing experience.

A diagrammatic representation of the Social Web Portal is shown below

swp1

Fig 4. A schematic of how the personalization happens in the Social Web Portal

In Fig 4 it can be seen that the bottom most layer contains the collective browsing history of all the registered users as they browse different web sites. This click stream will be updated at regular intervals. This browsing history is analyzed statistically to determine the most relevant and popular sites for each user’s social network and then ranked on the degree of importance of each individual in the social circle.

fc

Fig 5. Flow chart for the Social Web Portal

Hence the Social Web Portal will broadly perform the following activities

  • The collective browsing history of all registered users of the Social Web Portal will be sent for analysis to a back-end database
  • The Social Web Portal will render content based on the statistical analysis of the collective click stream activity of a user’s family, friends, professional circle and the world at large
  • The Social Web Portal will render content dynamically based on the statistical ranking of browsing history of user’s social circle
  • A user can configure the order of importance to each of the people in his/her social circle. The Social Web Portal the portal will query the back-end database based on the relative importance of each of the acquaintance of the user and also the statistical weight of “visited sites”
  • The Social Web Portal will render the web page with the “most visited sites” based on the aggregate, collective web browsing tendencies of the user’s family, friends, colleagues and friends besides including the user’s own tastes and browsing preferences.

Benefits

The Social Web Portal will usher in a completely new Web browsing experience. Adding the social paradigm to a user’s browsing experience can have multiple benefits. It will allow each user to know what new articles or items are popular among his or her network. A person can keep abreast of all the trends that are of interest to him/her. The Social Web Portal will be novel experience that will be completely tailored to each and every user.

Find me on Google+

Re-imagining the Web portal

Published in Telecom Asia, Mar 16, 2012 – Re-imagining the web portal

Web portals had their heyday in the early 1990’s. Remember Lycos, Alta-vista, Yahoo, and Excite – portals which had neatly partitioned the web into compartments for e.g. Autos, Beauty, Health, and Games etc. Enter Google. It had a webpage with a single search bar. With a single stroke Google pushed all the portals to virtual oblivion.

It became obvious to the user that all information was just a “search away”. There was no longer the need for neat categorization of all the information on the web. There was no need to work your way through links only to find your information at the “bottom of the heap”. The user was content to search their way to needed information.

That was then in the mid 1990s. But much has changed since then. Many pages have been uploaded into the trillion servers that make up the internet. There is so much more information in the worldwide web.  News articles, wikis, blogs, tweets, webinars, podcasts, photos, you tube content, social networks etc etc.

Here are some fun facts about the internet – It contains 8.11 billion pages (Worldwidewebsize), has more than 1.97 billion users, 266 million websites (State of the Internet). We can expect the size to keep growing as the rate of information generation and our thirst for information keeps increasing.

In this world of exploding information the “humble search” will no longer be sufficient. As a user we would like to browse the web in a much more efficient, effective and personalized way.   Neither will site aggregators like StumbleUpon, Digg, Reddit and the like will be useful.  We need to have a smart way to be able to navigate through this information deluge.

It is here I think that there is a great opportunity for re-imagining the Web Portal. As a user of the web it would be great if the user is shown a view of the web that is personalized to the tastes and interests that is centered on him. What I am proposing is a Web portal that metamorphoses dynamically based on the user’s click stream, the user’s browsing preferences, of his interests and inclinations as the focal center.  Besides the user’s own interests the web portal would also analyze the click streams of the user’s close friends, colleagues and associates. Finally the portal would also include inputs from what the world at large is interested in and following. The web portal would analyze the key user’s preferences and then create a web portal based on its analysis of what the user would like to see.

This can be represented in the diagram below

We have all heard of Google’s zeitgeist which is a massive database of the world’s inclinations and tendencies.  Such a similar database would probably be also held by Yahoo, Microsoft, FB, Twitter etc.

The Web portal in its new incarnation would present contents that are tailored specifically to each user’s browsing patterns. In a single page would be included all news, status updates, latest youtuble videos, tweets etc he would like to see.

In fact this whole functionality can be integrated into the Web browser. In its new avatar the Web Portal would have content that is dynamic, current and personalized to each individual user. Besides every user would also be able to view what his friends, colleagues and the world at large are browsing.

A few years down the line we may see “the return of the dynamic, re-invented Web Portal”.

Find me on Google+