To R is human …

“To R is human, to dabble in it fun” one could say. In this post I try to be a little of Nate Silver looking at Twiiterverse. Since the Indian general election 2014 is around the corner for constituting the 16th Lok Sabha in India I wanted to play around a little bit. Anyway here goes.

To get started on Twitter, with R we first need to establish a handshake between Twitter and R. We need to authenticate our R application with Twitter to enable us to mine the tweets in Twitterverse.. The steps are fairly straightforward. The R app you create has to authenticated and authorized with Twitter.

The first step is to create an app at Twitter at http://dev.twitter.com.. Login to your twitter account. Click the drop down at your photo and choose “My applications”. Then click “Create new application”. Now do the following
– Enter a unique name for your application
– Enter a description
– For the ‘Website’ enter any valid URL
– Leave the Callback URL blank
– Accept the conditions

bb
Leave this in your browser. The handshake between your R application and Twitter needs to be established as follows

#install the necessary packages
install.packages("ROAuth")
install.packages("twitteR")
install.packages("wordcloud")
install.packages("tm")

library("ROAuth")
library("twitteR")
library("wordcloud")
library("tm")
library(RCurl)

# Set SSL certs globally
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))

require(twitteR)
reqURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "https://api.twitter.com/oauth/authorize"

Now go to your browser. In the created Twitter application, choose the API Keys tab. Copy and paste the API key and API secret in the next 2 lines

apiKey <- "Your API key here"
apiSecret <- "Your API secret here"
twitCred twitCred$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))

When you enter this you should see the following
To enable the connection, please direct your web browser to:
https://api.twitter.com/oauth/authorize?oauth_token=WnTGL4eHsiNJRFRiW1UU3GoYSvVZiYDBbO3WAsZO

Copy and paste the link given in a new tab in your browser. Copy the 7 digit PIN and paste it in the space below
When complete, record the PIN given to you and provide it here: 7377963

registerTwitterOAuth(twitCred)

This should complete the authorization. Now you are good to go.

Here is a short example of performing Text Mining with the help of package “tm”.

I wanted to create a word cloud around the hashtag #NaMo

So here is the code. We need to create a Corpus

#Search Twitter for the hashtag #NaMo

#Search Twitter for the hashtag #NaMo
r_stats<- searchTwitter("#NaMo",n=500, cainfo="cacert.pem")


# Save text
r_stats_text <- sapply(r_stats, function(x) x$getText())
# Create a corpus
r_stats_text_corpus <- Corpus(VectorSource(r_stats_text))
# Clean up the text
r_stats_text_corpus <- tm_map(r_stats_text_corpus, tolower)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, removePunctuation)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, function(x)removeWords(x,stopwords()))

# Now create a word cloud
wordcloud(r_stats_text_corpus)

modi

This will create a Wordcloud of the words most used with the hashtag, in this case #NaMo

You can clone the code at Rwordcloud

Watch this space. Hasta la vista. I’ll be back!

Find me on Google+

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s