Share this post

R is not only the 18th letter of the English alphabet series but a free and open source programming language used for statistical computing and graphics, available for Windows, Mac, and Linux. It is widely used by statisticians and analysts for data analysis and visualization. Installing R is an easy process. All you need is to navigate the browser to www.r-project.org and click on the CRAN link in the Download section. The key objects in R are vectors, lists, arrays, matrices and data frames wherein a vector, is used to store the same type of data, list is a broader form of vector and can store any type of data, an array is a multi-dimensional vector that can store data is rows and columns and matrices and data frames are used to hold tabular data in the form of an excel sheet in rows and columns.
The focus of the discussion in this post will be how to use R programming for analyzing Big Data. Big Data has many applications in the real world and social networking sites are ideal sources of Big Data. One such popular source of big data is Twitter. Twitter is an important social networking tool for data mining and gaining subsequent knowledge. Surprisingly, we can use R for extracting and visualizing Twitter data. It requires certain to do steps in order to access Twitter data using R.
STARTING WITH R
·         In order to explore Twitter data, the initial step is you need to have an installed Twitter application and a Twitter account along with pre-installed R program.
·         Using that account, register an application for your Twitter account from https://apps.twitter.com/site. Complete the formalities filling basic information and create your Twitter application.
·         A customer key, customer secret, access token and access token secret forms the final authentication function using setup_twitter_oauth() function.
·         Scroll and click on the “create my access token” button and your Twitter application is created.
·         The above initial steps were for creating twitter application and once it is done, the next step is to connect the R program with Twitter-related functions. below is the R script to start the twitter data analysis task.
>install.packages(twitteR)
>install.packages(ROAuth)
>library(twitteR)
>library(ROAuth)
·         Download the curl certificate in order to test this on the MS Windows platform. This can be done as:
>download.file (url=http://curl.haxx.se/ca/cacert.pem, destfile=cacert.pem)
·         Before the final connectivity, you need to save all the necessary values to suitable variables:
>requestURL=’https://api.twitter.com/oauth/request_token’,
>accessURL=’https://api.twitter.com/oauth/access_token’,
>authURL=’https://api.twitter.com/oauth/authorize’)
>cred<- OAuthFactory$new( consumerKey, ConsumerSecret, requestURL, accessURL, authURL )
>cred$handshake(cainfo=”cacert.pem”)
·         Authentication to Twitter application is done by the function setup_twitter_oauth() with stored key values as:
>setup_twitter_oauth(consumerKey, consumerSecret, AccessToken, AccessTokenSecret)
Once all the above instructions are done carefully, we are ready to access and explore the twitter data. The above brief description explains how to analyze and explore Big Data using R with Twitter as an application for big data.

Leave a Comments