What is Hadoop

Hadoop gets a lot of buzz these days in database and content management circles, but many people in the industry still don’t really know what it is and or how it can be best applied.

To conclude the discussion from Cloudera CEO we can infer the below clarification about Hadoop in a question answer scenario.

Where did Hadoop come from?

The underlying technology was invented by Google back in their earlier days so they could usefully index all the rich textural and structural information they were collecting, and then present meaningful and actionable results to users. There was nothing on the market that would let them do that, so they built their own platform. Google’s innovations were incorporated into Nutch, an open source project, and Hadoop was later spun-off from that. Yahoo has played a key role developing Hadoop for enterprise applications.


What problems can Hadoop solve?

The Hadoop platform was designed to solve problems where you have a lot of data — perhaps a mixture of complex and structured data — and it doesn’t fit nicely into tables. It’s for situations where you want to run analytics that are deep and computationally extensive, like clustering and targeting. That’s exactly what Google was doing when it was indexing the web and examining user behavior to improve performance algorithms.

Hadoop applies to a bunch of markets. In finance, if you want to do accurate portfolio evaluation and risk analysis, you can build sophisticated models that are hard to jam into a database engine. But Hadoop can handle it. In online retail, if you want to deliver better search answers to your customers so they’re more likely to buy the thing you show them, that sort of problem is well addressed by the platform Google built. Those are just a few examples.

How is Hadoop architected?

Hadoop is designed to run on a large number of machines that don’t share any memory or disks. That means you can buy a whole bunch of commodity servers, slap them in a rack, and run the Hadoop software on each one. When you want to load all of your organization’s data into Hadoop, what the software does is bust that data into pieces that it then spreads across your different servers. There’s no one place where you go to talk to all of your data; Hadoop keeps track of where the data resides. And because there are multiple copy stores, data stored on a server that goes offline or dies can be automatically replicated from a known good copy.

In a centralized database system, you’ve got one big disk connected to four or eight or 16 big processors. But that is as much horsepower as you can bring to bear. In a Hadoop cluster, every one of those servers has two or four or eight CPUs. You can run your indexing job by sending your code to each of the dozens of servers in your cluster, and each server operates on its own little piece of the data. Results are then delivered back to you in a unified whole. That’s MapReduce: you map the operation out to all of those servers and then you reduce the results back into a single result set.

Architecturally, the reason you’re able to deal with lots of data is because Hadoop spreads it out. And the reason you’re able to ask complicated computational questions is because you’ve got all of these processors, working in parallel, harnessed together.

At this point, do companies need to develop their own Hadoop applications?

It’s fair to say that a current Hadoop adopter must be more sophisticated than a relational database adopter. There are not that many “shrink wrapped” applications today that you can get right out of the box and run on your Hadoop processor. It’s similar to the early ’80s when Ingres and IBM were selling their database engines and people often had to write applications locally to operate on the data.

That said, you can develop applications in a lot of different languages that run on the Hadoop framework. The developer tools and interfaces are pretty simple. Some of our partners — Informatica is a good example — have ported their tools so that they’re able to talk to data stored in a Hadoop cluster using Hadoop APIs. There are specialist vendors that are up and coming, and there are also a couple of general process query tools: a version of SQL that lets you interact with data stored on a Hadoop cluster, and Pig, a language developed by Yahoo that allows for data flow and data transformation operations on a Hadoop cluster.

Hadoop’s deployment is a bit tricky at this stage, but the vendors are moving quickly to create applications that solve these problems. I expect to see more of the shrink-wrapped apps appearing over the next couple of years.

Where do you stand in the SQL vs NoSQL debate?

 “NoSQL.” was invented to create cachet around a bunch of different projects, each of which has different properties and behaves in different ways. The real question is, what problems are you solving? That’s what matters to users.

For more information please follow.
Android Certification, Big Data University, Course Era.

Installing HBASE on Hadoop

Installing HBase

To install HBase On Red Hat-compatible systems:
$ sudo yum install hbase
To install HBase on Ubuntu and Debian systems:
$ sudo apt-get install hbase
To install HBase on SLES systems:
$ sudo zypper install hbase


To list the installed files on Ubuntu and Debian systems:
$ dpkg -L hbase
To list the installed files on Red Hat and SLES systems:
$ rpm -ql hbase
You can see that the HBase package has been configured to conform to the Linux Filesystem Hierarchy Standard. (To learn more, run man hier).
You are now ready to enable the server daemons you want to use with Hadoop. You can also enable Java-based client access by adding the JAR files in /usr/lib/hbase/ and /usr/lib/hbase/lib/ to your Java class path.

Installing Pig in Hadoop

Installing Pig

To install Pig On Red Hat-compatible systems:
$ sudo yum install pig
To install Pig on SLES systems:
$ sudo zypper install pig
To install Pig on Ubuntu and other Debian systems:
$ sudo apt-get install pig
  Note:
Pig automatically uses the active Hadoop configuration (whether standalone, pseudo-distributed mode, or distributed). After installing the Pig package, you can start the grunt shell.
To start the Grunt Shell (MRv1):
$ export PIG_CONF_DIR=/usr/lib/pig/conf
$ export PIG_CLASSPATH=/usr/lib/hbase/hbase-0.94.2-cdh4.2.1-security.jar:
/usr/lib/zookeeper/zookeeper-3.4.5-cdh4.2.1.jar
$ pig 

grunt> 
To start the Grunt Shell (YARN):
  Important:
For each user who will be submitting MapReduce jobs using MapReduce v2 (YARN), or running Pig, Hive, or Sqoop in a YARN installation, set the HADOOP_MAPRED_HOME environment variable as follows:
$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
$ export PIG_CONF_DIR=/usr/lib/pig/conf
$ export PIG_CLASSPATH=/usr/lib/hbase/hbase-0.94.2-cdh4.2.1-security.jar:
/usr/lib/zookeeper/zookeeper-3.4.5-cdh4.2.1.jar
$ pig 
...
grunt>
To verify that the input and output directories from the example grep job exist list an HDFS directory from the Grunt Shell:
grunt> ls
hdfs://localhost/user/joe/input <dir>
hdfs://localhost/user/joe/output <dir>
To run a grep example job using Pig for grep inputs:
grunt> A = LOAD 'input';
grunt> B = FILTER A BY $0 MATCHES '.*dfs[a-z.]+.*';
grunt> DUMP B;
To check the status of your job while it is running, look at the JobTracker web console http://localhost:50030/.

Installing Splunk on Hadoop

Tar file install

To install Splunk Enterprise on a Linux system, expand the tar file into an appropriate directory using the tar command:
tar xvzf splunk_package_name.tgz
The default install directory is splunk in the current working directory. To install into /opt/splunk, use the following command:
tar xvzf splunk_package_name.tgz -C /opt
Note: When you install Splunk Enterprise with a tar file:
  • Some non-GNU versions of tar might not have the -C argument available. In this case, if you want to install in /opt/splunk, either cd to /opt or place the tar file in /opt before running the tar command. This method will work for any accessible directory on your machine's filesystem.
  • Splunk does not create the splunk user automatically. If you want Splunk to run as a specific user, you must create the user manually before installing.
  • Ensure that the disk partition has enough space to hold the uncompressed volume of the data you plan to keep indexed.

RedHat RPM install

Ensure that the desired splunk build rpm package is available locally on the target server. Verify that the file is readable and executable by the the Splunk user. If needed change access:
 
chmod 744 splunk_package_name.rpm
To install the Splunk RPM in the default directory /opt/splunk:
rpm -i splunk_package_name.rpm
To install Splunk in a different directory, use the --prefix flag:
rpm -i --prefix=/opt/new_directory splunk_package_name.rpm
Note: Installing with rpm in a non-default directory is not recommended, as RPM offers no safety net at time of upgrade, if --prefix does not agree then the upgrade will go awry.
To upgrade an existing Splunk Enterprise installation that resides in /opt/splunk using the RPM:
rpm -U splunk_package_name.rpm
Note: Upgrading rpms is upgrading the rpm package, not upgrading Splunk Enterprise. In other words, rpm upgrades can only be done when using the rpm in the past. There is no smooth transition from tar installs to rpm installs. This is not a Splunk issue, but a fundamental packaging issue.
To upgrade an existing Splunk Enterprise installation that was done in a different directory, use the --prefix flag:
rpm -U --prefix=/opt/existing_directory splunk_package_name.rpm
Note: If you do not specify with --prefix for your existing directory, rpm will install in the default location of /opt/splunk.
For example, to upgrade to the existing directory of $SPLUNK_HOME=/opt/apps/splunk enter the following:
rpm -U --prefix=/opt/apps splunk_package_name.rpm
To Replace an existing Splunk Enterprise installation
rpm -i --replacepkgs --prefix=/splunkdirectory/ splunk_package_name.rpm
If you want to automate your RPM install with kickstart, add the following to your kickstart file:
./splunk start --accept-license
./splunk enable boot-start 
Note: The second line is optional for the kickstart file.
Enable Splunk Enterprise to start the system at boot by adding it to /etc/init.d/ Run this command as root or sudo and specify the user that Splunk Enterprise should run as.
./splunk enable boot-start -user splunkuser

Debian DEB install

To install the Splunk DEB package:
dpkg -i splunk_package_name.deb
Note: You can only install the Splunk DEB package in the default location, /opt/splunk.


What gets installed

Splunk package status:
dpkg --status splunk
List all packages:
dpkg --list

Start Splunk

Splunk Enterprise can run as any user on the local system. If you run it as a non-root user, make sure that it has the appropriate permissions to read the inputs that you specify. Refer to the instructions for running Splunk Enterprise as a non-root user for more information.
To start Splunk Enterprise from the command line interface, run the following command from $SPLUNK_HOME/bin directory (where $SPLUNK_HOME is the directory into which you installed Splunk):
 ./splunk start
By convention, this document uses:
  • $SPLUNK_HOME to identify the path to your Splunk Enterprise installation.
  • $SPLUNK_HOME/bin/ to indicate the location of the command line interface.

Startup options

The first time you start Splunk Enterprise after a new installation, you must accept the license agreement. To start Splunk Enterprise and accept the license in one step:
 $SPLUNK_HOME/bin/splunk start --accept-license
Note: There are two dashes before the accept-license option.

Launch Splunk Web and log in

After you start Splunk Enterprise and accept the license agreement,
1. In a browser window, access Splunk Web at http://<hostname>:port.
  • hostname is the host machine.
  • port is the port you specified during the installation (the default port is 8000).
""Note:"" Navigate to HTTP the first time you access Splunk.
2. Splunk Web prompts you for login information (default, username admin and password changeme) before it launches. If you switch to Splunk Free, you will bypass this logon page in future sessions.

How do I switch to Splunk Free?

If you currently have Splunk Enterprise (trial or not), you can either wait for your Enterprise license to expire, or switch to a Free license at any time. To switch to a Free License:
1. Log in to Splunk Web as a user with admin privileges and navigate to Settings > Licensing.
2. Click Change license group at the top of the page.
ChangeLicenseGroup60.png
3. Select Free license and click Save.
4. You are prompted to restart.

Installing Hive

As we are using Clodera distribution hence installing hive is a very easy job.
Just type the below command :-

$ sudo yum install hadoop-hive

Now to use Hive type the below command to use from shell

$ hive
hive>

You cal also install hive with hbase additionally by typing the command [As of CDH3u5, Hive no longer has a dependency on HBase. Instead, if you want to use Hive with HBase (referred to as Hive-HBase integration), you need to install the hadoop-hive-hbase package, as follows.]

$ sudo yum install hadoop-hive-hbase
To run a Hive script that uses Hive-HBase integration:

hive --auxpath /usr/lib/hive/lib/hbase.jar,/usr/lib/hive/lib/hive-hbase-handler-0.7.1-cdh3u6.jar,
  /usr/lib/hive/lib/zookeeper.jar,/usr/lib/hive/lib/guava-r06.jar 
-hiveconf hbase.zookeeper.quorum=<zookeeper_quorum> -f <script>


Using Hive with HBase

To allow Hive scripts to use HBase, add the following statements to the top of each script. Replace the <component_version> strings with current version numbers for CDH, Guava and the Hive HBase handler. (You can find current version numbers for CDH dependencies such as Guava in CDH's root pom.xml file for the current release, for example cdh-root-4.4.0.pom.)
ADD JAR /usr/lib/hive/lib/zookeeper.jar
ADD JAR /usr/lib/hive/lib/hbase.jar
ADD JAR /usr/lib/hive/lib/hive-hbase-handler-<Hive-HBase-Handler_version>-cdh<CDH_version>.jar
ADD JAR /usr/lib/hive/lib/guava-<Guava_version>.jar
For example,
ADD JAR /usr/lib/hive/lib/zookeeper.jar
ADD JAR /usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar
ADD JAR /usr/lib/hive/lib/guava-11.0.2.jar
  Note:
Instead of adding these statements to each script, you can populate the hive.aux.jars.path property in hive-site.xml; for example:
<property>
  <name>hive.aux.jars.path </name>
  <value>file:///usr/lib/hive/lib/zookeeper.jar,file:///usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.5.0.jar,file:///usr/lib/hive/lib/guava-11.0.2.jar,file:///usr/lib/hive/lib/hbase.jar </value>
</property>
Configuring Hive :-

In order to make setup easy for new users, Hive's Metastore is configured to store metadata locally in an embedded Apache Derby database. Unfortunately, this configuration only allows a single user to access the Metastore at a time. We strongly encourages users to use a MySQL database instead. This section describes how to configure Hive to use a remote MySQL database, which allows Hive to support multiple users. See the Hive Metastore documentation for additional information.

Prerequisite :-

Step 1: Install and start MySQL if you have not already done so
To install MySQL on a Red Hat system:
$ sudo yum install mysql-server
Step 2: Configure the MySQL Service and Connector
Before you can run the Hive metastore with a remote MySQL database, you must configure a connector to the remote MySQL database, set up the initial database schema, and configure the MySQL user account for the Hive user.
To install the MySQL connector on a Red Hat 6 system:
Install mysql-connector-java and symbolically link the file into the /usr/lib/hive/lib/ directory.
$ sudo yum install mysql-connector-java
$ ln -s /usr/share/java/mysql-connector-java.jar /usr/lib/hive/lib/mysql-connector-java.jar
To set the MySQL root password:
$ sudo /usr/bin/mysql_secure_installation
[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] y
New password:
Re-enter new password:
Remove anonymous users? [Y/n] Y
[...]
Disallow root login remotely? [Y/n] N
[...]
Remove test database and access to it [Y/n] Y
[...]
Reload privilege tables now? [Y/n] Y
All done!
To make sure the MySQL server starts at boot:

On Red Hat systems:
$ sudo /sbin/chkconfig mysqld on
$ sudo /sbin/chkconfig --list mysqld
mysqld          0:off   1:off   2:on    3:on    4:on    5:on    6:off
Step 3. Create the Database and User
The instructions in this section assume you are using Remote mode, and that the MySQL database is installed on a separate host from the metastore service, which is running on a host named metastorehost in the example.
  Note:
If the metastore service will run on the host where the database is installed, replace 'metastorehost' in the CREATE USER example with 'localhost'. Similarly, the value ofjavax.jdo.option.ConnectionURL in /etc/hive/conf/hive-site.xml (discussed in the next step) must be jdbc:mysql://localhost/metastore. For more information on adding MySQL users, seehttp://dev.mysql.com/doc/refman/5.5/en/adding-users.html.
Create the initial database schema using the hive-schema-0.10.0.mysql.sql file located in the /usr/lib/hive/scripts/metastore/upgrade/mysql directory.
Example
$ mysql -u root -p
Enter password:
mysql> CREATE DATABASE metastore;
mysql> USE metastore;
mysql> SOURCE /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sql;
You also need a MySQL user account for Hive to use to access the metastore. It is very important to prevent this user account from creating or altering tables in the metastore database schema.
  Important:
If you fail to restrict the ability of the metastore MySQL user account to create and alter tables, it is possible that users will inadvertently corrupt the metastore schema when they use older or newer versions of Hive.
Example
mysql> CREATE USER 'hive'@'metastorehost' IDENTIFIED BY 'mypassword';
...
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'metastorehost';
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON metastore.* TO 'hive'@'metastorehost';
mysql> FLUSH PRIVILEGES;
mysql> quit;
Step 4: Configure the Metastore Service to Communicate with the MySQL Database
This step shows the configuration properties you need to set in hive-site.xml to configure the metastore service to communicate with the MySQL database, and provides sample settings. Though you can use the same hive-site.xml on all hosts (client, metastore, HiveServer), hive.metastore.uris is the only property that must be configured on all of them; the others are used only on the metastore host.
Given a MySQL database running on myhost and the user account hive with the password mypassword, set the configuration as follows (overwriting any existing values).
  Note:
The hive.metastore.local property is no longer supported as of Hive 0.10; setting hive.metastore.uris is sufficient to indicate that you are using a remote metastore.
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://myhost/metastore</value>
  <description>the URL of the MySQL database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>mypassword</value>
</property>

<property>
  <name>datanucleus.autoCreateSchema</name>
  <value>false</value>
</property>

<property>
  <name>datanucleus.fixedDatastore</name>
  <value>true</value>
</property>

<property>
  <name>datanucleus.autoStartMechanism</name> 
  <value>SchemaTable</value>
</property> 

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://<n.n.n.n>:9083</value>
  <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>

Step 5: Test connectivity to the metastore:
$ hive –e “show tables;”
  Note:
This will take a while the first time.

Configuring HiveServer2

You must make the following configuration changes before using HiveServer2. Failure to do so may result in unpredictable behavior.

Table Lock Manager (Required)

You must properly configure and enable Hive's Table Lock Manager. This requires installing ZooKeeper and setting up a ZooKeeper ensemble.
  Important:
Failure to do this will prevent HiveServer2 from handling concurrent query requests and may result in data corruption.
Enable the lock manager by setting properties in /etc/hive/conf/hive-site.xml as follows (substitute your actual ZooKeeper node names for those in the example):
<property>
  <name>hive.support.concurrency</name>
  <description>Enable Hive's Table Lock Manager Service</description>
  <value>true</value>
</property>

<property>
  <name>hive.zookeeper.quorum</name>
  <description>Zookeeper quorum used by Hive's Table Lock Manager</description>
  <value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
  Important:
Enabling the Table Lock Manager without specifying a list of valid Zookeeper quorum nodes will result in unpredictable behavior. Make sure that both properties are properly configured.

hive.zookeeper.client.port

If ZooKeeper is not using the default value for ClientPort, you need to set hive.zookeeper.client.port in /etc/hive/conf/hive-site.xml to the same value that ZooKeeper is using. Check/etc/zookeeper/conf/zoo.cfg to find the value for ClientPort. If ClientPort is set to any value other than 2181 (the default), set hive.zookeeper.client.port to the same value. For example, if ClientPort is set to 2222, set hive.zookeeper.client.port to 2222 as well:
<property>
  <name>hive.zookeeper.client.port</name>
  <value>2222</value>
  <description>
  The port at which the clients will connect.
  </description>
</property>

JDBC driver

The connection URL format and the driver class are different for HiveServer2 and HiveServer1:
HiveServer versionConnection URLDriver Class
HiveServer2jdbc:hive2://<host>:<port>org.apache.hive.jdbc.HiveDriver
HiveServer1jdbc:hive://<host>:<port>

org.apache.hadoop.hive.jdbc.HiveDriver

Authentication

HiveServer2 can be configured to authenticate all connections; by default, it allows any client to connect. HiveServer2 supports either Kerberos or LDAP authentication; configure this in thehive.server2.authentication property in the hive-site.xml file. You can also configure Pluggable Authentication, which allows you to use a custom authentication provider for HiveServer2; and HiveServer2 Impersonation, which allows users to execute queries and access HDFS files as the connected user rather than the super user who started the HiveServer2 daemon. 

Configuring HiveServer2 for YARN

To use HiveServer2 with YARN, you must set the HADOOP_MAPRED_HOME environment variable: add the following line to /etc/default/hive-server2:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Running HiveServer2 and HiveServer Concurrently

Cloudera recommends running HiveServer2 instead of the original HiveServer (HiveServer1) package in most cases; HiveServer1 is included for backward compatibility. Both HiveServer2 and HiveServer1 can be run concurrently on the same system, sharing the same data sets. This allows you to run HiveServer1 to support, for example, Perl or Python scripts that use the native HiveServer1 Thrift bindings.
Both HiveServer2 and HiveServer1 bind to port 10000 by default, so at least one of them must be configured to use a different port. You can set the port for HiveServer2 in hive-site.xml by means of thehive.server2.thrift.port property. For example:
<property>
  <name>hive.server2.thrift.port</name>
  <value>10001</value>
  <description>TCP port number to listen on, default 10000</description>
</property>
You can also specify the port (and the host IP address in the case of HiveServer2) by setting these environment variables:
HiveServer version Port Host Address
HiveServer2 HIVE_SERVER2_THRIFT_PORT HIVE_SERVER2_THRIFT_BIND_HOST
HiveServer1 HIVE_PORT <Host bindings cannot be specified>

Using Custom UDFs with HiveServer2

To use custom User-Defined Functions (UDFs) with HiveServer2, do the following:
  1. Copy the UDF JAR files to the machine(s) hosting the HiveServer2 server(s).
    Save the JARs to any directory you choose, and make a note of the path.
  2. Make the JARs available to the current instance of HiveServer2 by setting HIVE_AUX_JARS_PATH to the JARs' full pathname (the one you noted in Step 1) in hive-config.sh
      Note:
    The path can be the directory, or each JAR's full pathname in a comma-separated list.
    If you are using Cloudera Manager, use the HiveServer2 Service Environment Safety Valve to set HIVE_AUX_JARS_PATH.
  3. Add each JAR file's full pathname to the hive.aux.jars.path config property in hive-site.xml and re-start HiveServer2.
    This is to allow JARs to be passed to MapReduce jobs started by Hive.

Starting the Metastore

  Important:
If you are running the metastore in Remote mode, you must start the metastore before starting HiveServer2.
You can run the metastore from the command line:
$ hive --service metastore
Use Ctrl-c to stop the metastore process running from the command line.
To run the metastore as a daemon, the command is:
$ sudo service hive-metastore start

Installing R in Hadoop Cluster

There are multiple ways to install R & R studio in your unix system here I have described the simplest of them all by which I have been able to install R successfully in my system.

Installing R :-
Step1:- Download the latest R package from Cran.

Step2:- Choose a directory to install the R tree (R is not just a binary, but has additional data sets, help files, font metrics etc). Let us call this place R_HOME. Untar the source code. This should create directories src, doc, and several more under a top-level directory: change to that top-level directory.
Issue the following commands:
./configure
make

Step3:- Then check the built system works correctly by
make check
or
make check-devel
or
make check-all
 
Step4:- Then type
make install
This will install to the following directories:
prefix/bin or bindir
the front-end shell script and other scripts and executables
prefix/man/man1 or mandir/man1
the man page
prefix/LIBnn/R or libdir/R
all the rest (libraries, on-line help system, …). Here LIBnn is usually ‘lib’, but may be ‘lib64’ on some 64-bit Linux systems. This is known as the R home directory.
where prefix is determined during configuration (typically /usr/local) and can be set by running configure with the option --prefix, as in
./configure --prefix=/where/you/want/R/to/go
This causes make install to install the R script to /where/you/want/R/to/go/bin, and so on. The prefix of the installation directories can be seen in the status message that is displayed at the end of configure. The installation may need to be done by the owner of prefix, often a root account.
You can install into another directory tree by using
make prefix=/path/to/here install
 
Well that's it. This will complete the installation of R and now you can type R on your command prompt to get into R.

Installing R Studio[Cent OS] :-

32bit

$ sudo yum install openssl098e # Required only for RedHat/CentOS 6 and 7
$ wget http://download2.rstudio.org/rstudio-server-0.98.1091-i686.rpm
$ sudo yum install --nogpgcheck rstudio-server-0.98.1091-i686.rpm
64bit

$ sudo yum install openssl098e # Required only for RedHat/CentOS 6 and 7
$ wget http://download2.rstudio.org/rstudio-server-0.98.1091-x86_64.rpm
$ sudo yum install --nogpgcheck rstudio-server-0.98.1091-x86_64.rpm


Installing R Hadoop Packages :-

To install required packages for R Hadoop please follow the link here.