Hadoop for Beginner: Banking Domain Case Study in Hadoop and R

In this blog and the next few ones that will follow, we will analyze a banking domain dataset, which contains several files with details of its customers. This database was prepared by Petr Berka and Marta Sochorova.

The Berka dataset is a collection of financial information from a Czech bank. The dataset deals with over 5,300 bank clients with approximately 1,000,000 transactions. Additionally, the bank represented in the dataset has extended close to 700 loans and issued nearly 900 credit cards, all of which are represented in the data.

By the time you finish reading this blog, you would have learned :

How to analyze a bank’s data to predict a customer’s quality
Using this analysis we can categorize a customer into three categories:

Excellent: Customers whose record is good with the bank
Good: Customers who have average earning with a good record till now
Risky: Customers who are under debt of bank or who has not paid the loan on time

How to write PIG UDF
How to connect Hadoop with R
How to load data from Hadoop to R

How to analyze a bank’s data to predict the customer’s quality

Prerequisite

Software Technology

Java installed Hadoop concepts
Hadoop installed Java concepts
Pig installed Pig concepts
R-base
Rstudio
Ubuntu OS

View the detail case study here.

Hadoop for Beginner

HTML/JavaScript

Banking Domain Case Study in Hadoop and R

No comments:

Post a Comment

HTML/JavaScript

document.write(ssyby);

Banking Domain Case Study in Hadoop and R

No comments:

Post a Comment