Banking Domain Case Study in Hadoop and R

In this blog and the next few ones that will follow, we will analyze a banking domain dataset, which contains several files with details of its customers. This database was prepared by Petr Berka and Marta Sochorova.
The Berka dataset is a collection of financial information from a Czech bank. The dataset deals with over 5,300 bank clients with approximately 1,000,000 transactions. Additionally, the bank represented in the dataset has extended close to 700 loans and issued nearly 900 credit cards, all of which are represented in the data.
By the time you finish reading this blog,  you would have learned :
  • How to analyze a bank’s data to predict a customer’s quality
  • Using this analysis we can categorize a customer into three categories:
  1. Excellent: Customers whose record is good with the bank
  2. Good: Customers who have average earning with a good record till now
  3. Risky: Customers who are under debt of bank or who has not paid the loan on time
  • How to write PIG UDF
  • How to connect Hadoop with R
  • How to load data from Hadoop to R
How to analyze a bank’s data to predict the customer’s quality
Prerequisite
Software Technology
  • Java installed Hadoop concepts
  • Hadoop installed Java concepts
  • Pig installed Pig concepts
  • R-base
  • Rstudio
  • Ubuntu OS


View the detail case study here.

No comments:

Post a Comment