DATASCIENCE:
Datascience is not based on one tool, it is multiple tools combination
DATASCIENCE = BIG DATA + HADOOP + SPARK + STREAMING + MACHINE
LEARNING + BUSINESS INTELLIGENCE +CLOUD (AWS) + PYTHON + R
+NOSQL
• MACHINE LEARNING With PYTHON or R
•
• TWO IMPORTANT ALGORITHMS for making Predictions
• BALANCING PERFORMANCE, Complexity
• Penalized linear Regression
• Predictive models using Penalized linear regression
• Ensemble methods
MACHINE LEARNING FOR BIG DATA:
Industry’s applying 2 types of algorithms
1.Supervised learning
2.Unsupervised learning
We can use machine learning for
• Retail
• E-commerce
• Advertising
• Stock Trading
• Healthcare
• IOT
Very common languages for Machine learning
• R
• Python
• Scala
• Ruby
Industrys using machine learning for
• Batch process
• Online data process
• Streaming process
INDUSTRY MACHINE LEARNING PROJECT
• Building Statistical Predictive models and interpreting results and business problem solving.
• A technically qualified Data Science & Business Analytics
professional with strong foundation in R Language, Machine Learning,
Spark, Python and Data Mining techniques.
• performing on-demand, exploratory and targeted data analyses
to obtain insights from data and turn insight into action by creating
predictive analytic business solution.
• Solid expertise in data-driven descriptive analysis and
predictive models to solve problems in highly competitive fast paced
environment.
Analytical
Data mining |ETL |decisions| Time series | Predictive analytics| Text Analytics
Discrete distributions& Exploratory Analysis |Experimental Analysis
Mathematical
linear programming
Intermediate probability| Random variables
Statistical
Supervised: Linear Regression| Logistic Regression| Support Vector Machine, Decision Trees| Neural network| NLP
Unsupervised: K-means clustering, Hierarchical clustring
• involve into Development of candidate Sourcing component in Analytics.
• Using web crawling technique to extract the useful information from news articles
• Flexibly configured software environment setup on Digital Ocean cloud platform
• Used MongoDb database to store semi structured data in binary file format
• Used NLP(Natural Language Processing ) to identify the entities , relation ,POS in News articles
• Experimented advanced clustering and classification algorithms
• Flexible in data preprocessing and data transformation
• Flexible in using R packages for data analysis and data visualization
• Created visualization graphs in R with RNeo4j database connectivity
using K Means Algorithm, by grouping the customers based on their
purchasing behavior and targeting those customers by encouraging to buy
more and more via offers, emails, SMS, Coupons. Similarly we can form K
number of groups and analyze what action can be taken on each group.
• Using Market Basket Analysis Algorithm to estimate which of the products are sold together
• Using Random Forest Algorithm to predict how many customers use the offer given and purchase the product.
• Used Cross validation techniques (model evaluation technique) to check how accurately the model predicts.
BUSINESS INTELLIGENCE: TABLEAU ,QLIKVIEW
SQL TO TABLEAU
SQL TO QLIKVIEW
HIVE TO TABLEAU
HBASE TO TABLEAU
MONGODB TO TABLEAU