Data Science Training

DATASCIENCE:

Datascience is not based on one tool, it is multiple tools combination

DATASCIENCE = BIG DATA + HADOOP + SPARK + STREAMING + MACHINE LEARNING + BUSINESS INTELLIGENCE +CLOUD (AWS) + PYTHON + R +NOSQL

• MACHINE LEARNING With PYTHON or R

• TWO IMPORTANT ALGORITHMS for making Predictions
• BALANCING PERFORMANCE, Complexity
• Penalized linear Regression
• Predictive models using Penalized linear regression
• Ensemble methods



MACHINE LEARNING FOR BIG DATA:

Industry’s applying 2 types of algorithms

1.Supervised learning
2.Unsupervised learning

We can use machine learning for
• Retail
• E-commerce
• Advertising
• Stock Trading
• Healthcare
• IOT

Very common languages for Machine learning
• R
• Python
• Scala
• Ruby

Industrys using machine learning for
• Batch process
• Online data process
• Streaming process




INDUSTRY MACHINE LEARNING PROJECT
• Building Statistical Predictive models and interpreting results and business problem solving.

• A technically qualified Data Science & Business Analytics professional with strong foundation in R Language, Machine Learning, Spark, Python and Data Mining techniques.

• performing on-demand, exploratory and targeted data analyses to obtain insights from data and turn insight into action by creating predictive analytic business solution.

• Solid expertise in data-driven descriptive analysis and predictive models to solve problems in highly competitive fast paced environment.


Analytical
Data mining |ETL |decisions| Time series | Predictive analytics| Text Analytics





Discrete distributions& Exploratory Analysis |Experimental Analysis


Mathematical
linear programming

Intermediate probability| Random variables


Statistical
Supervised: Linear Regression| Logistic Regression| Support Vector Machine, Decision Trees| Neural network| NLP


Unsupervised: K-means clustering, Hierarchical clustring


• involve into Development of candidate Sourcing component in Analytics.
• Using web crawling technique to extract the useful information from news articles
• Flexibly configured software environment setup on Digital Ocean cloud platform
• Used MongoDb database to store semi structured data in binary file format
• Used NLP(Natural Language Processing ) to identify the entities , relation ,POS in News articles
• Experimented advanced clustering and classification algorithms
• Flexible in data preprocessing and data transformation
• Flexible in using R packages for data analysis and data visualization
• Created visualization graphs in R with RNeo4j database connectivity

using K Means Algorithm, by grouping the customers based on their purchasing behavior and targeting those customers by encouraging to buy more and more via offers, emails, SMS, Coupons. Similarly we can form K number of groups and analyze what action can be taken on each group.
• Using Market Basket Analysis Algorithm to estimate which of the products are sold together
• Using Random Forest Algorithm to predict how many customers use the offer given and purchase the product.
• Used Cross validation techniques (model evaluation technique) to check how accurately the model predicts.

BUSINESS INTELLIGENCE: TABLEAU ,QLIKVIEW
SQL TO TABLEAU
SQL TO QLIKVIEW
HIVE TO TABLEAU
HBASE TO TABLEAU
MONGODB TO TABLEAU