Online Trainings Course:

INTRODUCTION TO DATA SCIENCE:

  • What is Data Science?
  • Who is Data Scientist and who can become a Data Scientist?
  • Real time process of Data Science
  • Data Science Applications
  • Technologies used in Data Science
  • Prerequisites knowledge to learn Data Science

INTRODUCTION TO MACHINE LEARINING:

  • What is Machine Learning?
  • How Machine will learn like Human Learning?
  • Traditional Programming vs. machine learning
  • Machine Learning engineer responsibilities
  • Types of learning
    • Supervised learning
    • Un-supervised learning
  • Machine learning algorithms: KNN, Naïve-bayes, Decision trees, Classification rules, Regression (Linear Regression, Logistic Regression), K-means clustering, Association rules, Support Vector Machine, Random Forest.

PYTHON PROGRAMMING:

  • What is Python? History of Python
  • Python Features, Applications of Python
  • Downloading and Installing Python
  • Python IDE: Jupyter Notebook & Spyder
  • What is Anaconda Navigator?
  • Downloading and Installing Anaconda, Jupyter Notebook & Spyder
  • Python Programming vs. Existing Programming
  • Interactive Mode Programming & Script Mode Programming
  • Python Identifiers, Reserved Words
  • Lines and Indentations, Quotations, Comments
  • Assigning values to variables
  • Operators - Arithmetic Operators, Comparison (Relational) Operators, Assignment Operators, Logical Operators, Bitwise Operators, Membership Operators, Identity Operators
  • Flavors in Python, Python Versions
  • Data Types: int, float, complex, bool, str
  • List, Tuple, Range, Bytes & Bytearray
  • Set, Frozenset, Dict, None
  • Inbuilt Functions in Python, Slice operator - Indexing
  • Mutable vs. Immutable, Modules and Packages
  • Database Connection - PyMySQL, Defining & Manipulating

NumPy with Python:

  • NumPy Environment setup in Python, Features of NumPy
  • Array Creation, Indexing & Slicing, Array Manipulation
  • Mathematical Functions, Statistical Functions

Pandas with Python:

  • Pandas Environment setup in Python
  • Features of Pandas, Data Structures
  • Series - Create Series, Accessing Data from Series with Position
  • DataFrame - Features of DataFrame, Create DataFrame, DataFrame from List, Dict, Row & Column Selecting, Adding & Deleting
  • Panel - Create and select data from Panel
  • Indexing & Selecting Data, Statistical Functions
  • Merging / Joining, Categorical Data

R PROGRAMMING:

  • R Programming Introduction
  • R Programming vs. Existing Programming
  • Downloading and Installing R, What is CRAN?
  • R Programming IDE: RStudio, Downloading and Installing RStudio
  • Variable Assignment - Displaying & Deleting Variables
  • Comments – Single Line and Multi Line Comments
  • Data Types – Logical, Integer, Double, Complex, Character
  • Operators - Arithmetic Operators, Relational Operators, Logical Operators, Assignment Operators, R as Calculator, Performing different Calculations
  • Functions – Inbuilt Functions and User Defined Functions
  • STRUCTURES – Vector, List, Matrix, Data frame, Array, Factors
  • Inbuilt Constants & Functions

Setting Environment:

  • Search Packages in R Environment
  • Search Packages in Machine with inbuilt function and manual searching
  • Attach Packages to R Environment
  • Install Add-on Packages from CRAN
  • Detach Packages from R Environment
  • Functions and Packages Help

Vectors:

  • Vector Creation, Single Element Vector, Multiple Element Vector
  • Vector Manipulation, Sub setting & Accessing the Data in Vectors

Lists:

  • Creating a List, Naming List Elements, Accessing List Elements
  • Manipulating List Elements, Merging Lists, Converting List to Vector

Matrix:

  • Creating a Matrix, Accessing Elements of a Matrix
  • Matrix Manipulations, Dimensions of Matrix, Transpose of Matrix

Data Frames:

  • Create Data Frame, Vector to Data Frame
  • Why Characters are Converting into Factors? – stringsAsFactors
  • Convert the columns of a data frame to characters
  • Extract Data from Data Frame
  • Expand Data Frame, Column Bind and Row Bind
  • Merging / Joining Data Frames – Inner Join, Outer Join & Cross Join

Arrays:

  • Create Array with Multiple Dimensions, Naming Columns and Rows
  • Accessing Array Elements, Manipulating Array Elements
  • Calculations across Array Elements

Factors:

  • Factors in Data Frame, Changing the Order of Levels
  • Generating Factor Levels, Deleting Factor Levels

Loading and Reading Data:

  • DATA EXTRACTION FROM CSV
    • Getting and Setting the Working Directory
    • Input as CSV File, Reading a CSV File
    • Analyzing the CSV File, Writing into a CSV File
  • DATA EXTRACTION FROM URL
  • DATA EXTRACTION FROM CLIPBOARD
  • DATA EXTRACTION FROM EXCEL
    • Install “xlsx” Package
    • Verify and Load the "xlsx" Package, Input as “xlsx” File
    • Reading the Excel File, Writing the Excel File
  • DATA EXTRACTION FROM DATABASES
    • RMySQL Package, Connecting to MySql
    • Querying the Tables, Query with Filter Clause
    • Updating Rows in the Tables, Inserting Data into the Tables
    • Creating Tables in MySql, Dropping Tables in MySql
    • Using dplyr and tidyr package

STATISTICS:

  • Mean, Median and Mode
  • Data Variability: Range, Quartiles, IQR, Calculating Percentiles
  • Variance, Standard Deviation, Statistical Summaries
  • Types of Distributions – Normal, Binomial, Poisson
  • Probability Distributions, Skewness, Outliers
  • Data Distribution, 68–95–99.7 rule (Empirical rule)
  • Descriptive Statistics and Inferential Statistics
  • Statistics Terms and Definitions, Types of Data
  • Data Measurement Scales, Normalization
  • Measure of Distance, Euclidean Distance
  • Probability Calculation – Independent & Dependent
  • Hypothesis Testing, Analysis of Variance

DATA VISUALIZATION:

  • Data Visualization with MatPlotLib and Seaborn
  • Data Visualization with Graphics and GrDevices
  • High Level Plotting and Low Level Plotting
  • Pie Charts - Title, Colors, Slice Percentages, Chart Legend
  • 3D Pie Charts
  • Box Plots - Outliers, Ranges, IQR, Quantiles, Median, Data Distribution Analysis, 68–95–99.7 rule (Empirical rule)
  • Bar Charts - Label, Title, Colors, Group Bar, Stacked Bar Charts
  • Histograms - Range of X and Y Values
  • Line Graphs - Types: Points, Lines, Both, Overplotted, Steps
  • Scatterplots
  • Combining Plots - Par and Layout

LAZY LEARNING – CLASSIFICATION USING NEAREST NEIGHBORS:

Understanding Classification Using Nearest Neighbors

  • The KNN algorithm
  • Calculating distance
  • Choosing an appropriate k
  • Preparing data for use with KNN
  • Why is the KNN algorithm lazy?

Diagnosing breast cancer with the KNN algorithm

  • Collecting data
  • Exploring and preparing the data
    • Transformation-normalizing numeric the data
    • Data preparing –creating training and test datasets
  • Training a model on the data
  • Evaluating model performance
  • Improving model performance
    • Transformation –z-score standardization
    • Testing alternative values of k

PROBABILISTIC LEARNING – CLASSIFICATION USING NAÏVE BAYES:

Understanding Naïve-Bayes

  • Basic concepts of Bayesian methods
  • Probability
  • Joint probability
  • Conditional probability with Bayes’ theorem

The Naïve Bayes Algorithm

  • The Naïve Bayes classification
  • The Laplace estimator
  • Using numeric features with Naïve Bayes

Filtering Mobile Phone Spam with the Naïve-Bayes Algorithm

  • Collecting data
  • Exploring and preparing the data
  • Data preparation –processing text data for analysis
    • Data preparation –creating training and test datasets
    • Visualizing text data-word clouds
    • Data preparation-creating indicator features for frequent words
  • Training a model on the data
  • Evaluating model performance
  • Improving model performance

DIVIDE AND CONQUER – CLASSIFICATION USING DECISION TREES AND RULES:

Understanding decision trees

  • Divide conquer
  • The C5.0 decision tree algorithm
    • Choosing the best split
    • Pruning the decision tree

Identifying risky bank loans using C5.0 decision trees

  • Collect data
  • Exploring and preparing the data
    • Data preparation-creating random training and test datasets
  • Training a model on the data
  • Evaluating model performance
  • Improving model performance
    • Boosting the accuracy of decision trees
    • Making some mistakes more costly than others

Understanding classification rules

  • Separate and conquer
  • The one rule algorithm
  • The RIPPER algorithm
  • Rules from decision trees

Identifying poisonous mushrooms with rule learners

  • Collecting data
  • Exploring and preparing data
  • Training a model on the data
  • Evaluating model performance
  • Improving model performance

FORECASTING NUMARIC DATA – REGRESSION METHODS:

Understanding regression

  • Simple linear regression
  • Ordinary least squares estimation
  • Correlations
  • Multiple linear regressions

Predicting medical expenses using linear regression

  • Collecting data
  • Exploring and preparing data
    • Exploring relationships among features- the correlation matrix
    • Visualizing relationships among features –the scatter plot matrix
  • Training a model on the data
  • Evaluating model performance
  • Improving model performance
    • Model specification –adding non-linear relationships
    • Transformation –converting a numeric variable to a binary indicator
    • Model specification –adding interaction effects
    • Putting it all together-an improved regression model

Understanding regression trees and model trees

  • Adding regression to trees

Estimating the quality of wines with regression trees and model trees

  • Collecting data
  • Exploring and preparing the data
  • Training a model on the data
    • Visualizing decision trees
  • Evaluating model performance
    • Measuring performance with mean absolute error
  • Improving model performance

FINDING PATTERNS - MARKET BASKET ANALYSIS USING ASSOCIATION RULES:

Understanding Association Rules

  • The Apriori algorithm for association rule learning
    • Measuring rule interest –support and confidence
    • Building a set of rules with the Apriori

Identifying frequently purchased groceries with association rules

  • Collecting data
  • Exploring and preparing the data
    • Data preparation – creating a sparse matrix for transaction data
    • Visualizing item support –item frequency plots
    • Visualizing transaction data-plotting the sparse matrix
  • Training a model on the data
  • Evaluating model performance
  • Improving model performance
    • Sorting the set of association rules
    • Taking subsets of association rules
    • Saving association rules to a file or data frame

FINDING GROUPS OF DATA - CLUSTERING WITH K-MEANS:

Understanding Clustering

  • Clustering as a machine learning task
  • The K-means algorithm for clustering
    • Using distance to assign and update cluster
    • Choosing the appropriate number of cluster

Finding teen market segments using K-means clustering

  • Collecting data
  • Exploring and preparing the data
    • Data preparation –dummy coding missing values
    • Data preparing –imputing missing values
  • Training a model on the data
  • Evaluating model performance
  • Improving model performance

EVALUATING MODEL PERFORMANCE:

Measuring Performance for Classification

  • Working with classification prediction data in R
  • A closer look at confusion matrices
  • Using confusion matrices to measure performance
  • Beyond accuracy – other measure of performance
    • The kappa statistic
    • Sensitivity and specificity
    • Precision and recall
    • The F- measure
  • Visualizing performance TRADEOFFS
    • ROC curves

Estimating future performance

  • The holdout method
  • Cross-validation
  • Bootstrap sampling

IMPROVING MODEL PERFORMANCE:

Tuning Stock Models for Better Performance

  • Using caret for automated parameter tuning
    • Creating a simple tuned model
    • Customizing the tuning process

Improving Model Performance with Meta – Learning

  • Understanding ensembles
  • Bagging
  • Boosting
  • Random forests
    • Training random forests
    • Evaluating random forest performance

DEEP LEARNING:

  • Installation of Theano, TensorFlow, Keras, OpenCV
  • Relating Deep Learning and Traditional Machine Learning
  • Basics of Neural Networks
  • Artificial Neural Networks
  • Deep Neural Networks
  • Convolutional Neural Networks
  • Recurrent Neural Networks
  • Deep learning with Theano
  • Deep Learning with TensorFlow
  • Deep Learning with Keras
  • Deep Learning with OpenCV
  • Implementation of Deep learning

ARTIFICIAL INTELLIGENCE:

  • AI Introduction
  • AI Intelligent Systems
  • AI Popular Search Algorithms
  • AI Fuzzy Logic Systems
  • AI Natural Language Processing
  • AI Robotics
  • AI Neural Networks

INTRODUCTION TO WEKA

EXPLORE WEKA MACHINE LEARNING TOOLKIT

  • Installation of WEKA
  • Features of WEKA Toolkit
  • Explore & Load data sets in Weka

PERFORM DATA PREPROCESSING TASKS

  • Apply Filters on data sets

PERFORMING CLASSIFICATION ON DATA SETS

  • J48 Classification Algorithm
  • Decision Trees Algorithm
  • K-NN Classification Algorithm
  • Naive-bayes Classification Algorithm
  • Comparing Classification Results

PERFORMING REGRESSION ON DATA SETS

  • Simple Linear Regression Model, Multi Linear Regression Model
  • Logistic Regression Model, Cross-Validation and Percentage Split

PERFORMING CLUSTERING ON DATA SETS

  • Clustering Techniques in Weka
  • Simple K-means Clustering Algorithm
  • Association Rule Mining on Data Sets
  • Apriori Association Rule Algorithm
  • Discretization in the Rule Generation Process

GRAPHICAL VISUALIZATION IN WEKA

  • Visualization Features in Weka
  • Visualize the data in various dimensions
  • Plot Histogram, Derive Interesting Insights