Below are links for websites, videos, and books that I found useful giving an overview of different machine learning models and steps in the data preparation and model validation process.
I will update these links as I come across more great resources!
Best Sites for Machine Learning Mastery
I often find myself using the same sites over and over again for reference when exploring machine learning and deep learning. Links to my favorite resources are below:
Data Preparation
Data Preprocessing
- Preprocessing with sklearn: a complete and comprehensive guide
- Feature Scaling for Machine Learning: Understanding the Difference Between Normalization vs. Standardization
- All about Categorical Variable Encoding
- Ordinal and One-Hot Encodings for Categorical Variables
Data Splitting for Training and Model Validation
- train_test_split Vs StratifiedShuffleSplit
- sklearn.cross_validation.StratifiedShuffleSplit
- Cross-validation: Evaluating Estimator Performance
Imputing Missing Values
Problem Types and Model Selection
- Difference Between Classification and Regression in Machine Learning
- Which Machine Learning Model to Use?
- Machine Learning Algorithms: Which One to Choose for Your Problem
Regularized Regression
- An Introduction to Ridge, Lasso, and Elastic Net Regression
- Regularization: Ridge, Lasso and Elastic Net
Essential Algorithms: Tree Methods
Ensemble Methods
Decision Trees
- What is a Decision Tree?
- A Step by Step CART Decision Tree Example
- sklearn.tree.DecisionTreeClassifier
- Classification And Regression Trees for Machine Learning
- InDepth: Parameter tuning for Decision Tree
- Impurity & Judging Splits – How Decision Trees Work
Random Forest
- An Implementation and Explanation of the Random Forest in Python
- Introduction to Random forest – Simplified
- A Beginner’s Guide to Random Forest Hyperparameter Tuning
- sklearn.ensemble.RandomForestClassifier
Gradient Boosted Trees
- A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning
- Gradient Boosting Decision Tree Algorithm Explained
- Boosting algorithm: GBM
- Gradient Boosting Classifiers in Python with Scikit-Learn
- sklearn.ensemble.GradientBoostingClassifier
- Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python
XGBoost
- A Gentle Introduction to XGBoost for Applied Machine Learning
- An End-to-End Guide to Understand the Math behind XGBoost
- A Beginner’s guide to XGBoost
- XGBoost Algorithm: Long May She Reign!
Support Vector Machines (SVMs)
- Support Vector Machine — Simply Explained
- Chapter 2 : SVM (Support Vector Machine) — Theory
- Understanding Support Vector Machine (SVM) algorithm from examples (along with code)
- Support Vector Machines
- sklearn.svm.SVC
Multi-Class & Multi-Label Algorithms
- Multiclass and Multilabel Algorithms
- Multiclass Classification using scikit-learn
- How to Use One-vs-Rest and One-vs-One for Multi-Class Classification
- Multiclass Classification using Random Forest on Scikit-Learn Library
- Machine Learning — Multi-Class Classification with Imbalanced Dataset
Loss Functions
- Common Loss Functions in Machine Learning
- A Detailed Guide to 7 Loss Functions for Machine Learning Algorithms with Python Code
- Common loss functions that you should know!
Model Validation & Improvement
Model Evaluation & Improving Models
- Evaluating a Classification Model
- Validating Machine Learning Models with scikit-learn
- Regression: An Explanation of Regression Metrics And What Can Go Wrong
- Metrics and Scoring: Quantifying the Quality of Predictions
- How to Use ROC Curves and Precision-Recall Curves for Classification in Python
- Understanding AUC – ROC Curve
- sklearn.metrics.classification_report
- Feature Importances with Forests of Trees
- Important three techniques to improve machine learning model performance with imbalance datasets
Clustering Models
K-Means
- The Most Comprehensive Guide to K-Means Clustering You’ll Ever Need
- K-means Cluster Analysis
- Interpret All Statistics and Graphs for Cluster K-Means
Hierarchical Clustering
- What is Hierarchical Clustering?
- What is Hierarchical Clustering? on KDNuggets
- Understanding the Concept of the Hierarchical Clustering Technique
- A Beginner’s Guide to Hierarchical Clustering and how to Perform it in Python
DBSCAN
Validating Clustering Models
- Clustering Validation Statistics: 4 Vital Things Everyone Should Know – Unsupervised Machine Learning
- Determining The Optimal Number Of Clusters: 3 Must Know Methods
- Selecting the Number of Clusters with Silhouette Analysis on K-Means Clustering
- Cluster Validation Statistics: Must Know Methods
- Unsupervised Learning: Evaluating Clusters
Recommender Engines
General
- Comprehensive Guide to Build a Recommendation Engine from Scratch (in Python)
- Building a Movie Recommendation Engine in Python using Scikit-Learn
- Large Scale Jobs Recommendation Engine using Implicit Data in PySpark
Content-Based Method
Collaborative Filtering Method
Other Recommendation Engine Methods
- Recommender System — Singular Value Decomposition (SVD) & Truncated SVD
- Wide & Deep Learning for Recommender Systems
- ALS: Stock Portfolio Recommendations
Deep Learning
- Deep Learning: Feedforward Neural Network
- Deep Learning: Back Propagation
- Gentle Introduction to the Adam Optimization Algorithm for Deep Learning
- Adam — Latest Trends in Deep Learning Optimization
Time Series
General Principles
- Almost Everything You Need to Know About Time Series
- A Complete Tutorial on Time Series Modeling in R
- Seasonality in Python: Additive or Multiplicative Model?
- Is my Time Series Additive or Multiplicative?
Moving Average
- Moving Average Smoothing for Data Preparation and Time Series Forecasting in Python
- Moving Average Models
ARIMA
- ARIMA Models
- ARIMA Model – Complete Guide to Time Series Forecasting in Python
- How to Create an ARIMA Model for Time Series Forecasting in Python
- Understanding Auto Regressive Moving Average Model — ARIMA
Holt-Winter Model
Causal Impact/Bayesian Time Series
- Inferring Causal Impact using Bayesian Structural Time Series Models
- Inferring Causality in Time Series Data
Multi-Step Time Series
- Improving Multi-step Prediction of Learned Time Series Models
- Strategies for Multi-Step Time Series Forecasting
Prophet Models
- Prophet: Forecasting at Scale
- Journal Article: Forecasting at Scale
- Time Series Prediction using Prophet in Python
- Generate Quick and Accurate Time Series Forecasts using Facebook’s Prophet
- Using Prophet for Anomaly Detection
- Forecasting Multiple Time Series using Prophet in Parallel
