- What is entropy? Information gain (IG) concepts
- Gradient Boosting
- Bagging
- XGBoost (Why popular - parallelization)
- Trees for classification versus regression
- CART/Regression Trees, algorithmic change to incorporate regression in trees (maximum, mean of samples in each leaf to make final prediction)
- Variance reduction method instead of IG
Estimation strategies: Maximum likelihood (MLE) versus Maximum apriori (MAP)
Naive Bayes, Logistic Regression
- Generative versus Discriminative models
- Logistic regression intuition from a perceptron
- Loss functions for Logistic regression
- Multiclass LR (derivations for likelihood estimation and gradient calculations)
- How Multiclass LR is different from MLPs (Multi-layer perceptron)
- Types, differences, uniqueness in norms L0, L1, L2
- Why L3, L4, L5, .. norms are not used
- Why is L1 sparse?
- Bagging - Boosting - Cross validation
- Boosting loss similarity to log-loss/Logistic regression
Regularization in Deep Networks
- Dropouts
- BatchNorm (Is it a regularizer?)
- Data augmention as regularization
- Early stopping, multitask learning, adversarial learning
- Zoneouts, dropconnect (specifically for LSTMs)
- What PCA?
- Loss of PCA
- Difference between the two, convexity of both their losses
- Eigenvalue calculations
- What they depict, why important
Class imbalance issues
- Algorithmic ways
- Sampling ways
BayesNet and unsupervised learning
- Why inference on BayesNet is intractable?
- Inference
- Monte carlo methods
- Giibs Sampling
- Expectation-Minimization
- Gaussian Mixture models
- KMeans - loss and code from scratch
- KNNSs and how they are different from KMeans
Metrics to test a model
- Precision, recall, F1 - differences, use cases
- AUC, area under ROC curve
- What the area signifies? use-case based questions
- Hinge loss
- Code implementation
Linear Regression - loss function calculation and derivations
- MLE vs MAP (Different estimation strategies)
- How MAP brings regularization in linear regression loss
- Convexity, solving the loss directly
- Kernel regression
ICA (Independent component analysis) - difference from PCA/SVD
- When to use ICA?
Difference in decision boundaries for all algorihtms (Tree vs Logistic vs Linear Reg vs SVMs vs Naive Bayes)