0% found this document useful (0 votes)
8 views5 pages

Machine Learning Concepts and Techniques

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

Machine Learning Concepts and Techniques

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

# MACHINE LEARNING – DETAILED NOTES

## 1. SUPERVISED LEARNING

### Feature Selection

Feature selection means choosing only the most important input variables. Benefits:

- Reduces overfitting

- Increases model accuracy

- Makes training faster

Methods:

1. **Filter methods** – Correlation, Chi-square, ANOVA F-test

2. **Wrapper methods** – Forward selection, backward elimination

3. **Embedded methods** – Lasso, Ridge, Decision tree importance

### Cross Validation

Cross-validation checks model performance on unseen data.

- **K-fold**: Data is split into K parts; model trained K times.

- **Stratified K-fold**: Keeps class ratio similar in all folds.

- **Leave-One-Out (LOO)**: One sample is test, rest are training.

### Bootstrapping

Bootstrapping samples data **with replacement**.

- Useful for estimating accuracy when data is small.

- Basis for **Bagging** and **Random Forests**.

### Normalization

Normalization scales data so features have similar range.

- **Min-Max Scaling**: (x - min) / (max - min)

- **Z-score Standardization**: (x - mean) / std

---

## 2. CLASSIFICATION ALGORITHMS

### Naïve Bayes

Probabilistic classifier using **Bayes theorem** with assumption of independence.

Types:

- Gaussian NB

- Multinomial NB
- Bernoulli NB

### Bayesian Network

Graphical model showing dependencies using nodes (variables) and edges (relationships).

### Decision Trees (ID3, C4.5)

- **ID3** uses entropy + information gain.

- **C4.5** improves ID3 (handles continuous data, pruning).

### Support Vector Machine (SVM)

- Finds hyperplane that maximizes margin.

- Uses **kernel trick**: Linear, Polynomial, RBF.

### Extreme Learning Machine (ELM)

- Single hidden layer feedforward network.

- Hidden weights random, output solved analytically.

### Neural Network

- Layers: Input → Hidden → Output.

- Uses activation functions: ReLU, Sigmoid, Tanh.

- Learns using **backpropagation**.

### VC Dimension

Measures model complexity and capacity of hypothesis space.

### Regularization

Prevents overfitting.

- **L1 (Lasso)** – feature selection

- **L2 (Ridge)** – weight shrinkage

---

## 3. REGRESSION

### Linear Regression

Predicts continuous value using line: y = mx + c.

### Multiple Linear Regression

y = b0 + b1x1 + b2x2 + ...

### Polynomial Regression


Fits polynomial curve: y = a0 + a1x + a2x² + ...

### Support Vector Regression (SVR)

Uses margin-based idea for regression with kernels.

---

## 4. COMMITTEE / ENSEMBLE METHODS

### Bagging

- Trains multiple models on bootstrapped samples.

- Reduces variance.

- Example: **Random Forest**.

### Boosting

- Models trained sequentially.

- Each corrects previous errors.

- Examples: AdaBoost, Gradient Boosting, XGBoost.

---

## 5. UNSUPERVISED LEARNING

### K-Nearest Neighbour (KNN)

- Classification based on distance to k-nearest points.

### K-Means Clustering

- Clusters data into K groups by minimizing intra-cluster distance.

### Fuzzy K-Means

Each data point has **probabilistic membership** in clusters.

### Hierarchical Clustering

Builds clusters in tree form.

- **Single linkage** – min distance

- **Complete linkage** – max distance

- **Average linkage** – average distance

### Non-Spherical Clustering

Algorithms that handle irregular cluster shapes:

- DBSCAN
- OPTICS

- Spectral Clustering

---

## 6. STATISTICAL TESTING METHODS

Used to validate hypotheses.

- t-test

- Chi-square test

- ANOVA

- Mann-Whitney U-test

---

## 7. PROBABILISTIC INFERENCE

Making predictions from probabilities.

- Used in Bayesian networks, HMMs, belief propagation.

---

## 8. NEURAL NETWORKS & DEEP LEARNING

Deep learning models:

- CNN

- RNN

- LSTM

- Transformers

Concepts:

- Backpropagation

- Optimization: SGD, Adam

- Loss functions: Cross entropy, MSE

---

## 9. EVOLUTIONARY ALGORITHMS

Inspired by biological evolution.

- Genetic Algorithm

- Genetic Programming

- Mutation, crossover, selection


---

## 10. APPLICATIONS

### Text Classification

Email spam detection, sentiment analysis using NB, SVM, BERT.

### Disease Diagnosis

ML models classify diseases using symptoms or images.

### Biometric Systems

Face recognition, fingerprint matching.

### Real-Valued Classification

Regression-based tasks: age prediction, price estimation.

---

Common questions

Powered by AI

Non-spherical clustering algorithms like DBSCAN and OPTICS can detect arbitrary-shaped clusters and handle noise, unlike K-means which assumes spherical clusters and may struggle with irregular shapes. They are more effective in settings with unevenly distributed or complexly shaped data, enabling the discovery of clusters that better reflect the underlying structure .

Statistical testing in machine learning, like t-tests or Chi-square tests, requires careful consideration of assumptions, data distribution, and sample size to ensure validity. These considerations prevent erroneous conclusions and confirm the statistical significance of model differences or relationships, making them crucial for reliable hypothesis validation and model evaluation .

The VC Dimension measures the capacity of a model's hypothesis space by quantifying the largest set of points that can be perfectly classified. A higher VC Dimension indicates a model's ability to fit complex patterns but may lead to overfitting. It influences model selection by helping balance complexity and generalization, guiding choices on model suitability for specific datasets .

Regularization controls model complexity to prevent overfitting by adding penalty terms to the loss function. L1 regularization (Lasso) can shrink some feature coefficients to zero, effectively performing feature selection. L2 regularization (Ridge) reduces all coefficients proportionally, preventing large weights prone to overfitting. Both modify model training by introducing trade-offs between fitting the training data and maintaining simplicity .

Bootstrapping is preferred in scenarios with small datasets, as it allows for estimation of accuracy by resampling the data with replacement, providing multiple datasets for robust model evaluation. It forms the basis of bagging and random forests by allowing multiple models to train on varied datasets, thus reducing variance and enhancing generalization .

Cross-validation, including techniques like K-fold and stratified K-fold, allows for reliable assessment by testing a model on different subsets of data, helping ensure it performs well on unseen data. However, it can be computationally intensive, especially with large datasets, and might not always appropriately handle highly imbalanced data without adjustments like stratification .

Ensemble methods like bagging (e.g., Random Forest) mitigate overfitting by reducing variance through training on bootstrapped samples, while boosting (e.g., AdaBoost, Gradient Boosting) enhances model accuracy by sequentially correcting errors of weak learners. These methods significantly improve performance on complex datasets, leveraging the strengths of multiple models to achieve better generalization .

Naïve Bayes classifiers differ in their assumptions about data distribution: Gaussian NB assumes normally distributed features, making it suitable for continuous data. Multinomial NB works well with count data like text classification. Bernoulli NB assumes binary data and is useful for binary features in text categorization tasks. Their application depends on the feature types and distribution assumptions .

The C4.5 algorithm improves upon ID3 by handling continuous data, supporting pruning to reduce overfitting, and dealing with missing values. These improvements enhance its robustness and applicability in practical scenarios, allowing it to handle a wider variety of datasets while maintaining efficiency in model complexity and accuracy .

Feature selection improves a machine learning model's performance by reducing overfitting, enhancing model accuracy, and decreasing training time. Common methods include filter methods (Correlation, Chi-square, ANOVA F-test), wrapper methods (forward selection, backward elimination), and embedded methods (Lasso, Ridge, Decision tree importance).

You might also like