๐‚๐ก๐ž๐œ๐ค ๐˜๐จ๐ฎ๐ซ ๐Œ๐จ๐๐ž๐ฅ: ๐ˆ๐ฌ ๐ˆ๐ญ ๐†๐ž๐ญ๐ญ๐ข๐ง๐  ๐ˆ๐ญ ๐‘๐ข๐ ๐ก๐ญ, ๐จ๐ซ ๐‰๐ฎ๐ฌ๐ญ ๐…๐š๐ค๐ข๐ง๐  ๐ˆ๐ญ?โฃ

Sayemuzzaman Siam
4 min readJan 26, 2025

--

Your models pose, my models predict. We are not the same๐Ÿ˜ค

Training a machine learning model is like teaching a pet โ€” you want it to learn the right tricks without overdoing it or forgetting the basics! But how do you know if your model is truly learning meaningful patterns or just memorizing noise in the data?

Techniques like ๐œ๐ซ๐จ๐ฌ๐ฌ-๐ฏ๐š๐ฅ๐ข๐๐š๐ญ๐ข๐จ๐ง, ๐ฅ๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐œ๐ฎ๐ซ๐ฏ๐ž๐ฌ, ๐š๐ง๐ ๐š๐œ๐ญ๐ข๐ฏ๐š๐ญ๐ข๐จ๐ง ๐ฏ๐ข๐ฌ๐ฎ๐š๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง๐ฌ help you quickly spot whether your model is ๐ฎ๐ง๐๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ , ๐จ๐ฏ๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ , ๐จ๐ซ ๐ฉ๐ž๐ซ๐Ÿ๐ž๐œ๐ญ๐ฅ๐ฒ ๐›๐š๐ฅ๐š๐ง๐œ๐ž๐. Letโ€™s dive in and find out.โฃ

๐“๐ž๐œ๐ก๐ง๐ข๐ช๐ฎ๐ž๐ฌ ๐ญ๐จ ๐ˆ๐๐ž๐ง๐ญ๐ข๐Ÿ๐ฒ ๐”๐ง๐๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐  ๐š๐ง๐ ๐Ž๐ฏ๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ โฃ:

๐Ÿ. ๐‚๐ซ๐จ๐ฌ๐ฌ-๐•๐š๐ฅ๐ข๐๐š๐ญ๐ข๐จ๐งโฃ

๐‡๐จ๐ฐ ๐ˆ๐ญ ๐—ช๐จ๐ซ๐ค๐ฌ:

Split your data into multiple folds, and train and evaluate the model on each fold.โฃ

๐—ช๐ก๐š๐ญ ๐ญ๐จ ๐‹๐จ๐จ๐ค ๐…๐จ๐ซ:โฃ

Low performance across all folds โ†’ ๐”๐ง๐๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ .โฃ

High variance between folds โ†’ ๐Ž๐ฏ๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ .โฃ

๐๐ซ๐จ ๐“๐ข๐ฉ: Use stratified k-fold for imbalanced datasets to ensure each fold represents the class distribution.โฃ

๐Ÿ. ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐‚๐ฎ๐ซ๐ฏ๐ž๐ฌโฃ

๐‡๐จ๐ฐ ๐ˆ๐ญ ๐—ช๐จ๐ซ๐ค๐ฌ:

Plot the modelโ€™s performance (๐š๐œ๐œ๐ฎ๐ซ๐š๐œ๐ฒโ†‘ or ๐ž๐ซ๐ซ๐จ๐ซ-๐ฅ๐จ๐ฌ๐ฌโ†“) on both training and validation sets over time or as the training set size increases.โฃ

๐—ช๐ก๐š๐ญ ๐ญ๐จ ๐‹๐จ๐จ๐ค ๐…๐จ๐ซ:โฃ

Training and validation performance stabilize at a low level โ†’๐”๐ง๐๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ .โฃ

Large gap between training and validation performance โ†’ ๐Ž๐ฏ๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ .โฃ

๐Ÿ‘. ๐•๐ข๐ฌ๐ฎ๐š๐ฅ๐ข๐ณ๐ข๐ง๐  ๐€๐œ๐ญ๐ข๐ฏ๐š๐ญ๐ข๐จ๐ง๐ฌ (๐๐ž๐ฎ๐ซ๐š๐ฅ ๐๐ž๐ญ๐ฐ๐จ๐ซ๐ค๐ฌ)โฃ

๐‡๐จ๐ฐ ๐ˆ๐ญ ๐—ช๐จ๐ซ๐ค๐ฌ:

Analyze the activations of layers in a neural network to see if the model is learning useful features or overfitting to noise.โฃ

๐“๐จ๐จ๐ฅ๐ฌ: TensorBoard, Grad-CAM, or activation heatmaps.โฃ

๐—ช๐ก๐š๐ญ ๐ญ๐จ ๐‹๐จ๐จ๐ค ๐…๐จ๐ซ:โฃ

Uniform or uninformative activations โ†’ ๐”๐ง๐๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ .โฃ

Overly specific activations (memorizing noise) โ†’ ๐Ž๐ฏ๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ .โฃโฃ

๐Ž๐ฏ๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ : ๐’๐ฒ๐ฆ๐ฉ๐ญ๐จ๐ฆ๐ฌ ๐š๐ง๐ ๐’๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฌโฃ

๐’๐ฒ๐ฆ๐ฉ๐ญ๐จ๐ฆ๐ฌ:โฃ

1.Model performs well on training data but poorly on validation/test data.โฃ

2.High variance in cross-validation results.โฃ

Overfitted bed: It fits the model perfectlyโ€ฆ but good luck getting anyone else in!

๐’๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฌ:โฃ

1.๐‘๐ž๐๐ฎ๐œ๐ž ๐Œ๐จ๐๐ž๐ฅ ๐‚๐จ๐ฆ๐ฉ๐ฅ๐ž๐ฑ๐ข๐ญ๐ฒ:

Use fewer layers (neural networks) or fewer parameters (reduce tree depth in decision trees).โฃ

2.๐‘๐ž๐ ๐ฎ๐ฅ๐š๐ซ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง ๐“๐ž๐œ๐ก๐ง๐ข๐ช๐ฎ๐ž๐ฌ:โฃ

๐‹๐Ÿ (๐‹๐š๐ฌ๐ฌ๐จ): Encourages sparsity and feature selection.โฃ

๐‹๐Ÿ (๐‘๐ข๐๐ ๐ž): Smooths weights to prevent over-reliance on specific features.โฃ

๐ƒ๐ซ๐จ๐ฉ๐จ๐ฎ๐ญ: Randomly drop units during training (neural networks).โฃ

3.๐„๐š๐ซ๐ฅ๐ฒ ๐’๐ญ๐จ๐ฉ๐ฉ๐ข๐ง๐ : Stop training when validation performance stops improving.โฃ

4.๐ƒ๐š๐ญ๐š ๐€๐ฎ๐ ๐ฆ๐ž๐ง๐ญ๐š๐ญ๐ข๐จ๐ง: Increase dataset size through transformationsโฃ

5.๐๐š๐ญ๐œ๐ก ๐๐จ๐ซ๐ฆ๐š๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง: Normalize activations to stabilize training (neural networks).โฃ

6.๐”๐ฌ๐ž ๐’๐ข๐ฆ๐ฉ๐ฅ๐ž๐ซ ๐Œ๐จ๐๐ž๐ฅ๐ฌ: Switch to a less complex algorithm.โฃโฃ

๐”๐ง๐๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ : ๐’๐ฒ๐ฆ๐ฉ๐ญ๐จ๐ฆ๐ฌ ๐š๐ง๐ ๐’๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฌโฃ

๐’๐ฒ๐ฆ๐ฉ๐ญ๐จ๐ฆ๐ฌ:โฃ

Model performs poorly on both training and validation/test data.โฃ

Low performance across all cross-validation folds.โฃ

You are underfit for this king size bed, get a wife and have children

๐’๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฌ:โฃ

1.๐ˆ๐ง๐œ๐ซ๐ž๐š๐ฌ๐ž ๐Œ๐จ๐๐ž๐ฅ ๐‚๐จ๐ฆ๐ฉ๐ฅ๐ž๐ฑ๐ข๐ญ๐ฒ: Add more layers (neural networks) or increase the number of parameters (increase tree depth in decision trees).โฃ

2.๐…๐ž๐š๐ญ๐ฎ๐ซ๐ž ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐ : Create new features or transform existing ones (polynomial features, interaction terms).โฃ

3.๐Œ๐จ๐ซ๐ž ๐ƒ๐š๐ญ๐š: Increase the number of data in the dataset.โฃ

4.๐‡๐ฒ๐ฉ๐ž๐ซ๐ฉ๐š๐ซ๐š๐ฆ๐ž๐ญ๐ž๐ซ ๐“๐ฎ๐ง๐ข๐ง๐ : Adjust hyperparameters like learning rate, number of layers, or number of estimators.โฃ

5.๐„๐ง๐ฌ๐ž๐ฆ๐›๐ฅ๐ž ๐Œ๐ž๐ญ๐ก๐จ๐๐ฌ: Combine multiple models (bagging, boosting) to improve accuracy and handle complex relationships.โฃโฃ

๐‚๐จ๐ฆ๐ฆ๐จ๐ง ๐’๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฌ ๐Ÿ๐จ๐ซ ๐๐จ๐ญ๐ก ๐”๐ง๐๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐  ๐š๐ง๐ ๐Ž๐ฏ๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐ โฃ:

1.๐‚๐ฅ๐ž๐š๐ง ๐ƒ๐š๐ญ๐š: Ensure the data is properly preprocessed, free from outliers, and representative of the problem space.โฃ

2.๐‹๐จ๐ฌ๐ฌ ๐…๐ฎ๐ง๐œ๐ญ๐ข๐จ๐ง๐ฌ: Choose or modify loss functions to better suit the problem (focal loss for class imbalance, Huber loss for robust regression).โฃ

3.๐‚๐ก๐š๐ง๐ ๐ž ๐€๐ฅ๐ ๐จ๐ซ๐ข๐ญ๐ก๐ฆ๐ฌ: Experiment with different algorithms that might perform better for your specific problem.โฃ

โฃ

๐๐ข๐š๐ฌ-๐•๐š๐ซ๐ข๐š๐ง๐œ๐ž ๐“๐ซ๐š๐๐ž๐จ๐Ÿ๐Ÿ: ๐“๐ก๐ž ๐๐ข๐  ๐๐ข๐œ๐ญ๐ฎ๐ซ๐žโฃ

High Bias = You are bias towards someone, not neutral! and High Variance = More scattered!

๐”๐ง๐๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐  โ†’ ๐‡๐ข๐ ๐ก ๐๐ข๐š๐ฌ: The model is too simple to capture the underlying patterns.โฃ

๐Ž๐ฏ๐ž๐ซ๐Ÿ๐ข๐ญ๐ญ๐ข๐ง๐  โ†’ ๐‡๐ข๐ ๐ก ๐•๐š๐ซ๐ข๐š๐ง๐œ๐ž: The model is too complex and memorizes noise instead of learning generalizable patterns.โฃ

SAD Life

--

--

Sayemuzzaman Siam
Sayemuzzaman Siam

No responses yet