Underfitting

Published on :

21 Aug, 2024

Blog Author :

N/A

Edited by :

Ashish Kumar Srivastav

Reviewed by :

Dheeraj Vaidya

What Is Underfitting?

Underfitting refers to a scenario in which a data model cannot capture the association between input and output variables with accuracy. Individuals and organizations should avoid making decisions based on such a model as it leads to erroneous or problematic outcomes.

Underfitting

Typically, this scenario may arise when a machine learning model is too straightforward, causing it to need more input features, less regularization, or additional training time. In such a scenario, the model is unable to validate the trend in the data, like in the case of overfitting. This leads to the model’s poor performance and training errors.

  • Underfitting refers to a certain scenario in the field of data science where a model oversimplifies the data within a dataset. Such a model has two key characteristics — low variance and high bias.   
  • A few common reasons behind this scenario are the complexity and high variance of the model and the noise within the dataset utilized for training. 
  • A key difference between underfitting and overfitting is that in the case of the former, the model is extremely straightforward. That said, an overfitted model is complicated. 
  • One can handle underfitting in machine learning by increasing the features in a dataset.

Underfitting Explained

Underfitting refers to a certain type of scenario that materializes where machine learning models are not adequately complex to capture the associations between a target variable and the features of a data set. Individuals and businesses must not make decisions using this model because accurate data do not form the basis of the suggestions drawn from it.

This kind of model often delivers a poor performance on the training data. In many cases, the results are highly unlikely or oversimplified. If a model cannot generalize or adapt well to unseen or new data, then one cannot utilize it for prediction or classification tasks.

Generalizing to unseen data finally allows people to utilize machine learning algorithms to classify data and make predictions. Individuals must note that low variance and high bias are useful indicators of this kind of scenario.

In a graphical representation of such a scenario, one uses a straight line to represent the training data decently. However, it cannot entirely render the curved association between a couple of variables, for example, X and Y. Hence, the model’s outcomes are not accurate when one applies the same to unseen or new data, especially if X values within the unseen data are much smaller or larger than the values within the training data.

Reasons

Some reasons behind underfitting in machine learning are as follows:

  • Specific models may be underfitted when they are not given adequate training samples. In such a case, the reason behind the occurrence of the underfitting could be the presence of a lot of uncertainty within the training data, causing the model to be incapable of discerning an underlying association between outputs and inputs.
  • In most cases, models underfit as they exhibit excess higher-than-recommended bias.
  • Noise or any type of distortion within a dataset can cause a model to malfunction.
  • If the user fails to ensure the data inputted is compatible with the machine learning model they are using, such a scenario may occur. For example, when creating a linear model, people must not utilize non-linear data.
  • If a dataset is very complex, the model might make inaccurate and unreliable predictions.

Examples

Let us look at a few underfitting examples to understand the concept better.

Example #1

Suppose a business utilized an underfitted model to make decisions. The model suggested that the organization always generates better sales if it incurs more expenses on advertising. That said, in reality, the model failed to capture or spot a saturation effect. In other words, the model failed to capture that at some point, revenue will flat out irrespective of how much the organization spends on advertising. Since the business relied on that model to make business-related decisions, it ended up overspending on marketing.

Example #2

Suppose an underfitted model suggested an organization that hiring new employees would lead to increased output and higher revenue. So, the business decided to recruit more workers. That said, actually, the model was unable to identify the saturation impact. Simply put, the model was unable to predict that the output levels will flatten soon, irrespective of how many more new workers join. The business’s costs increased substantially, and profits dropped as it made decisions based on that model.

How To Handle?

The different ways to tackle such a scenario are as follows:

  • Decrease the total amount of regularization utilized. Note that regularization is the process that involves decreasing noise within a dataset. That said, problems take place when features within the dataset are overly uniform.
  • Users need to make sure that a model is adequately and not overly trained, as a delicate balance exists between underfitting and overfitting.
  • A model needs to have sufficient data, or else it cannot capture patterns. In other words, if a model consists of similar or limited data, it cannot interpret data accurately.
  • Ensuring the presence of adequate predictive features will ensure the creation of a model with the necessary functions. Note that a model will provide inaccurate results if it does not comprise sufficient predictive features.

Importance

It is important for one to know about such a scenario and avoid it, as using these models to make decisions could lead to higher costs and losses. Moreover, such a model may not consider all environmental aspects. This would lead to inaccurate results and undependable predictions.

 One must note that such a model generates a higher error rate on unseen as well as training set data. That said, an underfitted model can offer a good contrast to the issue of overfitting. Also, the ability to detect and address this scenario is a crucial element of the model development procedure.

Underfitting vs Overfitting

The concepts of underfitting and overfitting can be confusing for people new to the world of data science. To understand their meaning clearly, one must find out how they differ. In that regard, looking at their key differences is vital.

OverfittingUnderfitting
This scenario occurs when a model is complex. It occurs if a model is too straightforward or simple. 
The model fits all the data closely. In this scenario, the model cannot figure out patterns and associations. 
A good indicator of an overfitted model is high variance. In this case, a good indicator is high bias. 

Frequently Asked Questions (FAQs)

1. How to identify overfitting and underfitting?

Individuals can spot whether any predictive model is overfitted or underfitted by observing the prediction error on the evaluation and training data. Note that in the case of underfitting, the model’s performance on the training data is poor. On the other hand, a model is overfitting the training data if the model has a good performance on the training data but at the same time has a bad performance on the evaluation data.

2. Is underfitting better than overfitting?

Yes, it is better than overfitting. This is because there is no actual upper limit concerning the degradation of the generalization performance resulting from overfitting. That said, remember that an upper limit exists for underfitting.

3. Can a model be both underfitting and overfitting?

Individuals must note that both scenarios at the same time can be possible. However, the data sets must be different.

4. Does cross-validation prevent underfitting?

Yes, it can help prevent such a scenario in data science.

This article has been a guide to what is Underfitting. Here, we compare it with overfitting, explain its examples, reasons, how to handle it, and importance. You may also find some useful articles here -