Courses

Courses for Kids

Free study material

Store

Talk to our experts

1800-120-456-456

Maths

What Are Outliers in Maths?

What Are Outliers in Maths?

Q: 5. What is the difference between an outlier and an error in data entry?

This is a critical distinction in statistics.An outlier is a legitimate data point that is extreme but possible. For example, the salary of a CEO in a list of employee salaries would be a genuine outlier.An error is an incorrect data point that is not possible and resulted from a mistake during data collection or entry. For example, recording a person's age as 200 years.While all data entry errors will likely appear as outliers, not all outliers are errors. It is important to investigate the cause of an outlier before deciding to remove it from the data set.

Reviewed by:

Rama Sharma

How to Identify and Interpret Outliers in Data

An outlier is a mathematical value in a set of data which is quite distinguishing from the other values. In simple terms, outliers are values uncommonly far from the middle. Mostly, outliers have a significant impact on mean, but not on the median, or mode. Thus, the outliers are crucial in their influence on the mean. Remember that there is no rule to determine the outliers. Value of an outlier is generally more than 1.5 times the value of the interquartile range (IQR) beyond the quartiles.

What is an Outlier in a Line Plot?

Plotting the data on a number line as a dot plot will enable you to determine the outliers.

Outlier Examples

Outliers are basically considered to be stragglers, meaning that — extremely high or extremely low values — in a data that can throw off the stats. For example, if you were measuring the height of people in a room, your average value might be thrown off if Robert Wadlow was in the room.

Apparently, Robert Wadlow is discovered to be the tallest man ever in medical history, who when last measured to be 2.72 m (8 ft 11.1 in) tall on 27 June 1940.

Displaying Outliers in Box and Whisker Plots

Box and whisker plots will often display outliers as dots that are individualized from the rest of the plot.

Below are a box plot and whisker plot of the distribution from above that does not display outliers.

(Image will be uploaded soon)

Below, is a box and whisker plot of a similar distribution that does display outliers.

(Image will be uploaded soon)

Solved Examples

Below is the step-by-step solution to the outlier math example.

Example:

Determine the outliers of the data set. Also, evaluate the mean of the data set including the outliers and excluding the outliers.

35, 75, 20, 25, 15, 30, 30, 15, 45, 40, 110

Solution:

First, arrange the data set in order.

15, 15, 20, 25, 30, 30, 35, 40, 45, 75, 110

Now, plot the data on a number line in the form of a dot plot.

The values 75 and 110 are far off the middle. Thus, these two values are outliers for the assigned set of data.

Find the mean median mode outlier of the data:

Mean = {Sum of the data values}/{Number of data values}

= [15 + 15 + 20 + 25 + 30 + 30 + 35 + 40 + 45 + 75 + 110]/ 11

= 40

Now to find the mean without the outlier,

Evaluating the mean of the data set excluding the outliers, remove the values far off the middle (i.e. 75 and 110):

Mean = Sum of the data values/Number of data values

= {15 + 15 + 20 + 25 + 30 + 30 + 35 + 40 + 45}/9

=20.45

The mean of the given data set is 40 when outliers are included, however, it is 20.45 when outliers are not included.

Example:

For the data set including values 2, 5, 6, 9, 12, we are available with the following five-number summary:

Solution:

Minimum = 2

1st Quartile = 3.5

Median = 6

3rd Quartile = 10.5

Maximum = 12

IQR = 10.5 – 3.5 = 7.

Thus, 1.5·IQR = 10.5.

In order to identify if there are any outliers, we should consider the numbers that are 1.5·IQR or 10.5 beyond the quartiles.

1st quartile – 1.5·IQR = 3.5 – 10.5 = –7

3rd quartile + 1.5·IQR = 10.5 + 10.5 = 21

Considering the fact that none of the data lies outside the interval from –7 to 21, thus, we deduce there are no outliers.

Did You Know?

The outlier is a data point that lies outside the entire pattern in a distribution.
The outliers are shown as dots.
A usual rule says that a data point is an outlier given that it is more than 1.5 IQR1.
The whiskers are required to change.
Whiskers stretch out to the farthest point in the data set that isn't an outlier.

FAQs on What Are Outliers in Maths?

1. What exactly is an outlier in Maths?

An outlier is a data point that is significantly different from other observations in a data set. Think of it as a value that 'lies outside' the expected range of the data. For example, in the data set {2, 3, 4, 5, 100}, the number 100 is a clear outlier because it is abnormally far from the other values.

2. How do you find outliers in a data set using the IQR method?

The most common method taught as per the CBSE/NCERT syllabus involves the Interquartile Range (IQR). The steps are as follows:

First, arrange the data in ascending order and calculate the first quartile (Q1) and the third quartile (Q3).
Next, find the Interquartile Range (IQR) by subtracting Q1 from Q3 (Formula: IQR = Q3 - Q1).
Calculate the lower and upper boundaries. The lower boundary is Q1 - 1.5 * IQR, and the upper boundary is Q3 + 1.5 * IQR.
Any data point that falls below the lower boundary or above the upper boundary is identified as an outlier.

3. How do outliers affect the mean, median, and mode of a data set?

Outliers have a significant impact on some measures of central tendency but not others:

Mean: The mean is highly sensitive to outliers. A single very large or very small outlier can drastically pull the mean towards it, making it a less reliable measure for skewed data.
Median: The median is resistant to outliers. Since it is the middle value of an ordered dataset, an extreme value at either end does not change its position significantly.
Mode: The mode, being the most frequent value, is generally not affected by outliers unless the outlier itself becomes the most frequent number, which is rare.

4. Can a data set have negative outliers?

Yes, absolutely. An outlier's status is determined by its distance from the other data points, not by whether it is positive or negative. For example, in the data set {-20, 8, 9, 10, 12}, the value -20 is an outlier because it is unusually far from the cluster of positive numbers. The same calculation method using the IQR applies to data sets containing negative values.

5. What is the difference between an outlier and an error in data entry?

This is a critical distinction in statistics.

An outlier is a legitimate data point that is extreme but possible. For example, the salary of a CEO in a list of employee salaries would be a genuine outlier.
An error is an incorrect data point that is not possible and resulted from a mistake during data collection or entry. For example, recording a person's age as 200 years.

While all data entry errors will likely appear as outliers, not all outliers are errors. It is important to investigate the cause of an outlier before deciding to remove it from the data set.

6. Is it possible for most of the data points in a set to be classified as outliers?

No, it is conceptually impossible for the majority of data points to be outliers. The standard method for identifying outliers uses the Interquartile Range (IQR). By its very definition, the range between Q1 and Q3 contains the middle 50% of the data. The outlier boundaries (Q1 - 1.5*IQR and Q3 + 1.5*IQR) are designed to create a wide zone that encompasses the vast majority of typical data points, making it impossible for 'most' points to fall outside.

7. Why is it important to identify outliers when analysing data?

Identifying outliers is a crucial step in data analysis for several reasons:

Accuracy: They can reveal measurement or data entry errors that need to be corrected to ensure the data is accurate.
Insight: They can highlight unusual or special events within the data that might be interesting to study further, such as a day with record-breaking sales or a faulty sensor reading.
Reliability: Outliers can heavily skew statistical results, especially the mean and standard deviation. Identifying and appropriately handling them prevents misleading conclusions and ensures the analysis is reliable.