Courses

Courses for Kids

Free study material

Store

Talk to our experts

1800-120-456-456

Maths

Statistics Numerical Measures: Mean, Median, Mode & Range

Statistics Numerical Measures: Mean, Median, Mode & Range

Q: 4. What do measures of dispersion like variance and standard deviation tell us?

Measures of dispersion, or variation, describe the spread or consistency of data points around the central value (usually the mean). A small standard deviation or variance indicates that the data points are clustered closely together, suggesting high consistency. Conversely, a large standard deviation means the data points are widely spread out, indicating low consistency or high variability.

Q: 5. Why is standard deviation often preferred over variance to describe data spread?

Although variance is crucial for calculations, standard deviation is often preferred for interpretation because it is expressed in the same units as the original data. For example, if you are measuring heights in centimetres, the standard deviation will also be in centimetres. The variance, however, would be in 'square centimetres,' which is not intuitive for describing spread. This makes the standard deviation easier to relate back to the mean and the dataset itself.

Q: 7. What is the fundamental difference between a sample statistic and a population parameter?

The key difference lies in what they describe. A population parameter is a numerical value (e.g., population mean µ) that describes a characteristic of an entire population. It is a fixed but often unknown value. A sample statistic (e.g., sample mean x̄) is a value calculated from a subset, or sample, of the population. We use sample statistics to estimate and make inferences about unknown population parameters.

Q: 8. How does the coefficient of variation help in comparing two different datasets?

The coefficient of variation (CV) is a relative measure of dispersion that is particularly useful for comparing the variability of two or more datasets that are measured in different units or have very different means. It expresses the standard deviation as a percentage of the mean. A lower CV implies greater consistency or less variability relative to its mean. For example, you can use CV to compare the volatility of two different stocks, even if their average prices are vastly different.

Reviewed by:

Rama Sharma

How to Apply Numerical Measures in Real-Life Data Analysis

Data is organized and summarized either graphically or numerically. Graphical descriptions of data are often used. But, if the given data set is large, constructing a graph becomes tedious. Although we can visualize the shape, centre, and spread of the distribution of the data set from the histogram, we cannot quantify data. We need to find out the numerical measures for describing data.

A statistic is a numerical descriptive measure calculated from sample data whereas a parameter is a numerical descriptive measure of a large population. Generally, the values of parameters are not known. We calculate statistics from the sample data, and based on the data in the samples, make claims about the parameters, which represent the population from the sample data.

Here, we will illustrate the numerical descriptive measures for sample and population, their computation, their meaning, and their uses.

Numerical Descriptive Measures For Sample

There are three types of numerical descriptive measures in statistics.

Measure of Central Tendency
Variability
Shape

Let's discuss each of them

Measures of Central Tendency

Measure of central tendency is defined as the number that represents the centre of a set of ordered numerical data. The different measures of central tendency are mean, median, mode, and geometric mean.

Mean

Mean, also known as average, is a measure of the central tendency of a group of values. Mean, generally refers to the arithmetic mean, as opposed to harmonic mean or geometric mean. The value of the mean is extremely affected by outliers.

To calculate the mean, we take the sum of all the values and divide it by the number of values as shown below.

\[\bar{x} = \frac{x_{1} + x_{2} + x_{3} + … + xn}{n}\]

I mean is calculated from the sample of a population, then it is known as sampling mean, represented as \[(\bar{x})\], whereas population mean is represented as \[(\mu)\].

Median

Median is a measure of central tendency which distributes the data into two parts, separating the upper half and lower half of data by a value known as the median. The median is affected by the extreme values.

Locating The Median

If the given data is arranged in order, then the median is located at \[\frac{n+1}{2}\] data values.

If the number of values is odd, the median is the middle number whereas if the number of values is even, the medium is the average of the two middle numbers.

Note: \[\frac{n+1}{2}\] is not the value of median but only the position of median in the ranked data.

Mode

Mode is the value that occurs most frequently. It is not affected by extreme values. It is used either for categorical data or numerical data. There may be several modes or no modes.

Geometric Mean

In Mathematics, the term Geometric mean is defined as the average or mean which represents the central tendency or typical value of a set of numbers by using the product of their values in opposition to the arithmetic mean that uses sum). For a collection {x₁, x₂,...xₙ} of a positive real number, the geometric mean is defined as:

GM {x₁, x₂,...xₙ} =\[\sqrt[n]{x_{1}, x_{2}, ...xn}\]

Example:

Find the geometric mean of 2 and 32.

GM ( 2, 32) = \[\sqrt{2.32}\] = \[\sqrt{64}\] = 8

Therefore, the geometric mean of 2 and 32 is 8

Measure of Variations

Variations measure the spread or dispersion of values in a data set. The different parameters of variations are:

Range
Interquartile range
Variance
Standard Deviation
Coefficient of Variation

Let us discuss each of the parameters of variation

Range

The difference between the greatest value and the smallest value of a given data set is termed as the range. It is the easiest measure of variation.

Range = Largest Value - Smallest Value

It ignores how data is distributed and is also sensitive to outliers.

Interquartile Range

The interquartile range is also the measure of spread or variation, based on splitting a given data set into four quartiles. Quartiles divide the rank-ordered set into 4 equal parts. The values that divide each data are known as the first quartile, second quartile, and third quartile, and are represented as Q1, Q2, and Q3.

First Quartile (Q1) -The first quartile divides the series into 4 equal parts. It is also known as the lower quartile. It divides the series in such a way that 25% of the observations are below it and the remaining 75% are above it.

Second Quartile (Q2) - The second quartile divides the series into 4 equal parts. It is also known as the median. It divides the series equally. 50% of the observations are below it and the other 50% of the observations are above it.

Third Quartile (Q3)- The third quartile divides the series into 4 equal parts. It is also known as the upper quartile. It divides the series in such a way that 75% of the observations are below it and the remaining 25% of the observations are above it.

Interquartile Range = Q3 - Q1

Variance

Variance is the average (approximately) square deviation of values from the mean.

Sample Variance: S² =\[\frac{\sum_{i=1}^{n}(X - \bar{X})^{2}}{n - 1}\]

Here,

\[\bar{X}\] - Arithmetic mean

n = Sample size

Xi = ith value of the variable X

Standard Deviation

Standard deviation is the most commonly used measure of variation for the samples. It shows variation about the means and has the same unit as the original data.

Sample Standard Deviation: S = \[\sqrt{\frac{\sum_{i=1}^{n}(X - \bar{X})^{2}}{n - 1}}\]

Coefficient of Variation

The term coefficient of variation is defined as the standard deviation, divided by the mean, and multiplied by 100. It is always calculated in percentages and shows variation relative to the mean.

The coefficient of variation can be used to compare two or more data sets measured in different units.

CV = \[(\frac{S}{\bar{x}})\] * 100

Measure of Variation Summary

The more the data are spread out, the larger the range, interquartile range, variance, and standard deviation.
The lesser the data are spread out, the smaller the range, interquartile range, variance, and standard deviation.
If there is no variation (all values are the same), then all these measures will be 0.
None of these measures will ever be negative.

Measure of Shape

The shape of the distribution shows how data is distributed. The measures of shape are symmetric or skewed.

Left - Skewed

Mean < Median

(Image will be Uploaded soon)

Symmetric

Mean = Median

(Image will be Uploaded soon)

Right- Skewed

Mean > Median

(Image will be Uploaded soon)

Numerical Descriptive Measures For Population

Numerical descriptive measures described previously are of samples, not population

Numerative descriptive measure, describing a population known as parameters, and are represented by Greek letters.

Important population parameters are population mean, population variance, and population standard deviation.

Population Mean - The population mean is the sum of all the values in the population divided by the size of the population, N.

\[\mu\]=\[\frac{\sum_{i=1}^{N}Xi}{N}\]=\[\frac{X_{1} + X_{2} + X_{3}...XN}{N}\]

Where,

\[\mu\] - Population Mean

N - Population Size

Xi - ith value of the variable X

Population Variance - The population variance is the average of the square deviation of values from the mean.

\[\sigma^{2}\] = \[\sqrt{\frac{\sum_{i=1}^{N}(Xi - \mu)^{2}}{N}}\]

Where,

\[\mu\] - Population Mean

N - Population Size

Xi - ith value of the variable X

Population Standard Deviation - It is the most commonly used measure of variations and has the same unit as the original data.

Population Standard Deviation : \[\sigma\] = \[\sqrt{\frac{\sum_{i=1}^{N}(Xi - \mu)^{2}}{N}}\]

Where,

\[\mu\] - Population Mean

N- Population size

Xi - ith value of the variable X

Solved Examples

1. Find the mean, median, mode, and range for the data given below.

12 , 17, 12, 13, 12, 14, 13, 21, 12

Solution:

Mean - It is the sum of all the values divided by the number of values as shown below.

Mean = \[\frac{12+17+12+13+12+14+13+21+12}{9}\] = 14

Median - The median is the middle or central value of the data set. To calculate the median, we will arrange the data in ascending order as 12, 12, 12, 12, 13, 13, 14, 17, 21.

There are 9 numbers, so the middle value is

\[\frac{9+1}{2}\]= 5

= 5th number

Therefore, the median is 13

Mode - The value that occurs most frequently in a given data is termed as mode. Accordingly, 12 is the mode.

Range - Largest Value - Smallest Value

Largest Value = 21

Smallest Value = 12

Range = 21 - 12 = 9

2. Find the coefficient of variation for the data given below.

Stock A

Average Price of Last Year = 60
Standard Deviation = 6

Stock B

Average Price of Last Year = 100
Standard Deviation = 6

Solution:

Stock A :

Average Price of Last Year = 60
Standard Deviation = 6

CV of stock A = \[(\frac{S}{\bar{X}})\] \[\times\] 100% = \[\frac{6}{60}\] \[\times\] 100% = 10%

CV of stock B = \[(\frac{S}{\bar{X}})\] \[\times\] 100% = \[\frac{6}{100}\] \[\times\] 100% = 6%

Both stock A and stock B have a similar standard deviation, but stock B is less variable in comparison to its price.

FAQs on Statistics Numerical Measures: Mean, Median, Mode & Range

1. What are the main numerical measures used in statistics?

In statistics, numerical measures are used to summarize and describe key features of a dataset. They are broadly classified into three main types:

Measures of Central Tendency: These describe the centre or typical value of a dataset. Examples include the mean, median, and mode.
Measures of Dispersion (or Variation): These describe how spread out the data points are. Key examples are the range, variance, and standard deviation.
Measures of Shape: These describe the distribution of the data, such as its symmetry or skewness.

2. What is the difference between mean, median, and mode?

The mean, median, and mode are all measures of central tendency, but they describe the 'centre' of the data in different ways:

The Mean is the arithmetic average, calculated by summing all values and dividing by the count of values. It is best used for symmetric data without extreme values.
The Median is the middle value in an ordered dataset. It is preferred when the data has outliers or is skewed, as it is not affected by extreme values.
The Mode is the most frequently occurring value in the dataset. It is the only measure of central tendency that can be used for categorical data.

3. In what real-world scenarios is the median a better measure than the mean?

The median is a better measure than the mean in situations where the dataset contains extreme values, or outliers, that can heavily skew the average. A classic example is measuring income levels or house prices in a neighbourhood. A few multi-million dollar properties would dramatically increase the mean price, making it unrepresentative of a typical house. The median, however, would provide a more accurate picture of the central value by ignoring these outliers.

4. What do measures of dispersion like variance and standard deviation tell us?

Measures of dispersion, or variation, describe the spread or consistency of data points around the central value (usually the mean). A small standard deviation or variance indicates that the data points are clustered closely together, suggesting high consistency. Conversely, a large standard deviation means the data points are widely spread out, indicating low consistency or high variability.

5. Why is standard deviation often preferred over variance to describe data spread?

Although variance is crucial for calculations, standard deviation is often preferred for interpretation because it is expressed in the same units as the original data. For example, if you are measuring heights in centimetres, the standard deviation will also be in centimetres. The variance, however, would be in 'square centimetres,' which is not intuitive for describing spread. This makes the standard deviation easier to relate back to the mean and the dataset itself.

6. How does the relationship between the mean and median indicate the shape of a data distribution?

The comparison between the mean and median is a quick way to understand the skewness, or asymmetry, of a distribution:

If Mean = Median, the distribution is perfectly symmetric.
If Mean > Median, the distribution is right-skewed (or positively skewed), meaning there is a 'tail' of high values pulling the mean up.
If Mean < Median, the distribution is left-skewed (or negatively skewed), indicating a 'tail' of low values is pulling the mean down.

7. What is the fundamental difference between a sample statistic and a population parameter?

The key difference lies in what they describe. A population parameter is a numerical value (e.g., population mean µ) that describes a characteristic of an entire population. It is a fixed but often unknown value. A sample statistic (e.g., sample mean x̄) is a value calculated from a subset, or sample, of the population. We use sample statistics to estimate and make inferences about unknown population parameters.

8. How does the coefficient of variation help in comparing two different datasets?

The coefficient of variation (CV) is a relative measure of dispersion that is particularly useful for comparing the variability of two or more datasets that are measured in different units or have very different means. It expresses the standard deviation as a percentage of the mean. A lower CV implies greater consistency or less variability relative to its mean. For example, you can use CV to compare the volatility of two different stocks, even if their average prices are vastly different.