Courses

Courses for Kids

Free study material

Store

Talk to our experts

1800-120-456-456

Commerce

Karl Pearson's Coefficient of Correlation: Calculation and Use

Karl Pearson's Coefficient of Correlation: Calculation and Use

Q: 7. How do extreme values or outliers impact the Karl Pearson coefficient?

Extreme values, or outliers, can significantly distort the Karl Pearson coefficient of correlation. A single outlier that doesn't follow the general trend of the data can artificially inflate or deflate the 'r' value, giving a misleading impression of the strength of the linear relationship. It is one of the major disadvantages of this method.

Reviewed by:

Dimple Tiwari

Karl Pearson Coefficient of Correlation – A Statistical Study

The study of Karl Pearson's Coefficient of Correlation is an inevitable part of Statistics. Statistics is majorly dependent on Karl Pearson Coefficient Correlation method. The Karl Pearson coefficient is defined as a linear correlation that falls in the numeric range of -1 to +1.

This is a quantitative method that offers the numeric value to form the intensity of the linear relationship between the X and Y variable. But is it really useful for any economic calculation? Let, us find and delve into this topic to get more detailed information on the subject matter – Karl Pearson Coefficient of Correlation.

Do You Know?

The correlation was developed in 1885 by Francis Galton!
Karl Pearson was actually a British statistician who was known as the leading founder of modern statistics.
It is regarded as the best method of measuring the association between two variables of interest as it is based on another popular method called covariance.
Karl Pearson’s method gets highly affected by extreme value items, so we cannot draw any immediate conclusion using this method.

What do You mean by Correlation Coefficient?

Before looking into details about Karl Pearson's Coefficient of Correlation, it is vital to brush up on fundamental concepts about correlation and its coefficient in general.

The correlation coefficient can be defined as a measure of the relationship between two quantitative or qualitative variables, i.e., X and Y. It serves as a statistical tool that helps to analyze and in turn, measure the degree of the linear relationship between the variables.

For example, a change in the monthly income (X) of a person leads to a change in their monthly expenditure (Y). With the help of correlation, you can measure the degree up to which such a change can impact the other variables.

Types of Correlation Coefficient

Depending on the direction of the relationship between variables, correlation can be of three types, namely –

Positive Correlation (0 to +1)
Negative Correlation (0 to -1)
Zero Correlation (0)

Positive Correlation (0 to +1)

In this case, the direction of change between X and Y is the same. For instance, an increase in the duration of a workout leads to an increase in the number of calories one burns.

Negative Correlation (0 to -1)

Here, the direction of change between X and Y variables is opposite. For example, when the price of a commodity increases its demand decreases.

Zero Correlation (0)

There is no relationship between the variables in this case. For instance, an increase in height has no impact on one’s intelligence.

Now that we have refreshed our memory of these basics, let’s move on to Karl Pearson Coefficient of Correlation.

What is Karl Pearson’s Coefficient of Correlation?

This method is also known as the Product Moment Correlation Coefficient and was developed by Karl Pearson. It is one of the three most potent and extensively used methods to measure the level of correlation, besides the Scatter Diagram and Spearman’s Rank Correlation.

The Karl Pearson correlation coefficient method is quantitative and offers numerical value to establish the intensity of the linear relationship between X and Y. Such a coefficient correlation is represented as ‘r’.

The Karl Pearson Coefficient of Correlation formula is expressed as

r = \[\frac{n\left ( \sum xy \right )-\left ( \sum x \right )\left ( \sum y \right )}{\sqrt{\left [ n\sum x^{2}-\left (\sum x \right )^{2} \right ]\left [ n\sum y^{2}-\left (\sum y \right )^{2} \right ]}}\]

In this formula,

\[X-\bar{X}\]

is mean of the X variable.

\[Y-\bar{Y}\]

is the mean of the Y variable.

Methods of Karl Pearson’s Coefficient of Correlation and its Calculation

The Karl Pearson coefficient can be obtained using various methods, which are mentioned below.

Actual Method
Direct Method
Short Cut/Assumed/Indirect Method
Step Deviation Method

1. Actual Mean Method Which is Expressed as

Actual Mean Method Which is Expressed as -

r = \[\frac{\sum \left ( X-\bar{X} \right )\left ( Y-\bar{Y} \right )}{\sqrt{\sum \left ( X-\bar{X} \right )^{2}\sqrt{\left ( Y-\bar{Y} \right )^{2}}}}\]

Where, \[\bar{X}\] = mean of X variable

\[\bar{Y}\] = mean of Y variable

In this Karl Pearson formula,

x = \[X-\bar{X}\]

y = \[X-\bar{Y}\]

2. Direct Method

Steps to Calculate the Coefficient of Correlation Using the Direct Method:

1. Calculate the sum of the X series ($ \Sigma X $).

2. Calculate the sum of the Y series ($ \Sigma Y $).

3. Square each value in the X series and find their total ($\Sigma X^2$).

4. Square each value in the Y series and find their total ($\Sigma Y^2 $).

5. Multiply corresponding values of the X and Y series, then find the total ($\Sigma XY$).

6. Use the formula below to compute the Coefficient of Correlation:

$ r = \frac{N \Sigma XY - \Sigma X \cdot \Sigma Y}{\sqrt{N \Sigma X^2 - (\Sigma X)^2} \cdot \sqrt{N \Sigma Y^2 - (\Sigma Y)^2}}$

Where:

N is the number of paired values in the dataset.
r is the coefficient of correlation.

3. Assumed Mean Method Which is Expressed as

Assume Mean Method

d\[_{x}\] = X - A

d\[_{y}\] = Y - A

r = \[\frac{N\sum d_{x}d_{y}-\left ( \sum d_{x} \right )\left ( \sum d_{y} \right )}{\sqrt{N\sum d_{x}^{2}-\left ( \sum d_{x} \right )^{2}}-\sqrt{N\sum d_{y}^{2}-\left ( \sum d_{y} \right )^{2}}}\]

In this Karl Pearson Correlation formula,

dx = x-series’ deviation from assumed mean, wherein (X - A)
dy = Y-series’ deviation from assumed mean = ( Y - A)
Σdx.dy implies summation of multiple dx and dy.
Σdx2 is the summation of the square of dx.
Σdy2 is the summation of the square of dy.
Σdx is the summation of X-series' deviation.
Σdy is a summation of the Y-series.

N is the number of observations in pairs.

4. Step Deviation Method Which is Expressed as

r = \[\frac{dX'dY'-\frac{\sum d'X\sum dY'}{N}}{\sqrt{\left ( \sum dx^{1} \right )^{2}}-\frac{\left (\sum dx^{1} \right )^{2}}{N}.\left ( \sum dy' \right )^{2}\frac{\left ( \sum dy' \right )^{2}}{N}}\]

In this particular Karl Pearson Method,

dx′=dxC1dx′=dxC1

dy′=dyC2dy′=dyC2

C1 = Common factor for series -x

C2 = Common factor for series -y

dx is x-series’ deviation from the assumed mean, where (X - A)

dy is Y-series’ deviation from the assumed mean, where ( Y - A)

Σdx.dy implies summation of multiple dx and dy.

Σdx2 is the summation of the square of dx.

Σdy2 is the summation of the square of dy.

Σdx is the summation of X-series' deviation.

Σdy is the summation of the Y-series.

N is the number of observations in pairs.

Characteristics of Karl Pearson's Coefficient of Correlation

Value Range: It ranges from -1 to +1. A value close to +1 shows a strong positive relationship, close to -1 shows a strong negative relationship, and 0 means no linear relationship.
Direction: A positive value means both variables move in the same direction (e.g. if one increases, the other increases). A negative value means the variables move in opposite directions (e.g. if one increases, the other decreases).
Measures Linear Relationship: It only shows the strength of a straight-line (linear) relationship between two variables. It doesn’t work well for curved or complex relationships.
Unit-Free: The coefficient doesn’t depend on the units of measurement (e.g., cm, kg). It’s a pure number that shows strength and direction.
Symmetry: The correlation between X and Y is the same as the correlation between Y and X. This means the order of variables doesn’t matter.

Advantages and Disadvantages of Karl Pearson's Coefficient of Correlation:

Advantages

The formula is straightforward and easy to compute with basic statistical tools.
It not only shows how strongly two variables are related but also whether the relationship is positive or negative.
The coefficient is not affected by the units of measurement, making it easy to compare relationships across different datasets.
The correlation between X and Y is the same as the correlation between Y and X, making the calculation unbiased with respect to the order of variables.
It is a well-established and commonly used method, making it reliable for comparing relationships in many fields.

Disadvantages

It cannot detect non-linear relationships, so it may give misleading results if the relationship is not straight-line.
Extreme values in the data can distort the result and give an incorrect measure of correlation.
The method assumes the data is normally distributed, which may not always be the case in real-life scenarios.
It only shows association, not cause and effect. Two variables might be correlated but not necessarily related in a meaningful way.
It only works with numerical data and cannot be used for categorical variables.

Solving a Few Karl Pearson Coefficient of Correlation Questions

Task 1: Refer to the table below and find out ‘r’ with the help of the provided data. Use the Actual Mean Method to solve it.

Price of Mango (Rs.)	15	25	35	40	50	65	75
Supply of Mango (units)	2	5	6	8	9	10	14

Task 2: With the help of this table below, find out ‘r’ using Karl Pearson Coefficient of Correlation Direct Method Formula.

Age of husband	21	24	27	29	31	35	38
Age of wife	19	21	25	26	29	32	34

Pro Tip: Try to solve one or two Karl Pearson coefficient of correlation problems using all the methods to figure out which is the easiest and shortest method of the lot. However, make sure to be thorough with all the formulas of the Karl Pearson coefficient of correlation, so that you can attempt them in your exams with greater confidence.

Once you have solved the Karl Pearson Coefficient of Correlation sums, you will be able to understand the degree of relationship between discussed variables and relate it with reality better.

Since we gained a fair idea about Pearson’s correlation of coefficient and have also become familiar with its question format, let’s learn about its properties as well.

In case you are wondering, “Why should I check out the properties of coefficient of correlation?” - Note that a clear idea about correlation coefficient will come in handy both during exam preparation and while solving Karl Pearson Coefficient of Correlation sums. It will help you retain every minute yet vital pointer about this ratio and would further prevent you from making any silly mistake.

That being said, let’s glance through these significant properties in brief –

The Correlation Coefficient (r) does not have any unit.
r with a positive value signifies that both X and Y move along the same direction.
r with a negative value indicates an inverse relation between X and Y.
X and Y are said to be not correlated if the value of r is 0.
r with a high value signifies a strong linear relationship between two variables.
r with a low value signifies a weak relationship between two variables.
Correlation between two variables is said to be perfect if the value of r is either +1 or -1.

Assumptions of Karl Pearson's Coefficient of Correlation

When we calculate the Karl Pearson Correlation, we are required to make a few assumptions in mind.

Following are the two main assumptions:

There is always a linear relationship between any two variables.
We are required to keep the outliers to a minimum range or remove them totally.

Outliers are data that contrasts drastically with the rest of the data. It might signify many extreme data which actually does not fit in the set. You can spot an outlier by plotting the data in a graph paper and looking for any extreme study.

Use of Karl Pearson Coefficient in Real Life

We see that the Karl Pearson Coefficient Correlation is used extensively in mathematical procedures. In the calculation of any economic problem, this gains great vitality by estimating the variables for X and Y and thereby sorting to find the intensity between them.

To logically and accurately understand the effect of one change in regard to another we can use this method. For example, a shoe manufacturer in order to understand the varied sizes of shoes he first needs to assimilate the common foot sizes, after placing them in the Karl Pearson Coefficient Correlation formula he can estimate the requirement accordingly.

FAQs on Karl Pearson's Coefficient of Correlation: Calculation and Use

1. What does Karl Pearson's Coefficient of Correlation measure?

Karl Pearson's Coefficient of Correlation, denoted by 'r', is a statistical measure used to determine the strength and direction of a linear relationship between two quantitative variables. It provides a numerical value that indicates how closely the data points for two variables fall along a straight line.

2. What do the different values of Karl Pearson's coefficient 'r' signify?

The value of Karl Pearson's coefficient 'r' always lies between -1 and +1. Each value indicates a specific type of relationship:

Positive Correlation (0 to +1): This means both variables move in the same direction. A value of +1 indicates a perfect positive linear relationship.
Negative Correlation (0 to -1): This means the variables move in opposite directions. A value of -1 indicates a perfect negative linear relationship.
Zero Correlation (0): This indicates there is no linear relationship between the two variables.

3. What is the most common formula to calculate Karl Pearson's Coefficient of Correlation?

The most frequently used formula, often called the direct method formula, for calculating Karl Pearson's Coefficient of Correlation (r) is:

r = [NΣXY - (ΣX)(ΣY)] / √{[NΣX² - (ΣX)²][NΣY² - (ΣY)²]}

Where:

N is the number of pairs of observations.
ΣXY is the sum of the product of corresponding X and Y values.
ΣX and ΣY are the sums of the X and Y series, respectively.
ΣX² and ΣY² are the sums of the squares of the X and Y values.

4. Can Karl Pearson's coefficient prove a cause-and-effect relationship between two variables?

No, it cannot. This is a crucial limitation. Karl Pearson's coefficient only measures the strength and direction of an association between two variables. A strong correlation does not prove that a change in one variable causes the change in the other. This is often summarised as 'correlation does not imply causation'.

5. What are the key assumptions for using Karl Pearson's correlation coefficient reliably?

For the Karl Pearson coefficient to be a valid and reliable measure, two main assumptions must be met:

Linear Relationship: The relationship between the two variables must be linear. The coefficient is not suitable for measuring non-linear or curved relationships.
Minimal Outliers: The data should not have significant outliers (extreme values), as these can heavily distort the correlation coefficient and lead to misleading conclusions.

6. What are the main advantages of using Karl Pearson's method for measuring correlation?

Karl Pearson's method is widely used due to several key advantages:

It provides a quantitative value for both the strength and direction (positive or negative) of the relationship.
The coefficient is a unit-free measure, allowing for easy comparison of relationships across different datasets with different units.
The calculation is symmetric, meaning the correlation of X and Y is the same as the correlation of Y and X.

7. How do extreme values or outliers impact the Karl Pearson coefficient?

Extreme values, or outliers, can significantly distort the Karl Pearson coefficient of correlation. A single outlier that doesn't follow the general trend of the data can artificially inflate or deflate the 'r' value, giving a misleading impression of the strength of the linear relationship. It is one of the major disadvantages of this method.

8. What are the different methods available to calculate Karl Pearson's coefficient?

There are four primary methods to calculate Karl Pearson's coefficient of correlation, each suited for different types of data:

Actual Mean Method: Uses deviations from the actual means of the X and Y series.
Direct Method: Uses the raw scores of the variables without calculating deviations.
Short-Cut Method (Assumed Mean Method): Uses deviations from an assumed mean, which simplifies calculations when the actual mean is a fraction.
Step-Deviation Method: A further simplification of the short-cut method, used when deviations can be divided by a common factor.