How to Calculate the Correlation Coefficient
2026-02-28 17:27 Diff

1384 Learners

Last updated on November 26, 2025

Correlation coefficients measure the relationship between two variables, such as screen time and mental health. These coefficients are used in numerous fields such as finance, education, and health care. In this topic, you will learn about the correlation coefficient and its significance from a broader perspective.

What is the Correlation Coefficient?

The correlation coefficient is a statistical metric that measures how strongly two variables are linearly related.  The values of the correlation coefficient range from -1 to 1. If the correlation coefficient is -1, the relationship between the variables indicates a negative or inverse correlation. When the correlation coefficient is 1, the variables are in positive correlation and are directly proportional. The correlation coefficient of zero indicates that there is no significant relationship between the variables.

Here are a few key takeaways to help you grasp the concept at a glance:

  • Correlation coefficients are applied to measure the strength of the relationship possessed by two variables.
     
  • A correlation coefficient of 1 indicates a direct relationship between two variables and a -1 shows the variables are in negative correlation.
     
  • If the variables don’t possess a significant connection or a weak correlation, the correlation coefficient will be 0.
     
  • The Pearson coefficient or Pearson’s R is the most prominent type of correlation coefficient that measures the strength and direction of linear relationships.
     

What is the Difference Between Correlation and Regression?

Correlation and regression are two related but different concepts. Understanding their differences will help you understand them better. Let’s look at a few key differences between Correlation and regression:
 

Correlation

Regression

Analyzes the strength and direction of the linear connection between two variables.

Measures the relationship between an independent variable and a dependent variable. 

The correlation can be positive or negative, depending on the connection between the variables.

Establishes a functional dependence, where the changes in one variable directly affect the other.

Correlation is the same for both variables.

It is not the same for both the variables

Formula of Correlation Coefficient

The correlation coefficient formula helps measure the strength and direction of the relationship between two variables. Several forms of the formula are used depending on the type of data. The main correlation coefficient formulas are given below.


Pearson’s Correlation Coefficient Formula 
 

\( r = \frac{n\sum xy - (\sum x)(\sum y)} {\sqrt{\bigl[n\sum x^{2} - (\sum x)^{2}\bigr] \bigl[n\sum y^{2} - (\sum y)^{2}\bigr]}} \)


Where:

  • n is the number of data pairs
     
  • \(Σx\) is the sum of all values for the first variable.
     
  • \(Σy\) is the sum of all values for the second variable.
     
  • \(Σxy\) is the sum of the product of first and second values.
     
  • \(Σx^2\) is the sum of squares of the first value.
     
  • \(Σy^2\) is the sum of squares of the second value. 

Sample Correlation Coefficient Formula
 

\(r_{xy} = \frac{s_{xy}}{s_x \cdot s_y} \)

\(S_x, S_y \) represent the standard deviations
 

\(S_{xy}\) is the sample covariance

Population Correlation Coefficient Formula

\(\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x \, \sigma_y} \)

Where:\(\sigma_x, \sigma_y \) is the population standard deviation


\(Σxy \) is the population covariance

 Linear Correlation Coefficient Formula
 

\(r_{xy} = \frac{n \sum_{i=1}^n x_i y_i - \left( \sum_{i=1}^n x_i \right)\left( \sum_{i=1}^n y_i \right)} {\sqrt{n \sum_{i=1}^n x_i^2 - \left( \sum_{i=1}^n x_i \right)^2} \; \sqrt{n \sum_{i=1}^n y_i^2 - \left( \sum_{i=1}^n y_i \right)^2}} \)

Explore Our Programs

How to Calculate the Correlation Coefficient?

We can calculate the correlation coefficient easily by understanding each step listed below:
 

  • To calculate the correlated coefficient, the initial step is to find the covariance of the variables provided.
     
  • Now, divide the resultant covariance of the variables by the product of their standard deviation.
     
  • To better understand the concept, let’s look at its equation:
     

\(\rho_{xy} = \frac{\text{Cov}(x, y)}{\sigma_x \, \sigma_y} \)

            Here: 

\(\rho_{xy}\) represents Pearson’s product-moment correlation coefficient
 

Cov(x, y) is the covariance of variables x and y

\(\sigma_x, \sigma_y \) are the standard deviations for variables x and y.

Suppose a teacher wants to check whether students who study more hours tend to score higher marks. The data for 5 students is given below: 

Students Study hour (x) Mark (y) 1 2 50 2 3 60 3 5 80 4 6 85 5 4 70

To calculate the correlation coefficient using the formula: 

\(\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x \, \sigma_y} \)

Find the mean of x and y:

  \({\bar x} = {{2 + 3 + 5 + 6 + 4}\over{5 }}= 4 \)

\({\bar y} = {{50 + 60 + 80 + 85 + 70}\over {5}} = 69 \)

Finding the covariance: 


\(\text{Cov}(x, y) = \frac{1}{n} \sum (x_i - \bar{x})(y_i - \bar{y}) \)

x y x - 4 y -69 \((x - 4)(y - 69)\) 2 50 -2 -19 38 3 60 -1 -9 9 5 80 1 11 11 6 85 2 16 32 4 70 0 1 0

\(Σ (x - x) (y - y) = 38 + 9 + 11 + 32 + 0 \\ \ \\ = 90\)

Calculating the standard deviations: 
 

For x: \(\sigma_x = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n}} \)
 

\( (x_i - \bar{x})^2 = (-2)^2 + (-1)^2 + 1^2 + 2^2 + 0^2 \)

\(= 4 + 1 + 1 + + 4 \\ \ \\ = 10\)

\(= \sqrt{\frac{10}{5}} \)

\(={ \sqrt 2} = 1.414\)

For y: 

\((y - {\bar y})^2 = 361 + 81 + 121 + 256 + 1 = 820 \)

\(\sigma_y = {\sqrt {820 \over 5}} \)

\(= \sqrt {164}\)

=12.806

Apply the formula: \(\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x \, \sigma_y} \)
 

\(\rho_{xy} = {{18 \over(1.414) (12.806)}}\)

\(= {18\over 18.10} = 0.994 \)

\(\rho_{xy} = 0.99\)

Tips and Tricks to Master How to Calculate the Correlation Coefficient

The correlation coefficient indicates the strength of the relationship between two variables and whether it is positive or negative. Here are a few tips and tricks to master the correlation coefficient.

  • Understand the basic concept of the correlation coefficient. The correlation coefficient measures the strength and direction of the relationship between two variables.
     
  • Students should understand and memorize the formula related to the correlation coefficient.
     
  • Students can organize the data using tables with columns for x, y, x2, y2, and xy to simplify calculations.
     
  • Students can use correlation coefficient calculators to verify the answer.
     
  • Teachers can use real-life examples to make the correlation meaningful for students.
     
  • Parents can encourage students to use tools such as calculators, Excel, or statistical software to help them understand the concept of correlation.

Common Mistakes and How to Avoid Them in Calculating Correlation Coefficients

Students tend to make mistakes when calculating correlation coefficients. Such mistakes occur due to various reasons. Let’s explore such errors and a few tips to avoid them:

Real-Life Applications of How to Calculate Correlation Coefficient

Correlation coefficients are applied in various fields to determine the linear relationship between two different quantities. Let’s learn how they can be applied in various fields:

  • Schools apply the correlation coefficient in analyzing how external factors like study hours, and screen time, affect their academic performance.
     
  • Businesses plan their marketing strategies using the correlation between consumer behavior and discounts.
     
  • Correlation is applied in healthcare to study how sleeping hours or eating habits affect certain diseases.
     
  • In social science, correlation is used to determine the relationship between literacy and the scope of job opportunities.
     
  • Phone manufacturers analyze the impact of mobile usage on battery life.

Download Worksheets

Problem 1

A café owner wants to analyze if temperature affects cool drinks sales. They collect data for 5 days:

Okay, lets begin

The resultant value (0.98) shows that there is a positive correlation between the variables. 

Explanation

We organize the data provided:
 

Day

Temperature

(°C)(x)

Cool Drinks 

Sales(Y)

XY X2 Y2 1 24 200 4800 576 40000 2 32 300 9600 1024 9000 3 25 250 6250 625 62500 4 33 350 11550 1089 122500 5 40 450 18000 1600 202500


Calculating the sums:

\(∑X = 24 + 32 + 25 +33 +40 = 154 \)

\(\sum Y = 200 + 300 + 250 + 350 + 450 = 1550 \)

\(∑XY= 4800 + 9600 +6250 +11550 +18000 = 50200 \)

\(\sum X^2 = 576 + 1024 + 625 + 1089 + 1600 = 4914 \)

\(∑Y^2 = 40000 + 90000 +62500 +122500 + 202500 = 517500\)

Given that \(n = 5\)

Using Pearson’s Correlation Formula,

 \( r = \frac{n\sum xy - (\sum x)(\sum y)} {\sqrt{\bigl[n\sum x^{2} - (\sum x)^{2}\bigr] \bigl[n\sum y^{2} - (\sum y)^{2}\bigr]}} \)


Substituting values into the formula:

\(r = {{(5 × 50200) − (154 × 1550)} \over {\sqrt {[5 × 4914) − (154)^2][(5 × 517500)-- (1550)^2]}}} \)

\({= {{(251000 – 238700)}\over { \sqrt {(24570 − 23716) (2587500 – 2402500)} }}}\)

\(= {{12300} \over {\sqrt{(854 × 185000)}}} \)

\( = {{12300\over {\sqrt{157990000}} }} \)

\( = {12300\over12573.37 } \)

\( r ≈ 0.98 \)

Here, the resultant value shows that there is a positive correlation between the variables. This indicates that temperature and cool drinks are directly proportional.

Well explained 👍

Problem 2

A student wants to see if reading time affects student exam scores. Data for 4 students is collected:

Okay, lets begin

The resultant value is 0.998, and it shows that there is a positive correlation between the variables. This indicates that an increase in reading time leads to an increase in the score results.

Explanation

Student  Reading hours Test score XY \(X^2\) \(Y^2\) 1 3 60 180 9 3600 2 5 70 350 25 4900 3 7 82 574 49 6725 4 9 95 855 81 9025

Calculating the sum:

\(∑X = 3 + 5 +7 +9 = 24\)
 

\(∑Y = 60 + 70 + 82 + 95 = 307 \)
 

\(∑XY= 180 + 350 + 574 + 855 = 1959 \)

\( ∑X^2 = 9 + 25 + 49 + 81 +1600 = 164 \)

\( ∑Y^2 = 3600 + 4900 + 6724 + 9025 = 24249\)
 

Given that \(n = 4\)
 

Using Pearson’s Correlation Formula:

\( r = \frac{n\sum xy - (\sum x)(\sum y)} {\sqrt{\bigl[n\sum x^{2} - (\sum x)^{2}\bigr] \bigl[n\sum y^{2} - (\sum y)^{2}\bigr]}} \)


Substituting values into the formula:
 

\(r = \frac{(4 \times 1959) - (24 \times 307)} {\sqrt{\, [4 \times 164 - 24^2] \, [4 \times 24249 - 307^2]\, }} \)
 

\(= \frac{7836 - 7368}{\sqrt{(656 - 576)\,(96996 - 94249)}} \)


= \(468 \over {\sqrt{(80 × 2747)}}\)

\(​ = \frac{468}{\sqrt{219760}} \)

\( = \frac{468}{468.86} \)
 

r ≈ 0.998
 

Here, the resultant value shows that there is a positive correlation between the variables. This indicates that an increase in reading time leads to an increase in the score results.

Well explained 👍

Problem 3

Anita records her daily exercise hours and weight loss of 5 of her friends over a month:

Okay, lets begin

The correlation coefficient is 0.91 which indicates that there is a positive correlation between the variables.

Explanation

We organize the data provided: 
 

Friends

Exercise

Hours(x)

Weight Loss
(Kg)(Y)

XY X2 Y2 1 3 4 12 9 16 2 2 3 6 4 9 3 4 5 20 16 25 4 1 2 2 1 4 5 5 10 50 25 100

Calculating the sums:

\(∑X = 3 + 2 + 4 + 1 + 5 = 15\)

\(∑Y = 4 + 3 + 5 + 2 + 10 = 24\)

\(∑XY= 12 + 6 + 20 + 2 + 50 = 90 \)

\(∑X^2 = 9 + 4 + 16 + 1+ 25 = 55 \)

\(∑Y^2 = 16 + 9 + 25 + 4 + 100 = 154\)

Given that \(n = 5\)

Using Pearson’s Correlation Formula:

\(r = \frac{n(\Sigma xy) - (\Sigma x)(\Sigma y)} {\sqrt{\,\left[n\Sigma x^{2} - (\Sigma x)^{2}\right]\left[n\Sigma y^{2} - (\Sigma y)^{2}\right]\,}} \)

Substituting values into the formula:

\(= \frac{(5 \times 90) - (15 \times 24)} {\sqrt{\, [5 \times 55 - 15^{2}] \,[5 \times 154 - 24^{2}]\,}} \)

\(= \frac{450 - 360}{\sqrt{(275 - 225)\,(770 - 576)}} \)

\(= \frac{90}{\sqrt{50 \times 194}} \)

\(= \frac{90}{\sqrt{9700}} \)

\(= \frac{90}{98.49} \)

r ≈ 0.91

Well explained 👍

FAQs on How to Calculate the Correlation Coefficient

1.How do we interpret what the value of the correlation coefficient indicates?

To understand the value of correlation, understand the scale of the correlation coefficient:

If r is equal to 1: Indicates both variables increase together, and they are in a perfect positive correlation.
If r is equal to -1: Indicates a perfect negative correlation where when one variable increases the other decreases.
If r is equal to 0: No correlation (no relationship between the variables).
0 < r < 1: There is a positive correlation between the variables (medium to strong relationship).
-1 < r < 0: Indicates a negative correlation, which means that as one variable increases the other decreases.

2.Give one real-life application of correlation.

Correlation helps us in determining the relationship between two variables, such as the effect of sleeping hours on productivity.
 

3.Provide a formula for Pearson’s Correlation Coefficient formula.

\(r = \frac{n(\Sigma xy) - (\Sigma x)(\Sigma y)} {\sqrt{\,[n\Sigma x^{2} - (\Sigma x)^{2}]\, [n\Sigma y^{2} - (\Sigma y)^{2}]\,}} \)
Where:

  • r denoted is the correlation coefficient.
  • n is the number of data pairs.
  • Σx is the sum of all values for the first variable.
  • Σy is the sum of all values for the second variable.
  • Σxy is the sum of the product of first and second values.
  • Σx2 is the sum of squares of the first value.
  • Σy2 is the sum of squares of the second value. 
     

4.How can we calculate the correlation coefficient in simple steps?

To calculate the correlation coefficient, perform the following steps:

  • List the values X, Y, XY, X2, and Y2 in a table.
  • Determine the sum of the listed values
  • Substitute the values into a correlation formula
  • Solve to obtain an outcome that analyzes the correlation
  • Interpret the outcome using the correlation scale.
     

5.Give one limitation of correlation analysis.

The correlation coefficient cannot be used in determining non-linear relationships.
 

Jaipreet Kour Wazir

About the Author

Jaipreet Kour Wazir is a data wizard with over 5 years of expertise in simplifying complex data concepts. From crunching numbers to crafting insightful visualizations, she turns raw data into compelling stories. Her journey from analytics to education ref

Fun Fact

: She compares datasets to puzzle games—the more you play with them, the clearer the picture becomes!