📐

Math · Statistics

Correlation Coefficient Calculator

Q: What is the Pearson correlation coefficient?

Pearson's r is a number between −1 and +1 that measures the strength and direction of the linear relationship between two continuous variables. A value of +1 indicates a perfect positive linear relationship, −1 a perfect negative one, and 0 indicates no linear relationship.

Q: What is a strong, moderate, or weak correlation?

A widely used guideline classifies |r| >= 0.7 as strong, 0.4 <= |r| < 0.7 as moderate, and 0 < |r| < 0.4 as weak. These are conventions, not absolute rules — what counts as meaningful depends on the field and context of the study.

Q: What is R² and how does it differ from r?

R² (the coefficient of determination) equals r squared. While r indicates direction and strength of a linear relationship, R² represents the proportion of variance in one variable that can be linearly explained by the other. For example, r = 0.8 gives R² = 0.64, meaning 64% of the variance in y is explained by its linear relationship with x.

Q: Can Pearson's r detect nonlinear relationships?

No. Pearson's r measures only linear associations. Two variables can have a strong nonlinear relationship (such as quadratic or exponential) while r remains close to 0. Always visualize data with a scatter plot alongside r to check for nonlinear patterns.

Q: Does a high correlation mean causation?

No. A high r value indicates that the two variables tend to move together linearly, but it does not establish that one causes the other. A confounding variable (a third factor influencing both) can produce high correlation without any direct causal relationship. Establishing causation requires controlled experimental design or rigorous causal inference methods.

Q: How many data points are needed for a meaningful correlation?

There is no hard minimum, but with very few data points (fewer than 5 to 10), the estimate of r is highly unstable and may not reflect the true population correlation. Statistical significance testing can help: with small n, even large observed r may not be significant. As a practical guide, at least 20 to 30 pairs are commonly recommended for a reasonably stable estimate.

Compute Pearson's r correlation coefficient and coefficient of determination R² from any set of paired (x, y) data. Get an instant strength label and direction.

Data Pairs (x, y)

Enter one x, y pair per line (comma or space separated). Example: 1, 3

Pearson's r

+0.9972Strong Positive

−10+1

Coefficient of Determination (R²)

0.9944

Sample Size (n)

Mean of X (x̄)

6.0000

Mean of Y (ȳ)

72.0000

Strength

Strong Positive

r ranges from −1 (perfect negative) to +1 (perfect positive). Values near 0 suggest little linear relationship.

Share your result

Pearson Correlation Coefficient: Definition, Formula, and Interpretation

The correlation coefficient is one of the most widely used tools in statistics for describing the linear relationship between two variables. Named after English mathematician Karl Pearson, who formalized it in the late 19th century, Pearson's r quantifies both the strength and direction of a linear association. A single number between −1 and +1 summarizes how closely two variables move together, making it indispensable in fields from psychology and economics to biology and engineering.

Unlike covariance, which depends on the units of measurement, Pearson's r is dimensionless and scale-free. Whether you are comparing heights in centimeters with weights in kilograms, or exam scores with hours of study, r provides a directly interpretable and comparable measure.

The Formula

Pearson's r is calculated as: r = Σ((xi − x̄)(yi − ȳ)) / √(Σ(xi − x̄)² × Σ(yi − ȳ)²). Here, xi and yi are individual data values, x̄ and ȳ are the sample means, and the summations run over all n data pairs. The numerator captures how much x and y co-vary (their joint deviations from their respective means), while the denominator normalizes by the product of their individual spreads.

Equivalently, r can be written as the covariance of x and y divided by the product of their standard deviations: r = Cov(x, y) / (σx × σy). This form makes the scale-independence of r explicit: dividing by the standard deviations removes the effect of units, leaving only the correlation structure.

The coefficient of determination R² = r² gives the proportion of the variance in one variable that is statistically explained by a linear relationship with the other. For example, r = 0.8 yields R² = 0.64, meaning 64% of the variability in y can be attributed to its linear relationship with x under this model.

Interpreting the Value of r

r = +1 indicates a perfect positive linear relationship: as x increases, y increases in exact proportion. Every data point lies exactly on a straight line with a positive slope. r = −1 indicates a perfect negative linear relationship: as x increases, y decreases proportionally, with all points on a straight line with a negative slope. r = 0 means no linear relationship exists between the two variables, though a nonlinear relationship may still be present.

In practice, values are rarely exactly 0 or plus/minus 1. A common convention classifies |r| >= 0.7 as strong, 0.4 <= |r| < 0.7 as moderate, and 0 < |r| < 0.4 as weak. These thresholds are guidelines, and the appropriate interpretation depends on context. In social sciences, r = 0.3 may be considered meaningful; in physics or engineering, only values near plus/minus 1 may be practically significant.

It is important to note that r measures only linear association. Two datasets with r = 0 can still have a strong nonlinear relationship (for example, a perfect quadratic or sinusoidal pattern). Visualizing data with a scatter plot alongside the numerical r value is always recommended.

Assumptions and Limitations

Pearson's r assumes that the relationship between the two variables is linear, that the data are measured on interval or ratio scales, and that there are no extreme outliers that would disproportionately influence the result. Outliers can dramatically inflate or deflate r, potentially producing a misleading picture of the relationship. Checking a scatter plot for outliers before interpreting r is considered good practice.

Another important caveat: correlation does not imply causation. Even a strong r value does not indicate that changes in one variable cause changes in the other. A confounding variable — a third factor affecting both — can produce high correlation without any direct causal link. The classic example is the positive correlation between ice cream sales and drowning rates, both driven by summer heat rather than by each other.

When the normality assumption is violated or when data contain ordinal-scale variables, Spearman's rank correlation coefficient is often preferred over Pearson's r. Spearman's rho measures monotonic relationships and is more robust to outliers and non-normal distributions.

Worked Example

Suppose five students' hours of study (x) and exam scores (y) are: (2, 50), (4, 60), (6, 70), (8, 85), (10, 95). The means are x̄ = 6 and ȳ = 72. The deviations from the mean for x are −4, −2, 0, 2, 4 and for y are −22, −12, −2, 13, 23. The cross-products sum to 88 + 24 + 0 + 26 + 92 = 230. The sum of squared x-deviations is 40 and the sum of squared y-deviations is 1330.

Pearson's r = 230 / √(40 × 1330) = 230 / √53200 ≈ 0.997. This very high positive r indicates that study hours and exam scores are nearly perfectly linearly related in this sample. R² ≈ 0.994, suggesting that roughly 99.4% of the variance in exam scores is linearly explained by hours of study in this dataset.

This result illustrates how r captures a clear upward trend. In real-world datasets, results are rarely this clean — variability from measurement error, confounding factors, and natural variation typically produce r values well below 1 in magnitude.

Applications Across Fields

In psychology and social sciences, Pearson's r is ubiquitous. Researchers use it to examine relationships between test scores, to validate psychometric instruments, and to explore relationships between demographic factors and outcomes. In clinical research, r can describe the association between a biomarker and a disease severity measure.

In finance and economics, correlation is used to describe the co-movement of asset returns, measure diversification benefits in portfolios, and analyze economic indicators. A high positive correlation between two assets means they tend to rise and fall together, offering little diversification; low or negative correlation is prized for risk management.

In the natural sciences, r is routinely applied to calibration curves, to assess genetic associations between traits, and in environmental monitoring. In machine learning, correlation analysis is a common feature selection step — highly correlated features may be redundant and candidates for removal.

Statistical Significance of r

A nonzero r in a sample does not necessarily mean a true linear relationship exists in the population — it could arise from sampling variability. To assess whether the observed r is statistically significant, researchers use a t-test: t = r × √(n − 2) / √(1 − r²), which follows a t-distribution with n − 2 degrees of freedom under the null hypothesis that the population correlation is zero.

For small samples, even a fairly large |r| may not be statistically significant. With very large samples (n = 10,000), a tiny r = 0.03 can be statistically significant while explaining only 0.09% of variance — negligible in any practical context. Both statistical significance and effect size (r or R²) should be reported and interpreted together.

Frequently Asked Questions

What is the Pearson correlation coefficient?

Pearson's r is a number between −1 and +1 that measures the strength and direction of the linear relationship between two continuous variables. A value of +1 indicates a perfect positive linear relationship, −1 a perfect negative one, and 0 indicates no linear relationship.

What is a strong, moderate, or weak correlation?

A widely used guideline classifies |r| >= 0.7 as strong, 0.4 <= |r| < 0.7 as moderate, and 0 < |r| < 0.4 as weak. These are conventions, not absolute rules — what counts as meaningful depends on the field and context of the study.

What is R² and how does it differ from r?

R² (the coefficient of determination) equals r squared. While r indicates direction and strength of a linear relationship, R² represents the proportion of variance in one variable that can be linearly explained by the other. For example, r = 0.8 gives R² = 0.64, meaning 64% of the variance in y is explained by its linear relationship with x.

Can Pearson's r detect nonlinear relationships?

No. Pearson's r measures only linear associations. Two variables can have a strong nonlinear relationship (such as quadratic or exponential) while r remains close to 0. Always visualize data with a scatter plot alongside r to check for nonlinear patterns.

Does a high correlation mean causation?

No. A high r value indicates that the two variables tend to move together linearly, but it does not establish that one causes the other. A confounding variable (a third factor influencing both) can produce high correlation without any direct causal relationship. Establishing causation requires controlled experimental design or rigorous causal inference methods.

How many data points are needed for a meaningful correlation?

There is no hard minimum, but with very few data points (fewer than 5 to 10), the estimate of r is highly unstable and may not reflect the true population correlation. Statistical significance testing can help: with small n, even large observed r may not be significant. As a practical guide, at least 20 to 30 pairs are commonly recommended for a reasonably stable estimate.

Related Calculators

📈

Average Calculator

Calculate mean, median, mode, and range of numbers.

📊

Confidence Interval Calculator

Calculate the confidence interval for a population mean using sample data.

📊

Gini Coefficient Calculator

Calculate the Gini coefficient to measure income or wealth inequality in a distribution.