Correlation in Excel — Complete Guide (2026)

Correlation in Excel using CORREL, PEARSON, and the correlation matrix. Step-by-step formulas, scatter plots, p-values, and Spearman rank in plain English.

Microsoft ExcelBy Katherine LeeMay 26, 202615 min read
Correlation in Excel — Complete Guide (2026)

Correlation in Excel tells you how strongly two columns of numbers move together. A monthly ad budget and monthly sales. Height and shoe size. Hours studied and final grade. If one goes up when the other goes up, they're positively correlated. If one goes up when the other goes down, the correlation is negative. If they wander independently, the correlation hovers near zero. The number Excel calculates — called r, the Pearson product-moment correlation coefficient — always sits somewhere between -1 and +1, and it summarises the linear relationship between two variables in a single decimal.

Here's the good news. You don't need a statistics degree to compute it. Excel ships with two built-in functions that do the maths for you: CORREL and PEARSON. They return the exact same number. For three or more variables at once, the Analysis ToolPak adds a one-click correlation matrix that fills a whole grid of r values in seconds.

This guide walks through every method — the simple functions, the matrix tool, scatter-plot visualisation, p-value significance testing, Spearman rank correlation for non-normal data, and the most common pitfalls that trip people up the first time they try this. By the end you'll know which method to use, when to use it, and how to interpret the result without overstating what correlation actually proves.

One thing to settle up front: correlation is not causation. A high r between two columns means they move together. It does not mean one causes the other. Ice-cream sales and shark attacks both spike in July. The correlation is real. The cause is summer, not the ice cream. We'll come back to this later — but if you take only one idea away from this guide, take that one. For more on data preparation before running correlations, see our Excel reference and the Excel pivot tables primer for summarising data first.

Microsoft Excel - Microsoft Excel certification study resource

How to Compute Correlation in Excel Right Now

Put your first variable in column A, your second variable in column B, same number of rows in each. In an empty cell, type =CORREL(A2:A100, B2:B100) and press Enter. Excel returns a number between -1 and +1. That's your correlation coefficient. Closer to +1 or -1 means a stronger relationship. Closer to 0 means weaker. That's it — the rest of this guide is interpretation, edge cases, and the matrix view for multiple variables.

How to Read the r Value

📈+1.0Perfect positive linear relationship
🔼+0.7Strong positive correlation
↗️+0.3Weak positive correlation
0No linear relationship
↘️-0.3Weak negative correlation
🔽-0.7Strong negative correlation

Two Functions, One Result

🧮CORRELMost Used

=CORREL(array1, array2) returns the Pearson r between two equal-length ranges. Most common function, shortest name, what you'll see in 90% of Excel tutorials and templates.

📐PEARSONIdentical Output

=PEARSON(array1, array2) returns the same number as CORREL. Excel keeps both for backwards compatibility. Identical output to ten decimal places — use whichever name reads more clearly in your spreadsheet.

📊Analysis ToolPakMatrix View

Data → Data Analysis → Correlation. Lets you select multiple columns at once and produces a full pairwise matrix. Requires enabling the ToolPak add-in first (File → Options → Add-ins).

🎯RSQVariance

=RSQ(known_y, known_x) returns r-squared (the coefficient of determination), which is just r multiplied by itself. Tells you how much variance in y is explained by x. Used a lot in regression contexts.

Let's walk through CORREL with a real example. Suppose column A holds twelve monthly advertising budgets — A2 through A13 — and column B holds the matching monthly revenue figures in the same rows. In any empty cell type =CORREL(A2:A13, B2:B13). Hit Enter.

Excel scans both ranges, calculates the means, computes the covariance, divides by the product of standard deviations, and spits out a single number. Say you get 0.84. That tells you ad spend and revenue move together strongly — when one rises, the other tends to rise too. The relationship is positive (both move the same direction) and strong (close to +1).

Now swap in PEARSON: =PEARSON(A2:A13, B2:B13). Same 0.84. Microsoft documented these functions identically — both implement the Pearson product-moment correlation formula. The only difference is the name. Use whichever feels more readable in your sheet. Most analysts default to CORREL because it's shorter and the name matches the everyday word.

What if your two columns are different lengths? Excel returns #N/A. The arrays must contain the same number of rows, and the rows must line up — row 5 in column A pairs with row 5 in column B. If one column has a blank cell, Excel ignores that whole pair (both the cell and its partner in the other column drop out of the calculation). This is usually what you want.

But it's worth knowing, because a column with stray blanks can produce a correlation based on fewer data points than you expected. Quick check: =COUNT(A2:A13) and =COUNT(B2:B13) should match — and should match the number of pairs you think you're analysing. For a refresher on counting cells, our COUNTIFS Excel guide covers conditional counting in detail.

Here's a useful interpretation framework. Values between 0.7 and 1.0 (or -0.7 and -1.0) signal a strong linear relationship. Between 0.3 and 0.7 — moderate. Below 0.3 — weak. Below 0.1 — basically noise. These thresholds aren't carved in stone (a 0.4 correlation in physics might be huge; a 0.4 in social science is normal) but they give you a starting frame. Always pair the number with a scatter plot before drawing conclusions. A scatter plot reveals outliers, non-linear shapes, and clusters that the single r value can hide entirely.

Building a Correlation Matrix in Excel

File → Options → Add-ins → Manage: Excel Add-ins → Go → check Analysis ToolPak → OK. A new Data Analysis button appears on the Data tab, far right. One-time setup per Excel install.

Excellence Playa Mujeres - Microsoft Excel certification study resource

From Raw Data to Full Matrix in Under a Minute

  • Lay your variables out as columns with one header row
  • Select the entire block including the header row
  • Go to Data → Data Analysis → Correlation
  • Tick 'Labels in first row' so headers carry through
  • Pick an output cell on the same sheet or a new one
  • Click OK — Excel writes the full matrix in one shot
  • Add a red-white-green colour scale via conditional formatting
  • Result: an instant heat map of every pairwise r value

Now to the matrix itself. A correlation matrix shows every pairwise r between three or more variables. Picture five columns of survey data: Age, Income, Education Years, Hours Worked, Job Satisfaction. The matrix is a 5×5 grid. Each cell answers a single question: "how strongly do these two variables move together?" The diagonal — Age vs Age, Income vs Income — is always exactly 1.0, since any variable is perfectly correlated with itself. The upper-right and lower-left triangles are mirror images, so the ToolPak shows only one half to save space.

The matrix is invaluable for spotting patterns fast. You might see Income and Education at r = 0.62 (strong link), Income and Age at r = 0.31 (modest link), and Hours Worked and Job Satisfaction at r = -0.18 (weak negative — people working more hours report slightly less satisfaction). Three insights in one glance. Without the matrix, you'd write ten separate CORREL formulas to get the same picture. With it, one click does the lot.

Reading tip: don't just chase the biggest absolute number. A correlation of 0.95 between two columns that essentially measure the same thing (revenue and gross sales, for example) is uninformative — of course they correlate. The interesting cells are moderate-strength correlations between variables you didn't expect to be linked. Those are the ones worth investigating with a scatter plot and, eventually, a proper hypothesis test. For richer data exploration, combining a matrix with Excel pivot tables often surfaces patterns that neither tool reveals alone.

A practical warning. The ToolPak's Correlation tool needs contiguous columns. If your variables aren't side by side, copy them into a fresh sheet first or you'll wrestle with the input range selector. And if any variable has zero variance (a column where every value is identical), Excel returns #DIV/0! for every cell in that row and column — there's no "movement" to correlate with anything else. Either drop the constant column or accept that it's not part of the analysis.

One more nuance worth flagging. The ToolPak gives you raw r values but no p-values. That means you see the strength of each relationship but not whether each one is statistically significant for your sample size. For a small dataset (say 10 rows), an r of 0.5 might not be statistically significant. For a large dataset (say 1,000 rows), an r of 0.1 might be highly significant. We'll cover p-value calculation later in this guide.

Pre-Flight Checks Before Running CORREL

  • Both columns contain numbers only — no text, no error values, no #N/A
  • Both ranges are the same length (=COUNT each range and compare)
  • Rows are paired correctly — A5 corresponds to B5, not B7
  • Neither column has zero variance (all identical values)
  • You've eyeballed a scatter plot to check for outliers and non-linear shapes
  • The relationship makes theoretical sense — not just a coincidence
  • Sample size is reasonable (at least 10 paired observations for meaningful r)
  • You're not measuring two columns of essentially the same data

A scatter plot is the single most useful companion to a correlation coefficient. The r value alone can deceive. A scatter plot can't. Highlight your two columns. Insert → Charts → Scatter (the first option with just dots). Excel drops a chart with one variable on the x-axis and the other on the y-axis. If the dots roughly form a rising line, you have a positive correlation. Falling line — negative. Random cloud — near-zero. Clear non-linear curve (a U-shape or wave) — the r value is meaningless, because Pearson correlation only measures linear relationships.

Here's the classic gotcha. Anscombe's quartet — a famous statistics example — shows four wildly different scatter plots that all share exactly the same correlation coefficient (r = 0.816). One is a clean linear trend. One is a curve. One is a line with a single huge outlier. One has all the action happening on one x-value with one extreme outlier dragging the r up. Same number, four entirely different stories. Don't trust the coefficient until you've looked at the dots. Always.

Add a trendline to make the linear fit visible. Click on any data point in the chart. Right-click → Add Trendline → Linear → tick "Display Equation on chart" and "Display R-squared value on chart". Excel draws the best-fit line and prints both the equation (y = mx + b form) and the R² value. R² is the correlation squared — so if r = 0.84, R² = 0.71. That R² tells you roughly 71% of the variation in y is explained by x. The remaining 29% is everything else (noise, other variables, measurement error).

R² is more interpretable than r for many audiences. "71% of variance explained" lands easier than "r equals 0.84". They're the same information in different clothes. Use whichever your reader will grasp faster. If you're presenting to a non-technical audience, R² as a percentage often works better. If you're publishing in a statistical context, report r with the sample size and p-value alongside.

Excel Spreadsheet - Microsoft Excel certification study resource

Significance Testing — Is the Correlation Real?

A correlation of 0.5 from 6 data points could easily happen by chance. The same 0.5 from 600 points almost certainly isn't random. The p-value formalises this — it's the probability of seeing a correlation this strong if there's actually no real relationship. Lower p-value = stronger evidence the correlation is real.

Spearman rank correlation is the non-parametric cousin of Pearson. Instead of working on the raw numbers, it ranks each column (1 for smallest, 2 for next smallest, and so on) and then runs the standard Pearson correlation on the ranks. The output — usually written as ρ (rho) or r_s — still ranges from -1 to +1 and is interpreted the same way.

But because it operates on ranks, it doesn't care about the shape of the distribution. Outliers stop being outliers once they're ranks. Non-linear-but-monotonic relationships (e.g. exponential growth — y always rises with x but the rate of rise accelerates) get captured properly by Spearman where Pearson under-states them.

Excel has no dedicated Spearman function, but you can build it in two cells. First, rank both columns. In C2: =RANK.AVG(A2, A:A, 1) and copy down. Same for D2 with column B. The third argument (1) means ascending — smallest gets rank 1. Now compute Pearson on the rank columns: =CORREL(C2:C100, D2:D100). That's your Spearman ρ. The whole exercise takes about thirty seconds once the data is laid out.

When should you reach for Spearman? Three scenarios. First, ordinal data — survey responses on a 1-to-5 scale, customer-satisfaction ratings, finishing positions in a race. Pearson assumes the gap between 4 and 5 is the same as the gap between 1 and 2, which often isn't true for ratings. Ranks side-step the problem. Second, heavy outliers — one billionaire in a sample of incomes can swing Pearson r dramatically; Spearman barely budges. Third, monotonic-but-non-linear curves — exponential, logarithmic, or saturation-shaped data where the direction is consistent but the rate of change varies. Spearman picks these up. Pearson loses them.

Worth knowing — RANK.AVG averages tied values rather than skipping. If two values are tied for 3rd place, both get rank 3.5 (average of 3 and 4) and the next value gets rank 5. RANK.EQ gives both ties the same rank (3) and the next value gets rank 5 — same end result for tied positions but ranks 3 and 4 are not used. RANK.AVG is the correct choice for Spearman, because Spearman's formula assumes ranks are averaged on ties. The classic =RANK() function (without the suffix) behaves like RANK.EQ but is kept only for backwards compatibility — use RANK.AVG.

Common Pitfalls to Avoid

🚫Causation ConfusionMost Common

A high r tells you two columns move together. It does not tell you why. Could be a third hidden variable driving both. Test on new data before believing the link.

⚠️Outliers Dragging rVisual Check

One extreme dot can swing the coefficient hard in either direction. Always plot the scatter first. Investigate outliers before — not after — computing r.

🔍Range RestrictionHidden Trap

Narrow sample of either variable shrinks the observed r relative to the true population value. Classic case: SAT vs college GPA at a single elite school.

A few common pitfalls deserve their own paragraph. The biggest is treating correlation as proof. People see r = 0.8 and reach for a conclusion. Resist. A high r tells you two things move together. It does not tell you why. The two columns might be driven by a third hidden variable. They might both respond to seasonality, inflation, demographics, or a holiday calendar. Test the relationship with new data before believing it.

Outliers are the second pitfall. A single extreme point can drag r in either direction. Plot first. If you see one dot floating away from the cloud, investigate it. Is it a data-entry typo? A legitimate but rare event? A different population accidentally mixed in? Decide on the outlier before computing the coefficient — not after. Removing outliers post-hoc to chase a higher r is the data-analysis equivalent of cheating.

The third pitfall is range restriction. If your two variables are only measured over a narrow slice of their actual range, correlation will look weaker than it really is. Classic case: SAT scores and college GPA at a single elite university. Both variables are truncated at the top end (only high scorers got in) so the observed correlation is much smaller than it would be in the full population of students. The relationship is real; the restricted sample hides it.

Finally, dynamic arrays in Excel 365 let you build correlation analyses that update themselves. Drop your data in a Table, write CORREL against the Table columns, and any new rows you add flow through automatically. No range fiddling. No copy-paste. The combination of Tables plus modern functions makes Excel a surprisingly capable statistical environment for routine analysis — not a replacement for R or Python, but more than enough for the day-to-day correlation checks most analysts actually need.

One last note on workflow. Many people compute a single correlation, see a number they like, and stop. Better practice is to compute the same correlation on subsets of your data — split by quarter, by region, by product line — and check whether the relationship holds across all of them. If r is 0.7 overall but drops to 0.1 in one subset and rises to 0.9 in another, the headline number is hiding a real story. That kind of stratified analysis turns a single coefficient into a small portfolio of insights.

Pearson vs Spearman

Use Pearson When
  • +Both variables are continuous and roughly normally distributed
  • +The relationship appears linear in the scatter plot
  • +There are no extreme outliers distorting the data
  • +You need standard parametric statistics (regression, ANOVA)
  • +Sample size is moderate to large (n ≥ 30)
  • +You're working with measurements like height, weight, temperature, currency
Use Spearman When
  • Data is ordinal — survey ratings, ranks, ordered categories
  • There are outliers you can't justify removing
  • The scatter plot shows a monotonic but non-linear curve
  • Distributions are heavily skewed or non-normal
  • Sample size is small (n < 20) and you can't verify normality
  • You want robustness over precision

Practice What You've Learned

FREE Excel Questions and Answers

60-question practice test covering formulas, functions, and core Excel skills.

Excel Formulas and Functions Practice

Drill the most common Excel functions including CORREL, PEARSON, and SUMIF.

Advanced Data Analysis Tools

Test your knowledge of ToolPak, pivot tables, and statistical functions.

Excel Questions and Answers

More Excel Guides

About the Author

Katherine LeeMBA, CPA, PHR, PMP

Business Consultant & Professional Certification Advisor

Wharton School, University of Pennsylvania

Katherine Lee earned her MBA from the Wharton School at the University of Pennsylvania and holds CPA, PHR, and PMP certifications. With a background spanning corporate finance, human resources, and project management, she has coached professionals preparing for CPA, CMA, PHR/SPHR, PMP, and financial services licensing exams.