Excel linear regression is one of the most practical statistical techniques you can run inside a spreadsheet, and it answers a deceptively simple question: how does one variable change when another variable changes? Whether you are forecasting sales from advertising spend, modeling exam scores against study hours, or projecting energy use based on temperature, Excel gives you at least five different ways to fit a straight line through your data. Each approach produces the same underlying slope and intercept, but the path you choose changes how much detail you get back about model quality, residuals, and confidence.
The classic equation behind every linear regression in Excel is y = mx + b, where m is the slope, b is the y-intercept, x is your independent variable, and y is the value you are trying to predict. Excel computes m and b using ordinary least squares, which minimizes the sum of squared vertical distances between each observed data point and the fitted line. This is the same math taught in introductory statistics courses, just wrapped inside fast worksheet functions like SLOPE, INTERCEPT, LINEST, FORECAST.LINEAR, and TREND that return results in milliseconds.
Beginners usually start with a scatter chart and a trendline because it is visual and requires no formulas. Intermediate users move to SLOPE and INTERCEPT for quick numeric output, and analysts who need R-squared, standard errors, and F-statistics turn to LINEST as an array formula or to the Data Analysis ToolPak. Each tool has tradeoffs in transparency, recalculation behavior, and how the output looks on the page, and learning when to use which one is what separates a casual spreadsheet user from a confident data modeler.
This guide walks through every method in order of complexity. You will see exactly which cells to click, which formulas to type, and how to read the numbers that come back. We cover single-variable regression first because it is easier to visualize, then expand into multiple regression where LINEST really earns its keep. Along the way we highlight common mistakes, such as forgetting to enter LINEST as an array in older Excel versions or misinterpreting a high R-squared value as proof of causation.
You do not need to install any add-ins to follow the worksheet-function methods, and most examples use sample data sets of fewer than fifty rows so you can recreate every screenshot in under five minutes. If you want fuller diagnostic output with residual plots and normal probability charts, we walk through enabling the Data Analysis ToolPak, which ships free with every desktop version of Excel and unlocks a guided regression dialog. Pair that with a standard deviation formula in Excel to fully describe the spread of your residuals.
By the end you will be able to fit a line, read the coefficients, compute predicted values, evaluate fit quality through R-squared and standard error, and present results in a chart that is ready to drop into a report. We also touch on extensions like polynomial trendlines and logarithmic transformations for cases where a straight line is not the right shape, plus the FORECAST.ETS function for time-series projections.
Linear regression is a foundational skill for analysts, students, scientists, and operations managers, and Excel happens to be the fastest environment in which to learn it. Once the workflow clicks, you will reach for it constantly to test hunches, set baselines, and communicate trends to colleagues who would never open a Python notebook. Let us start with the numbers behind the method.
Right-click any data series on a scatter chart and add a linear trendline. Display equation and R-squared on the chart for instant visual results without any formulas, perfect for presentations and quick exploration.
Two simple functions that return the regression line's slope and y-intercept as single numbers. Best when you only need m and b for a manual y = mx + b prediction and want lightweight, transparent formulas.
Returns slope, intercept, R-squared, standard errors, F-statistic, and residual sums in a single 5-row block. The professional choice when you need full regression diagnostics inside the worksheet itself.
Predict y-values for new x-inputs using the same least-squares math. FORECAST handles single predictions; TREND fills an entire range of fitted or projected values along an existing or new x-axis range.
Built-in add-in that opens a guided dialog producing ANOVA tables, coefficient confidence intervals, residual output, and normal probability plots. The closest Excel gets to dedicated statistical software like SPSS or R.
The fastest way to compute a regression line in Excel is the SLOPE and INTERCEPT pair. SLOPE(known_ys, known_xs) returns m, and INTERCEPT(known_ys, known_xs) returns b. Notice that the y-values come first in both functions, which trips up many first-time users who instinctively type x before y. Once you have m and b, you predict any new value with =m*new_x + b, or you can wrap it inside FORECAST.LINEAR(new_x, known_ys, known_xs) for a one-step calculation that hides the intermediate slope and intercept.
For richer output the LINEST function is the workhorse. Its syntax is LINEST(known_ys, known_xs, [const], [stats]). When you set the fourth argument to TRUE and enter the formula as an array across a 5-row by 2-column range, Excel returns slope and intercept in the first row, their standard errors in the second, R-squared and the standard error of the y-estimate in the third, the F-statistic and degrees of freedom in the fourth, and the regression and residual sums of squares in the fifth.
In Microsoft 365 and Excel 2021, LINEST spills automatically into adjacent cells, so you simply press Enter after typing the formula. In Excel 2019 and earlier, you must first select the full 5-by-2 range, type the formula, and confirm with Ctrl+Shift+Enter. Skipping that combination is the single most common reason users see only the slope value and conclude that LINEST is broken. The output is identical either way once entered correctly, and reading the third-row R-squared value is usually your first stop.
To make the LINEST output readable, label each cell. Put text headers like Slope, Intercept, SE Slope, SE Intercept, R-Squared, SE y-estimate, F-stat, df, SS reg, and SS res in an adjacent column or row. This turns a cryptic 10-number block into a self-documenting mini report you can copy into a dashboard or hand to a stakeholder without further explanation. Combine it with conditional formatting to flag low R-squared values in red and you have an at-a-glance fit indicator.
Many analysts also build helper columns for predicted y, residuals, and squared residuals. Predicted y is simply =slope*x + intercept, residual is =actual_y - predicted_y, and squared residual is the residual raised to the second power. Summing the squared residuals should match the SS res value LINEST returns, which is a useful sanity check that proves your manual model and Excel's internal calculation agree. These helper columns also feed directly into residual plots, which reveal whether the linear assumption holds.
If you prefer to skip formulas, the Insert Chart route is even faster. Create a scatter plot from two columns, right-click any data marker, choose Add Trendline, pick Linear, and tick the boxes for Display Equation on chart and Display R-squared value on chart.
The equation appears in y = mx + b form with full precision, and the R-squared number floats next to it. You can do this in under thirty seconds, which is why it is the most common technique in client-facing presentations. Combine this with techniques from our how to add a filter in Excel guide to slice data first.
For longer datasets, double-check that your x and y ranges are exactly the same length and contain no blank rows or text values. Excel silently ignores text in numeric ranges, which can shorten the effective sample size and skew results without warning. Use COUNT, COUNTA, and ISNUMBER checks before regressing, and if you see any #N/A or #DIV/0! errors in your LINEST output it almost always means there is a non-numeric cell hiding somewhere in your input range that needs to be cleaned up.
The trendline method is the most visual and beginner friendly approach in Excel. Start by selecting your two columns of data and inserting a Scatter chart from the Insert ribbon. Once the chart appears, right-click any data point, choose Add Trendline, and select Linear from the options panel that opens on the right. Excel instantly draws the best-fit line through your data using the same ordinary least squares math under the hood.
At the bottom of the trendline panel, tick Display Equation on chart and Display R-squared value on chart. The equation appears in the familiar y = mx + b format and R-squared sits next to it. This method is perfect for presentations because the audience sees both the visual fit and the math at once. The drawback is that you cannot easily extract individual coefficients into other cells for downstream calculations or sensitivity analysis.
The function method uses worksheet formulas to extract regression numbers into cells where you can reference them in further calculations. Type =SLOPE(B2:B50, A2:A50) to get m and =INTERCEPT(B2:B50, A2:A50) to get b, remembering that y always comes before x. For a single prediction, =FORECAST.LINEAR(new_x, ys, xs) wraps both calculations into one cell that updates whenever your data changes.
For full diagnostics use LINEST entered as an array across a 5-row by 2-column range. Press Ctrl+Shift+Enter in older Excel versions or simply Enter in Microsoft 365 where dynamic arrays spill automatically. This returns slope, intercept, both standard errors, R-squared, standard error of estimate, F-statistic, degrees of freedom, and the regression and residual sums of squares all at once, giving you the same numbers as professional statistical software.
The Data Analysis ToolPak is a free add-in that ships with Excel but is disabled by default. Enable it through File, Options, Add-Ins, then click Go next to Manage Excel Add-ins and check Analysis ToolPak. A new Data Analysis button appears on the Data ribbon. Click it, choose Regression from the list, and a guided dialog asks for your input Y range, input X range, and output destination.
Tick Labels if your ranges include headers, set a confidence level of 95%, and optionally request Residuals, Standardized Residuals, Residual Plots, Line Fit Plots, and Normal Probability Plots. The ToolPak produces a comprehensive output that includes regression statistics, ANOVA table, coefficient table with confidence intervals, and chart outputs. This is the closest Excel comes to dedicated statistical software and is the right choice for formal analysis and reporting.
R-squared measures how much variance in y is explained by your regression line, but a value of 0.95 does not prove x causes y. Always pair statistical fit with domain knowledge, check residual plots for hidden patterns, and confirm the relationship makes physical or business sense before drawing conclusions.
Interpreting the output of Excel linear regression begins with R-squared, which ranges from zero to one and tells you the proportion of variance in y explained by your model. An R-squared of 0.85 means 85 percent of the variation in your dependent variable is accounted for by changes in the independent variable, leaving 15 percent unexplained. Higher is generally better, but acceptable thresholds vary by field. Physics experiments often demand 0.99 or above, marketing analytics may accept 0.40, and behavioral science is happy with 0.20 because human behavior is inherently noisier than mechanical systems.
Standard error of the estimate, returned in row three column two of LINEST, measures the average distance between observed y-values and the regression line in the same units as y. If you are predicting sales in dollars, a standard error of 1,200 means typical predictions miss by about $1,200. This is often more useful than R-squared because it is in real-world units rather than an abstract proportion, and it gives stakeholders an immediate sense of model precision in language they already understand without statistical training.
The F-statistic in row four tests whether the overall regression is significant. Compare it against a critical F-value or compute the p-value with =F.DIST.RT(F_stat, df_reg, df_res). A p-value below 0.05 means the model explains significantly more variance than a flat line through the mean of y. For simple regression with one predictor, this F-test gives the same result as testing whether the slope coefficient is significantly different from zero, which you can also verify by dividing slope by its standard error to get a t-statistic.
Coefficient p-values are not returned directly by LINEST but are easy to compute. Take slope divided by its standard error to get t, then use =T.DIST.2T(ABS(t), df_res) to get the two-tailed p-value. A p-value under 0.05 means the slope is statistically distinct from zero at the 95 percent confidence level. The Data Analysis ToolPak does this calculation automatically and prints it in the coefficient table alongside lower and upper 95 percent confidence intervals, which is one reason the ToolPak is worth enabling for formal reports.
Residual analysis is the next layer of model validation that many spreadsheet users skip. Calculate residuals as actual y minus predicted y, then plot residuals against x or against predicted y. A good linear model produces residuals that scatter randomly around zero with constant variance. Patterns like a curve, a fan shape, or systematic over-prediction at certain x-values indicate the linear assumption is violated and you should consider transforming variables, adding polynomial terms, or moving to a non-linear model entirely.
Confidence intervals around predictions are tighter than prediction intervals around individual observations, and both widen as x moves further from the mean of your training data. This is why extrapolating far outside your observed x-range is risky. A regression fit on study hours between 1 and 10 should not be used to predict exam scores for someone studying 40 hours because you have no data to validate the line still applies. Always note the valid x-range alongside any prediction you publish to stakeholders.
Finally, watch out for multicollinearity in multiple regression. When two predictors are highly correlated with each other, their individual coefficients become unstable and their standard errors balloon, even though the overall model R-squared looks fine. Compute correlations between predictors before regressing and drop or combine variables with correlations above 0.8. Excel's CORREL function or a full correlation matrix from the ToolPak makes this check quick and prevents one of the most common interpretation mistakes in business regression analysis.
Multiple linear regression in Excel extends the same LINEST function to handle several predictor variables at once. The syntax becomes LINEST(known_ys, known_xs_range, TRUE, TRUE), where known_xs_range is a multi-column block containing each independent variable in its own column. Excel returns slopes in reverse order from right to left across the first row, followed by the intercept in the last position. This reversed ordering catches almost every newcomer off guard, so always label your output cells explicitly with the variable name each coefficient corresponds to.
For example, if you regress sales against advertising spend in column A, store visits in column B, and email opens in column C, the LINEST formula =LINEST(D2:D100, A2:C100, TRUE, TRUE) returns coefficients in the order: email-opens slope, store-visits slope, advertising-spend slope, intercept. Reading this output without labels invites disaster. A common practice is to place the column header names above the corresponding output cells in reverse order so the entire regression is self-documenting and copy-paste safe across worksheets and across team members.
The TREND function complements LINEST by producing fitted or forecast values directly. =TREND(known_ys, known_xs, new_xs) returns predicted y-values for any list of new x inputs you supply, using the same multiple regression coefficients calculated internally. This is much more efficient than computing predictions one cell at a time with manual slope and intercept references. TREND works for both simple and multiple regression depending on whether your known_xs is one column or many, and it spills automatically in modern Excel versions for clean output.
FORECAST.LINEAR is the single-variable cousin of TREND and is best suited for simple regression with one predictor. For time-series forecasting where seasonality matters, switch to FORECAST.ETS, which uses exponential smoothing rather than linear regression and accounts for repeating cycles like monthly sales patterns or daily web traffic rhythms. Excel also offers Forecast Sheet on the Data ribbon, which builds a full chart with confidence bands in two clicks, though it is not appropriate for non-time-series regression problems.
When relationships are not linear, transform your variables before regressing. Take the natural log of y with LN() to handle exponential growth, square x to fit a quadratic pattern, or use 1/x for inverse relationships. Run LINEST on the transformed data and back-transform predictions afterward.
You can also add a polynomial trendline to a chart and display its equation, which gives you the same effect visually without writing transformation formulas. Compare R-squared values across model forms to pick the best fit, but always verify with residual plots that the chosen form is appropriate. For sorting your residuals or filtering by error magnitude, see our guide to how to create a filter in Excel.
Dummy variables let you include categorical predictors in regression. Create a new column for each category minus one and code each row as 1 if the observation belongs to that category or 0 otherwise. For example, to model salary by region across four regions, add three dummy columns leaving one region as the baseline. The coefficients then represent the difference in y between each non-baseline region and the baseline, holding other predictors constant. This is the same technique used in professional statistics packages and works seamlessly with LINEST.
Finally, document your regression workflow inside the workbook so future readers, including your future self, can reproduce the analysis. Add a methods sheet that lists data sources, transformation steps, predictor definitions, and model performance metrics. Include the final equation in plain English. A well-documented regression model is far more valuable than a black-box result, and a few minutes of documentation pays dividends every time someone questions the analysis or needs to update it with new data months down the line.
Practical tips for getting clean, defensible regression results in Excel start with data preparation. Spend more time cleaning than analyzing. Remove duplicate rows, standardize date formats, convert text-stored numbers to true numerics, and verify units are consistent across observations. A regression of revenue measured in thousands against advertising measured in millions will silently produce a slope that is off by three orders of magnitude, and Excel will never warn you. Use COUNT, COUNTA, MIN, MAX, and AVERAGE on every input column to sanity check ranges before regressing.
Save raw data on a protected sheet and do all transformations on copies. This makes it easy to roll back changes if you discover a mistake and lets reviewers verify your work from the original source. Use Excel Tables (Ctrl+T) for your input data because tables automatically expand formula references when new rows are added, which is a huge time saver when you receive monthly data updates and want the regression model to refresh without manually adjusting every range reference scattered across the workbook.
Use named ranges for your x and y inputs to make formulas more readable. Instead of =LINEST(D2:D500, A2:C500, TRUE, TRUE), define names Sales, AdSpend, StoreVisits, and EmailOpens, then write =LINEST(Sales, CHOOSE({1,2,3}, AdSpend, StoreVisits, EmailOpens), TRUE, TRUE) or simply pre-arrange the columns and reference =LINEST(Sales, Predictors, TRUE, TRUE). Named ranges turn cryptic cell references into self-documenting code that other analysts can read without opening the source workbook to figure out which column is which.
Build a regression dashboard that pulls slope, intercept, R-squared, standard error, and key p-values into a single summary block at the top of your sheet. Include a small line-fit chart, a residual plot, and a prediction calculator where users can enter new x-values and instantly see predicted y-values with confidence bands. This turns a one-time analysis into a living tool that stakeholders can interact with, which dramatically increases the perceived value of your work and ensures the model gets used rather than archived.
Validate your model against held-out data before publishing. Take 20 percent of your observations and set them aside before fitting. Run LINEST on the remaining 80 percent, then use TREND or FORECAST.LINEAR to predict y for the held-out x-values and compare predictions against actuals. Calculate root mean squared error as SQRT(SUMSQ(actual - predicted)/n) and compare it to the standard error from the training fit. If they are similar, your model generalizes; if test error is much higher, you have overfit and need a simpler model.
Document assumptions in plain language alongside the numbers. State explicitly that the regression assumes a linear relationship, independent observations, normally distributed residuals, and constant variance. Note the x-range over which the model is valid and any data quality caveats. Stakeholders rarely read footnotes, but they reference them when challenged, and having assumptions written down protects you when someone tries to use the model in conditions it was never validated for, like predicting next year's results from last quarter's data.
Finally, keep learning. Excel regression is a gateway to deeper statistical modeling, and once you are comfortable with LINEST and the ToolPak, the next steps are logistic regression for binary outcomes, time-series ARIMA models for forecasting, and eventually moving to R or Python for machine learning. The mental model you build by working through least squares manually in Excel transfers directly to every other regression tool you will use later, which makes the time invested in mastering Excel regression genuinely cumulative rather than throwaway.