Linear Regression & Least Squares Method Explained: Definition, Examples, Practice & Video Lessons

Let’s lock this line in place, and attach springs between the data points and the line. In this section, we’re going to explore least squares, understand what it means, learn the general formula, steps to plot it on a graph, know what are its limitations, and see what tricks we can use with least squares. Now we have all the information needed for our equation and are free to slot in values as we see fit. If we wanted to know the predicted grade of someone who spends 2.35 hours on their essay, all we need to do is swap that in for X. Often the questions we ask require us to make accurate predictions on how one factor affects an outcome. Sure, there are other factors at play like how good the student is at that particular class, but we’re going to ignore confounding factors like this for now and work through a simple example.

OLS then minimizes the sum of the squared variations between the determined values and the anticipated values, making sure the version offers the quality fit to the information. However, if we attempt to predict sales at a temperature like 32 degrees Fahrenheit, which is outside the range of the dataset, the situation changes. In this case, the correlation may be weak, and extrapolating beyond the data range is not advisable. Instead, the best estimate in such scenarios is the mean of the y values, denoted as ȳ. For instance, if the mean of the y values inventory to sales ratio is calculated to be 5,355, this would be the best guess for sales at 32 degrees, despite it being a less reliable estimate due to the lack of relevant data.

For example, if you analyze ice cream sales against daily high temperatures, you might find a positive correlation where higher temperatures lead to increased sales. By applying least squares regression, you can derive a precise equation that models this relationship, allowing for predictions and deeper insights into the data. Least squares regression method is a method to segregate fixed cost and variable cost components from a mixed cost figure. Another thing you might note is that the formula for the slope \(b\) is just fine providing you have statistical software to make the calculations. But, what would you do if you were stranded on a desert island, and were in need of finding the least squares regression line for the relationship between the depth of the tide and the time of day?

For this reason, this type of regression is sometimes called two dimensional Euclidean regression (Stein, 1983)12 or orthogonal regression.
This method, the method of least squares, finds values of the intercept and slope coefficient that minimize the sum of the squared errors.
Another thing you might note is that the formula for the slope \(b\) is just fine providing you have statistical software to make the calculations.
This suggests that the relationship between training hours and sales performance is nonlinear, which is a critical insight for further analysis.
The slope of the line, b, describes how changes in the variables are related.

Look at the graph below, the straight line shows the potential relationship between the independent variable and the dependent variable. The ultimate goal of this method is to reduce this difference between the observed response and the response predicted by the regression line. The data points need to be minimized by the method of reducing residuals of each point from the line. Vertical is mostly used in polynomials and hyperplane problems while perpendicular is used in general as seen in the image below. When the independent variable is error-free a residual represents the “vertical” distance between the observed data point and the fitted curve (or surface). In total least squares a residual represents the distance between a data point and the fitted curve measured along some direction.

In literal manner, least square method of regression minimizes the sum of squares of errors that could be made based upon the relevant equation. The Least Square method is a popular mathematical approach used in data fitting, regression analysis, and predictive modeling. It helps find the best-fit line or curve that minimizes the sum of squared differences between the observed data points and the predicted values. This technique is widely used in statistics, machine learning, and engineering applications. The Least Squares method is a mathematical procedure used to find the best-fitting solution to a system of linear equations that may not have an exact solution. It does this by minimizing the sum of the squared differences (residuals) between the observed values and the values predicted by the model.

Regardless, predicting the future is a fun concept even if, in reality, the most we can hope to predict is an approximation based on past data points.
By examining these plots, one can identify patterns and trends, such as positive or negative correlations.
By the way, you might want to note that the only assumption relied on for the above calculations is that the relationship between the response \(y\) and the predictor \(x\) is linear.
The least-square regression helps in calculating the best fit line of the set of data from both the activity levels and corresponding total costs.

Example: Sam found how many hours of sunshine vs how many ice creams were sold at the shop from Monday to Friday:

The idea behind the calculation is to minimize the sum of the squares of the vertical errors between the data points and cost function. The red points in the above plot represent the data points for the sample data available. Independent variables are plotted as x-coordinates, and dependent ones are plotted as y-coordinates. The equation of the line of best fit obtained from the Least Square method is plotted as the red line in the graph.

The proof, which may or may not show up on a quiz or exam, is left for you as an exercise.

We have two datasets, the first one (position zero) is for our pairs, so we show the dot on the graph. There isn’t much to be said about the code here since it’s all the theory that we’ve been through earlier. We loop through the values to get sums, averages, and all the other values we need to obtain the coefficient (a) and the slope (b). By the way, you might want to note that the only assumption relied on for the above calculations is that the relationship between the response \(y\) and the reconciliation predictor \(x\) is linear. Least squares is used as an equivalent to maximum likelihood when the model residuals are normally distributed with mean of 0. In actual practice computation of the regression line is done using a statistical computation package.

Uses in data fitting

These values can be used for a statistical criterion as to the goodness of fit. When unit weights are used, the numbers should be divided by the variance of an observation. Least Squares regression is widely used in predictive modeling, where the goal is to predict outcomes based on input features.

For this reason, this type of regression is sometimes called two dimensional Euclidean regression (Stein, 1983)12 or orthogonal regression. The idea behind finding the best-fit line is based on the assumption that the data are scattered about a straight line. The criteria for the best-fit line is that the sum of the squared errors (SSE) is minimized, that is, made as small as possible. Any other line you might choose would have a higher SSE than the best-fit line.

Proof: Ordinary least squares for simple linear regression

Having said that, and now that we’re not scared by the formula, we just need to figure out the a and b values. Before we jump into the formula and code, let’s define the data we’re going to use. For example, say we have a list of how many topics future engineers here at freeCodeCamp can solve if they invest 1, 2, or 3 hours continuously. Then we can predict how many topics will be covered after 4 hours of continuous study even without that data being available to us.

Steps

You might also appreciate understanding the relationship between the slope \(b\) and the sample correlation coefficient \(r\). Then, we try to represent all the marked points as a straight line or a linear equation. The equation of such a line is obtained with the help of the Least Square method. This is done to get the value of the dependent variable for an independent variable for which the value was initially unknown.

Least Squares Estimates

The are some cool physics at play, involving the relationship between force and the energy needed to pull a spring a given distance. It turns out that minimizing the overall energy in the springs is equivalent to fitting a regression line using the method of least squares. Imagine that you’ve plotted some data using a scatterplot, and that you fit a line for the mean of Y through the data.

It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models. The objective of OLS is to find the values of \beta_0, \beta_1, \ldots, \beta_p that minimize the sum of squared residuals (errors) between gross pay vs net pay the actual and predicted values. Least squares regression analysis or linear regression method is deemed to be the most accurate and reliable method to divide the company’s mixed cost into its fixed and variable cost components. This is because this method takes into account all the data points plotted on a graph at all activity levels which theoretically draws a best fit line of regression.

Example

These properties underpin the use of the method of least squares for all types of data fitting, even when the assumptions are not strictly valid. Econometric models often rely on Least Squares regression to analyze relationships between economic variables and to forecast future trends based on historical data. Where \(y\) is the dependent variable, \(x\) is the independent variable, \(m\) is the slope, and \(q\) is the intercept.

This suggests that the relationship between training hours and sales performance is nonlinear, which is a critical insight for further analysis. This method, the method of least squares, finds values of the intercept and slope coefficient that minimize the sum of the squared errors. It helps us predict results based on an existing set of data as well as clear anomalies in our data. Anomalies are values that are too good, or bad, to be true or that represent rare cases. The least-square regression helps in calculating the best fit line of the set of data from both the activity levels and corresponding total costs.

This process often involves the least squares method to determine the best fit regression line, which can then be utilized for making predictions. In analyzing the relationship between weekly training hours and sales performance, we can utilize the least squares regression line to determine if a linear model is appropriate for the data. The process begins by entering the data into a graphing calculator, where the training hours are represented as the independent variable (x) and sales performance as the dependent variable (y).