![]() The other new term at the end of the top equation is the Greek letter epsilon,, which can be called the error term. So is the intercept (some authors prefer to use the Greek symbol (alpha) for intercept so you may see that in certain papers or textbooks). So we might write the above line like:Īnd of course would be equal to 1 and equal to 2 So in the regression the equation for the line will work the same except we usually write the intercept first followed by all the other variables. Place those data points on a scatter plot and fit a line through that crosses as many points as possible, the slope of that line will be the coefficient on the variable x (only 1 for now).Īs you can see, the blue line crosses the y-axis at 1. The best way to explain how this works is to show a graph, we are trying to explain some phenomenon, y, and we have a ton of data points. Hopefully the above example has whetted your appetite to want to learn more. I can’t make it any less scary sounding than that. To complete that final step will require some linear algebra which is just a scary word for matrix algebra. Then we can look at a regression with multiple variables in an un-idealized scenario and build a macro to run the regression. We can calculate it out step by step in Excel. To see how, I want to look at a single variable regression (single variable with an intercept) in an un-idealized situation (meaning we don’t actually know how many independent variables there are). Now, if you think about it, it’s pretty interesting that we can actually take a final score and its components and break them up to calculate the value of each component. With y meaning Final score, x1 being Baskets, x2 being 3-pointers, and x3 being Free throws. And the dependent variable will usually be y so we can substitute y in for Fs: Each variable will have a coefficient usually represented by the character beta. ![]() In the general case we like to use x for the independent variables with a subscript indicating what number they are in the position. So we have explained how the final score operates and to calculate it all we need to do is observe the number of each type of score, neat! The result of that regression tells us that Baskets are worth 2 points, 3 Pointers are worth 3 points (shocking!), and Free throws are worth 1 point. We want to estimate that coefficient, if we can figure it out we can explain our dependent variable, final score. There we go, this implies that we believe the final score is a function of one variable (number of goals) times some coefficient (beta). Let’s change it so that it looks more like what you will see in real life: Got it, but that doesn’t help too much because this implies that we already can explain final score, we can tell anyone the final score if we simply observe the number of goals. ![]() For simplicity we shall stay with the soccer example, let’s make this look a little bit more mathy: Number of goals is the independent variable, we have no control over it and it is given by observations. 1 is the coefficient on number of goals, we know in this case that it is 1 because everyone knows the rules of soccer, every goal is worth 1 point, if this were football (the American kind) then each touchdown would be worth 6 points so the coefficient on that would be 6. Now I feel confident enough to write down a simple equation:įinal score is the dependent variable, it is what we are trying to explain. ![]() Why multiply by 1? Just to build some intuition, it becomes important later. If each goal were worth 2 points, all you would have to do is multiply the number of goals by 2, but since in this game each goal is worth one you simply multiply each goal by 1 to get the final score. Observe how many times a team kicks the ball into the opposite goal and you will know their final score. But first, what is a regression?īefore showing the equation, because it may be slightly off putting for those who aren’t terribly fond of math and also because I want to show the intuition first, lets look at a very simple problem: what determines the final score of a soccer game? It’s an easy answer, the number of goals will tell us the final score. These are some of the reasons we run regressions, Excel has a built-in solver feature which does just that, but I would like to build some macros that can also perform regressions, sort of like deconstructing and rebuilding an engine to truly understand how it works. ![]() Maybe you have to explain some phenomenon and predict it, but you don’t have a clue how it works or why it does what it does. Maybe you need to do some forecasting, maybe you want to tease out a relationship between two (or more) variables. So, you want to run a linear regression in Excel. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |