## Interview Question: Consequence of a Bug

Let’s assume you have a good random sample of data that you can comfortably run OLS on. However, due to a coding bug, you accidentally duplicate every observation twice in the dataset you feed into your OLS. For example, if the original data had been:

35 21 2 7
32 38 1 -2
93 74 10 61
where the first number on each line is the dependent variable and the rest are the independent variables, the erroneous dataset looks like:

35 21 2 7
35 21 2 7
32 38 1 -2
32 38 1 -2
93 74 10 61
93 74 10 61
(As you can see, every original input line was duplicated.)

When you run OLS on this faulty dataset, what happens to all the regression estimates and statistics? Have they changed? If so, for larger or smaller?

[A real phone interview question I was given]

### One Response to Interview Question: Consequence of a Bug

1. Brett says:

ANSWER

The biggest problem is with the t-stats of the coefficient estimates. The sample variance of the erroneous dataset is the same as the original dataset, but because we have twice as much data, the t-stats will have approximately doubled! Obviously this is not good.

Coefficient estimates and R-squared will remain the same.