Sample Question #100 (econometrics)

What’s wrong with the following model which tries to study the "happiness measure" (the *y *variable) of the American population?

*y = α + β*_{1}I{income<$35,000} + β_{2}I{income>=$35,000} + β_{3}x + ε

where *I{ } *is the indicator function, i.e., it’s 1 if the condition in the braces {} is true and 0 otherwise. *x *is a continuous exogenous variable that’s independent of income.

(Updated comment: A few visitors to my blog have complained that since not everyone knows econometrics, it would be "very" unfair to ask a question that many people simply are not familiar with. Okay, the little "contest" I had in mind was meant to be fun [like it was duly noted in the previous comments], but I see these guys have a point. So, sorry, no more contest. I apologize for the snafu. I hope you enjoy this question if you know statistics and/or econometrics. Cheers! 7 pm EDT 8/21/07)

### Like this:

Like Loading...

*Related*

may i ask what x stand for? thx

x is just some arbitrary independent variable.

ANSWER

[Again, the "contest" was cancelled]

First of all, you should recognize that there’s really nothing wrong with the equation itself. It simply links y with three independent variables, plus an intercept. As a mathematical reduced form of the "happiness" model, it’s valid.

The problems lie in how you estimate the model with data. There are two problems.

1) Perfect collinearity. This is easy to see. The solution is to drop either α or one of the two dummies.

2) Less obvious is the censored data problem. Many people in the U.S. have no reported income (such as babies and homeless people and tax evaders), so this model cannot be estimated for these people. Depending on the context of the research and your perspective, this can be a very serious issue — in fact, many financial regression models suffer this problem without the quants’ realizing it. (Trust me, when a sharp-minded client points this out to you and your boss, you’ll be feeling mighty humiliated.)