Sample Question #183 (econometrics)

Let’s assume we have a dataset of all the married couples in a mid-sized American city, with such personal information as each couple’s ages, occupations, education levels, as well as the number of extramarital affairs either has had since marriage. What’s an appropriate model for studying what factors influence the number of extramarital affairs? Explain your answer carefully.

One thing you might want to mention is this "mid-sized American city" dataset may not be representative of the general population so if the goal is to understand the "universal" influence factors for infidelity, we may have a sample selection bias at hand. Let’s assume this is not the case, i.e., the researchers only want to know what influences the extent of infidelity in this particular dataset.

Since we’re modeling the number of affairs, that value for each person or each couple (depending on the specific framework) is on the left-hand side of the equation, as the dependent variable. The problem is, this variable would be 0 for many persons and couples, either because they lied on the survey about their extramarital affairs, or because they genuinly have never cheated on their spouses. What we have, then, is a censored data problem, and a technique like Tobit analysis would be necessary to deal with this censoring problem.

(Bonus question: how do you estimate a Tobit model?)