Sample Question #200 (statistics – regressions)

What’s the right statistical (or econometric) model when the dependent variable is the number of trades in a given stock in a 5-minute period? For example, if a total of 5,000 trades were executed in MSFT this morning between 9:30 and 9:35, the dependent variable takes on the value of 5,000. Our dataset contains such counts for many stocks over many 5-minute periods. (Don’t worry about the right-hand side variables — you can imagine they’re already taken care of.)

(Comment: this is a very real-world modeling problem)

ANSWER

It turns out that this is a very tricky question!

The correct model to use does depend on what goes on the right-hand side. If the RHS variables (indpendent variables) have an influence on whether a stock has any trade at all in a 5-minute time window, then we need to use a censoring model for count data. This is so because when we see 0 in the dependent variable, that’s because the independent variables have a "latent" effect on this value.

If, on the other hand, the RHS variables are purely random and do not in any way influence whether the LHS value is 0 or not, we can use a truncated model for count data. It’s truncated because the count cannot be below 0.