Interview Question: Creating a Dummy

Sample Question #190 (programming – Matlab or R)

You have a very large dataset. For one of the variables in the dataset — for example, the dataset is from a population of people and the variable of interest is age — you need to create a grouping variable that takes on a discrete value for each range of values of the original variable. For example, ages 0-18 might be group 1, ages 19-25 group 2, etc.

How would you do this in Matlab or R?

(Hint: there’s a slow way and there’s a fast way)

(Comment: what about doing this in other packages or languages such as C++, Gauss, SAS or Stata?)

[Question courtesy of Dr. Charles Qin of ITG]

This entry was posted in Sample Qs. Bookmark the permalink.

One Response to Interview Question: Creating a Dummy

  1. Brett says:

    The slow way (and the only way in some languages like C++) is to loop through the data and use if or switch statements to define the grouping variable.
    Because Matlab and R (and S-Plus) allow you to work with vectors, it’s more efficient to use vector operations: first, decompose the original vector into vectors of dummy values; then use these dummy vectors to define the vector of grouping values. Implementation details in Matlab or R are left as an exercise.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s