Interview Question: Outcasts

Sample Question #59 (statistics)

What are some of the ways by which we can clean a dataset of outliers? Name at least three.

Tougher: If the dataset is huge, say with over 100 million observations, what would you do to identify and filter out the outliers?

(Hint: what is an "outlier"?)

Advertisements
This entry was posted in Sample Qs. Bookmark the permalink.

2 Responses to Interview Question: Outcasts

  1. Wu Chao says:

    I am glad to learn a new concept, outlier.

  2. Brett says:

    ANSWER
     
    1) Visual inspection: plot the data and just eyeball data points that lie far away from the majority of the data.
     
    2) Divide data into percentiles and identify cutoff thresholds beyond which data values are too "extreme"
     
    3) Repeatedly calculate standard deviation until it’s "stable" from one sub-sample to the next 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s