Identify potential outliers in a numeric vector.

outliers(x, method = c("boxplot", "stdev"), coef = NULL)

Arguments

x

Numeric vector.

method

Method for identifying outliers. Defaults to "boxplot". Other possibility is "stdev".

coef

Multiplier for determining how extreme a value must be to be flagged as a potential outlier. Defaults to 1.5 for boxplot method, indicating 1.5 times the inter-quartile range from the median, and to 2 for stdev method, indicating 2 times the standard deviation from the mean.

Value

Logical vector the same length as x, indicating for each value in x whether it is a possible outlier as determined by the specified method and coefficient.

Details

Takes a numeric vector, x, and returns a corresponding logical vector the same length as x. Values of TRUE in return vector indicate corresponding value in x may be an outlier. Handy for use with dplyr::mutate().

If method is boxplot, then values more than coef times the IQR above the 3rd quartile, or more than coef times the IQR below the 1st quartile are flagged as possible outliers.

If method is stdev, then values more than coef standard deviations from the mean are flagged as possible outliers.

Both methods are rather simple-minded ways of IDing possible outliers in the context of a presumed normal distribution.

TODO: Look around for inspiration as to other approaches, e.g.,

Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49(4), 764-766. doi:10.1016/j.jesp.2013.03.013

Code for MAD based outlier detection at: https://raw.githubusercontent.com/hauselin/Rcode/master/detectOutliers.R

See also

Author

David Braze davebraze@gmail.com

Examples

set.seed(1234) x <- rt(50, df=2) bp <- boxplot(x)
x[outliers(x)]
#> 25% 75% #> -3.118453 3.247265
#> [1] -3.158160 9.782400 -3.712073 -4.738422