mad_outliers
- typhon.math.array.mad_outliers(arr, cutoff=10, mad0='raise')[source]
Mask out mad outliers
Mask out any values that are more than N times the median absolute devitation from the median.
Although I (Gerrit Holl) came up with this myself, it’s also documented at:
http://eurekastatistics.com/using-the-median-absolute-deviation-to-find-outliers/
except that I rolled by own approach for “what if mad==0”.
Note: If all values except one are constant, it is not possible to determine whether the remaining one is an outlier or “reasonably close” to the rest, without additional hints. In this case, some outliers may go unnoticed.
- Parameters:
arr (numpy.ndarray) – n-D array with numeric dtype
cutoff (int) – Maximum tolerable normalised fractional distance
mad0 (str) – What to do if mad=0. Can be ‘raise’, ‘ignore’, or ‘perc’. In case of ‘perc’, will search for the lowest percentile at which the percentile absolute deviation is nonzero, increase the cutoff by the fractional approach toward percentile 100, and use that percentile instead. So if the first non-zero is at percentile 75%, it will use the 75th-percntile-absolute-deviation and increase the cutoff by a factor (100 - 50)/(100 - 75).
- Returns:
ndarray with bool dtype, True for outliers