Skip to contents

This function finds outliers in a dataset using quantile random forests.

Usage

outqrf(
  data,
  quantiles_type = 1000,
  threshold = 0.025,
  impute = TRUE,
  verbose = 1,
  weight = FALSE,
  ...
)

Arguments

data

a data frame

quantiles_type

'1000':seq(from = 0.001, to = 0.999, by = 0.001), '400':seq(0.0025,0.9975,0.0025)

threshold

a threshold for outlier detection

impute

a boolean value indicating whether to impute missing values

verbose

a boolean value indicating whether to print verbose output

weight

a boolean value indicating whether to use weight. if TRUE, The actual threshold will be threshold*r2.

...

additional arguments passed to the ranger function

Value

An object of class "outqrf" and a list with the following elements.

  • Data: Original data set in unchanged row order

  • outliers: Compact representation of outliers. Each row corresponds to an outlier and contains the following columns:

    • row: Row number of the outlier

    • col: Variable name of the outlier

    • observed: value of the outlier

    • predicted: predicted value of the outlier

    • rank: Rank of the outlier

  • outMatrix: Predicted value at different quantiles for each observation

  • r.squared: R-squared value of the quantile random forest model

  • outMatrix: Predicted value at different quantiles for each observation

  • r.squared: R-squared value of the quantile random forest model

  • oob.error: Out-of-bag error of the quantile random forest model

  • rmse: RMSE of the quantile random forest model

  • threshold: Threshold for outlier detection

Examples

iris_with_outliers <- generateOutliers(iris, p=0.05)
qrf = outqrf(iris_with_outliers)
#> 
#> Outlier identification by quantiles random forests
#> 
#>   Variables to check:		Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
#>   Variables used to check:	Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species
#> 
#>   Checking: Sepal.Length  Sepal.Width  Petal.Length  Petal.Width  
qrf$outliers
#>    row          col observed predicted  rank
#> 1   21 Sepal.Length     -0.3       4.8 0.005
#> 2   48 Sepal.Length     -3.1       4.7 0.001
#> 3  101 Sepal.Length      0.1       6.7 0.001
#> 4    9  Sepal.Width     -1.7       3.2 0.001
#> 5  118  Sepal.Width     16.5       3.0 0.999
#> 6  127  Sepal.Width     -2.5       2.7 0.001
#> 7   12 Petal.Length     15.4       1.7 0.999
#> 8   30 Petal.Length     -6.6       1.3 0.001
#> 9   47 Petal.Length     13.8       1.5 0.993
#> 10 140 Petal.Length     18.6       5.8 0.999
#> 11  13  Petal.Width     12.9       0.3 0.999
#> 12  56  Petal.Width     15.2       1.7 0.999
#> 13  69  Petal.Width     -9.3       1.0 0.001
#> 14 147  Petal.Width     14.1       1.8 0.999
evaluateOutliers(iris,iris_with_outliers,qrf$outliers)
#>     Actual  Predicted      Cover   Coverage Efficiency 
#>      32.00      14.00      14.00       0.44       1.00