This function finds outliers in a dataset using quantile random forests.
Usage
outqrf(
data,
quantiles_type = 1000,
threshold = 0.025,
impute = TRUE,
verbose = 1,
weight = FALSE,
...
)
Arguments
- data
a data frame
- quantiles_type
'1000':seq(from = 0.001, to = 0.999, by = 0.001), '400':seq(0.0025,0.9975,0.0025)
- threshold
a threshold for outlier detection
- impute
a boolean value indicating whether to impute missing values
- verbose
a boolean value indicating whether to print verbose output
- weight
a boolean value indicating whether to use weight. if TRUE, The actual threshold will be threshold*r2.
- ...
additional arguments passed to the ranger function
Value
An object of class "outqrf" and a list with the following elements.
Data
: Original data set in unchanged row orderoutliers
: Compact representation of outliers. Each row corresponds to an outlier and contains the following columns:row
: Row number of the outliercol
: Variable name of the outlierobserved
: value of the outlierpredicted
: predicted value of the outlierrank
: Rank of the outlier
outMatrix
: Predicted value at different quantiles for each observationr.squared
: R-squared value of the quantile random forest modeloutMatrix
: Predicted value at different quantiles for each observationr.squared
: R-squared value of the quantile random forest modeloob.error
: Out-of-bag error of the quantile random forest modelrmse
: RMSE of the quantile random forest modelthreshold
: Threshold for outlier detection
Examples
iris_with_outliers <- generateOutliers(iris, p=0.05)
qrf = outqrf(iris_with_outliers)
#>
#> Outlier identification by quantiles random forests
#>
#> Variables to check: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
#> Variables used to check: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species
#>
#> Checking: Sepal.Length Sepal.Width Petal.Length Petal.Width
qrf$outliers
#> row col observed predicted rank
#> 1 21 Sepal.Length -0.3 4.8 0.005
#> 2 48 Sepal.Length -3.1 4.7 0.001
#> 3 101 Sepal.Length 0.1 6.7 0.001
#> 4 9 Sepal.Width -1.7 3.2 0.001
#> 5 118 Sepal.Width 16.5 3.0 0.999
#> 6 127 Sepal.Width -2.5 2.7 0.001
#> 7 12 Petal.Length 15.4 1.7 0.999
#> 8 30 Petal.Length -6.6 1.3 0.001
#> 9 47 Petal.Length 13.8 1.5 0.993
#> 10 140 Petal.Length 18.6 5.8 0.999
#> 11 13 Petal.Width 12.9 0.3 0.999
#> 12 56 Petal.Width 15.2 1.7 0.999
#> 13 69 Petal.Width -9.3 1.0 0.001
#> 14 147 Petal.Width 14.1 1.8 0.999
evaluateOutliers(iris,iris_with_outliers,qrf$outliers)
#> Actual Predicted Cover Coverage Efficiency
#> 32.00 14.00 14.00 0.44 1.00