Skip to contents

Create 2D visualization of anomalies using PCA or UMAP dimensionality reduction. Anomalies are highlighted in red, normal points in blue. For large datasets (>3000 points), automatic sampling is applied to improve speed.

Usage

plot_anomaly_projection(
  model,
  data,
  method = "mad",
  dim_reduction = c("pca", "umap"),
  contamination = 0.05,
  point_size = 2,
  point_alpha = 0.6,
  umap_n_neighbors = 15,
  umap_min_dist = 0.1,
  sample_rate = 0.05
)

Arguments

model

An isoForest model object

data

The original data (must be numeric)

method

Threshold method for anomaly detection (default: "mad")

dim_reduction

Dimensionality reduction method: "pca" or "umap" (default: "pca")

contamination

Contamination rate (only used if method = "contamination")

point_size

Size of points in the plot (default: 2)

point_alpha

Transparency of points (default: 0.6)

umap_n_neighbors

Number of neighbors for UMAP (default: 15)

umap_min_dist

Minimum distance for UMAP (default: 0.1)

sample_rate

Target anomaly rate in the displayed data (default: 0.05). The function will sample normal points so that anomalies represent this proportion of total displayed points. Set to NULL to disable sampling. For example, if sample_rate = 0.05 and there are 100 anomalies, the total displayed points will be approximately 2000 (100/0.05).

Value

A ggplot2 object showing the 2D projection with anomalies in red

Examples

if (FALSE) { # \dontrun{
# Using PCA for dimensionality reduction
model <- isoForest(iris[1:4])
plot_pca <- plot_anomaly_projection(model, iris[1:4], dim_reduction = "pca")
print(plot_pca)

# Using UMAP for dimensionality reduction
plot_umap <- plot_anomaly_projection(model, iris[1:4], dim_reduction = "umap")
print(plot_umap)

# For large datasets, automatic sampling is applied
# large_data <- matrix(rnorm(5000 * 5), ncol = 5)
# model <- isoForest(large_data)
# plot_anomaly_projection(model, large_data)  # Samples so anomalies = 5% of display

# Custom sample rate: show fewer points (anomalies = 10% of display)
# plot_anomaly_projection(model, large_data, sample_rate = 0.10)
} # }