Models Guide¶
List of Benchmarked Models¶
As of today, available models included in this benchmarking study are
Statistical models
Standard Deviation
Median Absolute Deviation (MAD)
Interquartile Range (IQR)
Z-score
Modified Z-score
Distance-based models
Euclidean Distance
Manhattan Distance
Minkowski Distance
Mahalanobis Distance
Machine-learning models
Isolation Forest
K-Nearest Neighbors (KNN)
Gaussian Mixture Models (GMM)
Local Outlier Factor (LOF)
Principal Component Analysis (PCA)
Autoencoders (AE)
Hyperparameter Tuning¶
In this study, we implemented hyperparameter tuning for unsupervised anomaly detection models using two different methods:
Leveraging meta-learning with labelled outliers to train a model with hyperparameter tuning (training dataset) and subsequenty predict the model performance on new, unlabeled datasets (test dataset) and
Using the output from the unsupervised model (e.g., predicted inliers from all anomaly detection models) as features for a supervised task, and adjust the unsupervised model’s hyperparameters to maximize the performance of the downstream supervised model.
Instead of blindly trying random hyperparameters as in random search or computing expensive hyperparameter searching as in grid search, we implemented Bayesian optimization to build a probabilistic model of the objective function and to choose the most promising hyperparameters.
Example¶
List of Model Tutorials
- Example (1): Baseline Isolation Forest without Hyperparameter Tuning
- Prerequisites
- Step-1: Load libraries
- Step-2: Load Benchmarking Dataset
- Step-3: Filter Dataset for a Selected Cell
- Step-4: Drop True Labels
- Step-5: Plot Cycle Data without Labels
- Step-6: Statistical Feature Transformation
- Step-7: Physics-informed Feature Extraction
- Step-8: Bubble Plot Visualization
- Step-9: Baseline Isolation Forest (without hyperparameter tuning)
- Step-10: Predict Probabilistic Anomaly Score Map
- Step-11: Model Performance Evaluation
- Step-12: Export Evaluation Metrics
- Example (2): Isolation Forest with Hyperparameter Tuning
- Prerequisites
- Step-1: Load libraries
- Step-2: Load Benchmarking Dataset
- Step-3: Filter Dataset for a Selected Cell
- Step-4: Load Benchmarking Dataset for Selected Cell
- Step-5: Drop True Labels
- Step-6: Plot Cycle Data without Labels
- Step-7: Load the Pre-computed Training Features
- Step-8: Hyperparameter Tuning with Optuna
- Step-9: Aggregate Best Trials
- Step-10: Evaluate Percentage of Perfect Recall and Precision
- Step-11: Plot Pareto Front
- Step-12: Export Hyperparameters to CSV
- Step-13: Train Model with Best Trial Parameters
- Step-14: Predict Probabilistic Anomaly Score Map
- Step-15: Model Performance Evaluation
- Step-16: Export Model Performance Metrics
- Step-17: Verify with True Labels
- Example (3): Baseline Autoencoder without Hyperparameter Tuning (Tohoku Dataset)
- Prerequisites
- Step-1: Load libraries
- Step-2: Load Tohoku Benchmarking Dataset
- Step-3: Filter Dataset for a Selected Cell
- Step-4: Drop True Labels
- Step-5: Plot Capacity Fade Without Labels
- Step-6: Feature Engineering with Mahalanobis Distance
- Step-7: Baseline Autoencoder (without hyperparameter tuning)
- Step-8: Predict Probabilistic Anomaly Score Map
- Step-9: Model Performance Evaluation
- Step-10: Export Evaluation Metrics
- Step-11: Visualize Predicted Anomalies
- Example (4): Autoencoder with Hyperparameter Tuning (Tohoku Dataset)
- Prerequisites
- Step-1: Load libraries
- Step-2: Load Benchmarking Dataset
- Step-3: Filter Dataset for a Selected Cell
- Step-4: Drop True Labels
- Step-5: Plot Cycle Capacity Fade without Labels
- Step-6: Feature Transformation
- Step-7: Hyperparameter Tuning with Optuna
- Step-8: Aggregate Best Trials
- Step-9: Evaluate Percentage of Perfect Recall and Precision
- Step-10: Plot Pareto Front
- Step-11: Export Hyperparameters to CSV
- Step-12: Train Model with Best Trial Parameters
- Step-13: Predict Probabilistic Anomaly Score Map
- Step-14: Model Performance Evaluation
- Step-15: Export Model Performance Metrics
- Step-16: Visualize Predicted Anomalies
- Example(5): Distance Based Anomaly Detection using Euclidean Distance
- Example(6): K-Nearest Neighbors with Hyperparameter Tuning using Proxy Evaluation Metrics
- Step-1: Load libraries
- Step-2: Load Benchmarking Dataset
- Step-3: Load the Features DB
- Step-4: Hyperparameter Tuning with Optuna using Proxy Metrics
- Step-5: Aggregate Best Hyperparameters
- Step-6: Train Model with Best Hyperparameters
- Step-8: Predict Anomaly Score Map
- Step-9: Model Performance Evaluation