osbad.model¶
Utilities for anomaly detection benchmarking on battery cell data.
This module provides the ModelRunner class to orchestrate data
preparation, prediction, evaluation, and visualization for a single cell.
It supports several PyOD models via the PyODModelType type alias
and produces publication-ready figures saved per cell in an artifacts
directory.
- Key features:
create_model_x_input: Extracts selected feature columns from the input DataFrame and stores them asself.Xdata.pred_outlier_indices_from_proba: Identifies indices of predicted outliers based on probability outputs and a decision threshold.evaluate_indices: Compares predicted outliers against benchmark labels and computes recall and precision scores.create_2d_mesh_grid: Builds a 2D mesh grid from the first two features ofself.Xdatafor plotting decision surfaces.predict_anomaly_score_map: Fits a PyOD model, visualizes anomaly probabilities with a decision boundary, highlights predicted outliers, and saves the resulting figure to the cell’s artifact directory.
from osbad.model import ModelRunner
Module Contents¶
- osbad.model.SELECTED_FEATURE_COLS = ('log_max_diff_dQ', 'log_max_diff_dV')¶
- osbad.model.PyODModelType¶
Type alias for supported PyOD anomaly detection models.
This alias groups together the most commonly used PyOD estimators for outlier detection. It allows functions and methods to accept any of these model types without explicitly listing each one in the type annotations.
- Models included:
IForest (Isolation Forest)
KNN (k-Nearest Neighbors)
GMM (Gaussian Mixture Model)
LOF (Local Outlier Factor)
PCA (Principal Component Analysis for outliers)
AutoEncoder (Neural-network-based autoencoder for anomalies)
- class osbad.model.ModelRunner(cell_label: str, df_input_features: fireducks.pandas.DataFrame, selected_feature_cols: Tuple | List)¶
- df_input_features¶
- selected_features¶
- create_model_x_input()¶
Extract selected feature columns as a NumPy array.
This method selects the feature columns specified in
self.selected_featuresfrom the input DataFrameself.df_input_features. The values are converted into a NumPy array and stored inself.Xdatafor downstream model training and visualization.- Parameters:
None
- Returns:
Array of shape (n_samples, n_features) containing the selected feature values.
- Return type:
np.ndarray
selected_feature_cols = ( "log_max_diff_dQ", "log_max_diff_dV") runner = ModelRunner( cell_label=selected_cell_label, df_input_features=df_features_per_cell, selected_feature_cols=selected_feature_cols ) Xdata = runner.create_model_x_input()
- pred_outlier_indices_from_proba(proba: numpy.ndarray, threshold: float, outlier_col: int = 1) numpy.ndarray¶
Identify outlier sample indices from probability predictions.
In PyOD, probability output has shape (n_samples, 2), where:
column 0 = inlier probability
column 1 = outlier probability
This function selects all indices where the outlier probability is greater than or equal to the given threshold.
- Parameters:
proba (np.ndarray) – Array of shape (n_samples, 2) with predicted probabilities.
threshold (float) – Probability threshold above which a sample is flagged as outlier.
outlier_col (int, optional) – Column index for outlier probability. Defaults to 1.
- Returns:
Array of indices for samples classified as outliers.
- Return type:
np.ndarray
- evaluate_indices(df_benchmark_dataset: fireducks.pandas.DataFrame, pred_indices: numpy.ndarray) Tuple[float, float]¶
Evaluate predicted outlier indices against benchmark labels.
Uses
modval.evaluate_pred_outliersto align predicted outlier indices with the benchmark dataset. The resulting DataFrame is expected to contain the columnstrue_outlierandpred_outlier. Recall and precision are then computed from these columns.- Parameters:
df_benchmark_dataset (pd.DataFrame) – Benchmark dataset containing ground-truth outlier labels.
pred_indices (np.ndarray) – Indices of predicted outliers from the model.
- Returns:
recall: Fraction of true outliers correctly identified.
precision: Fraction of predicted outliers that are true.
- Return type:
Tuple[float, float]
Note
Both recall and precision use
zero_division=0. This means that if there is a zero division, for example when the denominator is zero (TP + FP = 0 for precision) or (TP + FN = 0 for recall), the calculatedrecall_scoreorprecision_scorewill be zero.
- proxy_evaluate_indices(pred_indices: numpy.ndarray, cycle_idx: numpy.ndarray, features: numpy.ndarray) Tuple[float, float]¶
Evaluates the quality of predicted outlier indices using a proxy regression-based approach and calculates proxy evaluation metrics like, regression loss score and inlier count score by fitting a linear regression model on the predicted inlier data.
- Parameters:
pred_indices (np.ndarray) – Indices of predicted outliers from the model.
cycle_idx (np.ndarray) – All predictor cycle indices or cycle numbers for proxy regression model obtained from model input Xdata if ‘cycle_index’ is one of the features selected to be extracted from
self.df_input_features.features – (np.ndarray): All target features like voltage feature or capacity discharge feature obtained from model input Xdata, apart from the predictor ‘cycle_index’ feature.
- Returns:
loss_score: Normalized MSE regression loss for predicted
inliers. A lower value (closer to 0) indicates that the outlier detection model performed well in excluding true positives (points that were indeed outliers. A value closer to 1 implies model was unable to remove true positives.
inlier_score: Normalized inlier count score.
It represents the proportion of data points retained after excluding predicted outliers. A higher value (closer to 1) means fewer points were removed, while a lower value indicates more aggressive outlier removal.
- Return type:
Tuple[float, float]
- create_2d_mesh_grid(square_grid: bool = True, grid_offset: int | float = 1)¶
Create a 2D mesh grid for visualization of anomaly scores.
This function generates a square mesh grid covering the range of the first two features in
self.Xdata. The grid is expanded by ±1 unit beyond the min and max values to ensure full coverage. The resulting grid can be used for plotting decision boundaries and anomaly score heatmaps.- Parameters:
None
- Returns:
xx (np.ndarray): 2D array of x-coordinates.
yy (np.ndarray): 2D array of y-coordinates.
meshgrid (np.ndarray): Flattened 2D grid of shape (n_points, 2) where each row is a (x, y) coordinate pair.
- Return type:
Tuple[np.ndarray, np.ndarray, np.ndarray]
- predict_anomaly_score_map(selected_model: PyODModelType, model_name: str, xoutliers: fireducks.pandas.Series, youtliers: fireducks.pandas.Series, pred_outliers_index: numpy.ndarray, threshold: float = 0.7, square_grid=True, grid_offset=1)¶
Plot a 2D anomaly score map with decision boundaries.
This function fits the selected PyOD model to the dataset and visualizes the predicted anomaly scores across a 2D mesh grid. It creates a contour plot showing anomaly probabilities, a dashed decision boundary at the specified threshold, and highlights the predicted anomalous cycles. Annotations and a legend are added to label outliers, and the figure is saved in the artifacts directory.
- Parameters:
selected_model (PyODModelType) – Trained PyOD model used to predict anomaly scores.
model_name (str) – Name of the model, used as the plot title and in the output filename.
xoutliers (pd.Series) – x-coordinates of predicted anomalous samples.
youtliers (pd.Series) – y-coordinates of predicted anomalous samples.
pred_outliers_index (np.ndarray) – Indices of the predicted anomalous samples.
threshold (np.float64, optional) – Probability threshold for the anomaly decision boundary. Defaults to 0.7.
- Returns:
Axes object containing the anomaly score contour plot.
- Return type:
matplotlib.axes.Axes
axplot = runner.predict_anomaly_score_map( selected_model=model, model_name="Isolation Forest", xoutliers=df_outliers_pred["log_max_diff_dQ"], youtliers=df_outliers_pred["log_max_diff_dV"], pred_outliers_index=pred_outlier_indices, threshold=param_dict["threshold"] )
Note
- Currently supported PyOD models in this benchmarking study are:
pyod.models.iforest.IForestpyod.models.knn.KNNpyod.models.gmm.GMMpyod.models.lof.LOFpyod.models.pca.PCApyod.models.auto_encoder.Autoencoder