osbad.model¶
Utilities for anomaly detection benchmarking on battery cell data.
This module provides the ModelRunner class to orchestrate data
preparation, prediction, evaluation, and visualization for a single cell.
It supports several PyOD models via the PyODModelType type alias
and produces publication-ready figures saved per cell in an artifacts
directory.
- Key features:
create_model_x_input: Extracts selected feature columns from the input DataFrame and stores them asself.Xdata.pred_outlier_indices_from_proba: Identifies indices of predicted outliers based on probability outputs and a decision threshold.evaluate_indices: Compares predicted outliers against benchmark labels and computes recall and precision scores.create_2d_mesh_grid: Builds a 2D mesh grid from the first two features ofself.Xdatafor plotting decision surfaces.predict_anomaly_score_map: Fits a PyOD model, visualizes anomaly probabilities with a decision boundary, highlights predicted outliers, and saves the resulting figure to the cell’s artifact directory.
from osbad.model import ModelRunner
Module Contents¶
- osbad.model.ROOT_DIR¶
- osbad.model.PATH_TO_ENV_VARIABLE¶
- osbad.model.USE_LATEX¶
- osbad.model.USE_LATEX = True¶
- osbad.model.SELECTED_FEATURE_COLS = ('log_max_diff_dQ', 'log_max_diff_dV')¶
- osbad.model.PyODModelType¶
Type alias for supported PyOD anomaly detection models.
This alias groups together the most commonly used PyOD estimators for outlier detection. It allows functions and methods to accept any of these model types without explicitly listing each one in the type annotations.
- Models included:
IForest (Isolation Forest)
KNN (k-Nearest Neighbors)
GMM (Gaussian Mixture Model)
LOF (Local Outlier Factor)
PCA (Principal Component Analysis for outliers)
AutoEncoder (Neural-network-based autoencoder for anomalies)
- class osbad.model.ModelRunner(cell_label: str, df_input_features: pandas.DataFrame, selected_feature_cols: Tuple | List)¶
- df_input_features¶
- selected_features¶
- create_model_x_input()¶
Extract selected feature columns as a NumPy array.
This method selects the feature columns specified in
self.selected_featuresfrom the input DataFrameself.df_input_features. The values are converted into a NumPy array and stored inself.Xdatafor downstream model training and visualization.- Parameters:
None
- Returns:
Array of shape (n_samples, n_features) containing the selected feature values.
- Return type:
np.ndarray
selected_feature_cols = ( "log_max_diff_dQ", "log_max_diff_dV") runner = ModelRunner( cell_label=selected_cell_label, df_input_features=df_features_per_cell, selected_feature_cols=selected_feature_cols ) Xdata = runner.create_model_x_input()
- pred_outlier_indices_from_proba(proba: numpy.ndarray, threshold: float, outlier_col: int = 1) numpy.ndarray¶
Identify outlier sample indices from probability predictions.
In PyOD, probability output has shape (n_samples, 2), where:
column 0 = inlier probability
column 1 = outlier probability
This function selects all indices where the outlier probability is greater than or equal to the given threshold.
- Parameters:
proba (np.ndarray) – Array of shape (n_samples, 2) with predicted probabilities.
threshold (float) – Probability threshold above which a sample is flagged as outlier.
outlier_col (int, optional) – Column index for outlier probability. Defaults to 1.
- Returns:
Array of indices for samples classified as outliers.
- Return type:
np.ndarray
- evaluate_indices(df_benchmark_dataset: pandas.DataFrame, pred_indices: numpy.ndarray) Tuple[float, float]¶
Evaluate predicted outlier indices against benchmark labels.
Uses
modval.evaluate_pred_outliersto align predicted outlier indices with the benchmark dataset. The resulting DataFrame is expected to contain the columnstrue_outlierandpred_outlier. Recall and precision are then computed from these columns.- Parameters:
df_benchmark_dataset (pd.DataFrame) – Benchmark dataset containing ground-truth outlier labels.
pred_indices (np.ndarray) – Indices of predicted outliers from the model.
- Returns:
recall: Fraction of true outliers correctly identified.
precision: Fraction of predicted outliers that are true.
- Return type:
Tuple[float, float]
Note
Both recall and precision use
zero_division=0. This means that if there is a zero division, for example when the denominator is zero (TP + FP = 0 for precision) or (TP + FN = 0 for recall), the calculatedrecall_scoreorprecision_scorewill be zero.
- proxy_evaluate_indices(pred_indices: numpy.ndarray, cycle_idx: numpy.ndarray, features: numpy.ndarray) Tuple[float, float]¶
Evaluates the quality of predicted outlier indices using a proxy regression-based approach and calculates proxy evaluation metrics like, regression loss score and inlier count score by fitting a linear regression model on the predicted inlier data.
- Parameters:
pred_indices (np.ndarray) – Indices of predicted outliers from the model.
cycle_idx (np.ndarray) – All predictor cycle indices or cycle numbers for proxy regression model obtained from model input Xdata if ‘cycle_index’ is one of the features selected to be extracted from
self.df_input_features.features – (np.ndarray): All target features like voltage feature or capacity discharge feature obtained from model input Xdata, apart from the predictor ‘cycle_index’ feature.
- Returns:
A tuple containing loss_score and inlier_score.
- Return type:
Tuple[float, float]
Note
loss_score: Normalized MSE regression loss for predicted inliers. A lower value (closer to 0) indicates that the outlier detection model performed well in excluding true positives (points that were indeed outliers. A value closer to 1 implies model was unable to remove true positives.
inlier_score: Normalized inlier count score. It represents the proportion of data points retained after excluding predicted outliers. A higher value (closer to 1) means fewer points were removed, while a lower value indicates more aggressive outlier removal.
- create_2d_mesh_grid(square_grid: bool = True, grid_offset: int | float = 1)¶
Create a mesh grid spanning the first two feature dimensions.
Generates linearly spaced grids from
self.Xdatathat are padded bygrid_offsetunits. Whensquare_gridis True, both axes share the combined min and max range to form a square plotting area.- Parameters:
square_grid (bool, optional) – Enforce equal axis bounds for a square grid. Defaults to True.
grid_offset (Union[int, float], optional) – Margin added to each axis bound to expand the plotting coverage. Defaults to 1.
- Returns:
xxandyymesh arrays plus a flattenedmeshgridsuitable for contour evaluation.- Return type:
Tuple[np.ndarray, np.ndarray, np.ndarray]
- predict_anomaly_score_map(selected_model: PyODModelType, model_name: str, xoutliers: pandas.Series, youtliers: pandas.Series, pred_outliers_index: numpy.ndarray, threshold: float = 0.7, square_grid=True, grid_offset=1)¶
Plot a 2D anomaly score map with decision boundaries.
This function fits the selected PyOD model to the dataset and visualizes the predicted anomaly scores across a 2D mesh grid. It creates a contour plot showing anomaly probabilities, a dashed decision boundary at the specified threshold, and highlights the predicted anomalous cycles. Annotations and a legend are added to label outliers, and the figure is saved in the artifacts directory.
- Parameters:
selected_model (PyODModelType) – Trained PyOD model used to predict anomaly scores.
model_name (str) – Name of the model, used as the plot title and in the output filename.
xoutliers (pd.Series) – x-coordinates of predicted anomalous samples.
youtliers (pd.Series) – y-coordinates of predicted anomalous samples.
pred_outliers_index (np.ndarray) – Indices of the predicted anomalous samples.
threshold (np.float64, optional) – Probability threshold for the anomaly decision boundary. Defaults to 0.7.
square_grid (bool, optional) – Use shared bounds for both axes to render a square mesh. Defaults to True.
grid_offset (Union[int, float], optional) – Margin added to axis limits when constructing the mesh grid. Defaults to 1.
- Returns:
Axes object containing the anomaly score contour plot.
- Return type:
matplotlib.axes.Axes
axplot = runner.predict_anomaly_score_map( selected_model=model, model_name="Isolation Forest", xoutliers=df_outliers_pred["log_max_diff_dQ"], youtliers=df_outliers_pred["log_max_diff_dV"], pred_outliers_index=pred_outlier_indices, threshold=param_dict["threshold"], square_grid=True, grid_offset=1 )
Note
- Currently supported PyOD models in this benchmarking study are:
pyod.models.iforest.IForestpyod.models.knn.KNNpyod.models.gmm.GMMpyod.models.lof.LOFpyod.models.pca.PCApyod.models.auto_encoder.Autoencoder