osbad.model¶

Utilities for anomaly detection benchmarking on battery cell data.

This module provides the ModelRunner class to orchestrate data preparation, prediction, evaluation, and visualization for a single cell. It supports several PyOD models via the PyODModelType type alias and produces publication-ready figures saved per cell in an artifacts directory.

Key features:

create_model_x_input: Extracts selected feature columns from the input DataFrame and stores them as self.Xdata.
pred_outlier_indices_from_proba: Identifies indices of predicted outliers based on probability outputs and a decision threshold.
evaluate_indices: Compares predicted outliers against benchmark labels and computes recall and precision scores.
create_2d_mesh_grid: Builds a 2D mesh grid from the first two features of self.Xdata for plotting decision surfaces.
predict_anomaly_score_map: Fits a PyOD model, visualizes anomaly probabilities with a decision boundary, highlights predicted outliers, and saves the resulting figure to the cell’s artifact directory.

from osbad.model import ModelRunner

Module Contents¶

osbad.model.ROOT_DIR¶

osbad.model.PATH_TO_ENV_VARIABLE¶

osbad.model.USE_LATEX¶

osbad.model.USE_LATEX = True¶

osbad.model.SELECTED_FEATURE_COLS = ('log_max_diff_dQ', 'log_max_diff_dV')¶

osbad.model.PyODModelType¶

Type alias for supported PyOD anomaly detection models.

This alias groups together the most commonly used PyOD estimators for outlier detection. It allows functions and methods to accept any of these model types without explicitly listing each one in the type annotations.

Models included:

IForest (Isolation Forest)
KNN (k-Nearest Neighbors)
GMM (Gaussian Mixture Model)
LOF (Local Outlier Factor)
PCA (Principal Component Analysis for outliers)
AutoEncoder (Neural-network-based autoencoder for anomalies)

class osbad.model.ModelRunner(cell_label: str, df_input_features: pandas.DataFrame, selected_feature_cols: Tuple | List)¶

df_input_features¶

selected_features¶

create_model_x_input()¶

Extract selected feature columns as a NumPy array.

This method selects the feature columns specified in self.selected_features from the input DataFrame self.df_input_features. The values are converted into a NumPy array and stored in self.Xdata for downstream model training and visualization.

Parameters:: None
Returns:: Array of shape (n_samples, n_features) containing the selected feature values.
Return type:: np.ndarray

selected_feature_cols = (
    "log_max_diff_dQ",
    "log_max_diff_dV")

runner = ModelRunner(
    cell_label=selected_cell_label,
    df_input_features=df_features_per_cell,
    selected_feature_cols=selected_feature_cols
)

Xdata = runner.create_model_x_input()

pred_outlier_indices_from_proba(proba: numpy.ndarray, threshold: float, outlier_col: int = 1) → numpy.ndarray¶

Identify outlier sample indices from probability predictions.

In PyOD, probability output has shape (n_samples, 2), where:

column 0 = inlier probability
column 1 = outlier probability

This function selects all indices where the outlier probability is greater than or equal to the given threshold.

Parameters:

proba (np.ndarray) – Array of shape (n_samples, 2) with predicted probabilities.
threshold (float) – Probability threshold above which a sample is flagged as outlier.
outlier_col (int, optional) – Column index for outlier probability. Defaults to 1.

Returns:

Array of indices for samples classified as outliers.

Return type:

np.ndarray

evaluate_indices(df_benchmark_dataset: pandas.DataFrame, pred_indices: numpy.ndarray) → Tuple[float, float]¶

Evaluate predicted outlier indices against benchmark labels.

Uses modval.evaluate_pred_outliers to align predicted outlier indices with the benchmark dataset. The resulting DataFrame is expected to contain the columns true_outlier and pred_outlier. Recall and precision are then computed from these columns.

Parameters:

df_benchmark_dataset (pd.DataFrame) – Benchmark dataset containing ground-truth outlier labels.
pred_indices (np.ndarray) – Indices of predicted outliers from the model.

Returns:

recall: Fraction of true outliers correctly identified.
precision: Fraction of predicted outliers that are true.

Return type:

Tuple[float, float]

Note

Both recall and precision use zero_division=0. This means that if there is a zero division, for example when the denominator is zero (TP + FP = 0 for precision) or (TP + FN = 0 for recall), the calculated recall_score or precision_score will be zero.

proxy_evaluate_indices(pred_indices: numpy.ndarray, cycle_idx: numpy.ndarray, features: numpy.ndarray) → Tuple[float, float]¶

Evaluates the quality of predicted outlier indices using a proxy regression-based approach and calculates proxy evaluation metrics like, regression loss score and inlier count score by fitting a linear regression model on the predicted inlier data.

Parameters:

pred_indices (np.ndarray) – Indices of predicted outliers from the model.
cycle_idx (np.ndarray) – All predictor cycle indices or cycle numbers for proxy regression model obtained from model input Xdata if ‘cycle_index’ is one of the features selected to be extracted from self.df_input_features.
features – (np.ndarray): All target features like voltage feature or capacity discharge feature obtained from model input Xdata, apart from the predictor ‘cycle_index’ feature.

Returns:

A tuple containing loss_score and inlier_score.

Return type:

Tuple[float, float]

Note

loss_score: Normalized MSE regression loss for predicted inliers. A lower value (closer to 0) indicates that the outlier detection model performed well in excluding true positives (points that were indeed outliers. A value closer to 1 implies model was unable to remove true positives.
inlier_score: Normalized inlier count score. It represents the proportion of data points retained after excluding predicted outliers. A higher value (closer to 1) means fewer points were removed, while a lower value indicates more aggressive outlier removal.

create_2d_mesh_grid(square_grid: bool = True, grid_offset: int | float = 1)¶

Create a mesh grid spanning the first two feature dimensions.

Generates linearly spaced grids from self.Xdata that are padded by grid_offset units. When square_grid is True, both axes share the combined min and max range to form a square plotting area.

Parameters:

square_grid (bool, optional) – Enforce equal axis bounds for a square grid. Defaults to True.
grid_offset (Union[int, float], optional) – Margin added to each axis bound to expand the plotting coverage. Defaults to 1.

Returns:

xx and yy mesh arrays plus a flattened meshgrid suitable for contour evaluation.

Return type:

Tuple[np.ndarray, np.ndarray, np.ndarray]

predict_anomaly_score_map(selected_model: PyODModelType, model_name: str, xoutliers: pandas.Series, youtliers: pandas.Series, pred_outliers_index: numpy.ndarray, threshold: float = 0.7, square_grid=True, grid_offset=1)¶

Plot a 2D anomaly score map with decision boundaries.

This function fits the selected PyOD model to the dataset and visualizes the predicted anomaly scores across a 2D mesh grid. It creates a contour plot showing anomaly probabilities, a dashed decision boundary at the specified threshold, and highlights the predicted anomalous cycles. Annotations and a legend are added to label outliers, and the figure is saved in the artifacts directory.

Parameters:

selected_model (PyODModelType) – Trained PyOD model used to predict anomaly scores.
model_name (str) – Name of the model, used as the plot title and in the output filename.
xoutliers (pd.Series) – x-coordinates of predicted anomalous samples.
youtliers (pd.Series) – y-coordinates of predicted anomalous samples.
pred_outliers_index (np.ndarray) – Indices of the predicted anomalous samples.
threshold (np.float64, optional) – Probability threshold for the anomaly decision boundary. Defaults to 0.7.
square_grid (bool, optional) – Use shared bounds for both axes to render a square mesh. Defaults to True.
grid_offset (Union[int, float], optional) – Margin added to axis limits when constructing the mesh grid. Defaults to 1.

Returns:

Axes object containing the anomaly score contour plot.

Return type:

matplotlib.axes.Axes

axplot = runner.predict_anomaly_score_map(
    selected_model=model,
    model_name="Isolation Forest",
    xoutliers=df_outliers_pred["log_max_diff_dQ"],
    youtliers=df_outliers_pred["log_max_diff_dV"],
    pred_outliers_index=pred_outlier_indices,
    threshold=param_dict["threshold"],
    square_grid=True,
    grid_offset=1
)

Note

Currently supported PyOD models in this benchmarking study are:

pyod.models.iforest.IForest
pyod.models.knn.KNN
pyod.models.gmm.GMM
pyod.models.lof.LOF
pyod.models.pca.PCA
pyod.models.auto_encoder.Autoencoder