osbad.model =========== .. py:module:: osbad.model .. autoapi-nested-parse:: Utilities for anomaly detection benchmarking on battery cell data. This module provides the :class:`ModelRunner` class to orchestrate data preparation, prediction, evaluation, and visualization for a single cell. It supports several PyOD models via the :data:`PyODModelType` type alias and produces publication-ready figures saved per cell in an artifacts directory. Key features: - ``create_model_x_input``: Extracts selected feature columns from the input DataFrame and stores them as ``self.Xdata``. - ``pred_outlier_indices_from_proba``: Identifies indices of predicted outliers based on probability outputs and a decision threshold. - ``evaluate_indices``: Compares predicted outliers against benchmark labels and computes recall and precision scores. - ``create_2d_mesh_grid``: Builds a 2D mesh grid from the first two features of ``self.Xdata`` for plotting decision surfaces. - ``predict_anomaly_score_map``: Fits a PyOD model, visualizes anomaly probabilities with a decision boundary, highlights predicted outliers, and saves the resulting figure to the cell’s artifact directory. .. code-block:: from osbad.model import ModelRunner Module Contents --------------- .. py:data:: SELECTED_FEATURE_COLS :value: ('log_max_diff_dQ', 'log_max_diff_dV') .. py:data:: PyODModelType Type alias for supported PyOD anomaly detection models. This alias groups together the most commonly used PyOD estimators for outlier detection. It allows functions and methods to accept any of these model types without explicitly listing each one in the type annotations. Models included: - IForest (Isolation Forest) - KNN (k-Nearest Neighbors) - GMM (Gaussian Mixture Model) - LOF (Local Outlier Factor) - PCA (Principal Component Analysis for outliers) - AutoEncoder (Neural-network-based autoencoder for anomalies) .. py:class:: ModelRunner(cell_label: str, df_input_features: fireducks.pandas.DataFrame, selected_feature_cols: Union[Tuple, List]) .. py:attribute:: df_input_features .. py:attribute:: selected_features .. py:method:: create_model_x_input() Extract selected feature columns as a NumPy array. This method selects the feature columns specified in ``self.selected_features`` from the input DataFrame ``self.df_input_features``. The values are converted into a NumPy array and stored in ``self.Xdata`` for downstream model training and visualization. :param None: :returns: Array of shape (n_samples, n_features) containing the selected feature values. :rtype: np.ndarray .. code-block:: selected_feature_cols = ( "log_max_diff_dQ", "log_max_diff_dV") runner = ModelRunner( cell_label=selected_cell_label, df_input_features=df_features_per_cell, selected_feature_cols=selected_feature_cols ) Xdata = runner.create_model_x_input() .. py:method:: pred_outlier_indices_from_proba(proba: numpy.ndarray, threshold: float, outlier_col: int = 1) -> numpy.ndarray Identify outlier sample indices from probability predictions. In PyOD, probability output has shape (n_samples, 2), where: - column 0 = inlier probability - column 1 = outlier probability This function selects all indices where the outlier probability is greater than or equal to the given threshold. :param proba: Array of shape (n_samples, 2) with predicted probabilities. :type proba: np.ndarray :param threshold: Probability threshold above which a sample is flagged as outlier. :type threshold: float :param outlier_col: Column index for outlier probability. Defaults to 1. :type outlier_col: int, optional :returns: Array of indices for samples classified as outliers. :rtype: np.ndarray .. py:method:: evaluate_indices(df_benchmark_dataset: fireducks.pandas.DataFrame, pred_indices: numpy.ndarray) -> Tuple[float, float] Evaluate predicted outlier indices against benchmark labels. Uses ``modval.evaluate_pred_outliers`` to align predicted outlier indices with the benchmark dataset. The resulting DataFrame is expected to contain the columns ``true_outlier`` and ``pred_outlier``. Recall and precision are then computed from these columns. :param df_benchmark_dataset: Benchmark dataset containing ground-truth outlier labels. :type df_benchmark_dataset: pd.DataFrame :param pred_indices: Indices of predicted outliers from the model. :type pred_indices: np.ndarray :returns: - recall: Fraction of true outliers correctly identified. - precision: Fraction of predicted outliers that are true. :rtype: Tuple[float, float] .. note:: Both recall and precision use ``zero_division=0``. This means that if there is a zero division, for example when the denominator is zero (TP + FP = 0 for precision) or (TP + FN = 0 for recall), the calculated ``recall_score`` or ``precision_score`` will be zero. .. py:method:: proxy_evaluate_indices(pred_indices: numpy.ndarray, cycle_idx: numpy.ndarray, features: numpy.ndarray) -> Tuple[float, float] Evaluates the quality of predicted outlier indices using a proxy regression-based approach and calculates proxy evaluation metrics like, regression loss score and inlier count score by fitting a linear regression model on the predicted inlier data. :param pred_indices: Indices of predicted outliers from the model. :type pred_indices: np.ndarray :param cycle_idx: All predictor cycle indices or cycle numbers for proxy regression model obtained from model input Xdata if 'cycle_index' is one of the features selected to be extracted from ``self.df_input_features``. :type cycle_idx: np.ndarray :param features: (np.ndarray): All target features like voltage feature or capacity discharge feature obtained from model input Xdata, apart from the predictor 'cycle_index' feature. :returns: - loss_score: Normalized MSE regression loss for predicted inliers. A lower value (closer to 0) indicates that the outlier detection model performed well in excluding true positives (points that were indeed outliers. A value closer to 1 implies model was unable to remove true positives. - inlier_score: Normalized inlier count score. It represents the proportion of data points retained after excluding predicted outliers. A higher value (closer to 1) means fewer points were removed, while a lower value indicates more aggressive outlier removal. :rtype: Tuple[float, float] .. py:method:: create_2d_mesh_grid(square_grid: bool = True, grid_offset: Union[int, float] = 1) Create a 2D mesh grid for visualization of anomaly scores. This function generates a square mesh grid covering the range of the first two features in ``self.Xdata``. The grid is expanded by ±1 unit beyond the min and max values to ensure full coverage. The resulting grid can be used for plotting decision boundaries and anomaly score heatmaps. :param None: :returns: - xx (np.ndarray): 2D array of x-coordinates. - yy (np.ndarray): 2D array of y-coordinates. - meshgrid (np.ndarray): Flattened 2D grid of shape (n_points, 2) where each row is a (x, y) coordinate pair. :rtype: Tuple[np.ndarray, np.ndarray, np.ndarray] .. py:method:: predict_anomaly_score_map(selected_model: PyODModelType, model_name: str, xoutliers: fireducks.pandas.Series, youtliers: fireducks.pandas.Series, pred_outliers_index: numpy.ndarray, threshold: float = 0.7, square_grid=True, grid_offset=1) Plot a 2D anomaly score map with decision boundaries. This function fits the selected PyOD model to the dataset and visualizes the predicted anomaly scores across a 2D mesh grid. It creates a contour plot showing anomaly probabilities, a dashed decision boundary at the specified threshold, and highlights the predicted anomalous cycles. Annotations and a legend are added to label outliers, and the figure is saved in the artifacts directory. :param selected_model: Trained PyOD model used to predict anomaly scores. :type selected_model: PyODModelType :param model_name: Name of the model, used as the plot title and in the output filename. :type model_name: str :param xoutliers: x-coordinates of predicted anomalous samples. :type xoutliers: pd.Series :param youtliers: y-coordinates of predicted anomalous samples. :type youtliers: pd.Series :param pred_outliers_index: Indices of the predicted anomalous samples. :type pred_outliers_index: np.ndarray :param threshold: Probability threshold for the anomaly decision boundary. Defaults to 0.7. :type threshold: np.float64, optional :returns: Axes object containing the anomaly score contour plot. :rtype: matplotlib.axes.Axes .. code-block:: axplot = runner.predict_anomaly_score_map( selected_model=model, model_name="Isolation Forest", xoutliers=df_outliers_pred["log_max_diff_dQ"], youtliers=df_outliers_pred["log_max_diff_dV"], pred_outliers_index=pred_outlier_indices, threshold=param_dict["threshold"] ) .. note:: Currently supported PyOD models in this benchmarking study are: * ``pyod.models.iforest.IForest`` * ``pyod.models.knn.KNN`` * ``pyod.models.gmm.GMM`` * ``pyod.models.lof.LOF`` * ``pyod.models.pca.PCA`` * ``pyod.models.auto_encoder.Autoencoder``