osbad.modval¶
Evaluation utilities for benchmarking anomaly detection models.
This module provides functions to compare predicted anomalous cycles against ground-truth labels from a benchmarking dataset. It includes utilities for aligning predictions with labels, visualizing results via confusion matrices, and summarizing performance metrics.
- Key features:
evaluate_pred_outliers: Aligns predicted outlier indices with the benchmarking dataset and produces a DataFrame containing cycle-wise true and predicted outlier labels.generate_confusion_matrix: Generates a customized confusion matrix heatmap, highlighting correct predictions in palegreen and misclassifications in salmon.eval_model_performance: Computes and prints standard evaluation metrics (accuracy, precision, recall, F1-score, Matthews correlation coefficient) and returns them in a single-row DataFrame.
import osbad.modval as modval
Module Contents¶
- osbad.modval.ROOT_DIR¶
- osbad.modval.PATH_TO_ENV_VARIABLE¶
- osbad.modval.USE_LATEX¶
- osbad.modval.USE_LATEX = True¶
- osbad.modval.evaluate_pred_outliers(df_benchmark: pandas.DataFrame, outlier_cycle_index: numpy.ndarray) pandas.DataFrame¶
Evaluate the predicted outliers against the true outliers for each cycle in a new dataframe.
- Parameters:
df_benchmark (pd.DataFrame) – Benchmarking dataset of the selected cell.
outlier_cycle_index (np.ndarray) – Predicted outliers from statistical methods or ML models.
- Returns:
A dataframe with predicted outliers and true outliers from the benchmarking dataset for each cycle.
- Return type:
pd.DataFrame
Example
df_eval_outlier_sd_dV = modval.evaluate_pred_outliers( df_benchmark=df_selected_cell, outlier_cycle_index=std_outlier_dV_index)
- osbad.modval.generate_confusion_matrix(y_true: pandas.Series | numpy.ndarray, y_pred: pandas.Series | numpy.ndarray) matplotlib.axes._axes.Axes¶
Generate a custom confusion matrix for true and false predictions, where the color palegreen indicates true predictions, whereas the color salmon denotes false predictions.
- Parameters:
y_true (pd.Series | np.ndarray) – True outliers from the benchmarking dataset.
y_pred (pd.Series | np.ndarray) – Predicted outliers from the statistical methods or ML models.
- Returns:
Matplotlib axes for additional external customization.
- Return type:
matplotlib.axes._axes.Axes
Example
df_eval_outlier_sd_dV = modval.evaluate_pred_outliers( df_benchmark=df_selected_cell, outlier_cycle_index=std_outlier_dV_index) axplot = modval.generate_confusion_matrix( y_true=np.array(df_eval_outlier_sd_dV["true_outlier"]), y_pred=np.array(df_eval_outlier_sd_dV["pred_outlier"])) fig_title=(r"SD on $\Delta V_\textrm{scaled,max,cyc}$\newline") axplot.set_title(fig_title + "\n", fontsize=16) plt.show()
- osbad.modval.eval_model_performance(model_name, selected_cell_label: str, df_eval_outliers: pandas.DataFrame) pandas.DataFrame¶
Evaluate and summarize model performance metrics.
This function computes model performance metrics (accuracy, precision, recall, F1-score, and Matthews correlation coefficient) using ground-truth and predicted outlier labels. It prints each metric to the console and returns the results as a one-row DataFrame for the specified model and cell.
- Parameters:
model_name (str) – Name of the machine learning model being evaluated.
selected_cell_label (str) – Identifier for the evaluated cell.
df_eval_outliers (pd.DataFrame) – DataFrame containing two columns: -
true_outlier: Ground-truth outlier labels. -pred_outlier: Predicted outlier labels.
- Returns:
Single-row DataFrame with the evaluation metrics and metadata including
ml_modelandcell_index.- Return type:
pd.DataFrame
Example
df_current_eval_metrics = modval.eval_model_performance( model_name="iforest", selected_cell_label=selected_cell_label, df_eval_outliers=df_eval_outlier)
Note
Both
true_outlierandpred_outliermust be binary labels where0= inlier and1= outlier.