osbad.modval ============ .. py:module:: osbad.modval .. autoapi-nested-parse:: Evaluation utilities for benchmarking anomaly detection models. This module provides functions to compare predicted anomalous cycles against ground-truth labels from a benchmarking dataset. It includes utilities for aligning predictions with labels, visualizing results via confusion matrices, and summarizing performance metrics. Key features: - ``evaluate_pred_outliers``: Aligns predicted outlier indices with the benchmarking dataset and produces a DataFrame containing cycle-wise true and predicted outlier labels. - ``generate_confusion_matrix``: Generates a customized confusion matrix heatmap, highlighting correct predictions in palegreen and misclassifications in salmon. - ``eval_model_performance``: Computes and prints standard evaluation metrics (accuracy, precision, recall, F1-score, Matthews correlation coefficient) and returns them in a single-row DataFrame. .. code-block:: python import osbad.modval as modval Module Contents --------------- .. py:data:: ROOT_DIR .. py:data:: PATH_TO_ENV_VARIABLE .. py:data:: USE_LATEX .. py:data:: USE_LATEX :value: True .. py:function:: evaluate_pred_outliers(df_benchmark: pandas.DataFrame, outlier_cycle_index: numpy.ndarray) -> pandas.DataFrame Evaluate the predicted outliers against the true outliers for each cycle in a new dataframe. :param df_benchmark: Benchmarking dataset of the selected cell. :type df_benchmark: pd.DataFrame :param outlier_cycle_index: Predicted outliers from statistical methods or ML models. :type outlier_cycle_index: np.ndarray :returns: A dataframe with predicted outliers and true outliers from the benchmarking dataset for each cycle. :rtype: pd.DataFrame .. rubric:: Example .. code-block:: df_eval_outlier_sd_dV = modval.evaluate_pred_outliers( df_benchmark=df_selected_cell, outlier_cycle_index=std_outlier_dV_index) .. py:function:: generate_confusion_matrix(y_true: Union[pandas.Series, numpy.ndarray], y_pred: Union[pandas.Series, numpy.ndarray]) -> matplotlib.axes._axes.Axes Generate a custom confusion matrix for true and false predictions, where the color palegreen indicates true predictions, whereas the color salmon denotes false predictions. :param y_true: True outliers from the benchmarking dataset. :type y_true: pd.Series | np.ndarray :param y_pred: Predicted outliers from the statistical methods or ML models. :type y_pred: pd.Series | np.ndarray :returns: Matplotlib axes for additional external customization. :rtype: matplotlib.axes._axes.Axes .. rubric:: Example .. code-block:: df_eval_outlier_sd_dV = modval.evaluate_pred_outliers( df_benchmark=df_selected_cell, outlier_cycle_index=std_outlier_dV_index) axplot = modval.generate_confusion_matrix( y_true=np.array(df_eval_outlier_sd_dV["true_outlier"]), y_pred=np.array(df_eval_outlier_sd_dV["pred_outlier"])) fig_title=(r"SD on $\Delta V_\textrm{scaled,max,cyc}$\newline") axplot.set_title(fig_title + "\n", fontsize=16) plt.show() .. py:function:: eval_model_performance(model_name, selected_cell_label: str, df_eval_outliers: pandas.DataFrame) -> pandas.DataFrame Evaluate and summarize model performance metrics. This function computes model performance metrics (accuracy, precision, recall, F1-score, and Matthews correlation coefficient) using ground-truth and predicted outlier labels. It prints each metric to the console and returns the results as a one-row DataFrame for the specified model and cell. :param model_name: Name of the machine learning model being evaluated. :type model_name: str :param selected_cell_label: Identifier for the evaluated cell. :type selected_cell_label: str :param df_eval_outliers: DataFrame containing two columns: - ``true_outlier``: Ground-truth outlier labels. - ``pred_outlier``: Predicted outlier labels. :type df_eval_outliers: pd.DataFrame :returns: Single-row DataFrame with the evaluation metrics and metadata including ``ml_model`` and ``cell_index``. :rtype: pd.DataFrame .. rubric:: Example .. code-block:: df_current_eval_metrics = modval.eval_model_performance( model_name="iforest", selected_cell_label=selected_cell_label, df_eval_outliers=df_eval_outlier) .. note:: - Both ``true_outlier`` and ``pred_outlier`` must be binary labels where ``0`` = inlier and ``1`` = outlier.