osbad.viz ========= .. py:module:: osbad.viz .. autoapi-nested-parse:: The methods outlined in this module visualize cycle data with and without anomalies. .. code-block:: import osbad.viz as bviz Module Contents --------------- .. py:data:: ROOT_DIR .. py:data:: PATH_TO_ENV_VARIABLE .. py:data:: USE_LATEX .. py:data:: USE_LATEX :value: True .. py:function:: plot_cycle_data(xseries: pandas.Series, yseries: pandas.Series, cycle_index_series: pandas.Series, xoutlier: pandas.Series = None, youtlier: pandas.Series = None) -> matplotlib.axes._axes.Axes Create scatter plot for the cycling data including colormap, colorbar and the option to plot outliers. :param xseries: Data for x-axis (e.g. capacity data); :type xseries: pd.Series :param yseries: Data for y-axis (e.g. voltage data); :type yseries: pd.Series :param cycle_index_series: Data for cycle count; :type cycle_index_series: pd.Series :param xoutlier: Anomalous x-data. Defaults to None. :type xoutlier: pd.Series, optional :param youtlier: Anomalous y-data. Defaults to None. :type youtlier: pd.Series, optional :returns: Matplotlib axes for additional external customization. :rtype: mpl.axes._axes.Axes .. rubric:: Example .. code-block:: # Anomalous cycle has label = 1 # Normal cycle has label = 0 # true outliers from benchmarking dataset df_true_outlier = df_selected_cell_no_label[ df_selected_cell_no_label.cycle_index.isin( true_outlier_cycle_index)] # Plot normal cycles with true outliers axplot = bviz.plot_cycle_data( xseries=df_selected_cell_no_label["discharge_capacity"], yseries=df_selected_cell_no_label["voltage"], cycle_index_series=df_selected_cell_no_label["cycle_index"], xoutlier=df_true_outlier["discharge_capacity"], youtlier=df_true_outlier["voltage"]) axplot.set_xlabel( r"Discharge capacity [Ah]", fontsize=14) axplot.set_ylabel( r"Discharge voltage [V]", fontsize=14) axplot.set_title( f"Cell {selected_cell_label}", fontsize=16) plt.show() .. py:function:: hist_boxplot(df_variable: Union[pandas.Series, numpy.ndarray]) -> matplotlib.axes._axes.Axes Plot a combined boxplot and histogram of a feature. This function generates a two-part visualization for a given feature: a boxplot on the top (to show distribution, median, and potential outliers) and a histogram on the bottom (to show frequency distribution). Both plots share the same x-axis for easier comparison. :param df_variable: Input feature values as a pandas Series or NumPy array. :type df_variable: Union[pd.Series, np.ndarray] :returns: Matplotlib histogram axes. This allows additional external customization, such as setting labels or titles. :rtype: mpl.axes._axes.Axes .. rubric:: Example .. code-block:: # Plot the histogram and boxplot of the scaled data ax_hist = bviz.hist_boxplot( df_var=df_capacity_med_scaled["scaled_discharge_capacity"]) ax_hist.set_xlabel( r"Discharge capacity, $Q_\textrm{dis}$ [Ah]", fontsize=12) ax_hist.set_ylabel( r"Count", fontsize=12) plt.show() .. py:function:: scatterhist(xseries: pandas.Series, yseries: pandas.Series, cycle_index_series: pandas.Series, selected_cell_label=None) -> matplotlib.axes._axes.Axes Plot a scatter plot with marginal histograms for two features. This function creates a joint visualization consisting of: - A central scatter plot of two features, color-coded by cycle index. - A histogram of the x-series above the scatter plot. - A histogram of the y-series to the right of the scatter plot. The marginal histograms provide additional insight into the distributions of each variable, while the scatter plot shows their relationship. :param xseries: Data for the x-axis. :type xseries: pd.Series :param yseries: Data for the y-axis. :type yseries: pd.Series :param cycle_index_series: Series of cycle indices used for color mapping in the scatter plot. :type cycle_index_series: pd.Series :param selected_cell_label: Label for the cell, displayed as the plot title. Defaults to None. :type selected_cell_label: str, optional :returns: Matplotlib scatter plot axes. Enables further customization (e.g., labels, annotations). :rtype: mpl.axes._axes.Axes .. rubric:: Example .. code-block:: axplot = bviz.scatterhist( xseries=df_selected_cell_no_label["discharge_capacity"], yseries=df_selected_cell_no_label["voltage"], cycle_index_series=df_selected_cell_no_label["cycle_index"], selected_cell_label=selected_cell_label) axplot.set_xlabel( r"Capacity, $Q_\textrm{dis}$ [Ah]", fontsize=12) axplot.set_ylabel( r"Voltage, $V_\textrm{dis}$ [V]", fontsize=12) plt.show() .. py:function:: plot_explain_scaling(df_scaled_capacity: pandas.DataFrame, df_scaled_voltage: pandas.DataFrame, selected_cell_label: str, xoutlier: pandas.Series = None, youtlier: pandas.Series = None) Visualize statistical scaling transformations for a cell's cycles. This function creates a 2×3 grid of subplots to illustrate how scaling and statistical feature transformations are applied to cycling data. It plots original and scaled capacity-voltage curves, highlights detected outliers, and visualizes derived features such as median-square, IQR, and median/IQR ratio. A shared colorbar indicates cycle progression. :param df_scaled_capacity: DataFrame containing cycle-based capacity features. Must include the following columns: ``["discharge_capacity", "scaled_discharge_capacity", "median_square", "IQR", "median_square_IQR_ratio", "cycle_index"]``. :type df_scaled_capacity: pd.DataFrame :param df_scaled_voltage: DataFrame containing cycle-based voltage features. Must include the following columns: ``["voltage", "scaled_voltage", "median_square", "IQR", "median_square_IQR_ratio"]``. :type df_scaled_voltage: pd.DataFrame :param selected_cell_label: Identifier of the evaluated cell, used for titling the plots and naming the output file. :type selected_cell_label: str :param xoutlier: X-coordinates of outlier points to highlight in the voltage-capacity plot. Defaults to None. :type xoutlier: pd.Series, optional :param youtlier: Y-coordinates of outlier points to highlight in the voltage-capacity plot. Defaults to None. :type youtlier: pd.Series, optional :returns: The function saves the resulting figure as a PNG file and displays it. :rtype: None .. rubric:: Example .. code-block:: # Path to DuckDB file db_filepath = str( Path.cwd() .parent .joinpath("database","train_dataset_severson.db")) selected_cell_label = "2017-05-12_5_4C-70per_3C_CH17" # Import the BenchDB class # Load only the dataset based on the selected cell benchdb = BenchDB( db_filepath, selected_cell_label) # Extract true outliers cycle index from benchmarking dataset true_outlier_cycle_index = benchdb.get_true_outlier_cycle_index( df_selected_cell) # Anomalous cycle has label = 1 # Normal cycle has label = 0 # true outliers from benchmarking dataset df_true_outlier = df_selected_cell_without_labels[ df_selected_cell_without_labels.cycle_index.isin( true_outlier_cycle_index)] # Instantiate the CycleScaling class scaler = CycleScaling( df_selected_cell=df_selected_cell_without_labels) # Implement median IQR scaling on the discharge capacity data df_capacity_med_scaled = scaler.median_IQR_scaling( variable="discharge_capacity", validate=True) # Implement median IQR scaling on the discharge voltage data df_voltage_med_scaled = scaler.median_IQR_scaling( variable="voltage", validate=True) bviz.plot_explain_scaling( df_scaled_capacity=df_capacity_med_scaled, df_scaled_voltage=df_voltage_med_scaled, extracted_cell_label=selected_cell_label, xoutlier=df_true_outlier["discharge_capacity"], youtlier=df_true_outlier["voltage"] ) .. note:: - The plots include: 1. Raw capacity-voltage curve. 2. Scaled capacity-voltage curve. 3. Capacity-voltage curve with detected outliers. 4. Median-square transformation. 5. Interquartile range (IQR). 6. Median/IQR ratio. - All scatter plots are color-coded by ``cycle_index`` using a shared colorbar. - Figures are saved under ``fig_output/`` with a filename based on the cell label. .. py:function:: compare_hist_limits(df_variable, df_norm_variable, upper_limit, lower_limit) .. py:function:: plot_quantiles(xdata: pandas.Series | numpy.ndarray, ax: matplotlib.axes._axes.Axes, fit=False, validate=False) -> matplotlib.axes._axes.Axes Adapt the probplot method from scipy stats to create the probability plot of a selected feature so that the feature distribution can be compared to the theoretical quantiles of a normal distribution. :param xdata: Selected feature. :type xdata: pd.Series | np.ndarray :param ax: Matplotlib axes from a subplot. :type ax: mpl.axes._axes.Axes :param fit: If True, create a straight line fit through the probability plot. Defaults to False. :type fit: bool, optional :param validate: If True, compare adapted visualization method with scipy's implementation. Defaults to False. :type validate: bool, optional :returns: Matplotlib axes for additional external customization. :rtype: mpl.axes._axes.Axes .. Note:: The straight dotted line in the probability plot indicates a perfect fit to the normal distribution. If most data points fall approximately along the straight line, it implies that the feature are consistent with the normal distribution. Anomalies would appear as points far away from the main cluster and the straight line fit. If points deviate significantly in the tails, this suggests heavier tails compared to the theoretical normal distribution. .. rubric:: Example .. code-block:: fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 6)) ax1 = bviz.plot_quantiles( xdata=df_max_dV["max_diff"], ax=ax1, fit=True, validate=False) ax1.set_title( "Normality check before removing outliers") ax2 = bviz.plot_quantiles( xdata=df_max_dV_2nd_iter["max_diff"], ax=ax2, fit=True, validate=False) ax2.set_title( "Normality check after removing detected outliers") plt.show() .. py:function:: plot_histogram_with_distribution_fit(df_variable: Union[pandas.Series, numpy.ndarray], method='norm') -> matplotlib.axes._axes.Axes Plot a histogram of feature values with fitted distribution overlay. This function visualizes the distribution of a selected feature by plotting its histogram and overlaying a fitted probability density function (PDF). The feature can be fitted with either a normal or lognormal distribution, based on the ``method`` argument. :param df_variable: Input feature data. :type df_variable: pd.Series | np.ndarray :param method: Distribution type for fitting. Must be: - ``norm``: Fit using ``scipy.stats.norm.fit`` and overlay a normal distribution. - ``lognorm``: Fit using ``scipy.stats.lognorm.fit`` and overlay a lognormal distribution. Defaults to ``"norm"``. :type method: str, optional :returns: Matplotlib axes object containing the histogram and fitted distribution plot. Can be customized further externally. :rtype: mpl.axes._axes.Axes .. rubric:: Example .. code-block:: # Plot with normal fit axplot = bviz.plot_histogram_with_distribution_fit( df_variable=df_max_dV_2nd_iter["max_diff"], method="norm") axplot.set_xlabel( r"$\Delta V_\textrm{scaled,max,cyc}\;\textrm{[V]}$", fontsize=12) axplot.set_ylabel('Probability', fontsize=12) plt.show() # Plot with lognormal fit axplot = bviz.plot_histogram_with_distribution_fit( df_variable=np.array(df_max_dQ_2nd_iter["max_diff"]), method="lognorm") axplot.set_xlabel( r"$\Delta Q_\textrm{scaled,max,cyc}\;\textrm{[V]}$", fontsize=12) axplot.set_ylabel('Probability', fontsize=12) plt.show() .. note:: - Histogram uses ``bins="auto"`` for automatic bin width selection, balancing performance across distributions. - The histogram is normalized (``density=True``) so that the total area integrates to 1, matching the fitted PDF. - For ``"norm"`` fitting, see ``scipy.stats.norm.fit``. - For ``"lognorm"`` fitting, see ``scipy.stats.lognorm.fit``. .. py:function:: calculate_bubble_size_ratio(df_variable: pandas.Series | numpy.ndarray) -> pandas.Series Calculate the bubble size of the feature in the bubble plot depending on the anomaly score by using the feature standardization method. :param df_variable: Selected feature. :type df_variable: pd.Series | np.ndarray :returns: Calculated bubble size of the feature. :rtype: pd.Series .. rubric:: Example .. code-block:: df_bubble_size_dQ = bviz.calculate_bubble_size_ratio( df_variable=df_max_dQ["max_diff_dQ"]) df_bubble_size_dV = bviz.calculate_bubble_size_ratio( df_variable=df_max_dV["max_diff"]) .. py:function:: plot_bubble_chart(xseries: pandas.Series, yseries: pandas.Series, bubble_size: numpy.ndarray | pandas.Series, unique_cycle_count: numpy.ndarray | pandas.Series = None, cycle_outlier_idx_label: numpy.ndarray = None, square_grid: bool = False) -> matplotlib.axes._axes.Axes Plot the bubble chart of each feature with scalable bubble size ratio depending on the anomaly score. :param xseries: Data to be plotted on the x-axis of the bubble chart. :type xseries: pd.Series :param yseries: Data to be plotted on the y-axis of the bubble chart. :type yseries: pd.Series :param bubble_size: Calculated bubble size depending on the anomaly score. :type bubble_size: np.ndarray|pd.Series :param unique_cycle_count: Unique cycle count of the selected cell. Defaults to None. :type unique_cycle_count: np.ndarray|pd.Series, optional :param cycle_outlier_idx_label: The index of anomalous cycles. Defaults to None. :type cycle_outlier_idx_label: np.ndarray, optional :param square_grid: Define square grid with equal distance for x-axis and y-axis. Defaults to False. :type square_grid: bool, optional :returns: Matplotlib axes for additional external customization. :rtype: mpl.axes._axes.Axes .. code-block:: # Plot the bubble chart and label the outliers axplot = bviz.plot_bubble_chart( xseries=df_features_per_cell["log_max_diff_dQ"], yseries=df_features_per_cell["log_max_diff_dV"], bubble_size=bubble_size, unique_cycle_count=unique_cycle_count, cycle_outlier_idx_label=true_outlier_cycle_index, square_grid=True) axplot.set_title( f"Cell {selected_cell_label}", fontsize=13) axplot.set_xlabel( r"$\log(\Delta Q_\textrm{scaled,max,cyc)}\;\textrm{[Ah]}$", fontsize=12) axplot.set_ylabel( r"$\log(\Delta V_\textrm{scaled,max,cyc})\;\textrm{[V]}$", fontsize=12) output_fig_filename = ( "log_bubble_plot_" + selected_cell_label + ".png") fig_output_path = ( selected_cell_artifacts_dir.joinpath(output_fig_filename)) plt.savefig( fig_output_path, dpi=200, bbox_inches="tight") plt.show() .. py:function:: plot_multiple_outlier_cycles(df_selected_cell: pandas.DataFrame, potential_outlier_cycles: list, selected_cell_label: str) -> None Plot and annotate multiple potential outlier cycles. This function creates a grid of subplots, each highlighting one of the potential outlier cycles from the given cell dataset. Cycling data for all cycles is shown with a colormap, and the specified outlier cycles are annotated with text boxes. A shared colorbar indicates the cycle index range. The figure is saved to the cell’s artifacts directory and displayed interactively. :param df_selected_cell: Cycling dataset for the cell, containing ``discharge_capacity``, ``voltage`` and ``cycle_index``. :type df_selected_cell: pd.DataFrame :param potential_outlier_cycles: List of cycle indices to be highlighted as potential outliers. :type potential_outlier_cycles: list :param selected_cell_label: Identifier of the evaluated cell, used in the plot title and output filename. :type selected_cell_label: str :returns: The function saves and displays the generated plot. :rtype: None .. rubric:: Example .. code-block:: # Get the cell-ID from cell_inventory selected_cell_label = "2017-05-12_5_4C-70per_3C_CH17" bviz.plot_multiple_outliers( df_selected_cell, potential_outlier_cycles= [0, 40, 147, 148], selected_cell_label=selected_cell_label) .. py:function:: plot_single_outlier_cycle(df_selected_cell: pandas.DataFrame, selected_cycle_index: int, selected_cell_label: str) Plot and annotate a single cycle as a potential outlier. This function visualizes the voltage vs. discharge capacity for the selected cell and highlights one cycle specified by its cycle index. The chosen cycle is annotated as a potential outlier on the plot. The figure is saved to the artifacts directory for the selected cell and displayed interactively. :param df_selected_cell: Cycling dataset for the cell, containing ``discharge_capacity``, ``voltage`` and ``cycle index``. :type df_selected_cell: pd.DataFrame :param selected_cycle_index: Cycle index to highlight as a potential outlier. :type selected_cycle_index: int :param selected_cell_label: Identifier of the evaluated cell, used in the plot title and output filename. :type selected_cell_label: str :returns: The function saves and displays the generated plot. :rtype: None .. rubric:: Example .. code-block:: # Get the cell-ID from cell_inventory selected_cell_label = "2017-05-12_5_4C-70per_3C_CH17" bviz.plot_single_outlier( df_selected_cell, selected_cycle_index=147, selected_cell_label=selected_cell_label)