osbad.database ============== .. py:module:: osbad.database .. autoapi-nested-parse:: Utilities for loading and visualizing benchmarking data from DuckDB. This module defines :class:`BenchDB`, a helper for accessing battery cell datasets stored in DuckDB, removing labels for modeling, extracting ground- truth outlier indices, and plotting cycling curves. Figures are saved to a per-cell artifacts directory under ``bconf.PIPELINE_OUTPUT_DIR`` and may be shown interactively if ``bconf.SHOW_FIG_STATUS`` is enabled. Key features: - ``load_benchmark_dataset``: Load the training or test dataset for the selected cell from a DuckDB database (including the label column). - ``drop_labels``: Remove the ``outlier`` label column and, optionally, keep only a specified subset of columns. - ``get_true_outlier_cycle_index``: Retrieve the cycle indices labeled as anomalous (``outlier == 1``) for the selected cell. - ``plot_cycle_data``: Plot discharge voltage vs. capacity cycles; optionally highlight and annotate known anomalous cycles. Saves the figure to the cell's artifacts directory and can display it. - ``load_features_db``: Load precomputed features (train or test) for the selected cell from a DuckDB features database. .. rubric:: Example .. code-block:: from osbad.database import BenchDB Module Contents --------------- .. py:data:: ROOT_DIR .. py:data:: PATH_TO_ENV_VARIABLE .. py:data:: USE_LATEX .. py:data:: USE_LATEX :value: True .. py:class:: BenchDB(input_db_filepath: str, cell_label: str) Load and analyze benchmarking datasets for a single cell. The ``BenchDB`` class provides utilities for accessing and managing benchmarking datasets stored in DuckDB. It supports loading training and test datasets, extracting features, removing labels, plotting cycling data, and retrieving ground-truth outlier indices. Figures are saved to a per-cell artifacts directory, with optional display enabled via configuration. :param input_db_filepath: Path to the DuckDB benchmarking database file. :type input_db_filepath: str :param cell_label: Label of the cell to be analyzed. :type cell_label: str .. py:method:: load_benchmark_dataset(dataset_type='train') -> pandas.DataFrame Load benchmarking dataset for the selected cell from DuckDB. This function connects to the DuckDB benchmarking database, loads either the training or test dataset, and filters the records for the selected cell label. The resulting DataFrame contains all cycles for that cell, including true outlier labels. :param dataset_type: Type of dataset to load. Must be one of: - ``"train"``: Load training dataset from ``df_train_dataset_sv``. - ``"test"``: Load test dataset from ``df_test_dataset_sv``. Defaults to ``"train"``. :type dataset_type: str, optional :returns: Benchmarking dataset filtered for the selected cell label. :rtype: pd.DataFrame :raises AssertionError: If the selected cell label is not found in the database. .. rubric:: Example .. code-block:: # Path to the DuckDB instance: "train_dataset_severson.db" # osbad/database/train_dataset_severson.db db_filepath = ( Path.cwd() .parent.parent.parent .joinpath("database","train_dataset_severson.db")) # Get the cell-ID from cell_inventory selected_cell_label = "2017-05-12_5_4C-70per_3C_CH17" # Import the BenchDB class # Load only the dataset based on the selected cell benchdb = BenchDB( db_filepath, selected_cell_label) # load the benchmarking dataset df_selected_cell = benchdb.load_benchmark_dataset( dataset_type="train") .. py:method:: drop_labels(df_selected_cell: pandas.DataFrame, filter_col: list = None) -> pandas.DataFrame Remove true outlier labels from a cell cycling dataset. This function drops the ``outlier`` column from the benchmarking dataset for the selected cell. Optionally, it can also filter the dataset to retain only the specified columns. :param df_selected_cell: Input cycling dataset for a single cell, including the ``outlier`` label column. :type df_selected_cell: pd.DataFrame :param filter_col: List of column names to retain after dropping labels. If None, all remaining columns are returned. Defaults to None. :type filter_col: list, optional :returns: DataFrame without the ``outlier`` column. If ``filter_col`` is provided, only the specified columns are kept. :rtype: pd.DataFrame .. rubric:: Example .. code-block:: # Import the BenchDB class # Load only the dataset based on the selected cell benchdb = BenchDB( db_filepath, selected_cell_label) # load the benchmarking dataset df_selected_cell = benchdb.load_benchmark_dataset( dataset_type="train") if df_selected_cell is not None: filter_col = [ "cell_index", "cycle_index", "discharge_capacity", "voltage"] # Drop true labels from the benchmarking dataset # and filter for selected columns only benchdb.drop_labels( df_selected_cell, filter_col) .. py:method:: get_true_outlier_cycle_index(df_selected_cell: pandas.DataFrame) -> numpy.ndarray Extract true outlier labels from the benchmarking dataset. :param df_selected_cell: Cell cycling dataset based on selected cell index. :type df_selected_cell: pd.DataFrame :returns: True outliers labels from the benchmarking dataset. :rtype: np.ndarray .. rubric:: Example .. code-block:: # Extract true outliers cycle index # from benchmarking dataset true_outlier_cycle_idx = battdb.get_true_outlier_cycle_index( df_selected_cell) print(f"True outlier cycle index: {true_outlier_cycle_idx}") .. py:method:: plot_cycle_data(df_selected_cell_without_labels: pandas.DataFrame, true_outlier_cycle_index: list = None) Visualize discharge voltage vs. capacity cycles for a cell. This function plots cycling data for the selected cell. If a list of true outlier cycle indices is provided, those cycles are highlighted and annotated on the plot. Otherwise, no annotations of anomalies will be shown. The figure is saved in the cell’s artifacts directory. :param df_selected_cell_without_labels: DataFrame containing discharge capacity, voltage, and cycle index for the selected cell (without labels). :type df_selected_cell_without_labels: pd.DataFrame :param true_outlier_cycle_index: List of cycle indices known to be anomalous. If None, cycles are plotted without outlier highlights. Defaults to None. :type true_outlier_cycle_index: list, optional :returns: Axes object containing the cycle plot. :rtype: matplotlib.axes.Axes .. rubric:: Example .. code-block:: # Plot cell data with true anomalies # If the true outlier cycle index is not known, # cycling data will be plotted without labels benchdb.plot_cycle_data( df_selected_cell_without_labels, true_outlier_cycle_index) .. py:method:: load_features_db(db_features_filepath: Union[pathlib.PosixPath, str], dataset_type: str) Load features for the selected cell from a DuckDB database. This function connects to a DuckDB database containing precomputed features for battery cells. It supports loading either training or test datasets, filters the data for the selected cell label, and returns the resulting feature DataFrame. :param db_features_filepath: Path to the DuckDB features database file. :type db_features_filepath: Union[pathlib.PosixPath, str] :param dataset_type: Type of dataset to load. Must be one of: - ``"train"``: Load training features from ``df_train_features_sv``. - ``"test"``: Load test features from ``df_test_features_sv``. :type dataset_type: str :returns: DataFrame containing features for the selected cell. :rtype: pd.DataFrame :raises AssertionError: If the selected cell label is not found in the database. .. rubric:: Example .. code-block:: # Define the filepath to ``train_features_severson.db`` # DuckDB instance. # osbad/database/train_features_severson.db db_features_filepath = ( Path.cwd() .parent.parent.parent .joinpath("database","train_features_severson.db")) # Load only the training features dataset df_features_per_cell = benchdb.load_features_db( db_features_filepath, dataset_type="train") unique_cycle_count = ( df_features_per_cell["cycle_index"].unique())