osbad.database¶
Utilities for loading and visualizing benchmarking data from DuckDB.
This module defines BenchDB, a helper for accessing battery cell
datasets stored in DuckDB, removing labels for modeling, extracting ground-
truth outlier indices, and plotting cycling curves. Figures are saved to a
per-cell artifacts directory under bconf.PIPELINE_OUTPUT_DIR and may be
shown interactively if bconf.SHOW_FIG_STATUS is enabled.
- Key features:
load_benchmark_dataset: Load the training or test dataset for the selected cell from a DuckDB database (including the label column).drop_labels: Remove theoutlierlabel column and, optionally, keep only a specified subset of columns.get_true_outlier_cycle_index: Retrieve the cycle indices labeled as anomalous (outlier == 1) for the selected cell.plot_cycle_data: Plot discharge voltage vs. capacity cycles; optionally highlight and annotate known anomalous cycles. Saves the figure to the cell’s artifacts directory and can display it.load_features_db: Load precomputed features (train or test) for the selected cell from a DuckDB features database.
Example
from osbad.database import BenchDB
Module Contents¶
- class osbad.database.BenchDB(input_db_filepath: str, cell_label: str)¶
Load and analyze benchmarking datasets for a single cell.
The
BenchDBclass provides utilities for accessing and managing benchmarking datasets stored in DuckDB. It supports loading training and test datasets, extracting features, removing labels, plotting cycling data, and retrieving ground-truth outlier indices. Figures are saved to a per-cell artifacts directory, with optional display enabled via configuration.- Parameters:
input_db_filepath (str) – Path to the DuckDB benchmarking database file.
cell_label (str) – Label of the cell to be analyzed.
- load_benchmark_dataset(dataset_type='train') pandas.DataFrame¶
Load benchmarking dataset for the selected cell from DuckDB.
This function connects to the DuckDB benchmarking database, loads either the training or test dataset, and filters the records for the selected cell label. The resulting DataFrame contains all cycles for that cell, including true outlier labels.
- Parameters:
dataset_type (str, optional) – Type of dataset to load. Must be one of: -
"train": Load training dataset fromdf_train_dataset_sv. -"test": Load test dataset fromdf_test_dataset_sv. Defaults to"train".- Returns:
Benchmarking dataset filtered for the selected cell label.
- Return type:
pd.DataFrame
- Raises:
AssertionError – If the selected cell label is not found in the database.
Example
# Path to the DuckDB instance: # "train_dataset_severson.db" db_filepath = ( Path.cwd() .parent.parent .joinpath("database","train_dataset_severson.db")) # Get the cell-ID from cell_inventory selected_cell_label = "2017-05-12_5_4C-70per_3C_CH17" # Import the BenchDB class # Load only the dataset based on the selected cell benchdb = BenchDB( db_filepath, selected_cell_label) # load the benchmarking dataset df_selected_cell = benchdb.load_benchmark_dataset( dataset_type="train")
- drop_labels(df_selected_cell: pandas.DataFrame, filter_col: list = None) pandas.DataFrame¶
Remove true outlier labels from a cell cycling dataset.
This function drops the
outliercolumn from the benchmarking dataset for the selected cell. Optionally, it can also filter the dataset to retain only the specified columns.- Parameters:
df_selected_cell (pd.DataFrame) – Input cycling dataset for a single cell, including the
outlierlabel column.filter_col (list, optional) – List of column names to retain after dropping labels. If None, all remaining columns are returned. Defaults to None.
- Returns:
DataFrame without the
outliercolumn. Iffilter_colis provided, only the specified columns are kept.- Return type:
pd.DataFrame
Example
# Path to the DuckDB instance: "train_dataset_severson.db" db_filepath = ( Path.cwd() .parent.parent .joinpath("database","train_dataset_severson.db")) print(db_filepath) # Import the BenchDB class # Load only the dataset based on the selected cell benchdb = BenchDB( db_filepath, selected_cell_label) # load the benchmarking dataset df_selected_cell = benchdb.load_benchmark_dataset( dataset_type="train") if df_selected_cell is not None: filter_col = [ "cell_index", "cycle_index", "discharge_capacity", "voltage"] # Drop true labels from the benchmarking dataset # and filter for selected columns only benchdb.drop_labels( df_selected_cell, filter_col)
- get_true_outlier_cycle_index(df_selected_cell: pandas.DataFrame) numpy.ndarray¶
Extract true outlier labels from the benchmarking dataset.
- Parameters:
df_selected_cell (pd.DataFrame) – Cell cycling dataset based on selected cell index.
- Returns:
True outliers labels from the benchmarking dataset.
- Return type:
np.ndarray
Example
# Extract true outliers cycle index # from benchmarking dataset true_outlier_cycle_idx = battdb.get_true_outlier_cycle_index( df_selected_cell) print(f"True outlier cycle index: {true_outlier_cycle_idx}")
- plot_cycle_data(df_selected_cell_without_labels: pandas.DataFrame, true_outlier_cycle_index: list = None)¶
Visualize discharge voltage vs. capacity cycles for a cell.
This function plots cycling data for the selected cell. If a list of true outlier cycle indices is provided, those cycles are highlighted and annotated on the plot. Otherwise, no annotations of anomalies will be shown. The figure is saved in the cell’s artifacts directory.
- Parameters:
df_selected_cell_without_labels (pd.DataFrame) – DataFrame containing discharge capacity, voltage, and cycle index for the selected cell (without labels).
true_outlier_cycle_index (list, optional) – List of cycle indices known to be anomalous. If None, cycles are plotted without outlier highlights. Defaults to None.
- Returns:
Axes object containing the cycle plot.
- Return type:
matplotlib.axes.Axes
Example
# Plot cell data with true anomalies # If the true outlier cycle index is not known, # cycling data will be plotted without labels benchdb.plot_cycle_data( df_selected_cell_without_labels, true_outlier_cycle_index)
- load_features_db(db_features_filepath: pathlib.PosixPath | str, dataset_type: str)¶
Load features for the selected cell from a DuckDB database.
This function connects to a DuckDB database containing precomputed features for battery cells. It supports loading either training or test datasets, filters the data for the selected cell label, and returns the resulting feature DataFrame.
- Parameters:
db_features_filepath (Union[pathlib.PosixPath, str]) – Path to the DuckDB features database file.
dataset_type (str) – Type of dataset to load. Must be one of: -
"train": Load training features fromdf_train_features_sv. -"test": Load test features fromdf_test_features_sv.
- Returns:
DataFrame containing features for the selected cell.
- Return type:
pd.DataFrame
- Raises:
AssertionError – If the selected cell label is not found in the database.
Example
# Define the filepath to ``train_features_severson.db`` # DuckDB instance. db_features_filepath = ( Path.cwd() .parent.parent .joinpath("database","train_features_severson.db")) # Load only the training features dataset df_features_per_cell = benchdb.load_features_db( db_features_filepath, dataset_type="train") unique_cycle_count = ( df_features_per_cell["cycle_index"].unique())