osbad.scaler¶

The methods outlined in this module implement statistical feature transformations before passing the input features into the anomaly detection methods in this benchmarking study.

from osbad.scaler import CycleScaling

Module Contents¶

class osbad.scaler.CycleScaling(df_selected_cell: pandas.DataFrame)¶

Implement statistical feature transformation methods on the selected dataframe.

# Only select the relevant features for the models while excluding
# the true labels from the benchmarking dataset.
df_selected_cell_without_labels = df_selected_cell[
    ["cell_index",
     "cycle_index",
     "discharge_capacity",
     "voltage"]].reset_index(drop=True)

# Instantiate the CycleScaling class for dataset without labels
scaler = CycleScaling(
    df_selected_cell=df_selected_cell_without_labels)

Note

True labels are stored in the dataframe as df_selected_cell["outlier"].

median_IQR_scaling(variable: str, validate: bool = False) → pandas.DataFrame¶

Implement median-IQR-scaling on the selected feature from the dataframe to help with the marginal histogram separation of abnormal cycles from normal cycles.

Parameters:

variable (str) – Variable or feature to implement with the median-IQR-scaling method.
validate (bool, optional) – Validate and visually inspect if the scaling are performed correctly. If True, this method will return additional columns with intermediate calculation step results. Defaults to False.

Returns:

Scaled variable with the corresponding cycle index.

Return type:

pd.DataFrame

Example:

# Instantiate the CycleScaling class
scaler = CycleScaling(
    df_selected_cell=df_selected_cell_without_labels)

# Implement median IQR scaling on the discharge capacity data
df_capacity_med_scaled = scaler.median_IQR_scaling(
    variable="discharge_capacity",
    validate=True)

calculate_max_diff_per_cycle(df_scaled: pandas.DataFrame, variable_name: str) → pandas.DataFrame¶

Calculate the maximum feature difference per cycle to transform collective anomalies of a given cycle into cycle-wise point anomalies. If continuous abnormal voltage and current measurements are recorded in a cycle, the specific cycle will be labelled as an anomalous cycle.

Parameters:

df_scaled (pd.DataFrame) – The dataframe with scaled feature.
variable_name (str) – Name of the feature or variable in the dataframe.

Returns:

Maximum feature difference per cycle with the corresponding cycle index.

Return type:

pd.DataFrame

Note

While the cycle index at the beginning may be the same as the natural index of the dataframe, do not use the natural index of the dataframe to label the cycle number. This is because the natural index may change if some anomalous cycless are removed from the dataframe.

Example:

# maximum scaled capacity difference per cycle
df_max_dQ = scaler.calculate_max_diff_per_cycle(
    df_scaled=df_capacity_med_scaled,
    variable_name="scaled_discharge_capacity")

# maximum scaled voltage difference per cycle
df_max_dV = scaler.calculate_max_diff_per_cycle(
    df_scaled=df_voltage_med_scaled,
    variable_name="scaled_voltage")

calculate_max_feature_derivative_per_cycle(Xfeature: pandas.Series, Yfeature: pandas.Series, cycle_index: pandas.Series) → pandas.DataFrame¶

Calculate the derivative of Yfeature and Xfeature (dYdX)

Parameters:

Xfeature (pd.Series) – Feature to be considered as denominator.
Yfeature (pd.Series) – Feature to be considered as numerator.
cycle_index (pd.Series) – Cycle index of selected cell.

Returns:

Calculate max feature derivative (dYdX) per cycle.

Return type:

pd.DataFrame