osbad.scaler ============ .. py:module:: osbad.scaler .. autoapi-nested-parse:: The methods outlined in this module implement statistical feature transformations before passing the input features into the anomaly detection methods in this benchmarking study. .. code-block:: from osbad.scaler import CycleScaling Module Contents --------------- .. py:class:: CycleScaling(df_selected_cell: fireducks.pandas.DataFrame) Implement statistical feature transformation methods on the selected dataframe. .. code-block:: # Only select the relevant features for the models while excluding # the true labels from the benchmarking dataset. df_selected_cell_without_labels = df_selected_cell[ ["cell_index", "cycle_index", "discharge_capacity", "voltage"]].reset_index(drop=True) # Instantiate the CycleScaling class for dataset without labels scaler = CycleScaling( df_selected_cell=df_selected_cell_without_labels) .. note:: True labels are stored in the dataframe as ``df_selected_cell["outlier"]``. .. py:method:: median_IQR_scaling(variable: str, validate: bool = False) -> fireducks.pandas.DataFrame Implement median-IQR-scaling on the selected feature from the dataframe to help with the marginal histogram separation of abnormal cycles from normal cycles. :param variable: Variable or feature to implement with the median-IQR-scaling method. :type variable: str :param validate: Validate and visually inspect if the scaling are performed correctly. If True, this method will return additional columns with intermediate calculation step results. Defaults to False. :type validate: bool, optional :returns: Scaled variable with the corresponding cycle index. :rtype: pd.DataFrame Example:: # Instantiate the CycleScaling class scaler = CycleScaling( df_selected_cell=df_selected_cell_without_labels) # Implement median IQR scaling on the discharge capacity data df_capacity_med_scaled = scaler.median_IQR_scaling( variable="discharge_capacity", validate=True) .. py:method:: calculate_max_diff_per_cycle(df_scaled: fireducks.pandas.DataFrame, variable_name: str) -> fireducks.pandas.DataFrame Calculate the maximum feature difference per cycle to transform collective anomalies of a given cycle into cycle-wise point anomalies. If continuous abnormal voltage and current measurements are recorded in a cycle, the specific cycle will be labelled as an anomalous cycle. :param df_scaled: The dataframe with scaled feature. :type df_scaled: pd.DataFrame :param variable_name: Name of the feature or variable in the dataframe. :type variable_name: str :returns: Maximum feature difference per cycle with the corresponding cycle index. :rtype: pd.DataFrame .. Note:: While the cycle index at the beginning may be the same as the natural index of the dataframe, do not use the natural index of the dataframe to label the cycle number. This is because the natural index may change if some anomalous cycless are removed from the dataframe. Example:: # maximum scaled capacity difference per cycle df_max_dQ = scaler.calculate_max_diff_per_cycle( df_scaled=df_capacity_med_scaled, variable_name="scaled_discharge_capacity") # maximum scaled voltage difference per cycle df_max_dV = scaler.calculate_max_diff_per_cycle( df_scaled=df_voltage_med_scaled, variable_name="scaled_voltage") .. py:method:: calculate_max_feature_derivative_per_cycle(Xfeature: fireducks.pandas.Series, Yfeature: fireducks.pandas.Series, cycle_index: fireducks.pandas.Series) -> fireducks.pandas.DataFrame Calculate the derivative of Yfeature and Xfeature (dYdX) :param Xfeature: Feature to be considered as denominator. :type Xfeature: pd.Series :param Yfeature: Feature to be considered as numerator. :type Yfeature: pd.Series :param cycle_index: Cycle index of selected cell. :type cycle_index: pd.Series :returns: Calculate max feature derivative (dYdX) per cycle. :rtype: pd.DataFrame