bluecast.evaluation.error_analysis

Module for error analysis.

This step follows the training step. Ideally it uses stored out of fold datasets from using the ‘fit_eval’ methods.

Module Contents

Classes

OutOfFoldDataReader

OutOfFoldDataReaderCV

ErrorAnalyserClassificationMixin

ErrorDistributionPlotterMixin

ErrorAnalyserClassification

ErrorAnalyserClassificationCV

class bluecast.evaluation.error_analysis.OutOfFoldDataReader(bluecast_instance: bluecast.blueprints.cast.BlueCast)

Bases: bluecast.evaluation.base_classes.DataReader

read_data_from_bluecast_instance() polars.DataFrame

Read out of fold datasetsfrom defined storage location.

Returns:

Out of fold dataset.

read_data_from_bluecast_cv_instance() polars.DataFrame

Function to fail when called.

Please use read_data_from_bluecast_instance instead. :return: Will raise an error.

class bluecast.evaluation.error_analysis.OutOfFoldDataReaderCV(bluecast_instance: bluecast.blueprints.cast_cv.BlueCastCV)

Bases: bluecast.evaluation.base_classes.DataReader

read_data_from_bluecast_instance() polars.DataFrame

Function to fail when called.

Please use read_data_from_bluecast_cv_instance instead. :return: Will raise an error.

read_data_from_bluecast_cv_instance() polars.DataFrame

Read out of fold datasets from defined storage location.

Returns:

Concatenated out of fold dataset.

class bluecast.evaluation.error_analysis.ErrorAnalyserClassificationMixin

Bases: bluecast.evaluation.base_classes.ErrorAnalyser

analyse_errors(df: pandas.DataFrame | polars.DataFrame, descending: bool = True) polars.DataFrame

Find mean absolute errors for all subsegments :param df: Preprocessed out of fold DataFrame. :param descending: Bool indicating if errors shall be ordered descending in final DataFrame. :return: Polars DataFrame with all subsegments and mean absolute error in each of them.

class bluecast.evaluation.error_analysis.ErrorDistributionPlotterMixin(ignore_columns_during_visualization: List[str] | None = None)

Bases: bluecast.evaluation.base_classes.ErrorDistributionPlotter

plot_error_distributions(df: polars.DataFrame, target_column: str = 'target_class')
class bluecast.evaluation.error_analysis.ErrorAnalyserClassification(bluecast_instance: bluecast.blueprints.cast.BlueCast, ignore_columns_during_visualization=None)

Bases: OutOfFoldDataReader, bluecast.evaluation.base_classes.ErrorPreprocessor, ErrorAnalyserClassificationMixin, ErrorDistributionPlotterMixin

stack_predictions_by_class(df: polars.DataFrame) polars.DataFrame

Stack class predictions into a long format.

BlueCast returns predictions for each class as separate columns. This function returns a DataFrame where all predictions are stacked as a single ‘prediction’ column. :param df: Polars DataFrame with wide predictions format. :return: Polars DataFrame with stacked predictions.

calculate_errors(df: pandas.DataFrame | polars.DataFrame) polars.DataFrame

Analyse errors of predictions on out of fold data.

Parameters:

df – DataFrame holding out of fold data and predictions.

Returns:

Polars DataFrame with additional ‘prediction_error’ column.

analyse_segment_errors() polars.DataFrame

Pipeline for error analysis.

Reads the out of fold datasets from the output location defined in the training config inside the provided BlueCast instance, preprocess the data and calculate errors for all subsegments of the data. Numerical columns will be split into quantiles to get subsegments. :return: Polars DataFrame with subsegments and errors.

class bluecast.evaluation.error_analysis.ErrorAnalyserClassificationCV(bluecast_instance: bluecast.blueprints.cast_cv.BlueCastCV, ignore_columns_during_visualization=None)

Bases: OutOfFoldDataReaderCV, bluecast.evaluation.base_classes.ErrorPreprocessor, ErrorAnalyserClassificationMixin, ErrorDistributionPlotterMixin

stack_predictions_by_class(df: polars.DataFrame) polars.DataFrame

Stack class predictions into a long format.

BlueCast returns predictions for each class as separate columns. This function returns a DataFrame where all predictions are stacked as a single ‘prediction’ column. :param df: Polars DataFrame with wide predictions format. :return: Polars DataFrame with stacked predictions.

calculate_errors(df: pandas.DataFrame | polars.DataFrame) polars.DataFrame

Analyse errors of predictions on out of fold data.

Parameters:

df – DataFrame holding out of fold data and predictions.

Returns:

Polars DataFrame with additional ‘prediction_error’ column.

analyse_segment_errors() polars.DataFrame

Pipeline for error analysis.

Reads the out of fold datasets from the output location defined in the training config inside the provided BlueCast instance, preprocess the data and calculate errors for all subsegments of the data. Numerical columns will be split into quantiles to get subsegments. :return: Polars DataFrame with subsegments and errors.