`bluecast.evaluation.error_analysis`¶

Module for error analysis with DuckDB backend.

This step follows the training step. Ideally it uses stored out of fold datasets from using the ‘fit_eval’ methods. Enhanced with DuckDB for better performance and analytics capabilities.

Module Contents¶

Classes¶

`DuckDBErrorAnalysisEngine`	DuckDB-based engine for error analysis providing enhanced analytics capabilities.
`OutOfFoldDataReader`
`OutOfFoldDataReaderCV`
`ErrorAnalyserClassificationMixin`
`ErrorDistributionPlotterMixin`
`ErrorAnalyserClassification`
`ErrorAnalyserClassificationCV`

class bluecast.evaluation.error_analysis.DuckDBErrorAnalysisEngine(db_path: str | None = None)¶

DuckDB-based engine for error analysis providing enhanced analytics capabilities.

_init_database() → None¶: Initialize database schema for error analysis.

load_data(df: pandas.DataFrame, experiment_id: str, target_column: str) → None¶

Load error analysis data into DuckDB.

Parameters:

df – DataFrame with predictions and features
experiment_id – Unique identifier for this experiment
target_column – Name of the target column

compute_error_statistics(experiment_id: str) → Dict[str, pandas.DataFrame]¶

Compute comprehensive error statistics.

Parameters:: experiment_id – Experiment identifier
Returns:: Dictionary of statistical DataFrames

create_error_visualizations(experiment_id: str, target_column: str = 'target_class') → Dict[str, plotly.graph_objects.Figure]¶

Create comprehensive error visualizations using Plotly.

Parameters:

experiment_id – Experiment identifier
target_column – Target column name

Returns:

Dictionary of Plotly figures

close() → None¶: Close database connection and cleanup temporary files if created.

class bluecast.evaluation.error_analysis.OutOfFoldDataReader(bluecast_instance: bluecast.blueprints.cast.BlueCast)¶

Bases: bluecast.evaluation.base_classes.DataReader

read_data_from_bluecast_instance() → polars.DataFrame¶

Read out of fold datasets from defined storage location.

Returns:: Out of fold dataset.

read_data_from_bluecast_cv_instance() → polars.DataFrame¶

Function to fail when called.

Please use read_data_from_bluecast_instance instead. :return: Will raise an error.

class bluecast.evaluation.error_analysis.OutOfFoldDataReaderCV(bluecast_instance: bluecast.blueprints.cast_cv.BlueCastCV)¶

Bases: bluecast.evaluation.base_classes.DataReader

read_data_from_bluecast_instance() → polars.DataFrame¶

Function to fail when called.

Please use read_data_from_bluecast_cv_instance instead. :return: Will raise an error.

read_data_from_bluecast_cv_instance() → polars.DataFrame¶

Read out of fold datasets from defined storage location.

Returns:: Out of fold dataset.

class bluecast.evaluation.error_analysis.ErrorAnalyserClassificationMixin¶

Bases: bluecast.evaluation.base_classes.ErrorAnalyser

analyse_errors(df: pandas.DataFrame | polars.DataFrame, descending: bool = True) → polars.DataFrame¶

Find mean absolute errors for all subsegments using DuckDB for enhanced analysis.

Parameters:

df – Preprocessed out of fold DataFrame.
descending – Bool indicating if errors shall be ordered descending in final DataFrame.

Returns:

Polars DataFrame with all subsegments and mean absolute error in each of them.

class bluecast.evaluation.error_analysis.ErrorDistributionPlotterMixin(ignore_columns_during_visualization: List[str] | None = None)¶

Bases: bluecast.evaluation.base_classes.ErrorDistributionPlotter

plot_error_distributions(df: polars.DataFrame, target_column: str = 'target_class')¶: Enhanced error distribution plotting using Plotly with better visualizations.

class bluecast.evaluation.error_analysis.ErrorAnalyserClassification(bluecast_instance: bluecast.blueprints.cast.BlueCast, ignore_columns_during_visualization=None)¶

Bases: OutOfFoldDataReader, bluecast.evaluation.base_classes.ErrorPreprocessor, ErrorAnalyserClassificationMixin, ErrorDistributionPlotterMixin

stack_predictions_by_class(df: polars.DataFrame) → polars.DataFrame¶

Stack class predictions into a long format.

BlueCast returns predictions for each class as separate columns. This function returns a DataFrame where all predictions are stacked as a single ‘prediction’ column. :param df: Polars DataFrame with wide predictions format. :return: Polars DataFrame with stacked predictions.

calculate_errors(df: pandas.DataFrame | polars.DataFrame) → polars.DataFrame¶

Analyse errors of predictions on out of fold data.

Parameters:: df – DataFrame holding out of fold data and predictions.
Returns:: Polars DataFrame with additional ‘prediction_error’ column.

analyse_segment_errors() → polars.DataFrame¶

Enhanced pipeline for error analysis with DuckDB backend.

Reads the out of fold datasets from the output location defined in the training config inside the provided BlueCast instance, preprocess the data and calculate errors for all subsegments of the data. Numerical columns will be split into quantiles to get subsegments. :return: Polars DataFrame with subsegments and errors.

class bluecast.evaluation.error_analysis.ErrorAnalyserClassificationCV(bluecast_instance: bluecast.blueprints.cast_cv.BlueCastCV, ignore_columns_during_visualization=None)¶

Bases: OutOfFoldDataReaderCV, bluecast.evaluation.base_classes.ErrorPreprocessor, ErrorAnalyserClassificationMixin, ErrorDistributionPlotterMixin

stack_predictions_by_class(df: polars.DataFrame) → polars.DataFrame¶

Stack class predictions into a long format.

BlueCast returns predictions for each class as separate columns. This function returns a DataFrame where all predictions are stacked as a single ‘prediction’ column. :param df: Polars DataFrame with wide predictions format. :return: Polars DataFrame with stacked predictions.

calculate_errors(df: pandas.DataFrame | polars.DataFrame) → polars.DataFrame¶

Analyse errors of predictions on out of fold data.

Parameters:: df – DataFrame holding out of fold data and predictions.
Returns:: Polars DataFrame with additional ‘prediction_error’ column.

analyse_segment_errors() → polars.DataFrame¶

Enhanced pipeline for error analysis with DuckDB backend.

Reads the out of fold datasets from the output location defined in the training config inside the provided BlueCast instance, preprocess the data and calculate errors for all subsegments of the data. Numerical columns will be split into quantiles to get subsegments. :return: Polars DataFrame with subsegments and errors.

`bluecast.evaluation.error_analysis`¶

Module Contents¶

Classes¶

BlueCast

Navigation

Related Topics

bluecast.evaluation.error_analysis¶

Module Contents¶

Classes¶

`bluecast.evaluation.error_analysis`¶