bluecast.evaluation.error_analysis_regression

Module for regression error analysis with DuckDB backend.

Enhanced error analysis for regression tasks with DuckDB for better performance and analytics capabilities.

Module Contents

Classes

DuckDBRegressionErrorAnalysisEngine

DuckDB-based engine for regression error analysis with enhanced analytics.

OutOfFoldDataReaderRegression

Abstract class to define error reading out of fold datasets from BlueCast pipelines.

OutOfFoldDataReaderRegressionCV

Abstract class to define error reading out of fold datasets from BlueCast pipelines.

ErrorAnalyserRegressionMixin

Abstract class to define the analysis of prediction errors on out of fold datasets

ErrorDistributionRegressionPlotterMixin

Abstract class to define the plots for error analysis

ErrorAnalyserRegression

Abstract class to define error reading out of fold datasets from BlueCast pipelines.

ErrorAnalyserRegressionCV

Abstract class to define error reading out of fold datasets from BlueCast pipelines.

class bluecast.evaluation.error_analysis_regression.DuckDBRegressionErrorAnalysisEngine(db_path: str | None = None)

DuckDB-based engine for regression error analysis with enhanced analytics.

_init_database() None

Initialize database schema for regression error analysis.

load_regression_data(df: pandas.DataFrame, experiment_id: str, target_column: str) None

Load regression error analysis data into DuckDB.

Parameters:
  • df – DataFrame with predictions and features

  • experiment_id – Unique identifier for this experiment

  • target_column – Name of the target column

compute_regression_statistics(experiment_id: str) Dict[str, pandas.DataFrame]

Compute comprehensive regression error statistics.

Parameters:

experiment_id – Experiment identifier

Returns:

Dictionary of statistical DataFrames

create_regression_visualizations(experiment_id: str) Dict[str, plotly.graph_objects.Figure]

Create comprehensive regression error visualizations using Plotly.

Parameters:

experiment_id – Experiment identifier

Returns:

Dictionary of Plotly figures

close() None

Close database connection and cleanup temporary files if created.

class bluecast.evaluation.error_analysis_regression.OutOfFoldDataReaderRegression(bluecast_instance: bluecast.blueprints.cast_regression.BlueCastRegression)

Bases: bluecast.evaluation.base_classes.DataReader

Abstract class to define error reading out of fold datasets from BlueCast pipelines.

read_data_from_bluecast_instance() polars.DataFrame

Read out of fold datasets from defined storage location.

Returns:

Out of fold dataset.

read_data_from_bluecast_cv_instance() polars.DataFrame

Function to fail when called.

Please use read_data_from_bluecast_instance instead. :return: Will raise an error.

class bluecast.evaluation.error_analysis_regression.OutOfFoldDataReaderRegressionCV(bluecast_instance: bluecast.blueprints.cast_cv_regression.BlueCastCVRegression)

Bases: bluecast.evaluation.base_classes.DataReader

Abstract class to define error reading out of fold datasets from BlueCast pipelines.

read_data_from_bluecast_instance() polars.DataFrame

Function to fail when called.

Please use read_data_from_bluecast_cv_instance instead. :return: Will raise an error.

read_data_from_bluecast_cv_instance() polars.DataFrame

Read out of fold datasets from defined storage location for CV regression.

Returns:

Combined out of fold dataset.

class bluecast.evaluation.error_analysis_regression.ErrorAnalyserRegressionMixin

Bases: bluecast.evaluation.base_classes.ErrorAnalyser

Abstract class to define the analysis of prediction errors on out of fold datasets

analyse_errors(df: pandas.DataFrame | polars.DataFrame, descending: bool = True, target_column: str = 'target_quantiles') polars.DataFrame

Enhanced regression error analysis using DuckDB for better insights.

Parameters:
  • df – Preprocessed out of fold DataFrame.

  • descending – Bool indicating if errors shall be ordered descending in final DataFrame.

Returns:

Polars DataFrame with enhanced error analysis results.

class bluecast.evaluation.error_analysis_regression.ErrorDistributionRegressionPlotterMixin(ignore_columns_during_visualization: List[str] | None = None)

Bases: bluecast.evaluation.base_classes.ErrorDistributionPlotter

Abstract class to define the plots for error analysis

plot_error_distributions(df: polars.DataFrame, target_column: str = 'target_quantiles')

Enhanced error distribution plotting for regression using Plotly.

class bluecast.evaluation.error_analysis_regression.ErrorAnalyserRegression(bluecast_instance: bluecast.blueprints.cast_regression.BlueCastRegression, ignore_columns_during_visualization=None)

Bases: OutOfFoldDataReaderRegression, bluecast.evaluation.base_classes.ErrorPreprocessor, ErrorAnalyserRegressionMixin, ErrorDistributionRegressionPlotterMixin

Abstract class to define error reading out of fold datasets from BlueCast pipelines.

stack_predictions_by_class(df: polars.DataFrame) polars.DataFrame

Add additional column with binned target.

Parameters:

df – Polars DataFrame with original targets.

Returns:

Polars DataFrame with additional binned targets column.

calculate_errors(df: pandas.DataFrame | polars.DataFrame) polars.DataFrame

Analyse errors of predictions on out of fold data.

Parameters:

df – DataFrame holding out of fold data and predictions.

Returns:

Polars DataFrame with additional ‘prediction_error’ column.

analyse_segment_errors() polars.DataFrame

Enhanced pipeline for regression error analysis with DuckDB backend.

Reads the out of fold datasets from the output location defined in the training config inside the provided BlueCast instance, preprocess the data and calculate errors for all subsegments of the data. Numerical columns will be split into quantiles to get subsegments. :return: Polars DataFrame with subsegments and errors.

class bluecast.evaluation.error_analysis_regression.ErrorAnalyserRegressionCV(bluecast_instance: bluecast.blueprints.cast_cv_regression.BlueCastCVRegression, ignore_columns_during_visualization=None)

Bases: OutOfFoldDataReaderRegressionCV, bluecast.evaluation.base_classes.ErrorPreprocessor, ErrorAnalyserRegressionMixin, ErrorDistributionRegressionPlotterMixin

Abstract class to define error reading out of fold datasets from BlueCast pipelines.

stack_predictions_by_class(df: polars.DataFrame) polars.DataFrame

Add additional column with binned target.

Parameters:

df – Polars DataFrame with original targets.

Returns:

Polars DataFrame with additional binned targets column.

calculate_errors(df: pandas.DataFrame | polars.DataFrame)

Analyse errors of predictions on out of fold data.

Parameters:

df – DataFrame holding out of fold data and predictions.

Returns:

Polars DataFrame with additional ‘prediction_error’ column.

analyse_segment_errors() polars.DataFrame

Enhanced pipeline for regression error analysis with DuckDB backend.

Reads the out of fold datasets from the output location defined in the training config inside the provided BlueCast instance, preprocess the data and calculate errors for all subsegments of the data. Numerical columns will be split into quantiles to get subsegments. :return: Polars DataFrame with subsegments and errors.