:py:mod:`bluecast.evaluation.error_analysis_regression` ======================================================= .. py:module:: bluecast.evaluation.error_analysis_regression .. autoapi-nested-parse:: Module for regression error analysis with DuckDB backend. Enhanced error analysis for regression tasks with DuckDB for better performance and analytics capabilities. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: bluecast.evaluation.error_analysis_regression.DuckDBRegressionErrorAnalysisEngine bluecast.evaluation.error_analysis_regression.OutOfFoldDataReaderRegression bluecast.evaluation.error_analysis_regression.OutOfFoldDataReaderRegressionCV bluecast.evaluation.error_analysis_regression.ErrorAnalyserRegressionMixin bluecast.evaluation.error_analysis_regression.ErrorDistributionRegressionPlotterMixin bluecast.evaluation.error_analysis_regression.ErrorAnalyserRegression bluecast.evaluation.error_analysis_regression.ErrorAnalyserRegressionCV .. py:class:: DuckDBRegressionErrorAnalysisEngine(db_path: Optional[str] = None) DuckDB-based engine for regression error analysis with enhanced analytics. .. py:method:: _init_database() -> None Initialize database schema for regression error analysis. .. py:method:: load_regression_data(df: pandas.DataFrame, experiment_id: str, target_column: str) -> None Load regression error analysis data into DuckDB. :param df: DataFrame with predictions and features :param experiment_id: Unique identifier for this experiment :param target_column: Name of the target column .. py:method:: compute_regression_statistics(experiment_id: str) -> Dict[str, pandas.DataFrame] Compute comprehensive regression error statistics. :param experiment_id: Experiment identifier :return: Dictionary of statistical DataFrames .. py:method:: create_regression_visualizations(experiment_id: str) -> Dict[str, plotly.graph_objects.Figure] Create comprehensive regression error visualizations using Plotly. :param experiment_id: Experiment identifier :return: Dictionary of Plotly figures .. py:method:: close() -> None Close database connection and cleanup temporary files if created. .. py:class:: OutOfFoldDataReaderRegression(bluecast_instance: bluecast.blueprints.cast_regression.BlueCastRegression) Bases: :py:obj:`bluecast.evaluation.base_classes.DataReader` Abstract class to define error reading out of fold datasets from BlueCast pipelines. .. py:method:: read_data_from_bluecast_instance() -> polars.DataFrame Read out of fold datasets from defined storage location. :return: Out of fold dataset. .. py:method:: read_data_from_bluecast_cv_instance() -> polars.DataFrame Function to fail when called. Please use read_data_from_bluecast_instance instead. :return: Will raise an error. .. py:class:: OutOfFoldDataReaderRegressionCV(bluecast_instance: bluecast.blueprints.cast_cv_regression.BlueCastCVRegression) Bases: :py:obj:`bluecast.evaluation.base_classes.DataReader` Abstract class to define error reading out of fold datasets from BlueCast pipelines. .. py:method:: read_data_from_bluecast_instance() -> polars.DataFrame Function to fail when called. Please use read_data_from_bluecast_cv_instance instead. :return: Will raise an error. .. py:method:: read_data_from_bluecast_cv_instance() -> polars.DataFrame Read out of fold datasets from defined storage location for CV regression. :return: Combined out of fold dataset. .. py:class:: ErrorAnalyserRegressionMixin Bases: :py:obj:`bluecast.evaluation.base_classes.ErrorAnalyser` Abstract class to define the analysis of prediction errors on out of fold datasets .. py:method:: analyse_errors(df: Union[pandas.DataFrame, polars.DataFrame], descending: bool = True, target_column: str = 'target_quantiles') -> polars.DataFrame Enhanced regression error analysis using DuckDB for better insights. :param df: Preprocessed out of fold DataFrame. :param descending: Bool indicating if errors shall be ordered descending in final DataFrame. :return: Polars DataFrame with enhanced error analysis results. .. py:class:: ErrorDistributionRegressionPlotterMixin(ignore_columns_during_visualization: Optional[List[str]] = None) Bases: :py:obj:`bluecast.evaluation.base_classes.ErrorDistributionPlotter` Abstract class to define the plots for error analysis .. py:method:: plot_error_distributions(df: polars.DataFrame, target_column: str = 'target_quantiles') Enhanced error distribution plotting for regression using Plotly. .. py:class:: ErrorAnalyserRegression(bluecast_instance: bluecast.blueprints.cast_regression.BlueCastRegression, ignore_columns_during_visualization=None) Bases: :py:obj:`OutOfFoldDataReaderRegression`, :py:obj:`bluecast.evaluation.base_classes.ErrorPreprocessor`, :py:obj:`ErrorAnalyserRegressionMixin`, :py:obj:`ErrorDistributionRegressionPlotterMixin` Abstract class to define error reading out of fold datasets from BlueCast pipelines. .. py:method:: stack_predictions_by_class(df: polars.DataFrame) -> polars.DataFrame Add additional column with binned target. :param df: Polars DataFrame with original targets. :return: Polars DataFrame with additional binned targets column. .. py:method:: calculate_errors(df: Union[pandas.DataFrame, polars.DataFrame]) -> polars.DataFrame Analyse errors of predictions on out of fold data. :param df: DataFrame holding out of fold data and predictions. :return: Polars DataFrame with additional 'prediction_error' column. .. py:method:: analyse_segment_errors() -> polars.DataFrame Enhanced pipeline for regression error analysis with DuckDB backend. Reads the out of fold datasets from the output location defined in the training config inside the provided BlueCast instance, preprocess the data and calculate errors for all subsegments of the data. Numerical columns will be split into quantiles to get subsegments. :return: Polars DataFrame with subsegments and errors. .. py:class:: ErrorAnalyserRegressionCV(bluecast_instance: bluecast.blueprints.cast_cv_regression.BlueCastCVRegression, ignore_columns_during_visualization=None) Bases: :py:obj:`OutOfFoldDataReaderRegressionCV`, :py:obj:`bluecast.evaluation.base_classes.ErrorPreprocessor`, :py:obj:`ErrorAnalyserRegressionMixin`, :py:obj:`ErrorDistributionRegressionPlotterMixin` Abstract class to define error reading out of fold datasets from BlueCast pipelines. .. py:method:: stack_predictions_by_class(df: polars.DataFrame) -> polars.DataFrame Add additional column with binned target. :param df: Polars DataFrame with original targets. :return: Polars DataFrame with additional binned targets column. .. py:method:: calculate_errors(df: Union[pandas.DataFrame, polars.DataFrame]) Analyse errors of predictions on out of fold data. :param df: DataFrame holding out of fold data and predictions. :return: Polars DataFrame with additional 'prediction_error' column. .. py:method:: analyse_segment_errors() -> polars.DataFrame Enhanced pipeline for regression error analysis with DuckDB backend. Reads the out of fold datasets from the output location defined in the training config inside the provided BlueCast instance, preprocess the data and calculate errors for all subsegments of the data. Numerical columns will be split into quantiles to get subsegments. :return: Polars DataFrame with subsegments and errors.