:py:mod:`bluecast.evaluation.error_analysis`
============================================

.. py:module:: bluecast.evaluation.error_analysis

.. autoapi-nested-parse::

   Module for error analysis with DuckDB backend.

   This step follows the training step. Ideally
   it uses stored out of fold datasets from using the 'fit_eval' methods.
   Enhanced with DuckDB for better performance and analytics capabilities.


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   bluecast.evaluation.error_analysis.DuckDBErrorAnalysisEngine
   bluecast.evaluation.error_analysis.OutOfFoldDataReader
   bluecast.evaluation.error_analysis.OutOfFoldDataReaderCV
   bluecast.evaluation.error_analysis.ErrorAnalyserClassificationMixin
   bluecast.evaluation.error_analysis.ErrorDistributionPlotterMixin
   bluecast.evaluation.error_analysis.ErrorAnalyserClassification
   bluecast.evaluation.error_analysis.ErrorAnalyserClassificationCV


.. py:class:: DuckDBErrorAnalysisEngine(db_path: Optional[str] = None)


   DuckDB-based engine for error analysis providing enhanced analytics capabilities.

   .. py:method:: _init_database() -> None

      Initialize database schema for error analysis.


   .. py:method:: load_data(df: pandas.DataFrame, experiment_id: str, target_column: str) -> None

      Load error analysis data into DuckDB.

      :param df: DataFrame with predictions and features
      :param experiment_id: Unique identifier for this experiment
      :param target_column: Name of the target column


   .. py:method:: compute_error_statistics(experiment_id: str) -> Dict[str, pandas.DataFrame]

      Compute comprehensive error statistics.

      :param experiment_id: Experiment identifier
      :return: Dictionary of statistical DataFrames


   .. py:method:: create_error_visualizations(experiment_id: str, target_column: str = 'target_class') -> Dict[str, plotly.graph_objects.Figure]

      Create comprehensive error visualizations using Plotly.

      :param experiment_id: Experiment identifier
      :param target_column: Target column name
      :return: Dictionary of Plotly figures


   .. py:method:: close() -> None

      Close database connection and cleanup temporary files if created.


.. py:class:: OutOfFoldDataReader(bluecast_instance: bluecast.blueprints.cast.BlueCast)


   Bases: :py:obj:`bluecast.evaluation.base_classes.DataReader`

   .. py:method:: read_data_from_bluecast_instance() -> polars.DataFrame

      Read out of fold datasets from defined storage location.

      :return: Out of fold dataset.


   .. py:method:: read_data_from_bluecast_cv_instance() -> polars.DataFrame

      Function to fail when called.

      Please use read_data_from_bluecast_instance instead.
      :return: Will raise an error.


.. py:class:: OutOfFoldDataReaderCV(bluecast_instance: bluecast.blueprints.cast_cv.BlueCastCV)


   Bases: :py:obj:`bluecast.evaluation.base_classes.DataReader`

   .. py:method:: read_data_from_bluecast_instance() -> polars.DataFrame

      Function to fail when called.

      Please use read_data_from_bluecast_cv_instance instead.
      :return: Will raise an error.


   .. py:method:: read_data_from_bluecast_cv_instance() -> polars.DataFrame

      Read out of fold datasets from defined storage location.

      :return: Out of fold dataset.


.. py:class:: ErrorAnalyserClassificationMixin


   Bases: :py:obj:`bluecast.evaluation.base_classes.ErrorAnalyser`

   .. py:method:: analyse_errors(df: Union[pandas.DataFrame, polars.DataFrame], descending: bool = True) -> polars.DataFrame

      Find mean absolute errors for all subsegments using DuckDB for enhanced analysis.

      :param df: Preprocessed out of fold DataFrame.
      :param descending: Bool indicating if errors shall be ordered descending in final DataFrame.
      :return: Polars DataFrame with all subsegments and mean absolute error in each of them.


.. py:class:: ErrorDistributionPlotterMixin(ignore_columns_during_visualization: Optional[List[str]] = None)


   Bases: :py:obj:`bluecast.evaluation.base_classes.ErrorDistributionPlotter`

   .. py:method:: plot_error_distributions(df: polars.DataFrame, target_column: str = 'target_class')

      Enhanced error distribution plotting using Plotly with better visualizations.


.. py:class:: ErrorAnalyserClassification(bluecast_instance: bluecast.blueprints.cast.BlueCast, ignore_columns_during_visualization=None)


   Bases: :py:obj:`OutOfFoldDataReader`, :py:obj:`bluecast.evaluation.base_classes.ErrorPreprocessor`, :py:obj:`ErrorAnalyserClassificationMixin`, :py:obj:`ErrorDistributionPlotterMixin`

   .. py:method:: stack_predictions_by_class(df: polars.DataFrame) -> polars.DataFrame

      Stack class predictions into a long format.

      BlueCast returns predictions for each class as separate columns. This function returns a DataFrame where
      all predictions are stacked as a single 'prediction' column.
      :param df: Polars DataFrame with wide predictions  format.
      :return: Polars DataFrame with stacked predictions.


   .. py:method:: calculate_errors(df: Union[pandas.DataFrame, polars.DataFrame]) -> polars.DataFrame

      Analyse errors of predictions on out of fold data.

      :param df: DataFrame holding out of fold data and predictions.
      :return: Polars DataFrame with additional 'prediction_error' column.


   .. py:method:: analyse_segment_errors() -> polars.DataFrame

      Enhanced pipeline for error analysis with DuckDB backend.

      Reads the out of fold datasets from the output location defined in the training config inside the provided
      BlueCast instance, preprocess the data and calculate errors for all subsegments of the data.
      Numerical columns will be split into quantiles to get subsegments.
      :return: Polars DataFrame with subsegments and errors.


.. py:class:: ErrorAnalyserClassificationCV(bluecast_instance: bluecast.blueprints.cast_cv.BlueCastCV, ignore_columns_during_visualization=None)


   Bases: :py:obj:`OutOfFoldDataReaderCV`, :py:obj:`bluecast.evaluation.base_classes.ErrorPreprocessor`, :py:obj:`ErrorAnalyserClassificationMixin`, :py:obj:`ErrorDistributionPlotterMixin`

   .. py:method:: stack_predictions_by_class(df: polars.DataFrame) -> polars.DataFrame

      Stack class predictions into a long format.

      BlueCast returns predictions for each class as separate columns. This function returns a DataFrame where
      all predictions are stacked as a single 'prediction' column.
      :param df: Polars DataFrame with wide predictions  format.
      :return: Polars DataFrame with stacked predictions.


   .. py:method:: calculate_errors(df: Union[pandas.DataFrame, polars.DataFrame]) -> polars.DataFrame

      Analyse errors of predictions on out of fold data.

      :param df: DataFrame holding out of fold data and predictions.
      :return: Polars DataFrame with additional 'prediction_error' column.


   .. py:method:: analyse_segment_errors() -> polars.DataFrame

      Enhanced pipeline for error analysis with DuckDB backend.

      Reads the out of fold datasets from the output location defined in the training config inside the provided
      BlueCast instance, preprocess the data and calculate errors for all subsegments of the data.
      Numerical columns will be split into quantiles to get subsegments.
      :return: Polars DataFrame with subsegments and errors.