:py:mod:`bluecast.blueprints.cast_cv_regression`
================================================

.. py:module:: bluecast.blueprints.cast_cv_regression


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   bluecast.blueprints.cast_cv_regression.BlueCastCVRegression


.. py:class:: BlueCastCVRegression(class_problem: Literal[regression] = 'regression', cat_columns: Optional[List[Union[str, float, int]]] = None, stratifier: Optional[Any] = None, conf_training: Optional[bluecast.config.training_config.TrainingConfig] = None, conf_xgboost: Optional[bluecast.config.training_config.XgboostTuneParamsRegressionConfig] = None, conf_params_xgboost: Optional[bluecast.config.training_config.XgboostRegressionFinalParamConfig] = None, experiment_tracker: Optional[bluecast.experimentation.tracking.ExperimentTracker] = None, custom_in_fold_preprocessor: Optional[bluecast.preprocessing.custom.CustomPreprocessing] = None, custom_last_mile_computation: Optional[bluecast.preprocessing.custom.CustomPreprocessing] = None, custom_preprocessor: Optional[bluecast.preprocessing.custom.CustomPreprocessing] = None, custom_feature_selector: Optional[Union[bluecast.preprocessing.feature_selection.BoostaRootaWrapper, bluecast.preprocessing.custom.CustomPreprocessing]] = None, ml_model: Optional[Union[bluecast.ml_modelling.xgboost.XgboostModel, Any]] = None, single_fold_eval_metric_func: Optional[bluecast.evaluation.eval_metrics.RegressionEvalWrapper] = None)


   Wrapper to train and predict multiple blueCast instances.

   Check the BlueCast class documentation for additional parameter details.
   A custom splitter can be provided.

   :param :class_problem: Takes a string containing the class problem type. At the moment "regression" only.
   :param :target_column: Takes a string containing the name of the target column.
   :param :cat_columns: Takes a list of strings containing the names of the categorical columns. If not provided,
       BlueCast will infer these automatically.
   :param :date_columns: Takes a list of strings containing the names of the date columns. If not provided,
       BlueCast will infer these automatically.
   :param :time_split_column: Takes a string containing the name of the time split column. If not provided,
       BlueCast will not split the data by time or order, but do a random split instead.
   :param :ml_model: Takes an instance of a XgboostModelRegression class. If not provided, BlueCast will instantiate one.
       This is an API to pass any model class. Inherit the baseclass from ml_modelling.base_model.BaseModel.
   :param custom_in_fold_preprocessor: Takes an instance of a CustomPreprocessing class. Allows users to eeecute
       preprocessing after the train test split within cv folds. This will be executed only if precise_cv_tuning in
       the conf_Training is True. Custom ML models need to implement this themselves. This step is only useful when
       the proprocessing step has a high chance of overfitting otherwise (i.e: oversampling techniques).
   :param custom_preprocessor: Takes an instance of a CustomPreprocessing class. Allows users to inject custom
       preprocessing steps which take place right after the train test spit.
   :param custom_last_mile_computation: Takes an instance of a CustomPreprocessing class. Allows users to inject custom
       preprocessing steps which take place right before the model training.
   :param experiment_tracker: Takes an instance of an ExperimentTracker class. If not provided this will be initialized
       automatically.
   :param single_fold_eval_metric_func: Takes a function which calculates the evaluation metric for a single fold.
      Default is mean_squared_error. This function is used to calculate the evaluation metric for each fold during
      hyperparameter tuning when hyperparameter_tuning_rounds = 1 (default). Lower must be better.

   .. py:method:: prepare_data(df: pandas.DataFrame, target: str) -> Tuple[pandas.DataFrame, pandas.Series]


   .. py:method:: show_oof_scores(metric: str = 'RMSE') -> Tuple[float, float]

      Show out of fold scores.

      When calling BlueCastCVRegression's fit_eval function multiple BlueCastRegression
      instances are called and each of them predicts on unseen/oof data.

      This function collects these scores and return mean and average of them.

      :param metric: String indicating which metric shall be returned.
      :return: Tuple with (mean, std) of oof scores


   .. py:method:: fit(df: pandas.DataFrame, target_col: str) -> None

      Fit multiple BlueCastRegression instances on different data splits.

      Input df is expected the target column.


   .. py:method:: fit_eval(df: pandas.DataFrame, target_col: str) -> Tuple[float, float]

      Fit multiple BlueCastRegression instances on different data splits.

      Input df is expected the target column. Evaluation is executed on out-of-fold dataset
      in each split.
      :param df: Pandas DataFrame that includes the target column
      :param target_col: String indicating the name of the target column
      :returns Tuple of (oof_mean, oof_std) with scores on unseen data during eval


   .. py:method:: predict(df: pandas.DataFrame, return_sub_models_preds: bool = False, save_shap_values: bool = False, mean_type: Literal[arithmetic, median, geometric, harmonic] = 'arithmetic') -> Union[pandas.DataFrame, pandas.Series]

      Predict on unseen data using multiple trained BlueCastRegression instances.

      :param df: Pandas DataFrame with unseen data
      :param return_sub_models_preds: If true will return a DataFrame with the predictions of each model
          stored in separate columns.
      :param save_shap_values: If True, calculates and saves shap values, so they can be used to plot
          waterfall plots for selected rows o demand.
      :param mean_type: String indicating the type of mean to be used to blend the predictions of the sub models.
          Possible values are 'arithmetic', 'geometric' and 'harmonic' (default='arithmetic').


   .. py:method:: calibrate(x_calibration: pandas.DataFrame, y_calibration: pandas.Series, **kwargs) -> None

      Calibrate the model.

      Via this function the nonconformity measures are taken and used to predict prediction intervals vis the
      predict_interval function. Used is the mean prediction of all sub models.
      :param: x_calibration: Pandas DataFrame without target column, that has not been seen by the model during
          training.
      :param y_calibration: Pandas Series holding the target value, hat has not been seen by the model during
          training.


   .. py:method:: predict_interval(df: pandas.DataFrame, alphas: List[float]) -> pandas.DataFrame

      Create prediction intervals based on a certain confidence levels.

      Conformal prediction guarantees, that the correct value is present in the prediction band with a probability of
      1 - alpha.
      :param df: Pandas DataFrame holding unseen data
      :param alphas: List of floats indicating the desired confidence levels.
      :returns A Pandas DataFrame with  sorted columns 'alpha_XX_low' (alpha) and 'alpha_XX_high' (1 - alpha) for each
          alpha in the provided list of alphas. To obtain the mean prediction call the 'predict' method.