:py:mod:`bluecast.ml_modelling.base_classes` ============================================ .. py:module:: bluecast.ml_modelling.base_classes .. autoapi-nested-parse:: Base classes for all ML models. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: bluecast.ml_modelling.base_classes.BaseClassMlModel bluecast.ml_modelling.base_classes.BaseClassMlRegressionModel bluecast.ml_modelling.base_classes.XgboostBaseModel bluecast.ml_modelling.base_classes.CatboostBaseModel Attributes ~~~~~~~~~~ .. autoapisummary:: bluecast.ml_modelling.base_classes.PredictedProbas bluecast.ml_modelling.base_classes.PredictedClasses .. py:data:: PredictedProbas .. py:data:: PredictedClasses .. py:class:: BaseClassMlModel Bases: :py:obj:`abc.ABC` Base class for all ML models. Enforces the implementation of the fit and predict methods. If hyperparameter tuning is required, then the fit method should implement the tuning. .. py:method:: fit(x_train: pandas.DataFrame, x_test: pandas.DataFrame, y_train: pandas.Series, y_test: pandas.Series) -> Optional[Any] :abstractmethod: .. py:method:: predict(df: pandas.DataFrame) -> Tuple[PredictedProbas, PredictedClasses] :abstractmethod: Predict on unseen data. :return tuple of predicted probabilities and predicted classes .. py:class:: BaseClassMlRegressionModel Bases: :py:obj:`abc.ABC` Base class for all ML models. Enforces the implementation of the fit and predict methods. If hyperparameter tuning is required, then the fit method should implement the tuning. .. py:method:: fit(x_train: pandas.DataFrame, x_test: pandas.DataFrame, y_train: pandas.Series, y_test: pandas.Series) -> Optional[Any] :abstractmethod: .. py:method:: predict(df: pandas.DataFrame) -> numpy.ndarray :abstractmethod: Predict on unseen data. :return numpy array of predictions .. py:class:: XgboostBaseModel(class_problem: Union[Literal[binary, multiclass], Literal[regression]], conf_training: Optional[bluecast.config.training_config.TrainingConfig] = None, conf_xgboost: Optional[Union[bluecast.config.training_config.XgboostTuneParamsConfig, bluecast.config.training_config.XgboostTuneParamsRegressionConfig]] = None, conf_params_xgboost: Optional[Union[bluecast.config.training_config.XgboostFinalParamConfig, bluecast.config.training_config.XgboostRegressionFinalParamConfig]] = None, experiment_tracker: Optional[bluecast.experimentation.tracking.ExperimentTracker] = None, custom_in_fold_preprocessor: Optional[bluecast.preprocessing.custom.CustomPreprocessing] = None, cat_columns: Optional[List[Union[str, float, int]]] = None, single_fold_eval_metric_func: Optional[Union[bluecast.evaluation.eval_metrics.ClassificationEvalWrapper, bluecast.evaluation.eval_metrics.RegressionEvalWrapper]] = None) .. py:method:: _load_xgboost_training_config(conf_xgboost) -> None .. py:method:: _load_xgboost_final_params(conf_params_xgboost) -> None .. py:method:: _load_training_settings_config(conf_training) -> None .. py:method:: _load_experiment_tracker(experiment_tracker) -> None .. py:method:: _create_d_matrices(x_train, y_train, x_test, y_test) .. py:method:: concat_prepare_full_train_datasets(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series) Prepare training dataset and concat with test data. This is only recommended if early stopping is not used or not used on the same eval set. :param x_train: Pandas DataFrame with data without labels. :param y_train: Pandas Series with labels. :param x_test: Pandas DataFrame with data without labels. :param y_test: Pandas Series with labels. :return: Prepared training dataset as Pandas DataFrame, Pandas Series (labels) .. py:method:: get_early_stopping_callback() -> Optional[List[xgboost.callback.EarlyStopping]] Create early stopping callback if configured. .. py:method:: autotune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series) :abstractmethod: .. py:method:: fine_tune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series) :abstractmethod: .. py:method:: create_fine_tune_search_space() -> Dict[str, numpy.array] .. py:method:: _get_param_space_fpr_grid_search(trial: optuna.trial) -> Dict[str, numpy.array] .. py:method:: _optimize_and_plot_grid_search_study(objective: Callable, search_space: Dict[str, numpy.array]) -> None .. py:method:: orchestrate_hyperparameter_tuning(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series) .. py:method:: _create_optuna_study(direction: str, sampler: Optional[optuna.samplers.BaseSampler] = None, study_name: str = 'hyperparameter_tuning', pruner: Optional[optuna.pruners.BasePruner] = None) -> optuna.Study Create an Optuna study with optional database backend support. :param direction: Direction to optimize ('minimize' or 'maximize') :param sampler: Optuna sampler to use :param study_name: Name of the study :param pruner: Optuna pruner to use :return: Configured Optuna study .. py:class:: CatboostBaseModel(class_problem: Union[Literal[binary, multiclass], Literal[regression]], conf_training: Optional[bluecast.config.training_config.TrainingConfig] = None, conf_catboost: Optional[Union[bluecast.config.training_config.CatboostTuneParamsConfig, bluecast.config.training_config.CatboostTuneParamsRegressionConfig]] = None, conf_params_catboost: Optional[Union[bluecast.config.training_config.CatboostFinalParamConfig, bluecast.config.training_config.CatboostRegressionFinalParamConfig]] = None, experiment_tracker: Optional[bluecast.experimentation.tracking.ExperimentTracker] = None, custom_in_fold_preprocessor: Optional[bluecast.preprocessing.custom.CustomPreprocessing] = None, cat_columns: Optional[List[Union[str, float, int]]] = None, single_fold_eval_metric_func: Optional[Union[bluecast.evaluation.eval_metrics.ClassificationEvalWrapper, bluecast.evaluation.eval_metrics.RegressionEvalWrapper]] = None) Example base model class for CatBoost, replicating the structure and logic of your XgboostBaseModel. .. py:method:: _load_catboost_training_config(conf_catboost) -> None Loads CatBoost tuning configuration. If none is provided, uses either the classification or the regression default class. .. py:method:: _load_catboost_final_params(conf_params_catboost) -> None Loads CatBoost final parameters. If none is provided, uses either the classification or the regression default class. .. py:method:: _load_training_settings_config(conf_training) -> None Loads or creates a default TrainingConfig. .. py:method:: _load_experiment_tracker(experiment_tracker) -> None Loads or creates a default ExperimentTracker. .. py:method:: _create_pools(x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series) Creates CatBoost Pools for training and testing. Potentially uses sample weights if available. Also sets cat_features if self.cat_columns is provided. .. py:method:: concat_prepare_full_train_datasets(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series) Prepare training dataset and optionally concatenate with test data for final model training, if your approach is to train on all data at once (like in the XGBoost base class). .. py:method:: get_early_stopping_callback() In CatBoost, early stopping is handled by setting `od_type` (overfitting detector) and `od_wait` (similar to early_stopping_rounds). Example: If you want to replicate the 'EarlyStopping' from XGBoost, you can just set: model = CatBoostClassifier( iterations=..., od_type='Iter', od_wait=self.conf_training.early_stopping_rounds, ... ) For consistency with your structure, we return None or a dictionary of parameters. .. py:method:: autotune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series) :abstractmethod: .. py:method:: fine_tune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series) :abstractmethod: .. py:method:: create_fine_tune_search_space() -> Dict[str, numpy.array] .. py:method:: _get_param_space_fpr_grid_search(trial: optuna.trial) -> Dict[str, Any] Similar to XGBoost method for an Optuna-based grid or random search. For CatBoost, adjust to whichever parameters you want to tweak. .. py:method:: _optimize_and_plot_grid_search_study(objective: Callable, search_space: Dict[str, numpy.array]) -> None Similar to the XGBoost method. We create an Optuna study with a GridSampler (or any other sampler), run it, and track the best score. We then update self.conf_params_catboost accordingly if improvements are found. .. py:method:: orchestrate_hyperparameter_tuning(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series) Mirrors the XGBoost orchestrate_hyperparameter_tuning approach. .. py:method:: _create_optuna_study(direction: str, sampler: Optional[optuna.samplers.BaseSampler] = None, study_name: str = 'catboost_hyperparameter_tuning', pruner: Optional[optuna.pruners.BasePruner] = None) -> optuna.Study Create an Optuna study with optional database backend support for CatBoost. :param direction: Direction to optimize ('minimize' or 'maximize') :param sampler: Optuna sampler to use :param study_name: Name of the study :param pruner: Optuna pruner to use :return: Configured Optuna study