bluecast.ml_modelling.base_classes

Base classes for all ML models.

Module Contents

Classes

BaseClassMlModel

Base class for all ML models.

BaseClassMlRegressionModel

Base class for all ML models.

XgboostBaseModel

CatboostBaseModel

Example base model class for CatBoost, replicating the structure and logic

Attributes

PredictedProbas

PredictedClasses

bluecast.ml_modelling.base_classes.PredictedProbas
bluecast.ml_modelling.base_classes.PredictedClasses
class bluecast.ml_modelling.base_classes.BaseClassMlModel

Bases: abc.ABC

Base class for all ML models.

Enforces the implementation of the fit and predict methods. If hyperparameter tuning is required, then the fit method should implement the tuning.

abstract fit(x_train: pandas.DataFrame, x_test: pandas.DataFrame, y_train: pandas.Series, y_test: pandas.Series) Any | None
abstract predict(df: pandas.DataFrame) Tuple[PredictedProbas, PredictedClasses]

Predict on unseen data.

:return tuple of predicted probabilities and predicted classes

class bluecast.ml_modelling.base_classes.BaseClassMlRegressionModel

Bases: abc.ABC

Base class for all ML models.

Enforces the implementation of the fit and predict methods. If hyperparameter tuning is required, then the fit method should implement the tuning.

abstract fit(x_train: pandas.DataFrame, x_test: pandas.DataFrame, y_train: pandas.Series, y_test: pandas.Series) Any | None
abstract predict(df: pandas.DataFrame) numpy.ndarray

Predict on unseen data.

:return numpy array of predictions

class bluecast.ml_modelling.base_classes.XgboostBaseModel(class_problem: Literal[binary, multiclass] | Literal[regression], conf_training: bluecast.config.training_config.TrainingConfig | None = None, conf_xgboost: bluecast.config.training_config.XgboostTuneParamsConfig | bluecast.config.training_config.XgboostTuneParamsRegressionConfig | None = None, conf_params_xgboost: bluecast.config.training_config.XgboostFinalParamConfig | bluecast.config.training_config.XgboostRegressionFinalParamConfig | None = None, experiment_tracker: bluecast.experimentation.tracking.ExperimentTracker | None = None, custom_in_fold_preprocessor: bluecast.preprocessing.custom.CustomPreprocessing | None = None, cat_columns: List[str | float | int] | None = None, single_fold_eval_metric_func: bluecast.evaluation.eval_metrics.ClassificationEvalWrapper | bluecast.evaluation.eval_metrics.RegressionEvalWrapper | None = None)
_load_xgboost_training_config(conf_xgboost) None
_load_xgboost_final_params(conf_params_xgboost) None
_load_training_settings_config(conf_training) None
_load_experiment_tracker(experiment_tracker) None
_create_d_matrices(x_train, y_train, x_test, y_test)
concat_prepare_full_train_datasets(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)

Prepare training dataset and concat with test data.

This is only recommended if early stopping is not used or not used on the same eval set.

Parameters:
  • x_train – Pandas DataFrame with data without labels.

  • y_train – Pandas Series with labels.

  • x_test – Pandas DataFrame with data without labels.

  • y_test – Pandas Series with labels.

Returns:

Prepared training dataset as Pandas DataFrame, Pandas Series (labels)

get_early_stopping_callback() List[xgboost.callback.EarlyStopping] | None

Create early stopping callback if configured.

abstract autotune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)
abstract fine_tune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)
create_fine_tune_search_space() Dict[str, numpy.array]
_optimize_and_plot_grid_search_study(objective: Callable, search_space: Dict[str, numpy.array]) None
orchestrate_hyperparameter_tuning(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)
_create_optuna_study(direction: str, sampler: optuna.samplers.BaseSampler | None = None, study_name: str = 'hyperparameter_tuning', pruner: optuna.pruners.BasePruner | None = None) optuna.Study

Create an Optuna study with optional database backend support.

Parameters:
  • direction – Direction to optimize (‘minimize’ or ‘maximize’)

  • sampler – Optuna sampler to use

  • study_name – Name of the study

  • pruner – Optuna pruner to use

Returns:

Configured Optuna study

class bluecast.ml_modelling.base_classes.CatboostBaseModel(class_problem: Literal[binary, multiclass] | Literal[regression], conf_training: bluecast.config.training_config.TrainingConfig | None = None, conf_catboost: bluecast.config.training_config.CatboostTuneParamsConfig | bluecast.config.training_config.CatboostTuneParamsRegressionConfig | None = None, conf_params_catboost: bluecast.config.training_config.CatboostFinalParamConfig | bluecast.config.training_config.CatboostRegressionFinalParamConfig | None = None, experiment_tracker: bluecast.experimentation.tracking.ExperimentTracker | None = None, custom_in_fold_preprocessor: bluecast.preprocessing.custom.CustomPreprocessing | None = None, cat_columns: List[str | float | int] | None = None, single_fold_eval_metric_func: bluecast.evaluation.eval_metrics.ClassificationEvalWrapper | bluecast.evaluation.eval_metrics.RegressionEvalWrapper | None = None)

Example base model class for CatBoost, replicating the structure and logic of your XgboostBaseModel.

_load_catboost_training_config(conf_catboost) None

Loads CatBoost tuning configuration. If none is provided, uses either the classification or the regression default class.

_load_catboost_final_params(conf_params_catboost) None

Loads CatBoost final parameters. If none is provided, uses either the classification or the regression default class.

_load_training_settings_config(conf_training) None

Loads or creates a default TrainingConfig.

_load_experiment_tracker(experiment_tracker) None

Loads or creates a default ExperimentTracker.

_create_pools(x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)

Creates CatBoost Pools for training and testing. Potentially uses sample weights if available. Also sets cat_features if self.cat_columns is provided.

concat_prepare_full_train_datasets(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)

Prepare training dataset and optionally concatenate with test data for final model training, if your approach is to train on all data at once (like in the XGBoost base class).

get_early_stopping_callback()

In CatBoost, early stopping is handled by setting od_type (overfitting detector) and od_wait (similar to early_stopping_rounds). Example:

If you want to replicate the ‘EarlyStopping’ from XGBoost, you can just set:

model = CatBoostClassifier(

iterations=…, od_type=’Iter’, od_wait=self.conf_training.early_stopping_rounds, …

)

For consistency with your structure, we return None or a dictionary of parameters.

abstract autotune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)
abstract fine_tune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)
create_fine_tune_search_space() Dict[str, numpy.array]

Similar to XGBoost method for an Optuna-based grid or random search. For CatBoost, adjust to whichever parameters you want to tweak.

_optimize_and_plot_grid_search_study(objective: Callable, search_space: Dict[str, numpy.array]) None

Similar to the XGBoost method. We create an Optuna study with a GridSampler (or any other sampler), run it, and track the best score. We then update self.conf_params_catboost accordingly if improvements are found.

orchestrate_hyperparameter_tuning(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)

Mirrors the XGBoost orchestrate_hyperparameter_tuning approach.

_create_optuna_study(direction: str, sampler: optuna.samplers.BaseSampler | None = None, study_name: str = 'catboost_hyperparameter_tuning', pruner: optuna.pruners.BasePruner | None = None) optuna.Study

Create an Optuna study with optional database backend support for CatBoost.

Parameters:
  • direction – Direction to optimize (‘minimize’ or ‘maximize’)

  • sampler – Optuna sampler to use

  • study_name – Name of the study

  • pruner – Optuna pruner to use

Returns:

Configured Optuna study