bluecast.ml_modelling.base_classes¶
Base classes for all ML models.
Module Contents¶
Classes¶
Base class for all ML models. |
|
Base class for all ML models. |
|
Example base model class for CatBoost, replicating the structure and logic |
Attributes¶
- bluecast.ml_modelling.base_classes.PredictedProbas¶
- bluecast.ml_modelling.base_classes.PredictedClasses¶
- class bluecast.ml_modelling.base_classes.BaseClassMlModel¶
Bases:
abc.ABCBase class for all ML models.
Enforces the implementation of the fit and predict methods. If hyperparameter tuning is required, then the fit method should implement the tuning.
- abstract fit(x_train: pandas.DataFrame, x_test: pandas.DataFrame, y_train: pandas.Series, y_test: pandas.Series) Any | None¶
- abstract predict(df: pandas.DataFrame) Tuple[PredictedProbas, PredictedClasses]¶
Predict on unseen data.
:return tuple of predicted probabilities and predicted classes
- class bluecast.ml_modelling.base_classes.BaseClassMlRegressionModel¶
Bases:
abc.ABCBase class for all ML models.
Enforces the implementation of the fit and predict methods. If hyperparameter tuning is required, then the fit method should implement the tuning.
- abstract fit(x_train: pandas.DataFrame, x_test: pandas.DataFrame, y_train: pandas.Series, y_test: pandas.Series) Any | None¶
- abstract predict(df: pandas.DataFrame) numpy.ndarray¶
Predict on unseen data.
:return numpy array of predictions
- class bluecast.ml_modelling.base_classes.XgboostBaseModel(class_problem: Literal[binary, multiclass] | Literal[regression], conf_training: bluecast.config.training_config.TrainingConfig | None = None, conf_xgboost: bluecast.config.training_config.XgboostTuneParamsConfig | bluecast.config.training_config.XgboostTuneParamsRegressionConfig | None = None, conf_params_xgboost: bluecast.config.training_config.XgboostFinalParamConfig | bluecast.config.training_config.XgboostRegressionFinalParamConfig | None = None, experiment_tracker: bluecast.experimentation.tracking.ExperimentTracker | None = None, custom_in_fold_preprocessor: bluecast.preprocessing.custom.CustomPreprocessing | None = None, cat_columns: List[str | float | int] | None = None, single_fold_eval_metric_func: bluecast.evaluation.eval_metrics.ClassificationEvalWrapper | bluecast.evaluation.eval_metrics.RegressionEvalWrapper | None = None)¶
- _load_xgboost_training_config(conf_xgboost) None¶
- _load_xgboost_final_params(conf_params_xgboost) None¶
- _load_training_settings_config(conf_training) None¶
- _load_experiment_tracker(experiment_tracker) None¶
- _create_d_matrices(x_train, y_train, x_test, y_test)¶
- concat_prepare_full_train_datasets(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶
Prepare training dataset and concat with test data.
This is only recommended if early stopping is not used or not used on the same eval set.
- Parameters:
x_train – Pandas DataFrame with data without labels.
y_train – Pandas Series with labels.
x_test – Pandas DataFrame with data without labels.
y_test – Pandas Series with labels.
- Returns:
Prepared training dataset as Pandas DataFrame, Pandas Series (labels)
- get_early_stopping_callback() List[xgboost.callback.EarlyStopping] | None¶
Create early stopping callback if configured.
- abstract autotune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶
- abstract fine_tune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶
- create_fine_tune_search_space() Dict[str, numpy.array]¶
- _get_param_space_fpr_grid_search(trial: optuna.trial) Dict[str, numpy.array]¶
- _optimize_and_plot_grid_search_study(objective: Callable, search_space: Dict[str, numpy.array]) None¶
- orchestrate_hyperparameter_tuning(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶
- _create_optuna_study(direction: str, sampler: optuna.samplers.BaseSampler | None = None, study_name: str = 'hyperparameter_tuning', pruner: optuna.pruners.BasePruner | None = None) optuna.Study¶
Create an Optuna study with optional database backend support.
- Parameters:
direction – Direction to optimize (‘minimize’ or ‘maximize’)
sampler – Optuna sampler to use
study_name – Name of the study
pruner – Optuna pruner to use
- Returns:
Configured Optuna study
- class bluecast.ml_modelling.base_classes.CatboostBaseModel(class_problem: Literal[binary, multiclass] | Literal[regression], conf_training: bluecast.config.training_config.TrainingConfig | None = None, conf_catboost: bluecast.config.training_config.CatboostTuneParamsConfig | bluecast.config.training_config.CatboostTuneParamsRegressionConfig | None = None, conf_params_catboost: bluecast.config.training_config.CatboostFinalParamConfig | bluecast.config.training_config.CatboostRegressionFinalParamConfig | None = None, experiment_tracker: bluecast.experimentation.tracking.ExperimentTracker | None = None, custom_in_fold_preprocessor: bluecast.preprocessing.custom.CustomPreprocessing | None = None, cat_columns: List[str | float | int] | None = None, single_fold_eval_metric_func: bluecast.evaluation.eval_metrics.ClassificationEvalWrapper | bluecast.evaluation.eval_metrics.RegressionEvalWrapper | None = None)¶
Example base model class for CatBoost, replicating the structure and logic of your XgboostBaseModel.
- _load_catboost_training_config(conf_catboost) None¶
Loads CatBoost tuning configuration. If none is provided, uses either the classification or the regression default class.
- _load_catboost_final_params(conf_params_catboost) None¶
Loads CatBoost final parameters. If none is provided, uses either the classification or the regression default class.
- _load_training_settings_config(conf_training) None¶
Loads or creates a default TrainingConfig.
- _load_experiment_tracker(experiment_tracker) None¶
Loads or creates a default ExperimentTracker.
- _create_pools(x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶
Creates CatBoost Pools for training and testing. Potentially uses sample weights if available. Also sets cat_features if self.cat_columns is provided.
- concat_prepare_full_train_datasets(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶
Prepare training dataset and optionally concatenate with test data for final model training, if your approach is to train on all data at once (like in the XGBoost base class).
- get_early_stopping_callback()¶
In CatBoost, early stopping is handled by setting od_type (overfitting detector) and od_wait (similar to early_stopping_rounds). Example:
If you want to replicate the ‘EarlyStopping’ from XGBoost, you can just set:
- model = CatBoostClassifier(
iterations=…, od_type=’Iter’, od_wait=self.conf_training.early_stopping_rounds, …
)
For consistency with your structure, we return None or a dictionary of parameters.
- abstract autotune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶
- abstract fine_tune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶
- create_fine_tune_search_space() Dict[str, numpy.array]¶
- _get_param_space_fpr_grid_search(trial: optuna.trial) Dict[str, Any]¶
Similar to XGBoost method for an Optuna-based grid or random search. For CatBoost, adjust to whichever parameters you want to tweak.
- _optimize_and_plot_grid_search_study(objective: Callable, search_space: Dict[str, numpy.array]) None¶
Similar to the XGBoost method. We create an Optuna study with a GridSampler (or any other sampler), run it, and track the best score. We then update self.conf_params_catboost accordingly if improvements are found.
- orchestrate_hyperparameter_tuning(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶
Mirrors the XGBoost orchestrate_hyperparameter_tuning approach.
- _create_optuna_study(direction: str, sampler: optuna.samplers.BaseSampler | None = None, study_name: str = 'catboost_hyperparameter_tuning', pruner: optuna.pruners.BasePruner | None = None) optuna.Study¶
Create an Optuna study with optional database backend support for CatBoost.
- Parameters:
direction – Direction to optimize (‘minimize’ or ‘maximize’)
sampler – Optuna sampler to use
study_name – Name of the study
pruner – Optuna pruner to use
- Returns:
Configured Optuna study