`bluecast.ml_modelling.base_classes`¶

Base classes for all ML models.

Module Contents¶

Classes¶

`BaseClassMlModel`	Base class for all ML models.
`BaseClassMlRegressionModel`	Base class for all ML models.
`XgboostBaseModel`
`CatboostBaseModel`	Example base model class for CatBoost, replicating the structure and logic

Attributes¶

`PredictedProbas`
`PredictedClasses`

bluecast.ml_modelling.base_classes.PredictedProbas¶

bluecast.ml_modelling.base_classes.PredictedClasses¶

class bluecast.ml_modelling.base_classes.BaseClassMlModel¶

Bases: abc.ABC

Base class for all ML models.

Enforces the implementation of the fit and predict methods. If hyperparameter tuning is required, then the fit method should implement the tuning.

abstract fit(x_train: pandas.DataFrame, x_test: pandas.DataFrame, y_train: pandas.Series, y_test: pandas.Series) → Any | None¶

abstract predict(df: pandas.DataFrame) → Tuple[PredictedProbas, PredictedClasses]¶

Predict on unseen data.

:return tuple of predicted probabilities and predicted classes

class bluecast.ml_modelling.base_classes.BaseClassMlRegressionModel¶

Bases: abc.ABC

Base class for all ML models.

Enforces the implementation of the fit and predict methods. If hyperparameter tuning is required, then the fit method should implement the tuning.

abstract fit(x_train: pandas.DataFrame, x_test: pandas.DataFrame, y_train: pandas.Series, y_test: pandas.Series) → Any | None¶

abstract predict(df: pandas.DataFrame) → numpy.ndarray¶

Predict on unseen data.

:return numpy array of predictions

class bluecast.ml_modelling.base_classes.XgboostBaseModel(class_problem: Literal[binary, multiclass] | Literal[regression], conf_training: bluecast.config.training_config.TrainingConfig | None = None, conf_xgboost: bluecast.config.training_config.XgboostTuneParamsConfig | bluecast.config.training_config.XgboostTuneParamsRegressionConfig | None = None, conf_params_xgboost: bluecast.config.training_config.XgboostFinalParamConfig | bluecast.config.training_config.XgboostRegressionFinalParamConfig | None = None, experiment_tracker: bluecast.experimentation.tracking.ExperimentTracker | None = None, custom_in_fold_preprocessor: bluecast.preprocessing.custom.CustomPreprocessing | None = None, cat_columns: List[str | float | int] | None = None, single_fold_eval_metric_func: bluecast.evaluation.eval_metrics.ClassificationEvalWrapper | bluecast.evaluation.eval_metrics.RegressionEvalWrapper | None = None)¶

_load_xgboost_training_config(conf_xgboost) → None¶

_load_xgboost_final_params(conf_params_xgboost) → None¶

_load_training_settings_config(conf_training) → None¶

_load_experiment_tracker(experiment_tracker) → None¶

_create_d_matrices(x_train, y_train, x_test, y_test)¶

concat_prepare_full_train_datasets(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶

Prepare training dataset and concat with test data.

This is only recommended if early stopping is not used or not used on the same eval set.

Parameters:

x_train – Pandas DataFrame with data without labels.
y_train – Pandas Series with labels.
x_test – Pandas DataFrame with data without labels.
y_test – Pandas Series with labels.

Returns:

Prepared training dataset as Pandas DataFrame, Pandas Series (labels)

get_early_stopping_callback() → List[xgboost.callback.EarlyStopping] | None¶: Create early stopping callback if configured.

abstract autotune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶

abstract fine_tune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶

create_fine_tune_search_space() → Dict[str, numpy.array]¶

_get_param_space_fpr_grid_search(trial: optuna.trial) → Dict[str, numpy.array]¶

_optimize_and_plot_grid_search_study(objective: Callable, search_space: Dict[str, numpy.array]) → None¶

orchestrate_hyperparameter_tuning(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶

_create_optuna_study(direction: str, sampler: optuna.samplers.BaseSampler | None = None, study_name: str = 'hyperparameter_tuning', pruner: optuna.pruners.BasePruner | None = None) → optuna.Study¶

Create an Optuna study with optional database backend support.

Parameters:

direction – Direction to optimize (‘minimize’ or ‘maximize’)
sampler – Optuna sampler to use
study_name – Name of the study
pruner – Optuna pruner to use

Returns:

Configured Optuna study

class bluecast.ml_modelling.base_classes.CatboostBaseModel(class_problem: Literal[binary, multiclass] | Literal[regression], conf_training: bluecast.config.training_config.TrainingConfig | None = None, conf_catboost: bluecast.config.training_config.CatboostTuneParamsConfig | bluecast.config.training_config.CatboostTuneParamsRegressionConfig | None = None, conf_params_catboost: bluecast.config.training_config.CatboostFinalParamConfig | bluecast.config.training_config.CatboostRegressionFinalParamConfig | None = None, experiment_tracker: bluecast.experimentation.tracking.ExperimentTracker | None = None, custom_in_fold_preprocessor: bluecast.preprocessing.custom.CustomPreprocessing | None = None, cat_columns: List[str | float | int] | None = None, single_fold_eval_metric_func: bluecast.evaluation.eval_metrics.ClassificationEvalWrapper | bluecast.evaluation.eval_metrics.RegressionEvalWrapper | None = None)¶

Example base model class for CatBoost, replicating the structure and logic of your XgboostBaseModel.

_load_catboost_training_config(conf_catboost) → None¶: Loads CatBoost tuning configuration. If none is provided, uses either the classification or the regression default class.

_load_catboost_final_params(conf_params_catboost) → None¶: Loads CatBoost final parameters. If none is provided, uses either the classification or the regression default class.

_load_training_settings_config(conf_training) → None¶: Loads or creates a default TrainingConfig.

_load_experiment_tracker(experiment_tracker) → None¶: Loads or creates a default ExperimentTracker.

_create_pools(x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶: Creates CatBoost Pools for training and testing. Potentially uses sample weights if available. Also sets cat_features if self.cat_columns is provided.

concat_prepare_full_train_datasets(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶: Prepare training dataset and optionally concatenate with test data for final model training, if your approach is to train on all data at once (like in the XGBoost base class).

get_early_stopping_callback()¶

In CatBoost, early stopping is handled by setting od_type (overfitting detector) and od_wait (similar to early_stopping_rounds). Example:

If you want to replicate the ‘EarlyStopping’ from XGBoost, you can just set:

model = CatBoostClassifier(: iterations=…, od_type=’Iter’, od_wait=self.conf_training.early_stopping_rounds, …

)

For consistency with your structure, we return None or a dictionary of parameters.

abstract autotune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶

abstract fine_tune(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶

create_fine_tune_search_space() → Dict[str, numpy.array]¶

_get_param_space_fpr_grid_search(trial: optuna.trial) → Dict[str, Any]¶: Similar to XGBoost method for an Optuna-based grid or random search. For CatBoost, adjust to whichever parameters you want to tweak.

_optimize_and_plot_grid_search_study(objective: Callable, search_space: Dict[str, numpy.array]) → None¶: Similar to the XGBoost method. We create an Optuna study with a GridSampler (or any other sampler), run it, and track the best score. We then update self.conf_params_catboost accordingly if improvements are found.

orchestrate_hyperparameter_tuning(*, x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)¶: Mirrors the XGBoost orchestrate_hyperparameter_tuning approach.

_create_optuna_study(direction: str, sampler: optuna.samplers.BaseSampler | None = None, study_name: str = 'catboost_hyperparameter_tuning', pruner: optuna.pruners.BasePruner | None = None) → optuna.Study¶

Create an Optuna study with optional database backend support for CatBoost.

Parameters:

direction – Direction to optimize (‘minimize’ or ‘maximize’)
sampler – Optuna sampler to use
study_name – Name of the study
pruner – Optuna pruner to use

Returns:

Configured Optuna study

`bluecast.ml_modelling.base_classes`¶

Module Contents¶

Classes¶

Attributes¶

BlueCast

Navigation

Related Topics

bluecast.ml_modelling.base_classes¶

Module Contents¶

Classes¶

Attributes¶

`bluecast.ml_modelling.base_classes`¶