bluecast.ml_modelling.xgboost

Xgboost classification model.

This module contains a wrapper for the Xgboost classification model. It can be used to train and/or tune the model. It also calculates class weights for imbalanced datasets. The weights may or may not be used deepending on the hyperparameter tuning.

Module Contents

Classes

XgboostModel

Train and/or tune Xgboost classification model.

class bluecast.ml_modelling.xgboost.XgboostModel(class_problem: Literal[binary, multiclass], conf_training: bluecast.config.training_config.TrainingConfig | None = None, conf_xgboost: bluecast.config.training_config.XgboostTuneParamsConfig | None = None, conf_params_xgboost: bluecast.config.training_config.XgboostFinalParamConfig | None = None, experiment_tracker: bluecast.experimentation.tracking.ExperimentTracker | None = None, custom_in_fold_preprocessor: bluecast.preprocessing.custom.CustomPreprocessing | None = None, cat_columns: List[str | float | int] | None = None, single_fold_eval_metric_func: bluecast.evaluation.eval_metrics.ClassificationEvalWrapper | None = None)

Bases: bluecast.ml_modelling.base_classes.XgboostBaseModel

Train and/or tune Xgboost classification model.

fit(x_train: pandas.DataFrame, x_test: pandas.DataFrame, y_train: pandas.Series, y_test: pandas.Series) xgboost.Booster

Train Xgboost model. Includes hyperparameter tuning on default.

autotune(x_train: pandas.DataFrame, x_test: pandas.DataFrame, y_train: pandas.Series, y_test: pandas.Series) None

Tune hyperparameters.

An alternative config can be provided to overwrite the hyperparameter search space.

train_single_fold_model(d_train, d_test, y_test, param, steps, pruning_callback)
_fine_tune_precise(tuned_params: Dict[str, Any], x_train: pandas.DataFrame, y_train: pandas.Series, x_test: pandas.DataFrame, y_test: pandas.Series)
fine_tune(x_train: pandas.DataFrame, x_test: pandas.DataFrame, y_train: pandas.Series, y_test: pandas.Series) None
predict(df: pandas.DataFrame) Tuple[numpy.ndarray, numpy.ndarray]

Predict on unseen data.

predict_proba(df: pandas.DataFrame) numpy.ndarray

Predict class scores on unseen data.