:py:mod:`bluecast.config.training_config` ========================================= .. py:module:: bluecast.config.training_config .. autoapi-nested-parse:: Define training and common configuration parameters. Pydantic dataclasses are used to define the configuration parameters. This allows for type checking and validation of the configuration parameters. The configuration parameters are used in the training pipeline and in the evaluation pipeline. Pydantic dataclasses are used to allow users a pythonic way to define the configuration parameters. Default configurations can be loaded, adjusted and passed into the blueprints. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: bluecast.config.training_config.Config bluecast.config.training_config.TrainingConfig bluecast.config.training_config.XgboostTuneParamsConfig bluecast.config.training_config.XgboostTuneParamsRegressionConfig bluecast.config.training_config.XgboostFinalParamConfig bluecast.config.training_config.XgboostRegressionFinalParamConfig .. py:class:: Config .. py:attribute:: arbitrary_types_allowed :value: True .. py:class:: TrainingConfig(/, **data: Any) Bases: :py:obj:`pydantic.BaseModel` Define general training parameters. :param global_random_state: Global random state to use for reproducibility. :param increase_random_state_in_bluecast_cv_by: In BlueCastCV multiple models are trained. Define by how much the random state changes with each additional model. :param shuffle_during_training: Whether to shuffle the data during training when hypertuning_cv_folds > 1. :param hyperparameter_tuning_rounds: Number of hyperparameter tuning rounds. Not used when custom ML model is passed. :param hyperparameter_tuning_max_runtime_secs: Maximum runtime in seconds for hyperparameter tuning. Not used when custom ML model is passed. :param hypertuning_cv_folds: Number of cross-validation folds to use for hyperparameter tuning. Not used when custom ML model is passed. :param hypertuning_cv_repeats: Number of repetitions for each cross-validation fold during hyperparameter tuning. Not used when custom ML model is passed. :param sample_data_during_tuning: Whether to sample the data during tuning. Not used when custom ML model is passed. :param sample_data_during_tuning_alpha: Alpha value for sampling the data during tuning. The higher alpha the fewer samples will be left. Not used when custom ML model is passed. :param class_weight_during_dmatrix_creation: Whether to use class weights during DMatrix creation. Not used when custom ML model is passed. :param early_stopping_rounds: Number of early stopping rounds during final training or when hyperparameter tuning follows a single train-test split. Not used when custom ML model is passed. :param autotune_model: Whether to autotune the model. Not used when custom ML model is passed. :param autotune_on_device: Whether to autotune on CPU or GPU. Chose any of ["auto", "gpu", "cpu"]. Not used when custom ML model is passed. :param autotune_n_random_seeds: Number of random seeds to use for autotuning. This changes Optuna's random seed only. Will be updated back after every nth trial back again. Not used when custom ML model is passed. :param update_hyperparameter_search_space_after_nth_trial: Update the hyperparameter search space after the nth trial. Not used when custom ML model is passed. :param plot_hyperparameter_tuning_overview: Whether to plot the hyperparameter tuning overview. Not used when custom ML model is passed. :param enable_feature_selection: Whether to enable recursive feature selection. :param calculate_shap_values: Whether to calculate shap values. Also used when custom ML model is passed. Not compatible with all ML models. See the SHAP documentation for more details. :param shap_waterfall_indices: List of sample indices to plot. Each index represents a sample (i.e.: [0, 1, 499]). :param show_dependence_plots_of_top_n_features: Maximum number of dependence plots to show. Not used when custom ML model is passed. :param store_shap_values_in_instance: Whether to store the SHAP values in the BlueCast instance. Not applicable when custom ML model is used. :param train_size: Train size to use for train-test split. :param train_split_stratify: Whether to stratify the train-test split. Not used when custom ML model is passed. :param use_full_data_for_final_model: Whether to use the full data for the final model. This might cause overfitting. Not used when custom ML model is passed. :param cardinality_threshold_for_onehot_encoding: Categorical features with a cardinality of less or equal this threshold will be onehot encoded. The rest will be target encoded. Will be ignored if cat_encoding_via_ml_algorithm is set to true. :param infrequent_categories_threshold: Categories with a frequency of less this threshold will be grouped into a common group. This is done to reduce the risk of overfitting. Will be ignored if cat_encoding_via_ml_algorithm is set to true. :param cat_encoding_via_ml_algorithm: Whether to use an ML algorithm for categorical encoding. If True, the categorical encoding is done via a ML algorithm. If False, the categorical encoding is done via a target encoding in the preprocessing steps. See the ReadMe for more details. :param show_detailed_tuning_logs: Whether to show detailed tuning logs. Not used when custom ML model is passed. :param enable_grid_search_fine_tuning: After hyperparameter tuning run Gridsearch tuning on a fine-grained grid based on the previous hyperparameter tuning. Only possible when autotune_model is True. :param gridsearch_nb_parameters_per_grid: Decides how many steps the grid shall have per parameter. :param gridsearch_tuning_max_runtime_secs: Sets the maximum time in seconds the tuning shall run. This will finish the latest trial nd will exceed this limit though. :param experiment_name: Name of the experiment. Will be logged inside the ExperimentTracker. :param logging_file_path: Path to the logging file. If None, the logging will be printed to the Jupyter notebook instead. :param out_of_fold_dataset_store_path: Path to store the out of fold dataset. If None, the out of fold dataset will not be stored. Shall end with a slash. Only used when BlueCast instances are called with fit_eval method. .. py:attribute:: global_random_state :type: int :value: 33 .. py:attribute:: increase_random_state_in_bluecast_cv_by :type: int :value: 200 .. py:attribute:: shuffle_during_training :type: bool :value: True .. py:attribute:: hyperparameter_tuning_rounds :type: int :value: 200 .. py:attribute:: hyperparameter_tuning_max_runtime_secs :type: int :value: 3600 .. py:attribute:: hypertuning_cv_folds :type: int :value: 5 .. py:attribute:: hypertuning_cv_repeats :type: int :value: 1 .. py:attribute:: sample_data_during_tuning :type: bool :value: False .. py:attribute:: sample_data_during_tuning_alpha :type: float :value: 2.0 .. py:attribute:: precise_cv_tuning :type: bool :value: False .. py:attribute:: early_stopping_rounds :type: Optional[int] :value: 20 .. py:attribute:: autotune_model :type: bool :value: True .. py:attribute:: autotune_on_device :type: Literal[auto, gpu, cpu] :value: 'auto' .. py:attribute:: autotune_n_random_seeds :type: int :value: 1 .. py:attribute:: update_hyperparameter_search_space_after_nth_trial :type: int :value: 200 .. py:attribute:: plot_hyperparameter_tuning_overview :type: bool :value: True .. py:attribute:: enable_feature_selection :type: bool :value: False .. py:attribute:: calculate_shap_values :type: bool :value: True .. py:attribute:: shap_waterfall_indices :type: List[int] :value: [] .. py:attribute:: show_dependence_plots_of_top_n_features :type: int :value: 0 .. py:attribute:: store_shap_values_in_instance :type: bool :value: False .. py:attribute:: train_size :type: float :value: 0.8 .. py:attribute:: train_split_stratify :type: bool :value: True .. py:attribute:: use_full_data_for_final_model :type: bool :value: True .. py:attribute:: cardinality_threshold_for_onehot_encoding :type: int :value: 5 .. py:attribute:: infrequent_categories_threshold :type: int :value: 5 .. py:attribute:: cat_encoding_via_ml_algorithm :type: bool :value: False .. py:attribute:: show_detailed_tuning_logs :type: bool :value: False .. py:attribute:: optuna_sampler_n_startup_trials :type: int :value: 10 .. py:attribute:: enable_grid_search_fine_tuning :type: bool :value: False .. py:attribute:: gridsearch_tuning_max_runtime_secs :type: int :value: 3600 .. py:attribute:: gridsearch_nb_parameters_per_grid :type: int :value: 5 .. py:attribute:: bluecast_cv_train_n_model :type: Tuple[int, int] :value: (5, 1) .. py:attribute:: logging_file_path :type: Optional[str] .. py:attribute:: experiment_name :type: str :value: 'new experiment' .. py:attribute:: out_of_fold_dataset_store_path :type: Optional[str] .. py:class:: XgboostTuneParamsConfig(/, **data: Any) Bases: :py:obj:`pydantic.BaseModel` Define hyperparameter tuning search space. .. py:attribute:: max_depth_min :type: int :value: 1 .. py:attribute:: max_depth_max :type: int :value: 10 .. py:attribute:: alpha_min :type: float :value: 1e-08 .. py:attribute:: alpha_max :type: float :value: 100 .. py:attribute:: lambda_min :type: float :value: 1 .. py:attribute:: lambda_max :type: float :value: 100 .. py:attribute:: gamma_min :type: float :value: 1e-08 .. py:attribute:: gamma_max :type: float :value: 10 .. py:attribute:: min_child_weight_min :type: float :value: 1 .. py:attribute:: min_child_weight_max :type: float :value: 100 .. py:attribute:: sub_sample_min :type: float :value: 0.1 .. py:attribute:: sub_sample_max :type: float :value: 1.0 .. py:attribute:: col_sample_by_tree_min :type: float :value: 0.1 .. py:attribute:: col_sample_by_tree_max :type: float :value: 1.0 .. py:attribute:: col_sample_by_level_min :type: float :value: 1.0 .. py:attribute:: col_sample_by_level_max :type: float :value: 1.0 .. py:attribute:: max_bin_min :type: int :value: 128 .. py:attribute:: max_bin_max :type: int :value: 1024 .. py:attribute:: eta_min :type: float :value: 0.001 .. py:attribute:: eta_max :type: float :value: 0.3 .. py:attribute:: steps_min :type: int :value: 1000 .. py:attribute:: steps_max :type: int :value: 1000 .. py:attribute:: verbosity_during_hyperparameter_tuning :type: int :value: 0 .. py:attribute:: verbosity_during_final_model_training :type: int :value: 0 .. py:attribute:: booster :type: List[str] :value: ['gbtree'] .. py:attribute:: grow_policy :type: List[str] :value: ['depthwise', 'lossguide'] .. py:attribute:: tree_method :type: List[str] :value: ['exact', 'approx', 'hist'] .. py:attribute:: xgboost_objective :type: str :value: 'multi:softprob' .. py:attribute:: xgboost_eval_metric :type: str :value: 'mlogloss' .. py:attribute:: xgboost_eval_metric_tune_direction :type: Literal[minimize, maximize] :value: 'minimize' .. py:class:: XgboostTuneParamsRegressionConfig(/, **data: Any) Bases: :py:obj:`pydantic.BaseModel` Define hyperparameter tuning search space. .. py:attribute:: max_depth_min :type: int :value: 1 .. py:attribute:: max_depth_max :type: int :value: 10 .. py:attribute:: alpha_min :type: float :value: 1e-08 .. py:attribute:: alpha_max :type: float :value: 100 .. py:attribute:: lambda_min :type: float :value: 1 .. py:attribute:: lambda_max :type: float :value: 100 .. py:attribute:: gamma_min :type: float :value: 1e-08 .. py:attribute:: gamma_max :type: float :value: 10 .. py:attribute:: min_child_weight_min :type: float :value: 1 .. py:attribute:: min_child_weight_max :type: float :value: 100 .. py:attribute:: sub_sample_min :type: float :value: 0.1 .. py:attribute:: sub_sample_max :type: float :value: 1.0 .. py:attribute:: col_sample_by_tree_min :type: float :value: 0.1 .. py:attribute:: col_sample_by_tree_max :type: float :value: 1.0 .. py:attribute:: col_sample_by_level_min :type: float :value: 1.0 .. py:attribute:: col_sample_by_level_max :type: float :value: 1.0 .. py:attribute:: max_bin_min :type: int :value: 128 .. py:attribute:: max_bin_max :type: int :value: 1025 .. py:attribute:: eta_min :type: float :value: 0.001 .. py:attribute:: eta_max :type: float :value: 0.3 .. py:attribute:: steps_min :type: int :value: 1000 .. py:attribute:: steps_max :type: int :value: 1000 .. py:attribute:: verbosity_during_hyperparameter_tuning :type: int :value: 0 .. py:attribute:: verbosity_during_final_model_training :type: int :value: 0 .. py:attribute:: booster :type: List[str] :value: ['gbtree'] .. py:attribute:: grow_policy :type: List[str] :value: ['depthwise', 'lossguide'] .. py:attribute:: tree_method :type: List[str] :value: ['exact', 'approx', 'hist'] .. py:attribute:: xgboost_objective :type: str :value: 'reg:squarederror' .. py:attribute:: xgboost_eval_metric :type: str :value: 'rmse' .. py:attribute:: xgboost_eval_metric_tune_direction :type: Literal[minimize, maximize] :value: 'minimize' .. py:class:: XgboostFinalParamConfig Define final hyper parameters. .. py:attribute:: params .. py:attribute:: sample_weight :type: Optional[Dict[str, float]] .. py:attribute:: classification_threshold :type: float :value: 0.5 .. py:class:: XgboostRegressionFinalParamConfig Define final hyper parameters. .. py:attribute:: params