bluecast.preprocessing.feature_selection

Module Contents

Classes

BoostaRootaWrapper

BoostARoota

Functions

_create_shadow(x_train)

Take all X variables, creating copies and randomly shuffling them

_reduce_vars_xgb(x, y, metric, this_round, cutoff, ...)

Function to run through each

_reduce_vars_sklearn(x, y, clf, this_round, cutoff, ...)

Function to run through each

_BoostARoota(x, y, metric, clf, cutoff, iters, ...)

Function loops through, waiting for the stopping criteria to change

class bluecast.preprocessing.feature_selection.BoostaRootaWrapper(class_problem: Literal[binary, multiclass, regression], random_state)

Bases: bluecast.preprocessing.custom.CustomPreprocessing

fit_transform(df: pandas.DataFrame, targets: pandas.Series) Tuple[pandas.DataFrame, pandas.Series | None]
transform(df: pandas.DataFrame, target: pandas.Series | None = None, predicton_mode: bool = False) Tuple[pandas.DataFrame, pandas.Series | None]
class bluecast.preprocessing.feature_selection.BoostARoota(metric=None, clf=None, cutoff=200, iters=10, max_rounds=100, delta=0.1, silent=True)

Bases: object

fit(x, y)
transform(x)
fit_transform(x, y)
bluecast.preprocessing.feature_selection._create_shadow(x_train)

Take all X variables, creating copies and randomly shuffling them :param x_train: the dataframe to create shadow features on :return: dataframe 2x width and the names of the shadows for removing later

bluecast.preprocessing.feature_selection._reduce_vars_xgb(x, y, metric, this_round, cutoff, n_iterations, delta, silent)

Function to run through each :param x: Input dataframe - X :param y: Target variable :param metric: Metric to optimize in XGBoost :param this_round: Round so it can be printed to screen :return: tuple - stopping criteria and the variables to keep

bluecast.preprocessing.feature_selection._reduce_vars_sklearn(x, y, clf, this_round, cutoff, n_iterations, delta, silent)

Function to run through each :param x: Input dataframe - X :param y: Target variable :param clf: the fully specified classifier passed in by user :param this_round: Round so it can be printed to screen :return: tuple - stopping criteria and the variables to keep

bluecast.preprocessing.feature_selection._BoostARoota(x, y, metric, clf, cutoff, iters, max_rounds, delta, silent)

Function loops through, waiting for the stopping criteria to change :param x: X dataframe One Hot Encoded :param y: Labels for the target variable :param metric: The metric to optimize in XGBoost :return: names of the variables to keep