bluecast.preprocessing.infrequent_categories

Infrequent categories may cause overfitting.

This module groups infrequent categories into a common group to reduce the risk of overfitting

Module Contents

Classes

InFrequentCategoryEncoder

Group infrequent categories into common group.

class bluecast.preprocessing.infrequent_categories.InFrequentCategoryEncoder(cat_columns: List[str | float | int], target_col: str | float | int, infrequent_threshold: int = 5)

Group infrequent categories into common group.

fit_transform(x: pandas.DataFrame, y: pandas.Series) pandas.DataFrame

Find infrequent categories and transform column.

transform(x: pandas.DataFrame) pandas.DataFrame

Transform categories based on already explored frequencies.