bluecast.preprocessing.feature_types

Feature type detection and casting.

This is a convenience class to detect and cast feature types in a DataFrame. It can be used to detect numerical, categorical and datetime columns. It also casts columns to a specific type.

Module Contents

Classes

FeatureTypeDetector

Detect and cast feature types in DataFrame.

class bluecast.preprocessing.feature_types.FeatureTypeDetector(num_columns: List[str | int | float] | None = None, cat_columns: List[str | int | float] | None = None, date_columns: List[str | int | float] | None = None, all_null_cols: List[str | int | float] | None = None, zero_var_cols: List[str | int | float] | None = None)

Detect and cast feature types in DataFrame.

Column names for individual feature types can be provided. Otherwise types will be inferred and casted accordingly.

fit_transform_drop_all_null_columns(df: pandas.DataFrame) pandas.DataFrame

Drop all columns with only null values.

transform_drop_all_null_columns(df: pandas.DataFrame) pandas.DataFrame

Drop all columns with only null values.

fit_transform_drop_zero_variance_columns(df: pandas.DataFrame) pandas.DataFrame

Drop all columns with only one unique value.

transform_drop_zero_variance_columns(df: pandas.DataFrame) pandas.DataFrame

Drop all columns with only one unique value.

check_if_column_is_int_from_string(col: pandas.Series) bool

Check if column contains any ints or strings that can be cast to ints.

check_if_column_is_float_from_string(col: pandas.Series) bool

Check if column contains any floats or strings that can be cast to floats.

check_if_column_is_int(col: pandas.Series) bool

Check if column is float.

check_if_column_is_float(col: pandas.Series) bool

Check if column is float.

identify_num_columns(df: pandas.DataFrame) pandas.DataFrame

Identify numerical columns based on already existing data type.

identify_bool_columns(df: pandas.DataFrame) Tuple[List[str | float | int], List[str | float | int]]

Identify boolean columns based on data type

identify_date_time_columns(df: pandas.DataFrame, no_bool_cols: List[str | float | int])

Try casting to datetime. Expected is a datetime format of YYYY-MM-DD

cast_rest_columns_to_object(df: pandas.DataFrame, bool_cols: List[str | float | int]) pandas.DataFrame

Treat remaining columns.

Takes remaining columns and tries to cast them as numerical. If not successful, then columns are assumed to be categorical.

fit_transform_feature_types(df: pandas.DataFrame) pandas.DataFrame

Identify and transform feature types.

Wrapper function to orchester different detection methods.

transform_feature_types(df: pandas.DataFrame, ignore_cols: List[str | float | int | None]) pandas.DataFrame

Transform feature types based on already mapped types.