bluecast.preprocessing.feature_types¶
Feature type detection and casting.
This is a convenience class to detect and cast feature types in a DataFrame. It can be used to detect numerical, categorical and datetime columns. It also casts columns to a specific type.
Module Contents¶
Classes¶
Detect and cast feature types in DataFrame. |
- class bluecast.preprocessing.feature_types.FeatureTypeDetector(num_columns: List[str | int | float] | None = None, cat_columns: List[str | int | float] | None = None, date_columns: List[str | int | float] | None = None, all_null_cols: List[str | int | float] | None = None, zero_var_cols: List[str | int | float] | None = None)¶
Detect and cast feature types in DataFrame.
Column names for individual feature types can be provided. Otherwise types will be inferred and casted accordingly.
- fit_transform_drop_all_null_columns(df: pandas.DataFrame) pandas.DataFrame¶
Drop all columns with only null values.
- transform_drop_all_null_columns(df: pandas.DataFrame) pandas.DataFrame¶
Drop all columns with only null values.
- fit_transform_drop_zero_variance_columns(df: pandas.DataFrame) pandas.DataFrame¶
Drop all columns with only one unique value.
- transform_drop_zero_variance_columns(df: pandas.DataFrame) pandas.DataFrame¶
Drop all columns with only one unique value.
- check_if_column_is_int_from_string(col: pandas.Series) bool¶
Check if column contains any ints or strings that can be cast to ints.
- check_if_column_is_float_from_string(col: pandas.Series) bool¶
Check if column contains any floats or strings that can be cast to floats.
- check_if_column_is_int(col: pandas.Series) bool¶
Check if column is float.
- check_if_column_is_float(col: pandas.Series) bool¶
Check if column is float.
- identify_num_columns(df: pandas.DataFrame) pandas.DataFrame¶
Identify numerical columns based on already existing data type.
- identify_bool_columns(df: pandas.DataFrame) Tuple[List[str | float | int], List[str | float | int]]¶
Identify boolean columns based on data type
- identify_date_time_columns(df: pandas.DataFrame, no_bool_cols: List[str | float | int])¶
Try casting to datetime. Expected is a datetime format of YYYY-MM-DD
- cast_rest_columns_to_object(df: pandas.DataFrame, bool_cols: List[str | float | int]) pandas.DataFrame¶
Treat remaining columns.
Takes remaining columns and tries to cast them as numerical. If not successful, then columns are assumed to be categorical.
- fit_transform_feature_types(df: pandas.DataFrame) pandas.DataFrame¶
Identify and transform feature types.
Wrapper function to orchester different detection methods.
- transform_feature_types(df: pandas.DataFrame, ignore_cols: List[str | float | int | None]) pandas.DataFrame¶
Transform feature types based on already mapped types.