:py:mod:`bluecast.eda.data_leakage_checks` ========================================== .. py:module:: bluecast.eda.data_leakage_checks Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: bluecast.eda.data_leakage_checks.detect_leakage_via_correlation bluecast.eda.data_leakage_checks.detect_categorical_leakage .. py:function:: detect_leakage_via_correlation(data: pandas.DataFrame, target_column: Union[str, float, int], threshold: float = 0.9) -> List[Union[str, float, int, None]] Detect data leakage by checking for high correlations between the target column and other columns in the DataFrame. The target column must be part of the provided DataFrame. :param data: The DataFrame containing the data (numerical columns only for features) :param target_column: The name of the target column to check for correlations. :param threshold: The correlation threshold. If the absolute correlation value is greater than or equal to this threshold, it will be considered as a potential data leakage. :returns: True if data leakage is detected, False if not. .. py:function:: detect_categorical_leakage(data: pandas.DataFrame, target_column: Union[str, float, int], threshold: float = 0.9) -> List[Union[str, float, int, None]] Detect data leakage by calculating Theil's U for categorical variables with respect to the target. The target column must be part of the provided DataFrame. :param data: The DataFrame containing the data. :param target_column: The name of the target column. :param threshold: The threshold for Theil's U. Columns with U greater than or equal to this threshold will be considered potential data leakage. :returns: A list of column names with Theil's U greater than or equal to the threshold.