bluecast.eda.analyse

Module Contents

Functions

_create_data_hash(→ str)

Create a hash from DataFrame and additional arguments for caching.

_cached_plot_computation(func)

Decorator to cache expensive plot computations.

_entropy_fallback(p_x)

Fallback implementation for entropy calculation when scipy is not available.

find_bind_with_with_freedman_diaconis(data)

plot_pie_chart(→ plotly.graph_objects.Figure)

Create a pie chart with labels, sizes, and optional explosion.

plot_count_pair(→ plotly.graph_objects.Figure)

Compare the counts between two DataFrames of the chosen provided categorical column.

plot_count_pairs(→ None)

Compare the counts between two DataFrames of each categorical column in the provided list.

univariate_plots(→ None)

Plots univariate plots for all the columns in the dataframe. Only numerical columns are expected.

bi_variate_plots(→ None)

Plots bivariate plots for all column combinations in the dataframe.

correlation_heatmap(→ plotly.graph_objects.Figure)

Plots half of the heatmap showing correlations of all features.

correlation_to_target(→ plotly.graph_objects.Figure)

Plots correlations for all the columns in the dataframe in relation to the target column.

plot_against_target_for_regression(...)

Creates scatter plots for each column in num_columns against the target_col.

plot_pca(→ plotly.graph_objects.Figure)

Plots PCA for the dataframe. The target column must be part of the provided DataFrame.

plot_pca_cumulative_variance(→ plotly.graph_objects.Figure)

Plot the cumulative variance of principal components.

plot_pca_biplot(→ plotly.graph_objects.Figure)

Plots PCA biplot for the dataframe.

plot_tsne(→ plotly.graph_objects.Figure)

Plots t-SNE for the dataframe. The target column must be part of the provided DataFrame.

conditional_entropy(x, y)

theil_u(x, y)

plot_theil_u_heatmap(→ plotly.graph_objects.Figure)

Plot a heatmap for categorical data using Theil's U.

plot_null_percentage(→ plotly.graph_objects.Figure)

Plot the percentage of null values in each column.

check_unique_values(→ List[Union[str, int, float]])

Check if the columns have an amount of unique values that is almost the number of total rows (being above the defined threshold)

plot_classification_target_distribution_within_categories(→ None)

Plot distribution of target across categorical features.

mutual_info_to_target(→ plotly.graph_objects.Figure)

Plots mutual information scores for all the categorical columns in the DataFrame in relation to the target column.

plot_ecdf(→ Union[plotly.graph_objects.Figure, ...)

Plot the empirical cumulative density function (ECDF) and histogram.

plot_distribution_by_time(→ plotly.graph_objects.Figure)

Plot the distribution of a feature over time.

plot_error_distributions(→ None)

Plots bivariate plots for each column in the dataframe with respect to the target.

plot_andrews_curve(→ plotly.graph_objects.Figure)

Plot Andrews curve.

plot_distribution_pairs(→ plotly.graph_objects.Figure)

Compare distributions of two datasets for a given feature.

plot_benfords_law(→ plotly.graph_objects.Figure)

Plot Benford's Law analysis for a numerical column.

_create_gradient_bar_chart(→ plotly.graph_objects.Figure)

Create a beautiful gradient bar chart for category frequencies.

plot_category_frequency(→ plotly.graph_objects.Figure)

Create a beautiful category frequency visualization for categorical/text data.

plot_missing_values_matrix(→ plotly.graph_objects.Figure)

Create a missing values matrix visualization.

_dashboard_update_plot(plot_type, selected_feature, ...)

Helper function for dashboard plot updates.

_dashboard_update_summary(selected_feature, df, ...[, ...])

Helper function for dashboard summary updates with dark theme styling.

_apply_pandas_query_filter(→ pandas.DataFrame)

Apply SQL-like filtering using pandas query syntax and operations.

_create_outlier_detection_plot(...)

Create IsolationForest outlier detection plot showing outlier scores and top outliers.

_create_benford_plot(→ plotly.graph_objects.Figure)

Create Benford's Law analysis plot for regression dashboard.

_create_category_frequency_plot(...)

Create category frequency plot for regression dashboard.

_create_violin_plot(→ plotly.graph_objects.Figure)

Create violin plot by target bins for regression dashboard.

_create_theil_u_plot(→ plotly.graph_objects.Figure)

Create Theil U heatmap for categorical features including the target.

_create_ecdf_plot(→ plotly.graph_objects.Figure)

Create ECDF analysis plot for dashboard.

_dashboard_update_regression_plot(plot_type, ...)

Helper function for regression dashboard plot updates with dark theme styling.

_create_benford_plot_classification(...)

Create Benford's Law analysis plot for classification dashboard.

_create_category_frequency_plot_classification(...)

Create category frequency plot for classification dashboard.

_dashboard_update_classification_plot(plot_type, ...)

Helper function for classification dashboard plot updates with dark theme styling.

create_eda_dashboard_regression(df, target_col[, ...])

Create a Dash dashboard for regression analysis with enhanced features.

create_eda_dashboard_classification(df, target_col[, ...])

Create a Dash dashboard for classification analysis with enhanced features.

create_eda_dashboard(df, target_col[, port, ...])

Create a Dash dashboard for exploratory data analysis.

Attributes

HAS_ISOLATION_FOREST

HAS_SHAP

HAS_PANDAS_QUERY

HAS_WORDCLOUD

HAS_SCIPY

HAS_STATSMODELS

_plot_cache

bluecast.eda.analyse.HAS_ISOLATION_FOREST = True
bluecast.eda.analyse.HAS_SHAP = True
bluecast.eda.analyse.HAS_PANDAS_QUERY = True
bluecast.eda.analyse.HAS_WORDCLOUD = True
bluecast.eda.analyse.HAS_SCIPY = True
bluecast.eda.analyse.HAS_STATSMODELS = True
bluecast.eda.analyse._plot_cache: Dict[str, Any]
bluecast.eda.analyse._create_data_hash(df: pandas.DataFrame, *args) str

Create a hash from DataFrame and additional arguments for caching.

bluecast.eda.analyse._cached_plot_computation(func)

Decorator to cache expensive plot computations.

bluecast.eda.analyse._entropy_fallback(p_x)

Fallback implementation for entropy calculation when scipy is not available. Uses natural logarithm to match scipy.stats.entropy behavior.

Parameters:

p_x – List of probabilities

Returns:

Shannon entropy (using natural logarithm)

bluecast.eda.analyse.find_bind_with_with_freedman_diaconis(data: numpy.ndarray)
bluecast.eda.analyse.plot_pie_chart(df: pandas.DataFrame, column: str, explode: List[float] | None = None, colors: List[str] | None = None, show: bool = True) plotly.graph_objects.Figure

Create a pie chart with labels, sizes, and optional explosion.

Parameters: - df: Pandas DataFrame holding the column of interest - column: The column to be plotted - explode: (Optional) List of numerical values (not used in plotly version) - colors: (Optional) List with hexadecimal representations of colors in the RGB color model - show: Whether to display the plot

Returns: - plotly.graph_objects.Figure: The pie chart figure

bluecast.eda.analyse.plot_count_pair(df_1: pandas.DataFrame, df_2: pandas.DataFrame, df_aliases: List[str] | None, feature: str, order: List[str] | None = None, palette: List[str] | None = None, show: bool = True) plotly.graph_objects.Figure

Compare the counts between two DataFrames of the chosen provided categorical column.

Parameters:
  • df_1 – Pandas DataFrame. I.e.: df_1 dataset

  • df_2 – Pandas DataFrame. I.e.: Test dataset

  • df_aliases – List with names of DataFrames that shall be shown on the count plots to represent them. Format: [df_1 representation, df_2 representation]

  • feature – String indicating categorical column to plot

  • order – List with category names to define the order they appear in the plot

  • palette – List with hexadecimal representations of colors in the RGB color model

  • show – Whether to display the plot

Returns: - plotly.graph_objects.Figure: The count plot figure

bluecast.eda.analyse.plot_count_pairs(df_1: pandas.DataFrame, df_2: pandas.DataFrame, cat_cols: List[str], df_aliases: List[str] | None = None, palette: List[str] | None = None) None

Compare the counts between two DataFrames of each categorical column in the provided list.

Parameters:
  • df_1 – Pandas DataFrame. I.e.: Train dataset

  • df_2 – Pandas DataFrame. I.e.: Test dataset

  • df_aliases – List with names of DataFrames that shall be shown on the count plots to represent them. Format: [df_1 representation, df_2 representation]

  • cat_cols – List with strings indicating categorical column names to plot

  • palette – List with hexadecimal representations of colors in the RGB color model

bluecast.eda.analyse.univariate_plots(df: pandas.DataFrame, col_requires_at_least_n_values: int = 5) None

Plots univariate plots for all the columns in the dataframe. Only numerical columns are expected. The target column does not need to be part of the provided DataFrame.

Expects numeric columns only. The number of bins will be determined using the Freedman-Diaconis rule.

Parameters:
  • df – DataFrame holding the features.

  • col_requires_at_least_n_values – Minimum number of unique values required to plot the feature. If number of unique features is less, the column will be skipped.

bluecast.eda.analyse.bi_variate_plots(df: pandas.DataFrame, target: str, num_cols_grid: int = 4) None

Plots bivariate plots for all column combinations in the dataframe. The target column must be part of the provided DataFrame. Param num_cols_grid specifies how many columns the grid shall have.

Expects numeric columns only.

bluecast.eda.analyse.correlation_heatmap(df: pandas.DataFrame, show: bool = True) plotly.graph_objects.Figure

Plots half of the heatmap showing correlations of all features.

Expects numeric columns only.

Returns: - plotly.graph_objects.Figure: The correlation heatmap figure

bluecast.eda.analyse.correlation_to_target(df: pandas.DataFrame, target: str, show: bool = True) plotly.graph_objects.Figure

Plots correlations for all the columns in the dataframe in relation to the target column. The target column must be part of the provided DataFrame.

Expects numeric columns only.

Returns: - plotly.graph_objects.Figure: The correlation to target figure

bluecast.eda.analyse.plot_against_target_for_regression(df: pandas.DataFrame, num_columns: List[int | float | str], target_col: str, show: bool = True) plotly.graph_objects.Figure

Creates scatter plots for each column in num_columns against the target_col. Draws a regression line and shows statistical information.

If statsmodels is available: Uses OLS regression and shows p-values. If statsmodels is unavailable: Uses numpy linear regression and shows correlation coefficients.

Parameters: - df: pd.DataFrame -> The input dataframe containing the data. - num_columns: List[Union[int, float, str]] -> List of column names to plot against the target column. - target_col: str -> The target column name for regression. - show: Whether to display the plot

Returns: - plotly.graph_objects.Figure: The regression plots figure

bluecast.eda.analyse.plot_pca(df: pandas.DataFrame, target: str, scale_data: bool = True, show: bool = True) plotly.graph_objects.Figure

Plots PCA for the dataframe. The target column must be part of the provided DataFrame.

Handles missing values by dropping rows with any NaN values before PCA.

Expects numeric columns only. :param df: Pandas DataFrame. Should include the target variable. :param target: String indicating the target column. :param scale_data: If true, standard scaling will be performed before applying PCA, otherwise the raw data is used. :param show: Whether to display the plot

Returns: - plotly.graph_objects.Figure: The PCA plot figure

bluecast.eda.analyse.plot_pca_cumulative_variance(df: pandas.DataFrame, scale_data: bool = True, n_components: int = 10, show: bool = True) plotly.graph_objects.Figure

Plot the cumulative variance of principal components.

Handles missing values by dropping rows with any NaN values before PCA.

Parameters:
  • df – Pandas DataFrame. Should not include the target variable.

  • scale_data – If true, standard scaling will be performed before applying PCA, otherwise the raw data is used.

  • n_components – Number of total components to compute.

  • show – Whether to display the plot

Returns: - plotly.graph_objects.Figure: The PCA cumulative variance figure

bluecast.eda.analyse.plot_pca_biplot(df: pandas.DataFrame, target: str, scale_data: bool = True, show: bool = True) plotly.graph_objects.Figure

Plots PCA biplot for the dataframe.

Handles missing values by dropping rows with any NaN values before PCA.

Expects numeric columns only.

Parameters:
  • df – Pandas DataFrame.

  • target – String indicating the target column. Will be dropped if part of the DataFrame.

  • scale_data – If true, standard scaling will be performed before applying PCA, otherwise the raw data is used.

  • show – Whether to display the plot

Returns: - plotly.graph_objects.Figure: The PCA biplot figure

bluecast.eda.analyse.plot_tsne(df: pandas.DataFrame, target: str, perplexity=50, random_state=42, scale_data: bool = True, show: bool = True) plotly.graph_objects.Figure

Plots t-SNE for the dataframe. The target column must be part of the provided DataFrame.

Expects numeric columns only. :param df: Pandas DataFrame. Should include the target variable. :param target: String indicating which column is the target column. Must be part of the provided DataFrame. :param perplexity: The perplexity parameter for t-SNE :param random_state: The random state for t-SNE :param scale_data: If true, standard scaling will be performed before applying t-SNE, otherwise the raw data is used. :param show: Whether to display the plot

Returns: - plotly.graph_objects.Figure: The t-SNE plot figure

bluecast.eda.analyse.conditional_entropy(x, y)
bluecast.eda.analyse.theil_u(x, y)
bluecast.eda.analyse.plot_theil_u_heatmap(data: pandas.DataFrame, columns: List[str | int | float], show: bool = True) plotly.graph_objects.Figure

Plot a heatmap for categorical data using Theil’s U.

Returns: - plotly.graph_objects.Figure: The Theil’s U heatmap figure

bluecast.eda.analyse.plot_null_percentage(dataframe: pandas.DataFrame, show: bool = True) plotly.graph_objects.Figure

Plot the percentage of null values in each column.

Returns: - plotly.graph_objects.Figure: The null percentage plot figure

bluecast.eda.analyse.check_unique_values(df: pandas.DataFrame, columns: List[str | int | float], threshold: float = 0.9) List[str | int | float]

Check if the columns have an amount of unique values that is almost the number of total rows (being above the defined threshold)

Parameters:
  • df – The pandas DataFrame to check

  • columns – A list of column names to check

  • threshold – The threshold to check against

Returns:

A list of column names that have a high amount of unique values

bluecast.eda.analyse.plot_classification_target_distribution_within_categories(df: pandas.DataFrame, cat_columns: List[str], target_col: str) None

Plot distribution of target across categorical features.

This suitable for classification tasks only. :param df: Pandas dataFrame. Must include the target column. :param cat_columns: List of categorical column names. :param target_col: String indicating the target column name. :return:

bluecast.eda.analyse.mutual_info_to_target(df: pandas.DataFrame, target: str, class_problem: Literal[binary, multiclass, regression], show: bool = True, **mut_params) plotly.graph_objects.Figure

Plots mutual information scores for all the categorical columns in the DataFrame in relation to the target column. The target column must be part of the provided DataFrame. :param df: DataFrame containing all columns including target column. Features are expected to be numerical. :param target: String indicating which column is the target column. :param class_problem: Any of [“binary”, “multiclass”, “regression”] :param show: Whether to display the plot :param mut_params: Dictionary passing additional arguments into sklearn’s mutual_info_classif function.

Returns: - plotly.graph_objects.Figure: The mutual information plot figure

bluecast.eda.analyse.plot_ecdf(df: pandas.DataFrame, columns: List[str | int | float], plot_all_at_once: bool = False, show: bool = True) plotly.graph_objects.Figure | List[plotly.graph_objects.Figure]

Plot the empirical cumulative density function (ECDF) and histogram.

Parameters:
  • df – DataFrame containing all columns including target column. Features are expected to be numerical.

  • columns – A list of column names to check.

  • plot_all_at_once – If True, plot all eCDFs in one plot. If False, plot each eCDF separately.

  • show – Whether to display the plot

Returns: - plotly.graph_objects.Figure or List[plotly.graph_objects.Figure]: The ECDF figure(s)

bluecast.eda.analyse.plot_distribution_by_time(df: pandas.DataFrame, col_to_plot: str, date_col: str, xlabel: str = 'Week', ylabel: str = 'Feature distribution', title: str = 'Weekly distribution of the feature', freq: str = 'W', show: bool = True) plotly.graph_objects.Figure

Plot the distribution of a feature over time.

Parameters:
  • df – Pandas DataFrame

  • col_to_plot – String indicating which column to plot

  • date_col – String indicating which column to use as date

  • xlabel – String indicating the x-axis label

  • ylabel – String indicating the y-axis label

  • title – String indicating the title of the plot

  • freq – Label indicating the frequency of the time grouping. Must be one of Pandas’ Offset aliases.

  • show – Whether to display the plot

Returns:

plotly.graph_objects.Figure: The time distribution figure

bluecast.eda.analyse.plot_error_distributions(df: pandas.DataFrame, target: str, prediction_error: str, num_cols_grid: int = 1, max_x_elements: int = 5) None

Plots bivariate plots for each column in the dataframe with respect to the target. Each subplot represents unique values of the target column. The ‘prediction_error’ is plotted using unique values of the target column as the hue. Param num_cols_grid specifies how many columns the grid shall have. max_x_elements determines the maximum number of unique values on the x-axis per plot.

bluecast.eda.analyse.plot_andrews_curve(df: pandas.DataFrame, target: str, n_samples: int | None = 200, random_state=500, show: bool = True) plotly.graph_objects.Figure

Plot Andrews curve.

Andrews Curve helps visualize if there are inherent groupings of the numerical features based on a given grouping.

Parameters:
  • df – Pandas DataFrame

  • target – String indicating the target column

  • n_samples – Int indicating how many samples shall be shown. If None, the full DataFrame is taken.

  • random_state – Random seed determining the DataFrame sampling.

  • show – Whether to display the plot

Returns:

plotly.graph_objects.Figure: The Andrews curve figure

bluecast.eda.analyse.plot_distribution_pairs(df1: pandas.DataFrame, df2: pandas.DataFrame, feature: str, palette: List[str] | None = None, show: bool = True) plotly.graph_objects.Figure

Compare distributions of two datasets for a given feature.

Only the central 95% of the data is considered for the histogram.

Parameters:
  • df1 – DataFrame containing the feature.

  • df2 – Second DataFrame containing the feature for comparison.

  • feature – String indicating the feature name

  • palette – List of colors to use for the plots.

  • show – Whether to display the plot

Returns: - plotly.graph_objects.Figure: The distribution comparison figure

bluecast.eda.analyse.plot_benfords_law(df: pandas.DataFrame, column: str, show: bool = True) plotly.graph_objects.Figure

Plot Benford’s Law analysis for a numerical column.

Benford’s Law states that in many naturally occurring datasets, the leading digit d (d ∈ {1, 2, …, 9}) occurs with probability: P(d) = log10(1 + 1/d)

This is useful for fraud detection and data quality analysis.

Parameters:
  • df – DataFrame containing the data

  • column – Name of the numerical column to analyze

  • show – Whether to display the plot

Returns:

plotly.graph_objects.Figure: The Benford’s Law figure

bluecast.eda.analyse._create_gradient_bar_chart(value_counts: pandas.Series, column: str) plotly.graph_objects.Figure

Create a beautiful gradient bar chart for category frequencies.

Parameters:
  • value_counts – Series with category counts

  • column – Column name for labeling

Returns:

plotly.graph_objects.Figure with gradient bars

bluecast.eda.analyse.plot_category_frequency(df: pandas.DataFrame, column: str, max_categories: int = 20, show: bool = True) plotly.graph_objects.Figure

Create a beautiful category frequency visualization for categorical/text data.

Uses gradient colors for enhanced visual appeal. Falls back from word cloud to gradient bar chart when WordCloud library is unavailable.

Parameters:
  • df – DataFrame containing the data

  • column – Name of the categorical/text column

  • max_categories – Maximum number of categories to display

  • show – Whether to display the plot

Returns:

plotly.graph_objects.Figure: The category frequency figure

bluecast.eda.analyse.plot_missing_values_matrix(df: pandas.DataFrame, show: bool = True) plotly.graph_objects.Figure

Create a missing values matrix visualization.

Parameters:
  • df – DataFrame to analyze

  • show – Whether to display the plot

Returns:

plotly.graph_objects.Figure: The missing values matrix figure

bluecast.eda.analyse._dashboard_update_plot(plot_type: str, selected_feature: str, df: pandas.DataFrame, numeric_cols: List[str], target_col: str)

Helper function for dashboard plot updates.

Parameters:
  • plot_type – Type of plot to create

  • selected_feature – Selected feature for the plot

  • df – DataFrame containing the data

  • numeric_cols – List of numeric column names

  • target_col – Target column name

Returns:

Plotly figure object

bluecast.eda.analyse._dashboard_update_summary(selected_feature: str, df: pandas.DataFrame, numeric_cols: List[str], target_col: str | None = None)

Helper function for dashboard summary updates with dark theme styling. Shows statistics for both selected feature and target column.

Parameters:
  • selected_feature – Selected feature for the summary

  • df – DataFrame containing the data

  • numeric_cols – List of numeric column names

  • target_col – Target column name (optional)

Returns:

HTML div with tables or string message

bluecast.eda.analyse._apply_pandas_query_filter(df: pandas.DataFrame, query_text: str) pandas.DataFrame

Apply SQL-like filtering using pandas query syntax and operations.

Parameters:
  • df – DataFrame to filter

  • query_text – Query text (supports pandas query syntax or simple SQL-like syntax)

Returns:

Filtered DataFrame

bluecast.eda.analyse._create_outlier_detection_plot(df: pandas.DataFrame, target_col: str, dark_theme_layout: dict, contamination: float = 0.1) plotly.graph_objects.Figure

Create IsolationForest outlier detection plot showing outlier scores and top outliers.

Parameters:
  • df – DataFrame containing the data

  • target_col – Target column name

  • dark_theme_layout – Dark theme layout configuration

  • contamination – Expected proportion of outliers

Returns:

Plotly figure

bluecast.eda.analyse._create_benford_plot(selected_feature_x: str, df: pandas.DataFrame, numeric_cols: List[str], dark_theme_layout: dict) plotly.graph_objects.Figure

Create Benford’s Law analysis plot for regression dashboard.

bluecast.eda.analyse._create_category_frequency_plot(selected_feature_x: str, df: pandas.DataFrame, dark_theme_layout: dict) plotly.graph_objects.Figure

Create category frequency plot for regression dashboard.

bluecast.eda.analyse._create_violin_plot(selected_feature_x: str, df: pandas.DataFrame, target_col: str, dark_theme_layout: dict) plotly.graph_objects.Figure

Create violin plot by target bins for regression dashboard.

bluecast.eda.analyse._create_theil_u_plot(df: pandas.DataFrame, target_col: str, dark_theme_layout: dict, is_regression: bool = True) plotly.graph_objects.Figure

Create Theil U heatmap for categorical features including the target.

bluecast.eda.analyse._create_ecdf_plot(selected_feature_x: str, df: pandas.DataFrame, numeric_cols: List[str], dark_theme_layout: dict) plotly.graph_objects.Figure

Create ECDF analysis plot for dashboard.

bluecast.eda.analyse._dashboard_update_regression_plot(plot_type: str, selected_feature_x: str, selected_feature_y: str, df: pandas.DataFrame, numeric_cols: List[str], target_col: str)

Helper function for regression dashboard plot updates with dark theme styling.

bluecast.eda.analyse._create_benford_plot_classification(selected_feature_x: str, df: pandas.DataFrame, numeric_cols: List[str], dark_theme_layout: dict) plotly.graph_objects.Figure

Create Benford’s Law analysis plot for classification dashboard.

bluecast.eda.analyse._create_category_frequency_plot_classification(selected_feature_x: str, df: pandas.DataFrame, categorical_cols: List[str], dark_theme_layout: dict) plotly.graph_objects.Figure

Create category frequency plot for classification dashboard.

bluecast.eda.analyse._dashboard_update_classification_plot(plot_type: str, selected_feature_x: str, selected_feature_y: str, df: pandas.DataFrame, numeric_cols: List[str], categorical_cols: List[str], target_col: str)

Helper function for classification dashboard plot updates with dark theme styling.

bluecast.eda.analyse.create_eda_dashboard_regression(df: pandas.DataFrame, target_col: str, port: int = 8050, run_server: bool = True, jupyter_mode: str | None = None)

Create a Dash dashboard for regression analysis with enhanced features.

Parameters:
  • df – DataFrame to analyze

  • target_col – Target column name (should be numeric for regression)

  • port – Port number for the dashboard

  • run_server – Whether to start the server (set to False for testing)

  • jupyter_mode – Mode for Jupyter environments (“inline”, “external”, “tab”, “jupyterlab”) If None, runs as regular server. For Kaggle/Colab use “external”

bluecast.eda.analyse.create_eda_dashboard_classification(df: pandas.DataFrame, target_col: str, port: int = 8050, run_server: bool = True, jupyter_mode: str | None = None)

Create a Dash dashboard for classification analysis with enhanced features.

Parameters:
  • df – DataFrame to analyze

  • target_col – Target column name (should be categorical for classification)

  • port – Port number for the dashboard

  • run_server – Whether to start the server (set to False for testing)

  • jupyter_mode – Mode for Jupyter environments (“inline”, “external”, “tab”, “jupyterlab”) If None, runs as regular server. For Kaggle/Colab use “external”

bluecast.eda.analyse.create_eda_dashboard(df: pandas.DataFrame, target_col: str, port: int = 8050, run_server: bool = True, jupyter_mode: str | None = None)

Create a Dash dashboard for exploratory data analysis.

Parameters:
  • df – DataFrame to analyze

  • target_col – Target column name

  • port – Port number for the dashboard

  • run_server – Whether to start the server (set to False for testing)

  • jupyter_mode – Mode for Jupyter environments (“inline”, “external”, “tab”, “jupyterlab”) If None, runs as regular server. For Kaggle/Colab use “external”