explorica.reports.presets.blocks
explorica.reports.presets.blocks.cardinality
Data cardinality block preset.
Provides a summary of feature cardinality, uniqueness, and constancy characteristics as an Explorica Block. The block helps identify constant, unique, and low-information features using multiple complementary metrics.
Functions
- get_cardinality_block(data, round_digits=4, nan_policy=”drop”)
Build a Block instance summarizing feature cardinality and constancy metrics.
Notes
Cardinality is described using both absolute and relative measures, including number of unique values and their ratios.
Constant and unique features are flagged explicitly via boolean indicators.
Entropy is reported in normalized form to allow comparison across features with different cardinalities.
Examples
>>> import pandas as pd
>>> from explorica.reports.presets import get_cardinality_block
>>> df = pd.DataFrame({
... 'a': [1, 1, 1, 1],
... 'b': [1, 2, 3, 4],
... 'c': ['x', 'x', 'y', 'y']
... })
>>> block = get_cardinality_block(df)
>>> block.block_config.title
'Cardinality'
>>> [table.title for table in block.block_config.tables]
['Constancy | uniqueness metrics']
>>> block.close_figures()
- explorica.reports.presets.blocks.cardinality.get_cardinality_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], round_digits: int = 4, nan_policy: str | Literal['drop_with_split', 'raise', 'include'] = 'drop_with_split') Block[source]
Generate a Block summarizing feature cardinality and constancy metrics.
This block provides an overview of uniqueness and constant-like behavior in the dataset. It is intended for exploratory data analysis and data quality assessment, helping identify features with low variability, redundant values, or high uniqueness.
The block contains a single table with the following columns:
is_unique : indicates if a feature has all unique values.
is_constant : indicates if a feature has a single unique value.
n_unique : number of unique values in the feature.
unique_ratio : ratio of unique values to the number of rows.
top_value_ratio : proportion of the most frequent value.
entropy (normalized) : Shannon entropy of the feature normalized by log2(n_unique), measuring information content and effective cardinality.
- Parameters:
- dataSequence[Any] or Mapping[str, Sequence[Any]]
Input dataset. Must be convertible to a pandas DataFrame. Both numeric and categorical columns are supported.
- round_digitsint, default=4
Number of decimal places to round ratio and entropy metrics.
- nan_policy{‘drop’, ‘raise’, ‘include’}, default=’drop’
Policy for handling missing values:
‘drop_with_split’ : Missing values are handled independently for each feature. For every column, NaNs are dropped column-wise before computing statistics. As a result, different features may be evaluated on different numbers of observations. This behavior is semantically correct in an EDA context, where preserving per-feature statistics is preferred over strict row-wise alignment.
‘raise’: raise an error if any missing values are present.
‘include’: treat NaN as a valid category (counts towards uniqueness and entropy calculations).
- Returns:
- Block
An Explorica Block containing a single table with cardinality and constancy metrics for each feature.
Notes
Features with zero variance or all missing values will appear as NaN in relevant metrics.
This block complements data quality overview blocks by providing a deeper view of feature redundancy and variability.
Examples
>>> import pandas as pd >>> from explorica.reports.presets import get_cardinality_block >>> df = pd.DataFrame({ ... "A": [1, 2, 3, 4], ... "B": [5, 5, 5, 5], ... "C": [1, 2, 2, 3], ... "D": [None, 1, None, 1] ... }) >>> block = get_cardinality_block(df, nan_policy="include") >>> block.block_config.tables[0].table is_unique is_constant ... top_value_ratio entropy (normalized) A True False ... 0.25 1.0000 B False True ... 1.00 NaN C False False ... 0.50 0.9464 D False False ... 0.50 0.0000 [4 rows x 6 columns] >>> block.close_figures()
explorica.reports.presets.blocks.ctm module
Central tendency & dispersion block preset.
Provides a summary of the dataset’s central tendency (mean, median, mode) and dispersion (std, min, max, range) as an Explorica Block.
Functions
- get_ctm_block(data, nan_policy=’drop’, round_digits=4)
Build a Block instance containing tables of basic statistics for the dataset.
Notes
Designed for quick, high-level overview in Explorica reports.
Mode calculation includes categorical columns.
Examples
>>> import pandas as pd
>>> from explorica.reports.presets import get_ctm_block
>>> df = pd.DataFrame({'a': [1,2,3], 'b': ['x','y','z']})
>>> block = get_ctm_block(df)
>>> block.block_config.title
'Basic statistics for the dataset'
>>> [table.title for table in block.block_config.tables]
['Central tendency measures', None, 'Dispersion measures']
>>> block.close_figures()
- explorica.reports.presets.blocks.ctm.get_ctm_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], nan_policy: Literal['drop_with_split', 'raise'] = 'drop_with_split', round_digits: int = 4) Block[source]
Generate a Block containing central tendency statistics for a dataset.
This block provides central tendency measures and dispersion measures of a dataset, including mean, mode, median, std, min, max and range.
- Parameters:
- dataSequence or Mapping of sequences
Input dataset. Can be a list of sequences or a dictionary of column names to sequences. Will be converted to a pandas DataFrame internally.
- nan_policy{‘drop_with_split’, ‘raise’}, default=’drop’
Policy to handle missing values:
‘drop_with_split’ : Missing values are handled independently for each feature. For every column, NaNs are dropped column-wise before computing statistics. As a result, different features may be evaluated on different numbers of observations. This behavior is semantically correct in an EDA context, where preserving per-feature statistics is preferred over strict row-wise alignment.
‘raise’ : raise an error if NaNs are present.
- round_digitsint, default=4
Number of decimal places to round numerical results.
- Returns:
- Block
A Block object containing the following tables:
“Central tendency measures”: mean and median for numerical columns.
“Mode”: mode for all columns, including categorical.
“Dispersion measures”: standard deviation, minimum, maximum, and range for numerical columns.
Notes
The function automatically separates numerical and categorical columns.
Examples
>>> from explorica.reports.presets import get_ctm_block >>> data = {'a': [1, 2, 3, 4], 'b': [5, 5, 6, 6]} >>> block = get_ctm_block(data) >>> # Contains central tendency and dispersion tables >>> [table.title for table in block.block_config.tables] ['Central tendency measures', None, 'Dispersion measures'] >>> block.close_figures()
explorica.reports.presets.blocks.data_quality_overview
Data quality overview block preset.
Provides a quick summary of a dataset’s quality, including duplicated rows and counts/ratios of missing values. Designed for a fast, high-level exploratory analysis in Explorica reports.
Functions
- get_data_quality_overview_block(data, round_digits=4)
Build a Block instance containing metrics and a table summarizing duplicated rows and NaN counts/ratios.
Notes
Intended for quick inspection; use data_quality module for more detailed analysis.
round_digits controls numeric precision in NaN ratio calculations.
Examples
>>> import pandas as pd
>>> from explorica.reports.presets import get_data_quality_overview_block
>>> df = pd.DataFrame({'a': [1, None, 2], 'b': ['x', 'y', None]})
>>> block = get_data_quality_overview_block(df)
>>> block.block_config.title
'Data quality quick summary'
>>> [m['name'] for m in block.block_config.metrics]
['Duplicates rows', 'Duplicates ratio']
>>> block.block_config.tables[0].title
"NaN's count & ratio"
>>> block.close_figures()
- explorica.reports.presets.blocks.data_quality_overview.get_data_quality_overview_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], round_digits: int = 4) Block[source]
Generate a quick data quality overview block.
This block provides a concise summary of the dataset’s quality, including duplicated rows and missing values. It is intended for fast exploratory analysis without going into detailed data quality checks.
- Parameters:
- dataSequence[Any] or Mapping[str, Sequence[Any]]
The input dataset. Can be a list of sequences, a dictionary of sequences, or any pandas-compatible structure convertible to a DataFrame.
- round_digitsint, default 4
Number of decimal places to round ratios (e.g., NaN ratios, duplicate ratio).
- Returns:
- Block
An Explorica Block containing:
Metrics:
“Duplicates rows”: number of duplicated rows in the dataset.
“Duplicates ratio”: ratio of duplicated rows to total rows.
Table:
“NaN’s count & ratio”: table summarizing count and ratio of missing values per column.
Notes
The “Duplicates ratio” is np.nan for empty datasets.
NaN ratios and counts are rounded to round_digits decimal places.
This block is meant to give a quick overview and does not replace a detailed data quality analysis.
Examples
>>> import pandas as pd >>> from explorica.reports.presets import get_data_quality_overview_block >>> df = pd.DataFrame({ ... "a": [1, 2, 2, None], ... "b": ["x", "y", "y", "z"] ... }) >>> block = get_data_quality_overview_block(df) >>> metrics = {m['name']: m['value'] for m in block.block_config.metrics} >>> metrics["Duplicates rows"] 1 >>> metrics["Duplicates ratio"] np.float64(0.25) >>> block.block_config.tables[0].table nan_count nan_ratio a 1 0.25 b 0 0.00 >>> block.close_figures()
explorica.reports.presets.blocks.data_shape
Data shape block preset.
Provides a quick summary of the dataset shape, including number of rows, columns, column types, and positional index check.
Functions
- get_data_shape_block(data, nan_policy=’drop’)
Build a Block instance summarizing the dataset shape.
Notes
Designed for quick, high-level overview in Explorica reports.
Examples
>>> import pandas as pd
>>> from explorica.reports.presets import get_data_shape_block
>>> # Simple usage
>>> df = pd.DataFrame({'a': [1,2,3], 'b': ['x','y','z']})
>>> block = get_data_shape_block(df)
>>> block.block_config.title
'Dataset shape'
>>> block.close_figures()
- explorica.reports.presets.blocks.data_shape.get_data_shape_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], nan_policy: Literal['drop', 'raise', 'include'] = 'include') Block[source]
Generate a data shape block.
This block provides an overview of the dataset’s structural properties, including the number of rows and columns, the distribution of column types, and information about the dataset index.
- Parameters:
- dataSequence[Any] or Mapping[str, Sequence[Any]]
The input dataset. Can be a list of sequences, a dictionary of sequences, or a pandas-compatible structure convertible to a DataFrame.
- nan_policy{‘drop’, ‘raise’, ‘include’}, default ‘include’
Policy for handling missing values:
‘drop’ : remove rows with NaN values before computing metrics.
‘raise’: raise an error if NaN values are present.
‘include’: keep rows with NaN values; they do not interfere with computation of structural metrics or column type counts.
- Returns:
- Block
An Explorica Block containing:
Metrics:
“Rows”: number of rows in the dataset.
“Columns”: number of columns in the dataset.
“Index is positional”: boolean indicating if the index behaves as a non-negative integer positional index (unique, integer, starting at 0).
Table:
“Data types”: a table summarizing the count of columns per data type, sorted descending by number of features.
Notes
The “Index is positional” metric uses a heuristic to determine if the index can be interpreted as a simple positional index, which is robust to missing rows or non-consecutive integer indices.
Examples
>>> import pandas as pd >>> from explorica.reports.presets import get_data_shape_block >>> df = pd.DataFrame({ ... 'a': [1, 2, 3], ... 'b': ['x', 'y', 'z'] ... }) >>> block = get_data_shape_block(df) >>> block.block_config.metrics [{'name': 'Rows', 'value': 3, 'description': None}, ...] >>> block.block_config.tables[0].table dtype n_features 0 int64 1 1 object 1 >>> block.close_figures()
explorica.reports.presets.blocks.distributions
Distributions block preset.
Provides an overview of feature distributions as an Explorica Block, including numerical distribution descriptors and visual diagnostics. The block focuses on skewness, kurtosis, normality flags, and per-feature distribution visualizations.
Functions
get_distributions_block(data, threshold_skewness=0.25, threshold_kurtosis=0.25, round_digits=4, nan_policy=”drop”)
Build a Block instance summarizing feature distributions in a dataset.
Notes
The block operates on numerical columns only.
Skewness and excess kurtosis are used to assess distribution shape and deviation from normality.
Normality flags are derived using user-defined skewness and kurtosis thresholds.
Boxplots and distribution plots are rendered per feature; the first visualization in each group provides a group-level title.
Examples
>>> import pandas as pd
>>> from explorica.reports.presets import get_distributions_block
>>> df = pd.DataFrame({
... 'a': [1, 2, 3, 4, 5],
... 'b': [10, 10, 10, 10, 10]
... })
>>> block = get_distributions_block(df)
>>> block.block_config.title
'Distributions'
>>> [table.title for table in block.block_config.tables]
['Skewness and excess kurtosis']
>>> block.close_figures()
- explorica.reports.presets.blocks.distributions.get_distributions_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], threshold_skewness: float = 0.25, threshold_kurtosis: float = 0.25, round_digits: int = 4, nan_policy: Literal['drop_with_split', 'raise'] = 'drop_with_split') Block[source]
Generate a Block summarizing feature distributions in a dataset.
This block provides an overview of numeric features, including:
Skewness and excess kurtosis metrics in a table, with an is_normal flag according to provided thresholds.
Boxplots for all numeric features, plus individual boxplots per feature.
Distribution plots (histograms + optional KDE) for all numeric features.
The block is intended for exploratory data analysis and can be combined with other blocks (e.g., data quality, outliers) in reports.
- Parameters:
- dataSequence[Any] or Mapping[str, Sequence[Any]]
Input dataset. Must be convertible to a pandas DataFrame. Only numeric columns are used for analysis.
- threshold_skewnessfloat, default=0.25
Maximum absolute skewness value for a feature to be considered approximately normal.
- threshold_kurtosisfloat, default=0.25
Maximum absolute excess kurtosis value for a feature to be considered approximately normal.
- round_digitsint, default=4
Number of decimal digits for skewness and kurtosis values in the table.
- nan_policy{‘drop_with_split’, ‘raise’}, default ‘drop’
Policy to handle missing values:
‘drop_with_split’ : Missing values are handled independently for each feature. For every column, NaNs are dropped column-wise before computing statistics. As a result, different features may be evaluated on different numbers of observations. This behavior is semantically correct in an EDA context, where preserving per-feature statistics is preferred over strict row-wise alignment.
‘raise’ : raise an error if NaNs are present.
- Returns:
- Block
An Explorica Block containing:
A table with skewness, excess kurtosis, and is_normal flags for numeric features.
Boxplots for all numeric features and individual boxplots per feature.
Distribution plots (histograms + optional KDE) for all numeric features.
Examples
>>> import pandas as pd >>> from explorica.reports.presets import get_distributions_block >>> # Simple usage >>> df = pd.DataFrame({ ... "a": [1, 2, 3, 4, 5], ... "b": [2, 2, 3, 4, 5] ... }) >>> block = get_distributions_block(df) >>> block.block_config.tables[0].table skewness kurtosis is_normal desc a 0.0000 -1.300 False low-pitched b 0.3632 -1.372 False right-skewed, low-pitched >>> block.close_figures()
explorica.reports.presets.blocks.outliers
Outliers block preset.
Provides an overview of outliers in numerical features as an Explorica Block. The block summarizes outliers detected using multiple statistical methods, allowing comparison of their sensitivity and coverage.
Functions
- get_outliers_block(data, iqr_factor=1.5, zscore_factor=3.0, nan_policy=”drop”)
Build a Block instance containing a table summarizing outliers detected by different methods.
Notes
The block operates on numerical columns only.
Outliers are detected independently for each feature.
The interquartile range (IQR) method uses iqr_factor to control sensitivity to extreme values.
The Z-score method uses zscore_factor as a threshold for standardized deviation.
Examples
>>> import pandas as pd
>>> from explorica.reports.presets import get_outliers_block
>>> df = pd.DataFrame({
... 'a': [1, 2, 3, 100],
... 'b': [10, 11, 12, 13]
... })
>>> block = get_outliers_block(df)
>>> block.block_config.title
'Outliers'
>>> [table.title for table in block.block_config.tables]
['Count of outliers by different detection methods']
>>> block.close_figures()
- explorica.reports.presets.blocks.outliers.get_outliers_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], iqr_factor: float = 1.5, zscore_factor: float = 3.0, nan_policy: Literal['drop_with_split', 'raise'] = 'drop_with_split') Block[source]
Generate a Block summarizing outliers detected by different methods.
This block provides a compact overview of potential outliers in numeric features using multiple detection strategies. Currently, it includes counts of outliers detected by the IQR and Z-score methods.
If features with zero or near-zero variance are present, an additional table is included to explicitly report such features, as outliers cannot exist in constant series.
The resulting block is intended for exploratory data analysis and can be composed with other blocks (e.g., distribution or data quality blocks) in higher-level reports.
- Parameters:
- dataSequence[Any] or Mapping[str, Sequence[Any]]
Input dataset. Must be convertible to a pandas DataFrame. Only numeric columns are considered for outlier detection.
- iqr_factorfloat, default 1.5
Scaling factor used for the IQR-based outlier detection.
- zscore_factorfloat, default 3.0
Threshold (in standard deviations) used for Z-score-based outlier detection.
- nan_policy{‘drop_with_split’, ‘raise’}, default=’drop’
Policy to handle missing values:
‘drop_with_split’ : Missing values are handled independently for each feature. For every column, NaNs are dropped column-wise before computing statistics. As a result, different features may be evaluated on different numbers of observations. This behavior is semantically correct in an EDA context, where preserving per-feature statistics is preferred over strict row-wise alignment.
‘raise’ : raise an error if NaNs are present.
- Returns:
- Block
An Explorica Block containing a single table:
“Count of outliers by different detection methods”: a table indexed by feature name, where each column corresponds to an outlier detection method.
‘Features with zero or near zero variance”’ (optional): A table listing features whose variance is zero or numerically close to zero. This table is included only if such features are detected in the dataset.
Notes
The block is intentionally minimal and currently focuses on outlier counts only. It is designed to be extensible, allowing additional detection methods or related summaries to be added in the future.
Examples
>>> import pandas as pd >>> from explorica.reports.presets.blocks import get_outliers_block >>> df = pd.DataFrame({"x": [1, 2, 3, 100]}) >>> block = get_outliers_block(df) >>> block.block_config.tables[0].table IQR (1.5) Z-Score (3.0σ) x 1 0 >>> block.close_figures()
explorica.reports.presets.blocks.relations_linear
Linear relations block preset.
Provides an overview of linear associations between numeric features and a specified target variable as an Explorica Block.
The block focuses on correlation-based analysis and is intended to be used as part of interaction-focused reports (e.g., Exploratory Data Analysis with a defined target). It summarizes pairwise linear relationships using both Pearson and Spearman correlation coefficients and highlights the strongest correlations involving the target.
Functions
- get_linear_relations_block(data, target, round_digits=4, nan_policy=”drop”)
Build a Block instance summarizing linear relationships in a dataset.
Notes
Only numeric features are included in the analysis.
Pearson correlation captures linear relationships under the assumption of approximately linear dependence.
Spearman correlation captures monotonic relationships and is less sensitive to outliers.
The target variable is included in correlation matrices and is required for ranking the highest correlation pairs.
Correlation significance (p-values) is not included and may be added in future releases.
Examples
>>> import pandas as pd
>>> from explorica.reports.presets import get_linear_relations_block
>>> df = pd.DataFrame({
... "x1": [1, 2, 3, 4],
... "x2": [2, 4, 6, 8],
... "x3": [4, 3, 2, 1]
... })
>>> y = pd.Series([1, 0, 1, 0], name="target")
>>> block = get_linear_relations_block(df, y)
>>> block.block_config.title
'Linear relations'
>>> block.block_config.tables[0].table
X Y coef method
0 x1 target -0.4472 pearson
1 x2 target -0.4472 pearson
2 x3 target 0.4472 pearson
3 x1 target -0.4472 spearman
4 x2 target -0.4472 spearman
>>> block.close_figures()
- explorica.reports.presets.blocks.relations_linear.get_linear_relations_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], target: Sequence[Any] | Mapping[str, Sequence[Any]] = None, sample_size_threshold: NaturalNumber = 5000, round_digits: int = 4, nan_policy: Literal['drop', 'raise'] = 'drop') Block[source]
Generate a Block summarizing linear relationships in a dataset.
This block provides an overview of linear associations between numeric features and a specified target variable. It includes correlation matrices (Pearson and Spearman) and a table of the highest correlation pairs ranked by absolute coefficient values.
- Parameters:
- dataSequence[Any] or Mapping[str, Sequence[Any]]
Input dataset. Must be convertible to a pandas DataFrame. Only numeric columns are considered.
- targetSequence[Any] or Mapping[str, Sequence[Any]], optional
Target variable for correlation analysis. Must be convertible to a pandas Series. If not provided, target-specific tables and visualizations are skipped.
- sample_size_thresholdint, default=5000
Threshold on the number of observations used to switch between scatter plots and hexbin plots for feature-target visualizations.
- round_digitsint, default=4
Number of decimal places for rounding correlation and diagnostic coefficients.
- nan_policy{‘drop’, ‘raise’}, default=’drop’
Policy for handling missing values:
‘drop’ : remove rows with missing values.
‘raise’: raise an error if missing values are present.
- Returns:
- Block
An Explorica Block containing:
Pearson correlation matrix between numeric features (and target if provided)
Spearman correlation matrix between numeric features (and target if provided)
Multicollinearity diagnostic table based on Variance Inflation Factor (VIF), included if number of numeric features >= 2
Multicollinearity diagnostic table based on highest pairwise correlation, included if number of numeric features >= 2
If a target is provided:
Table of highest correlation pairs (features vs target)
Feature-target visualizations (scatterplots if number of rows <= sample_size_threshold, hexbin plots otherwise)
Examples
>>> import pandas as pd >>> from explorica.reports.presets import get_linear_relations_block >>> df = pd.DataFrame({ ... "x1": [1, 2, 3, 4], ... "x2": [2, 4, 6, 8], ... "x3": [4, 3, 2, 1] ... }) >>> y = pd.Series([1, 0, 1, 0], name="target") >>> block = get_linear_relations_block(df, y) >>> block.block_config.title 'Linear relations' >>> block.block_config.tables[0].table X Y coef method 0 x1 target -0.4472 pearson 1 x2 target -0.4472 pearson 2 x3 target 0.4472 pearson 3 x1 target -0.4472 spearman 4 x2 target -0.4472 spearman >>> block.close_figures()
explorica.reports.presets.blocks.relations_nonlinear
Non-linear relations block preset.
Provides a Block summarizing non-linear dependencies between numerical and categorical features using eta-squared (η²) and Cramer’s V metrics. The block includes heatmaps for both metrics and a table of top dependency pairs.
Functions
get_nonlinear_relations_block(numerical_data, categorical_data, numerical_target=None, categorical_target=None, **kwargs)
Build a Block instance summarizing non-linear dependencies between features.
Notes
Only one target type (numerical or categorical) can be provided at a time.
Non-linear dependencies are computed using η² (numerical-categorical) and Cramer’s V (categorical-categorical) only.
This function is intended for internal use in Explorica reports, but is exposed as a preset for user convenience.
Examples
>>> import pandas as pd
>>> from explorica.reports.presets import get_nonlinear_relations_block
>>> df_num = pd.DataFrame({'x1': [1,2,3], 'x2': [4,5,6]})
>>> df_cat = pd.DataFrame({'c1': ['a','b','a'], 'c2': ['x','y','x']})
>>> block = get_nonlinear_relations_block(df_num, df_cat)
>>> block.block_config.title
'Non-linear relations'
>>> block.close_figures()
- explorica.reports.presets.blocks.relations_nonlinear.get_nonlinear_relations_block(numerical_data: Sequence[Any] | Mapping[str, Sequence[Any]] = None, categorical_data: Sequence[Any] | Mapping[str, Sequence[Any]] = None, categorical_target: Sequence[Any] | Mapping[str, Sequence[Any]] = None, **kwargs) Block[source]
Generate a Block summarizing non-linear dependencies between features.
Computes non-linear dependency metrics between numerical and categorical features, including η² (eta squared) for numerical-categorical pairs and Cramer’s V for categorical-categorical pairs. Renders corresponding heatmaps and a table of top dependency pairs. If the dataset or target is insufficient, the block may be empty.
- Parameters:
- numerical_dataSequence or Mapping, optional
Numerical features for dependency analysis; must be convertible to a pandas DataFrame.
- categorical_dataSequence or Mapping, optional
Categorical features for dependency analysis; must be convertible to a pandas DataFrame.
- categorical_targetSequence or Mapping, optional
Categorical target variable to include in the analysis.
- Returns:
- Block
An Explorica Block containing a subset of the following components, depending on the provided data and targets:
Visualizations:
η² (eta squared) dependency heatmap Added if both numerical_data and categorical_data are provided and contain at least one column each. Numerical and categorical targets, if provided, are included in the computation.
Cramer’s V dependency heatmap Added if categorical_data is provided and contains at least one column. A categorical target, if provided, is included in the computation.
Tables:
Table of highest non-linear dependency pairs Added only if categorical_target is provided. The table summarizes the strongest non-linear dependencies between features and the categorical target using η² and Cramer’s V where applicable.
If none of the above conditions are satisfied, the returned block will be empty (block.empty == True).
- Other Parameters:
- nan_policy{‘drop’, ‘raise’}, default=’drop’
Policy for handling missing values:
‘drop’ : remove rows with missing values.
‘raise’: raise an error if missing values are present.
- round_digitsint, default=4
Number of decimal places to round dependency coefficients in the table.
Notes
Only one target variable type (numerical or categorical) can be provided.
If no categorical target is provided, the table of top dependency pairs will be omitted.
Each component of the block (η² heatmap, Cramer’s V heatmap, top dependency table) is added only if the corresponding data and/or target are available.
If none of the conditions are satisfied, the returned block will be empty.
This block is designed for inclusion in non-linear, interaction-focused Explorica reports.
This function is designed to be tolerant to missing inputs and may return an empty block when insufficient data is provided.
Examples
>>> import pandas as pd >>> from explorica.reports.presets import get_nonlinear_relations_block >>> df_num = pd.DataFrame({'x1': [1,2,3], 'x2': [4,5,6]}) >>> df_cat = pd.DataFrame({'c1': ['a','b','a'], 'c2': ['x','y','x']}) >>> block = get_nonlinear_relations_block(df_num, df_cat) >>> block.block_config.title 'Non-linear relations' >>> block.close_figures()