explorica.reports.presets.blocks

explorica.reports.presets.blocks.cardinality

Data cardinality block preset.

Provides a summary of feature cardinality, uniqueness, and constancy characteristics as an Explorica Block. The block helps identify constant, unique, and low-information features using multiple complementary metrics.

Functions

get_cardinality_block(data, round_digits=4, nan_policy=”drop”)

Build a Block instance summarizing feature cardinality and constancy metrics.

Notes

  • Cardinality is described using both absolute and relative measures, including number of unique values and their ratios.

  • Constant and unique features are flagged explicitly via boolean indicators.

  • Entropy is reported in normalized form to allow comparison across features with different cardinalities.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_cardinality_block
>>> df = pd.DataFrame({
...     'a': [1, 1, 1, 1],
...     'b': [1, 2, 3, 4],
...     'c': ['x', 'x', 'y', 'y']
... })
>>> block = get_cardinality_block(df)
>>> block.block_config.title
'Cardinality'
>>> [table.title for table in block.block_config.tables]
['Constancy | uniqueness metrics']
>>> block.close_figures()
explorica.reports.presets.blocks.cardinality.get_cardinality_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], round_digits: int = 4, nan_policy: str | Literal['drop_with_split', 'raise', 'include'] = 'drop_with_split') Block[source]

Generate a Block summarizing feature cardinality and constancy metrics.

This block provides an overview of uniqueness and constant-like behavior in the dataset. It is intended for exploratory data analysis and data quality assessment, helping identify features with low variability, redundant values, or high uniqueness.

The block contains a single table with the following columns:

  • is_unique : indicates if a feature has all unique values.

  • is_constant : indicates if a feature has a single unique value.

  • n_unique : number of unique values in the feature.

  • unique_ratio : ratio of unique values to the number of rows.

  • top_value_ratio : proportion of the most frequent value.

  • entropy (normalized) : Shannon entropy of the feature normalized by log2(n_unique), measuring information content and effective cardinality.

Parameters:
dataSequence[Any] or Mapping[str, Sequence[Any]]

Input dataset. Must be convertible to a pandas DataFrame. Both numeric and categorical columns are supported.

round_digitsint, default=4

Number of decimal places to round ratio and entropy metrics.

nan_policy{‘drop’, ‘raise’, ‘include’}, default=’drop’

Policy for handling missing values:

  • ‘drop_with_split’ : Missing values are handled independently for each feature. For every column, NaNs are dropped column-wise before computing statistics. As a result, different features may be evaluated on different numbers of observations. This behavior is semantically correct in an EDA context, where preserving per-feature statistics is preferred over strict row-wise alignment.

  • ‘raise’: raise an error if any missing values are present.

  • ‘include’: treat NaN as a valid category (counts towards uniqueness and entropy calculations).

Returns:
Block

An Explorica Block containing a single table with cardinality and constancy metrics for each feature.

Notes

  • Features with zero variance or all missing values will appear as NaN in relevant metrics.

  • This block complements data quality overview blocks by providing a deeper view of feature redundancy and variability.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_cardinality_block
>>> df = pd.DataFrame({
...     "A": [1, 2, 3, 4],
...     "B": [5, 5, 5, 5],
...     "C": [1, 2, 2, 3],
...     "D": [None, 1, None, 1]
... })
>>> block = get_cardinality_block(df, nan_policy="include")
>>> block.block_config.tables[0].table
   is_unique  is_constant  ...  top_value_ratio  entropy (normalized)
A       True        False  ...             0.25                1.0000
B      False         True  ...             1.00                   NaN
C      False        False  ...             0.50                0.9464
D      False        False  ...             0.50                0.0000

[4 rows x 6 columns]
>>> block.close_figures()

explorica.reports.presets.blocks.ctm module

Central tendency & dispersion block preset.

Provides a summary of the dataset’s central tendency (mean, median, mode) and dispersion (std, min, max, range) as an Explorica Block.

Functions

get_ctm_block(data, nan_policy=’drop’, round_digits=4)

Build a Block instance containing tables of basic statistics for the dataset.

Notes

  • Designed for quick, high-level overview in Explorica reports.

  • Mode calculation includes categorical columns.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_ctm_block
>>> df = pd.DataFrame({'a': [1,2,3], 'b': ['x','y','z']})
>>> block = get_ctm_block(df)
>>> block.block_config.title
'Basic statistics for the dataset'
>>> [table.title for table in block.block_config.tables]
['Central tendency measures', None, 'Dispersion measures']
>>> block.close_figures()
explorica.reports.presets.blocks.ctm.get_ctm_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], nan_policy: Literal['drop_with_split', 'raise'] = 'drop_with_split', round_digits: int = 4) Block[source]

Generate a Block containing central tendency statistics for a dataset.

This block provides central tendency measures and dispersion measures of a dataset, including mean, mode, median, std, min, max and range.

Parameters:
dataSequence or Mapping of sequences

Input dataset. Can be a list of sequences or a dictionary of column names to sequences. Will be converted to a pandas DataFrame internally.

nan_policy{‘drop_with_split’, ‘raise’}, default=’drop’

Policy to handle missing values:

  • ‘drop_with_split’ : Missing values are handled independently for each feature. For every column, NaNs are dropped column-wise before computing statistics. As a result, different features may be evaluated on different numbers of observations. This behavior is semantically correct in an EDA context, where preserving per-feature statistics is preferred over strict row-wise alignment.

  • ‘raise’ : raise an error if NaNs are present.

round_digitsint, default=4

Number of decimal places to round numerical results.

Returns:
Block

A Block object containing the following tables:

  • “Central tendency measures”: mean and median for numerical columns.

  • “Mode”: mode for all columns, including categorical.

  • “Dispersion measures”: standard deviation, minimum, maximum, and range for numerical columns.

Notes

  • The function automatically separates numerical and categorical columns.

Examples

>>> from explorica.reports.presets import get_ctm_block
>>> data = {'a': [1, 2, 3, 4], 'b': [5, 5, 6, 6]}
>>> block = get_ctm_block(data)
>>> # Contains central tendency and dispersion tables
>>> [table.title for table in block.block_config.tables]
['Central tendency measures', None, 'Dispersion measures']
>>> block.close_figures()

explorica.reports.presets.blocks.data_quality_overview

Data quality overview block preset.

Provides a quick summary of a dataset’s quality, including duplicated rows and counts/ratios of missing values. Designed for a fast, high-level exploratory analysis in Explorica reports.

Functions

get_data_quality_overview_block(data, round_digits=4)

Build a Block instance containing metrics and a table summarizing duplicated rows and NaN counts/ratios.

Notes

  • Intended for quick inspection; use data_quality module for more detailed analysis.

  • round_digits controls numeric precision in NaN ratio calculations.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_data_quality_overview_block
>>> df = pd.DataFrame({'a': [1, None, 2], 'b': ['x', 'y', None]})
>>> block = get_data_quality_overview_block(df)
>>> block.block_config.title
'Data quality quick summary'
>>> [m['name'] for m in block.block_config.metrics]
['Duplicates rows', 'Duplicates ratio']
>>> block.block_config.tables[0].title
"NaN's count & ratio"
>>> block.close_figures()
explorica.reports.presets.blocks.data_quality_overview.get_data_quality_overview_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], round_digits: int = 4) Block[source]

Generate a quick data quality overview block.

This block provides a concise summary of the dataset’s quality, including duplicated rows and missing values. It is intended for fast exploratory analysis without going into detailed data quality checks.

Parameters:
dataSequence[Any] or Mapping[str, Sequence[Any]]

The input dataset. Can be a list of sequences, a dictionary of sequences, or any pandas-compatible structure convertible to a DataFrame.

round_digitsint, default 4

Number of decimal places to round ratios (e.g., NaN ratios, duplicate ratio).

Returns:
Block

An Explorica Block containing:

  • Metrics:

    • “Duplicates rows”: number of duplicated rows in the dataset.

    • “Duplicates ratio”: ratio of duplicated rows to total rows.

  • Table:

    • “NaN’s count & ratio”: table summarizing count and ratio of missing values per column.

Notes

  • The “Duplicates ratio” is np.nan for empty datasets.

  • NaN ratios and counts are rounded to round_digits decimal places.

  • This block is meant to give a quick overview and does not replace a detailed data quality analysis.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_data_quality_overview_block
>>> df = pd.DataFrame({
...     "a": [1, 2, 2, None],
...     "b": ["x", "y", "y", "z"]
... })
>>> block = get_data_quality_overview_block(df)
>>> metrics = {m['name']: m['value'] for m in block.block_config.metrics}
>>> metrics["Duplicates rows"]
1
>>> metrics["Duplicates ratio"]
np.float64(0.25)
>>> block.block_config.tables[0].table
   nan_count  nan_ratio
a          1       0.25
b          0       0.00
>>> block.close_figures()

explorica.reports.presets.blocks.data_shape

Data shape block preset.

Provides a quick summary of the dataset shape, including number of rows, columns, column types, and positional index check.

Functions

get_data_shape_block(data, nan_policy=’drop’)

Build a Block instance summarizing the dataset shape.

Notes

  • Designed for quick, high-level overview in Explorica reports.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_data_shape_block
>>> # Simple usage
>>> df = pd.DataFrame({'a': [1,2,3], 'b': ['x','y','z']})
>>> block = get_data_shape_block(df)
>>> block.block_config.title
'Dataset shape'
>>> block.close_figures()
explorica.reports.presets.blocks.data_shape.get_data_shape_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], nan_policy: Literal['drop', 'raise', 'include'] = 'include') Block[source]

Generate a data shape block.

This block provides an overview of the dataset’s structural properties, including the number of rows and columns, the distribution of column types, and information about the dataset index.

Parameters:
dataSequence[Any] or Mapping[str, Sequence[Any]]

The input dataset. Can be a list of sequences, a dictionary of sequences, or a pandas-compatible structure convertible to a DataFrame.

nan_policy{‘drop’, ‘raise’, ‘include’}, default ‘include’

Policy for handling missing values:

  • ‘drop’ : remove rows with NaN values before computing metrics.

  • ‘raise’: raise an error if NaN values are present.

  • ‘include’: keep rows with NaN values; they do not interfere with computation of structural metrics or column type counts.

Returns:
Block

An Explorica Block containing:

  • Metrics:

    • “Rows”: number of rows in the dataset.

    • “Columns”: number of columns in the dataset.

    • “Index is positional”: boolean indicating if the index behaves as a non-negative integer positional index (unique, integer, starting at 0).

  • Table:

    • “Data types”: a table summarizing the count of columns per data type, sorted descending by number of features.

Notes

  • The “Index is positional” metric uses a heuristic to determine if the index can be interpreted as a simple positional index, which is robust to missing rows or non-consecutive integer indices.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_data_shape_block
>>> df = pd.DataFrame({
...     'a': [1, 2, 3],
...     'b': ['x', 'y', 'z']
... })
>>> block = get_data_shape_block(df)
>>> block.block_config.metrics
[{'name': 'Rows', 'value': 3, 'description': None}, ...]
>>> block.block_config.tables[0].table
    dtype  n_features
0   int64           1
1  object           1
>>> block.close_figures()

explorica.reports.presets.blocks.distributions

Distributions block preset.

Provides an overview of feature distributions as an Explorica Block, including numerical distribution descriptors and visual diagnostics. The block focuses on skewness, kurtosis, normality flags, and per-feature distribution visualizations.

Functions

get_distributions_block(data, threshold_skewness=0.25, threshold_kurtosis=0.25, round_digits=4, nan_policy=”drop”)

Build a Block instance summarizing feature distributions in a dataset.

Notes

  • The block operates on numerical columns only.

  • Skewness and excess kurtosis are used to assess distribution shape and deviation from normality.

  • Normality flags are derived using user-defined skewness and kurtosis thresholds.

  • Boxplots and distribution plots are rendered per feature; the first visualization in each group provides a group-level title.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_distributions_block
>>> df = pd.DataFrame({
...     'a': [1, 2, 3, 4, 5],
...     'b': [10, 10, 10, 10, 10]
... })
>>> block = get_distributions_block(df)
>>> block.block_config.title
'Distributions'
>>> [table.title for table in block.block_config.tables]
['Skewness and excess kurtosis']
>>> block.close_figures()
explorica.reports.presets.blocks.distributions.get_distributions_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], threshold_skewness: float = 0.25, threshold_kurtosis: float = 0.25, round_digits: int = 4, nan_policy: Literal['drop_with_split', 'raise'] = 'drop_with_split') Block[source]

Generate a Block summarizing feature distributions in a dataset.

This block provides an overview of numeric features, including:

  • Skewness and excess kurtosis metrics in a table, with an is_normal flag according to provided thresholds.

  • Boxplots for all numeric features, plus individual boxplots per feature.

  • Distribution plots (histograms + optional KDE) for all numeric features.

The block is intended for exploratory data analysis and can be combined with other blocks (e.g., data quality, outliers) in reports.

Parameters:
dataSequence[Any] or Mapping[str, Sequence[Any]]

Input dataset. Must be convertible to a pandas DataFrame. Only numeric columns are used for analysis.

threshold_skewnessfloat, default=0.25

Maximum absolute skewness value for a feature to be considered approximately normal.

threshold_kurtosisfloat, default=0.25

Maximum absolute excess kurtosis value for a feature to be considered approximately normal.

round_digitsint, default=4

Number of decimal digits for skewness and kurtosis values in the table.

nan_policy{‘drop_with_split’, ‘raise’}, default ‘drop’

Policy to handle missing values:

  • ‘drop_with_split’ : Missing values are handled independently for each feature. For every column, NaNs are dropped column-wise before computing statistics. As a result, different features may be evaluated on different numbers of observations. This behavior is semantically correct in an EDA context, where preserving per-feature statistics is preferred over strict row-wise alignment.

  • ‘raise’ : raise an error if NaNs are present.

Returns:
Block

An Explorica Block containing:

  • A table with skewness, excess kurtosis, and is_normal flags for numeric features.

  • Boxplots for all numeric features and individual boxplots per feature.

  • Distribution plots (histograms + optional KDE) for all numeric features.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_distributions_block
>>> # Simple usage
>>> df = pd.DataFrame({
...     "a": [1, 2, 3, 4, 5],
...     "b": [2, 2, 3, 4, 5]
... })
>>> block = get_distributions_block(df)
>>> block.block_config.tables[0].table
   skewness  kurtosis  is_normal                       desc
a    0.0000    -1.300      False                low-pitched
b    0.3632    -1.372      False  right-skewed, low-pitched
>>> block.close_figures()

explorica.reports.presets.blocks.outliers

Outliers block preset.

Provides an overview of outliers in numerical features as an Explorica Block. The block summarizes outliers detected using multiple statistical methods, allowing comparison of their sensitivity and coverage.

Functions

get_outliers_block(data, iqr_factor=1.5, zscore_factor=3.0, nan_policy=”drop”)

Build a Block instance containing a table summarizing outliers detected by different methods.

Notes

  • The block operates on numerical columns only.

  • Outliers are detected independently for each feature.

  • The interquartile range (IQR) method uses iqr_factor to control sensitivity to extreme values.

  • The Z-score method uses zscore_factor as a threshold for standardized deviation.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_outliers_block
>>> df = pd.DataFrame({
...     'a': [1, 2, 3, 100],
...     'b': [10, 11, 12, 13]
... })
>>> block = get_outliers_block(df)
>>> block.block_config.title
'Outliers'
>>> [table.title for table in block.block_config.tables]
['Count of outliers by different detection methods']
>>> block.close_figures()
explorica.reports.presets.blocks.outliers.get_outliers_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], iqr_factor: float = 1.5, zscore_factor: float = 3.0, nan_policy: Literal['drop_with_split', 'raise'] = 'drop_with_split') Block[source]

Generate a Block summarizing outliers detected by different methods.

This block provides a compact overview of potential outliers in numeric features using multiple detection strategies. Currently, it includes counts of outliers detected by the IQR and Z-score methods.

If features with zero or near-zero variance are present, an additional table is included to explicitly report such features, as outliers cannot exist in constant series.

The resulting block is intended for exploratory data analysis and can be composed with other blocks (e.g., distribution or data quality blocks) in higher-level reports.

Parameters:
dataSequence[Any] or Mapping[str, Sequence[Any]]

Input dataset. Must be convertible to a pandas DataFrame. Only numeric columns are considered for outlier detection.

iqr_factorfloat, default 1.5

Scaling factor used for the IQR-based outlier detection.

zscore_factorfloat, default 3.0

Threshold (in standard deviations) used for Z-score-based outlier detection.

nan_policy{‘drop_with_split’, ‘raise’}, default=’drop’

Policy to handle missing values:

  • ‘drop_with_split’ : Missing values are handled independently for each feature. For every column, NaNs are dropped column-wise before computing statistics. As a result, different features may be evaluated on different numbers of observations. This behavior is semantically correct in an EDA context, where preserving per-feature statistics is preferred over strict row-wise alignment.

  • ‘raise’ : raise an error if NaNs are present.

Returns:
Block

An Explorica Block containing a single table:

  • “Count of outliers by different detection methods”: a table indexed by feature name, where each column corresponds to an outlier detection method.

  • ‘Features with zero or near zero variance”’ (optional): A table listing features whose variance is zero or numerically close to zero. This table is included only if such features are detected in the dataset.

Notes

  • The block is intentionally minimal and currently focuses on outlier counts only. It is designed to be extensible, allowing additional detection methods or related summaries to be added in the future.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets.blocks import get_outliers_block
>>> df = pd.DataFrame({"x": [1, 2, 3, 100]})
>>> block = get_outliers_block(df)
>>> block.block_config.tables[0].table
   IQR (1.5)  Z-Score (3.0σ)
x          1               0
>>> block.close_figures()

explorica.reports.presets.blocks.relations_linear

Linear relations block preset.

Provides an overview of linear associations between numeric features and a specified target variable as an Explorica Block.

The block focuses on correlation-based analysis and is intended to be used as part of interaction-focused reports (e.g., Exploratory Data Analysis with a defined target). It summarizes pairwise linear relationships using both Pearson and Spearman correlation coefficients and highlights the strongest correlations involving the target.

Functions

get_linear_relations_block(data, target, round_digits=4, nan_policy=”drop”)

Build a Block instance summarizing linear relationships in a dataset.

Notes

  • Only numeric features are included in the analysis.

  • Pearson correlation captures linear relationships under the assumption of approximately linear dependence.

  • Spearman correlation captures monotonic relationships and is less sensitive to outliers.

  • The target variable is included in correlation matrices and is required for ranking the highest correlation pairs.

  • Correlation significance (p-values) is not included and may be added in future releases.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_linear_relations_block
>>> df = pd.DataFrame({
...     "x1": [1, 2, 3, 4],
...     "x2": [2, 4, 6, 8],
...     "x3": [4, 3, 2, 1]
... })
>>> y = pd.Series([1, 0, 1, 0], name="target")
>>> block = get_linear_relations_block(df, y)
>>> block.block_config.title
'Linear relations'
>>> block.block_config.tables[0].table
    X       Y    coef    method
0  x1  target -0.4472   pearson
1  x2  target -0.4472   pearson
2  x3  target  0.4472   pearson
3  x1  target -0.4472  spearman
4  x2  target -0.4472  spearman
>>> block.close_figures()
explorica.reports.presets.blocks.relations_linear.get_linear_relations_block(data: Sequence[Any] | Mapping[str, Sequence[Any]], target: Sequence[Any] | Mapping[str, Sequence[Any]] = None, sample_size_threshold: NaturalNumber = 5000, round_digits: int = 4, nan_policy: Literal['drop', 'raise'] = 'drop') Block[source]

Generate a Block summarizing linear relationships in a dataset.

This block provides an overview of linear associations between numeric features and a specified target variable. It includes correlation matrices (Pearson and Spearman) and a table of the highest correlation pairs ranked by absolute coefficient values.

Parameters:
dataSequence[Any] or Mapping[str, Sequence[Any]]

Input dataset. Must be convertible to a pandas DataFrame. Only numeric columns are considered.

targetSequence[Any] or Mapping[str, Sequence[Any]], optional

Target variable for correlation analysis. Must be convertible to a pandas Series. If not provided, target-specific tables and visualizations are skipped.

sample_size_thresholdint, default=5000

Threshold on the number of observations used to switch between scatter plots and hexbin plots for feature-target visualizations.

round_digitsint, default=4

Number of decimal places for rounding correlation and diagnostic coefficients.

nan_policy{‘drop’, ‘raise’}, default=’drop’

Policy for handling missing values:

  • ‘drop’ : remove rows with missing values.

  • ‘raise’: raise an error if missing values are present.

Returns:
Block

An Explorica Block containing:

  • Pearson correlation matrix between numeric features (and target if provided)

  • Spearman correlation matrix between numeric features (and target if provided)

  • Multicollinearity diagnostic table based on Variance Inflation Factor (VIF), included if number of numeric features >= 2

  • Multicollinearity diagnostic table based on highest pairwise correlation, included if number of numeric features >= 2

  • If a target is provided:

    • Table of highest correlation pairs (features vs target)

    • Feature-target visualizations (scatterplots if number of rows <= sample_size_threshold, hexbin plots otherwise)

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_linear_relations_block
>>> df = pd.DataFrame({
...     "x1": [1, 2, 3, 4],
...     "x2": [2, 4, 6, 8],
...     "x3": [4, 3, 2, 1]
... })
>>> y = pd.Series([1, 0, 1, 0], name="target")
>>> block = get_linear_relations_block(df, y)
>>> block.block_config.title
'Linear relations'
>>> block.block_config.tables[0].table
    X       Y    coef    method
0  x1  target -0.4472   pearson
1  x2  target -0.4472   pearson
2  x3  target  0.4472   pearson
3  x1  target -0.4472  spearman
4  x2  target -0.4472  spearman
>>> block.close_figures()

explorica.reports.presets.blocks.relations_nonlinear

Non-linear relations block preset.

Provides a Block summarizing non-linear dependencies between numerical and categorical features using eta-squared (η²) and Cramer’s V metrics. The block includes heatmaps for both metrics and a table of top dependency pairs.

Functions

get_nonlinear_relations_block(numerical_data, categorical_data, numerical_target=None, categorical_target=None, **kwargs)

Build a Block instance summarizing non-linear dependencies between features.

Notes

  • Only one target type (numerical or categorical) can be provided at a time.

  • Non-linear dependencies are computed using η² (numerical-categorical) and Cramer’s V (categorical-categorical) only.

  • This function is intended for internal use in Explorica reports, but is exposed as a preset for user convenience.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_nonlinear_relations_block
>>> df_num = pd.DataFrame({'x1': [1,2,3], 'x2': [4,5,6]})
>>> df_cat = pd.DataFrame({'c1': ['a','b','a'], 'c2': ['x','y','x']})
>>> block = get_nonlinear_relations_block(df_num, df_cat)
>>> block.block_config.title
'Non-linear relations'
>>> block.close_figures()
explorica.reports.presets.blocks.relations_nonlinear.get_nonlinear_relations_block(numerical_data: Sequence[Any] | Mapping[str, Sequence[Any]] = None, categorical_data: Sequence[Any] | Mapping[str, Sequence[Any]] = None, categorical_target: Sequence[Any] | Mapping[str, Sequence[Any]] = None, **kwargs) Block[source]

Generate a Block summarizing non-linear dependencies between features.

Computes non-linear dependency metrics between numerical and categorical features, including η² (eta squared) for numerical-categorical pairs and Cramer’s V for categorical-categorical pairs. Renders corresponding heatmaps and a table of top dependency pairs. If the dataset or target is insufficient, the block may be empty.

Parameters:
numerical_dataSequence or Mapping, optional

Numerical features for dependency analysis; must be convertible to a pandas DataFrame.

categorical_dataSequence or Mapping, optional

Categorical features for dependency analysis; must be convertible to a pandas DataFrame.

categorical_targetSequence or Mapping, optional

Categorical target variable to include in the analysis.

Returns:
Block

An Explorica Block containing a subset of the following components, depending on the provided data and targets:

Visualizations:

  • η² (eta squared) dependency heatmap Added if both numerical_data and categorical_data are provided and contain at least one column each. Numerical and categorical targets, if provided, are included in the computation.

  • Cramer’s V dependency heatmap Added if categorical_data is provided and contains at least one column. A categorical target, if provided, is included in the computation.

Tables:

  • Table of highest non-linear dependency pairs Added only if categorical_target is provided. The table summarizes the strongest non-linear dependencies between features and the categorical target using η² and Cramer’s V where applicable.

If none of the above conditions are satisfied, the returned block will be empty (block.empty == True).

Other Parameters:
nan_policy{‘drop’, ‘raise’}, default=’drop’

Policy for handling missing values:

  • ‘drop’ : remove rows with missing values.

  • ‘raise’: raise an error if missing values are present.

round_digitsint, default=4

Number of decimal places to round dependency coefficients in the table.

Notes

  • Only one target variable type (numerical or categorical) can be provided.

  • If no categorical target is provided, the table of top dependency pairs will be omitted.

  • Each component of the block (η² heatmap, Cramer’s V heatmap, top dependency table) is added only if the corresponding data and/or target are available.

  • If none of the conditions are satisfied, the returned block will be empty.

  • This block is designed for inclusion in non-linear, interaction-focused Explorica reports.

  • This function is designed to be tolerant to missing inputs and may return an empty block when insufficient data is provided.

Examples

>>> import pandas as pd
>>> from explorica.reports.presets import get_nonlinear_relations_block
>>> df_num = pd.DataFrame({'x1': [1,2,3], 'x2': [4,5,6]})
>>> df_cat = pd.DataFrame({'c1': ['a','b','a'], 'c2': ['x','y','x']})
>>> block = get_nonlinear_relations_block(df_num, df_cat)
>>> block.block_config.title
'Non-linear relations'
>>> block.close_figures()