Source code for explorica.reports.presets.interactions

"""
Interactions presets for Explorica reports.

This module provides high-level orchestration utilities for building
interaction-focused Explorica reports. It does not implement statistical
methods itself; instead, it coordinates feature assignment, heuristic
inference, and composition of lower-level analytical blocks.

Functions
---------
**get_interactions_blocks(data, feature_assignment=None, category_threshold=30,
round_digits=4, nan_policy="drop")**

    Build linear and non-linear interaction blocks for Explorica reports.

**get_interactions_report(data, feature_assignment=None, category_threshold=30,
round_digits=4, nan_policy="drop")**

    Generate an interaction analysis report.

Notes
-----
- This module is pandas-based and expects tabular, column-addressable
  input data (``DataFrame`` or mapping of column names to sequences).
- User-provided ``FeatureAssignment`` objects always take precedence over
  heuristic feature and target inference.
- Only non-empty blocks are included in the final report.
- An empty result indicates insufficient information for interaction
  analysis rather than an execution error.

Examples
--------
>>> import pandas as pd
>>> from explorica.reports.presets import get_interactions_report
>>> # Simple usage
>>> df = pd.DataFrame({
...     "x1": [1, 2, 3, 4],
...     "x2": [10, 20, 30, 40],
...     "c1": ["a", "b", "a", "b"],
...     "y": [0, 1, 0, 1]
... })
>>> report = get_interactions_report(df)
>>> report.title
'Interaction analysis'
>>> report.close_figures()
"""

import warnings
from typing import Any, Hashable, Mapping, Sequence

import pandas as pd

from ..._utils import convert_dataframe
from ..core.block import Block
from ..core.report import Report
from ..utils import _split_features_by_assignment, normalize_assignment
from .blocks import get_linear_relations_block, get_nonlinear_relations_block

__all__ = [
    "get_interactions_blocks",
    "get_interactions_report",
]



[docs]
def get_interactions_blocks(
    data: pd.DataFrame | Mapping[str, Sequence[Any]],
    numerical_names: list[Hashable] = None,
    categorical_names: list[Hashable] = None,
    target_name: Hashable = None,
    **kwargs,
) -> list[Block]:
    """
    Generate linear and non-linear interaction blocks for Explorica reports.

    This function orchestrates the creation of two main blocks:

    1. Linear relations block,
       summarizing correlations and multicollinearity diagnostics.
    2. Non-linear relations block, summarizing eta² (numerical-categorical)
       and Cramer's V (categorical-categorical) dependencies.

    Parameters
    ----------
    data : pandas.DataFrame or Mapping[str, Sequence[Any]]
        Input dataset containing features and optionally target columns.
    numerical_names : list[Hashable], optional
        Names of numerical feature columns. If not provided, numerical features
        are inferred from column dtypes.
    categorical_names : list[Hashable], optional
        Names of categorical feature columns. If not provided, categorical
        features are inferred using cardinality-based heuristics.
    target_name : Hashable, optional
        Name of the target column in `data`. If provided and explicit target
        names are not specified, its type and cardinality are used to infer
        whether it should be treated as numerical, categorical, or both.

    Returns
    -------
    list[Block]
        List of generated Explorica `Block` instances:

        - The linear relations block is always included.
        - The non-linear relations block is included only if it contains
          metrics, visualizations, or tables (otherwise it is omitted).

    Other Parameters
    ----------------
    target_numerical_name : Hashable, optional
        Explicit name of the numerical target column.
        Takes precedence over heuristic inference.
    target_categorical_name : Hashable, optional
        Explicit name of the categorical target column.
        Takes precedence over heuristic inference.
    categorical_threshold : int, default=30
        Maximum number of unique values for a column to be considered
        categorical during heuristic inference.
    round_digits : int, default=4
        Number of decimal places to round coefficients in tables.
    nan_policy : {'drop', 'raise'}, default='drop'
        Policy for handling missing values:

        - 'drop' : remove rows containing NaNs.
        - 'raise': raise an error if missing values are present.

    Notes
    -----
    - Explicitly provided feature and target names always take precedence over
      heuristic inference.
    - Features may appear in both numerical and categorical sets if applicable.
    - This function is intended for EDA and interaction analysis purposes.
    - During the construction of EDA or interaction reports, many matplotlib figures
      may be opened (one per plot or table visualization). This is expected behavior
      when the dataset contains many features.
    - To prevent runtime warnings about too many open figures, these warnings are
      ignored internally.
    - To free memory after rendering, it is recommended to explicitly close figures:

      .. code-block:: python

          report = get_eda_report(df)
          report.render()
          report.close_figures()

      Or for individual blocks:

      .. code-block:: python

          block.close_figures()

    Examples
    --------
    >>> import pandas as pd
    >>> from explorica.reports.presets import get_interactions_blocks
    >>> df = pd.DataFrame({
    ...     "x1": [1, 2, 3, 4],
    ...     "x2": [10, 20, 30, 40],
    ...     "c1": ["a", "b", "a", "b"],
    ...     "y": [0, 1, 0, 1]
    ... })
    >>> blocks = get_interactions_blocks(
    ...     df,
    ...     numerical_names=["x1", "x2"],
    ...     categorical_names=["c1"],
    ...     target_name="y"
    ... )
    >>> len(blocks)
    2
    >>> [i.block_config.title for i in blocks]
    ['Linear relations', 'Non-linear relations']
    """
    other_params = {
        "target_numerical_name": kwargs.get("target_numerical_name", None),
        "target_categorical_name": kwargs.get("target_categorical_name", None),
        "categorical_threshold": kwargs.get("categorical_threshold", 30),
        "round_digits": kwargs.get("round_digits", 4),
        "nan_policy": kwargs.get("nan_policy", "drop"),
    }

    df = convert_dataframe(data)
    feature_assignment = normalize_assignment(
        df,
        numerical_names,
        categorical_names,
        numerical_target=other_params["target_numerical_name"],
        categorical_target=other_params["target_categorical_name"],
        target_name=target_name,
    )
    # Split df by assignments
    df_num, df_cat, target_num, target_cat = _split_features_by_assignment(
        df,
        feature_assignment,
        categorical_threshold=other_params["categorical_threshold"],
    )
    # We ignore mpl runtime warnings because EDA reports may open many figures.
    # It's assumed, that the user use ``Report.close_figures()``
    # and ``Block.close_figures`` after rendering
    blocks = []
    with warnings.catch_warnings():
        warnings.filterwarnings(
            "ignore",
            module="explorica.visualizations",
            category=RuntimeWarning,
            message="More than 20 figures have been opened.",
        )

        if not df_num.empty:
            blocks.append(
                get_linear_relations_block(
                    df_num,
                    target_num,
                    round_digits=other_params["round_digits"],
                    nan_policy=other_params["nan_policy"],
                )
            )
        # Combine numerical features and numerical target for the nonlinear block.
        nonlinear_numerical = pd.concat([df_num, target_num], axis=1)
        block_nonlinear_rels = get_nonlinear_relations_block(
            nonlinear_numerical,
            df_cat,
            categorical_target=target_cat,
            round_digits=other_params["round_digits"],
            nan_policy=other_params["nan_policy"],
        )
        # block_nonlinear_rels can be empty
        if not block_nonlinear_rels.empty:
            blocks.append(block_nonlinear_rels)
    return blocks




[docs]
def get_interactions_report(
    data: pd.DataFrame | Mapping[str, Sequence[Any]],
    numerical_names: list[Hashable] = None,
    categorical_names: list[Hashable] = None,
    target_name: Hashable = None,
    **kwargs,
) -> Report:
    """
    Generate an interaction analysis report.

    This function is a high-level orchestrator that constructs an Explorica
    `Report` focused on feature interactions. It delegates feature selection,
    target assignment, and block composition to `get_interactions_blocks`,
    and wraps the resulting blocks into a single report.

    Parameters
    ----------
    data : pd.DataFrame or Mapping[str, Sequence[Any]]
        Input dataset containing features and optionally target columns.
    numerical_names : list[Hashable], optional
        Names of numerical feature columns. If not provided, numerical features
        are inferred from column dtypes.
    categorical_names : list[Hashable], optional
        Names of categorical feature columns. If not provided, categorical
        features are inferred using cardinality-based heuristics.
    target_name : Hashable, optional
        Name of the target column in `data`. If provided and explicit target
        names are not specified, its type and cardinality are used to infer
        whether it should be treated as numerical, categorical, or both.

    Returns
    -------
    Report
        An Explorica `Report` titled ``"Interaction analysis"`` containing
        zero or more blocks describing linear and non-linear feature
        interactions.

        The report may include:

        - A linear relations block (correlations, multicollinearity diagnostics,
          and feature–target visualizations).
        - A non-linear relations block (η² and Cramer's V dependency analysis).

        Only non-empty blocks are included in the report. If no interaction
        blocks can be constructed from the provided data and assignments,
        the report may be empty.

    Other Parameters
    ----------------
    target_numerical_name : Hashable, optional
        Explicit name of the numerical target column.
        Takes precedence over heuristic inference.
    target_categorical_name : Hashable, optional
        Explicit name of the categorical target column.
        Takes precedence over heuristic inference.
    categorical_threshold : int, default=30
        Maximum number of unique values for a column to be considered
        categorical during heuristic inference.
    round_digits : int, default=4
        Number of decimal places to round coefficients in all included blocks.
    nan_policy : {'drop', 'raise'}, default='drop'
        Policy for handling missing values across all blocks:

        - 'drop' : remove rows containing NaNs.
        - 'raise': raise an error if missing values are present.

    See Also
    --------
    get_interactions_blocks
        Constructs the individual interaction blocks used in the report.

    Notes
    -----
    - This function does not perform any analysis itself; it only orchestrates
      block construction and report assembly.
    - Explicitly provided feature and target names always take precedence over
      heuristic inference.
    - The presence and contents of each block depend on the availability of
      numerical and categorical features and on whether target variables are
      provided.
    - An empty report indicates insufficient information to compute interaction
      metrics, not an execution error.
    - During the construction of EDA or interaction reports, many matplotlib figures
      may be opened (one per plot or table visualization). This is expected behavior
      when the dataset contains many features.
    - To prevent runtime warnings about too many open figures, these warnings are
      ignored internally.
    - To free memory after rendering, it is recommended to explicitly close figures:

      .. code-block:: python

          report = get_eda_report(df)
          report.render()
          report.close_figures()

      Or for individual blocks:

      .. code-block:: python

          block.close_figures()

    Examples
    --------
    >>> import pandas as pd
    >>> from explorica.reports.presets import get_interactions_report
    >>> df = pd.DataFrame({
    ...     "x1": [1, 2, 3, 4],
    ...     "x2": [10, 20, 30, 40],
    ...     "c1": ["a", "b", "a", "b"],
    ...     "y": [0, 1, 0, 1],
    ... })
    >>> # Automatic feature and target inference
    >>> report = get_interactions_report(df, target_name="y")
    >>> len(report.blocks) > 0
    True

    >>> # Explicit feature assignment
    >>> report = get_interactions_report(
    ...     df,
    ...     numerical_names=["x1", "x2"],
    ...     categorical_names=["c1"],
    ...     target_name="y",
    ... )
    >>> report.title
    'Interaction analysis'

    >>> # Explicit target specification via kwargs
    >>> report = get_interactions_report(
    ...     df,
    ...     numerical_names=["x1", "x2"],
    ...     categorical_names=["c1"],
    ...     target_numerical_name="y",
    ... )
    >>> report.blocks
    [...]
    >>> report.close_figures()
    """
    blocks = get_interactions_blocks(
        data,
        numerical_names=numerical_names,
        categorical_names=categorical_names,
        target_name=target_name,
        **kwargs,
    )
    return Report(blocks, title="Interaction analysis")