explorica

explorica.types

Type utilities and descriptors used throughout the Explorica framework.

This module defines shared type structures that provide a consistent interface for handling visualization outputs, validating numeric inputs, and enabling stronger semantic typing across Explorica components. These types are not intended to replace Python’s built-in types; rather, they serve as lightweight descriptors and containers that improve clarity, correctness, and interoperability inside the framework.

Classes

VisualizationResult

Container object returned by all visualization functions. Stores the generated figure, axes (if applicable), engine metadata, sizing information, and arbitrary additional details.

TableResult

Standardized container for tabular results in Explorica.

NaturalNumber

Type descriptor enabling isinstance(x, NaturalNumber) checks for positive integers (natural numbers). Used in parameter validation across plotting and preprocessing utilities.

FeatureAssignment

Container for explicit assignment of feature types in a dataset.

Notes

  • The types defined in this module are part of the public API. They are designed to be stable, lightweight, and safe for direct user interaction.

  • NaturalNumber is implemented via a metaclass and acts like a pseudo-type. It cannot be instantiated and carries no behavior beyond validation.

  • Additional pseudo-types and data descriptors may be added in future releases to support broader patterns of type-driven validation within Explorica.

Examples

>>> # Using `VisualizationResult`
>>> from explorica import visualizations
>>> res = visualizations.scatterplot(
...     [1, 2, 3], [3, 2, 1], title="Example"
... )
>>> res.figure          # access underlying Matplotlib or Plotly figure
<Figure ...>
>>> res.axes            # Matplotlib only
<Axes: ...>
>>> res.title
'Example'
>>> # Using `NaturalNumber`
>>> from explorica.types import NaturalNumber
>>> isinstance(5, NaturalNumber)
True
>>> isinstance(5.0, NaturalNumber)
True
>>> isinstance(-1, NaturalNumber)
False
>>> isinstance("3", NaturalNumber)
False
class explorica.types.FeatureAssignment(numerical_features: list[str] = <factory>, categorical_features: list[str] = <factory>, numerical_target: str = None, categorical_target: str = None)[source]

Bases: object

Container for explicit assignment of feature types in a dataset.

This dataclass allows the user to specify which columns correspond to numerical features, categorical features, and potential target variables (numerical or categorical). It is primarily used by Explorica orchestrators to extract and process features appropriately during report generation.

Attributes:
numerical_featureslist of str, default=[]

List of column names in the dataset representing numerical features.

categorical_featureslist of str, default=[]

List of column names in the dataset representing categorical features.

numerical_targetstr, optional

Name of the numerical target variable column.

categorical_targetstr, optional

Name of the categorical target variable column.

Notes

  • Either a numerical target or a categorical target can be specified, not both.

Examples

>>> import pandas as pd
>>> from explorica.types import FeatureAssignment
>>>
>>> # Simple usage
>>> df = pd.DataFrame({
...     "cat_feat": ["A", "A", "B", "B", "A"],
...     "num_feat1": [35.2, 52000, 34, 243, 3.2],
...     "num_feat2": [1.2, 3.2, 3.5, 0.2, 99],
...     "y": [102, 230, 23, 20, 302],
... })
>>> table = FeatureAssignment(
...     numerical_features=["num_feat1", "num_feat2"],
...     categorical_features=["cat_feat"],
...     numerical_target="y",
... )
categorical_features: list[str]
categorical_target: str = None
numerical_features: list[str]
numerical_target: str = None
class explorica.types.NaturalNumber[source]

Bases: object

Type descriptor for natural numbers (positive integers).

This class defines a numeric type representing natural numbers, i.e., positive integers greater than zero. It is primarily intended for type checking in Explorica and related libraries, and can be used wherever one wants to enforce that a numeric input is a natural number.

The descriptor implements the __instancecheck__ protocol, so that isinstance(value, NaturalNumber) returns True if and only if value satisfies all of the following:

  1. It is a numeric type (int or float).

  2. It is strictly greater than zero.

  3. It represents a whole number (integer value).

Notes

  • Floats are accepted only if they are exact integers, e.g. 1.0.

  • Non-numeric types (str, list, etc.) always return False.

  • Zero and negative numbers are not considered natural numbers.

  • This class is a type descriptor, not a numeric class. It cannot be instantiated or used for arithmetic. Its purpose is type validation.

  • The class is singleton-like: all checks are done via the metaclass, so isinstance works directly on the class.

Examples

>>> from explorica.types import NaturalNumber
>>> isinstance(1, NaturalNumber)
True
>>> isinstance(1.0, NaturalNumber)
True
>>> isinstance(0, NaturalNumber)
False
>>> isinstance(-5, NaturalNumber)
False
>>> isinstance(3.14, NaturalNumber)
False
>>> isinstance("5", NaturalNumber)
False
class explorica.types.TableResult(table: DataFrame, title: str | None = None, description: str | None = None, render_extra: dict | None = <factory>)[source]

Bases: object

Standardized container for tabular results in Explorica.

This class represents a structured tabular artifact produced during exploratory data analysis (EDA), such as summary statistics, quality diagnostics, or interaction analysis results.

At the current stage, TableResult serves as a lightweight wrapper around a pandas DataFrame.

Parameters:
tablepandas.DataFrame

Tabular data. The DataFrame is expected to use a flat structure: no MultiIndex on rows and no MultiIndex on columns.

titlestr, optional

Short human-readable title describing the table contents.

descriptionstr, optional

Longer description providing context or interpretation guidelines for the table.

render_extradict, optional

Optional dictionary controlling rendering behavior for this table. Keys may include:

  • show_index : bool, default True - whether to display the row index in rendered output (HTML or PDF).

  • show_columns : bool, default True - whether to display column names.

  • Any additional rendering hints may be added in the future.

Attributes:
description
title

Notes

  • TableResult is a passive data container and does not implement analytical logic or rendering behavior.

  • This class may be extended in the future to include additional metadata (e.g., per-column annotations or semantic roles).

Examples

>>> import pandas as pd
>>> from explorica.types import TableResult
>>> # Simple usage
>>> df = pd.DataFrame({
...     "feature": ["age", "income"],
...     "mean": [35.2, 52000],
...     "std": [8.1, 12000],
... })
>>> table = TableResult(
...     table=df,
...     title="Feature Summary Statistics",
...     description="Basic central tendency and dispersion measures."
... )
description: str | None = None
render_extra: dict | None
table: DataFrame
title: str | None = None
class explorica.types.VisualizationResult(figure: Figure, axes: Axes | None = None, engine: str = 'matplotlib', width: int | None = None, height: int | None = None, title: str | None = None, extra_info: dict = None)[source]

Bases: object

Standardized container for the output of all Explorica visualization functions.

This dataclass provides a unified structure for accessing the generated figure, axes, metadata, and rendering backend. All visualization functions across Explorica return a VisualizationResult, ensuring that users always interact with plots through a consistent and predictable interface.

The container is engine-agnostic and supports both Matplotlib and Plotly. This allows downstream processing, inspection, chaining, or exporting of visualizations without needing to know which plotting backend produced them.

Parameters:
figurematplotlib.figure.Figure or plotly.graph_objects.Figure

The figure object produced by the visualization function.

  • For Matplotlib, this is an instance of matplotlib.figure.Figure.

  • For Plotly, this is an instance of plotly.graph_objects.Figure.

axesmatplotlib.axes.Axes or None, default=None

The primary axes object when using Matplotlib. Plotly visualizations do not use axes and set this attribute to None.

engine{‘matplotlib’, ‘plotly’}

Name of the plotting engine used to generate the visualization. Useful for backend-specific post-processing.

widthint or None

Width of the figure:

  • Measured in inches for Matplotlib.

  • Measured in pixels for Plotly.

heightint or None

Height of the figure:

  • Measured in inches for Matplotlib.

  • Measured in pixels for Plotly.

titlestr or None

Title of the generated visualization. This duplicates figure.title for convenience and consistency.

extra_infodict or None

Optional metadata dictionary containing additional details about the visualization. Typical fields may include:

  • ‘palette’: the color palette used,

  • ‘trendline’: trendline model or parameters (for scatterplots),

  • ‘layout’: Plotly layout overrides,

  • ‘transform’: preprocessing steps applied to input data,

  • or any backend-specific information used for reproducibility.

Attributes:
axes
extra_info
height
title
width

Notes

VisualizationResult provides a unified interface for interacting with figures generated by different plotting engines. While the returned object always contains a figure-like object in figure, its behavior depends on the backend:

  • matplotlib: figure is a matplotlib.figure.Figure, and axes contains the primary matplotlib.axes.Axes instance.

  • plotly: figure is a plotly.graph_objects.Figure, and axes is always None.

Common operations such as figure.show() work for both backends, though the resulting UI differs (Matplotlib uses the local renderer or notebook backend; Plotly opens an interactive HTML-based viewer).

Backend-specific methods remain available. For example:

  • Matplotlib: figure.savefig(...)

  • Plotly: figure.write_html(...) or figure.to_json()

This design allows both standardized downstream usage (e.g. consistent access to title or extra_info) and full access to the native API of the underlying visualization engine.

Examples

>>> # Basic usage with a Matplotlib-based visualization:
>>> from explorica import visualizations
>>> result = visualizations.scatterplot(
...     [1, 2, 3], [2, 4, 6], title="Demo Plot")
>>> result.figure        # Matplotlib Figure
<Figure ...>
>>> result.axes          # Matplotlib Axes
<Axes: ...>
>>> result.engine
'matplotlib'
>>> result.title
'Demo Plot'
>>> # Basic usage with a Plotly-based visualization:
>>> result = visualizations.mapbox(
...     lat=[34.05, 40.71],
...     lon=[-118.24, -74.00],
...     title="Cities Map"
... )
>>> result.figure        # Plotly Figure
Figure(...)
>>> result.axes is None
True
>>> result.engine
'plotly'
>>> result.title
'Cities Map'
>>> # Accessing extended metadata:
>>> result.extra_info
axes: Axes | None = None
engine: str = 'matplotlib'
extra_info: dict = None
figure: Figure
height: int | None = None
title: str | None = None
width: int | None = None