explorica.visualizations
explorica.visualizations.plots
High-level plotting utilities for Explorica visualizations.
This module provides a set of functions to generate common plots using Matplotlib, Seaborn, and Plotly. It standardizes plot outputs through the VisualizationResult dataclass and supports flexible styling, color palettes, and interactivity.
Methods
- barchart(data, category, ascending=None, horizontal=False, **kwargs)
Plots a bar chart from categorical and numerical data. Supports vertical or horizontal orientation, automatic sorting, and styling through Seaborn.
- piechart(data, category, autopct_method=’value’, **kwargs)
Draws a pie chart based on categorical and numerical data. Supports value, percent, or combined display on each segment.
- mapbox(lat, lon, category=None, **kwargs)
Generates an interactive geographic scatter plot using Plotly Mapbox. Supports categorical coloring, point scaling, hover labels, and Mapbox styling.
Notes
All plotting functions return a
explorica.types.VisualizationResult, which provides a consistent interface for accessing the figure, axes (if applicable), plotting engine, and additional metadata.plot_kws allows passing keyword arguments directly to the underlying plotting function used by the engine (Matplotlib, Seaborn, or Plotly). This provides fine-grained control over styling and behavior specific to that function.
Examples
>>> import matplotlib.pyplot as plt
>>> from explorica.visualizations.plots import barchart
>>> # Basic vertical bar chart (Matplotlib)
>>> data = [3, 7, 5]
>>> categories = ['A', 'B', 'C']
>>> result = barchart(data, categories,
... plot_kws={'color':'skyblue', 'edgecolor':'black'})
>>> result.figure.show()
>>> # Pie chart with percentages displayed
>>> from explorica.visualizations.plots import piechart
>>> result = piechart(data, categories, autopct_method='percent')
>>> result.figure.show()
>>> result.extra_info
{'autopct_method': 'percent'}
>>> # Mapbox scatter plot with categorical coloring
>>> from explorica.visualizations.plots import mapbox
>>> lat = [34.05, 40.71, 37.77]
>>> lon = [-118.24, -74.00, -122.42]
>>> categories = ['City1', 'City2', 'City3']
>>> result = mapbox(lat, lon, category=categories)
>>> # Show interactive map with hover labels
>>> result.figure.show()
>>> # Close all mpl figures after usage
>>> plt.close('all')
- explorica.visualizations.plots.barchart(data: Sequence[float] | Mapping[Any, Sequence[float]], category: Sequence[Any] | Mapping[Any, Sequence[Any]], ascending: bool = None, horizontal: bool = False, **kwargs) VisualizationResult[source]
Plot a Bar Chart using categorical and numerical data series.
This function creates a bar chart to visualize the relationship between categorical labels and numerical values. It supports both vertical and horizontal orientations, automatic sorting, and comprehensive styling options through integration with Seaborn’s visualization system.
Under the hood, the function uses Matplotlib’s matplotlib.axes.Axes.bar and matplotlib.axes.Axes.barh functions and applies Seaborn styles for aesthetic defaults. This allows passing additional kwargs directly to the underlying Matplotlib calls via plot_kws. For complete parameter documentation and advanced customization options, see urls below
- Parameters:
- dataSequence[float] | Mapping[Any, Sequence[float]]
A sequence containing numerical values (bar heights).
- categorySequence[Any] | Mapping[Any, Sequence[Any]]
A sequence containing categorical labels (bar names).
- ascendingbool, optional
If True or False, sorts the bars by value in ascending or descending order, respectively. If None (default), the original order is preserved.
- horizontalbool, optional
If True, plots a horizontal bar chart (barh) instead of a vertical one. Defaults to False.
- opacityfloat, default=0.5
Transparency of the bars (alpha value). Must be between 0 and 1.
- titlestr, optional
The title of the chart. Defaults to an empty string.
- xlabelstr, optional
The label for the X-axis. Overrides the automatic label.
- ylabelstr, optional
The label for the Y-axis. Overrides the automatic label.
- figsizetuple[float, float], optional
The Matplotlib figure size (width, height) in inches. Defaults to (10, 6).
- palettestr or dict, optional
The Seaborn/Matplotlib color palette to use for the plot.
- stylestr, optional
The Matplotlib/Seaborn style to apply to the figure (e.g., ‘whitegrid’, ‘darkgrid’).
- plot_kwsdict, optional
Dictionary of keyword arguments passed directly to the underlying matplotlib functions (matplotlib.axes.Axes.bar & matplotlib.axes.Axes.barh). This allows overriding any default plotting behavior. If not provided, the function internally constructs a dictionary from its own relevant parameters. Keys provided in plot_kws take precedence over internally generated defaults. For complete parameter documentation and advanced customization options, see urls below.
- nan_policy{‘drop’, ‘raise’}, default=’drop’
Policy for handling NaN values in input data:
‘raise’ : raise ValueError if any NaNs are present in data.
‘drop’ : drop rows (axis=0) containing NaNs before computation. This does not drop entire columns.
- directorystr, optional
The path to the directory for saving the plot. If None (default), the plot is not saved.
- verbosebool, optional
If True, prints messages about the saving process. Defaults to False.
- Returns:
- VisualizationResult
A dataclass encapsulating the result of a visualization. See also
explorica.types.VisualizationResultfor full attribute details.
- Raises:
- ValueError
If the lengths of the ‘data’ and ‘category’ input series do not match. If the ‘data’ or ‘category’ input contains more than one column/dimension. If nan_policy=’raise’ and missing values (NaN/null) are found in the data.
- Warns:
- UserWarning
Raised if the input data is empty. An empty plot with a warning message will be returned in this case.
Notes
This function uses matplotlib.axes.Axes.bar and matplotlib.axes.Axes.barh under the hood. For complete parameter documentation and advanced customization options, see: matplotlib bar, matplotlib barh.
Examples
>>> import matplotlib.pyplot as plt >>> from explorica.visualizations.plots import barchart >>> >>> >>> # Simple vertical Bar Chart >>> values = [25, 40, 15, 60, 35] >>> labels = ['Apple', 'Banana', 'Cherry', 'Date', 'Elderberry'] >>> plot = barchart(values, labels, title="Fruit sales") >>> plot.figure.show()
>>> # Horizontal Bar Chart with descending sort: >>> values_h = [150, 80, 220] >>> labels_h = ['Group A', 'Group B', 'Group C'] >>> plot = barchart(values_h, labels_h, ... horizontal=True, ... ascending=False, ... palette='viridis') >>> plot.figure.show() >>> # Close all mpl figures after usage >>> plt.close('all')
- explorica.visualizations.plots.mapbox(lat: Sequence[float], lon: Sequence[float], category: Sequence | None = None, **kwargs) VisualizationResult[source]
Display an interactive geographic scatter plot (Mapbox).
This method provides a high-level interface for visualizing spatial data using latitude and longitude coordinates. It supports categorical coloring, dynamic point sizing, custom hover labels, and Plotly Mapbox styling.
Under the hood, the function uses Plotly’s plotly.express.scatter_map function and applies plotly styles for aesthetic defaults. This allows passing additional kwargs directly to the underlying Plotly calls via plot_kws. For complete parameter documentation and advanced customization options, see urls below
- Parameters:
- latSequence[float]
Latitude values for each point. Cannot contain null values.
- lonSequence[float]
Longitude values for each point. Must match the length of lat and cannot contain null values.
- categorySequence[Any], optional
Categorical labels used to color the points. Must match the length of lat and lon, cannot contain nulls, and determines the number of discrete colors in the plot.
- Returns:
- VisualizationResult
A dataclass encapsulating the result of a visualization. See also
explorica.types.VisualizationResultfor full attribute details.
- Other Parameters:
- hover_nameSequence[Any], optional
Labels to show on hover. Must match the length of lat and lon.
- sizeSequence[float], optional
Numerical values used to scale point sizes. Must match the length of lat and lon.
- titlestr, optional
Plot title.
- show_legendbool, default=True
Whether to display the legend. Relevant only if category is provided.
- paletteSequence[str], optional
List of colors (hex or named) for categories. If not provided, the default Plotly color sequence (px.colors.qualitative.Plotly) is used.
- opacityfloat, default=0.7
Marker opacity.
- heightint, default=600
Figure height in pixels.
- widthint, default=800
Figure width in pixels.
- templatestr, default=”plotly_white”
Plotly template used for styling. E.g., “plotly_dark”, “ggplot2”, “seaborn”.
- map_stylestr, default=”open-street-map”
Mapbox style used for rendering the map. E.g., “carto-positron”, “carto-darkmatter”, “stamen-terrain”, “open-street-map”.
- plot_kwsdict, optional
Dictionary of keyword arguments passed directly to the underlying Plotly function (px.scatter_mapbox). This allows overriding any default plotting behavior. If not provided, the function internally constructs a dictionary from its own relevant parameters. Keys provided in plot_kws take precedence over internally generated defaults. For complete parameter documentation and advanced customization options, see urls below.
- nan_policystr, default=”drop”
Policy for handling NaN values in input data. Supports ‘drop’ (removes rows with NaNs) or ‘raise’ (raises an error).
- directorystr or Path, optional
Path to save the figure as HTML.
- verbosebool, default=False
Enable logging.
- Raises:
- ValueError
If lat, lon, or any optional input contain nulls or mismatched lengths.
- Warns:
- UserWarning
Raised if the input data is empty. An empty plot with a warning message will be returned in this case.
Notes
This function uses plotly.express.scatter_map under the hood. For complete parameter documentation and advanced customization options, see: plotly scatter_map.
The plot is saved as an interactive HTML file when directory is set.
Color resolution is handled internally using resolve_plotly_palette.
This function is intended for rapid map-based EDA rather than full cartographic customization.
Examples
>>> from pathlib import Path >>> from explorica.visualizations.plots import barchart >>> >>> >>> # Basic Mapbox scatter plot usage >>> lat = [34.05, 40.71, 37.77] >>> lon = [-118.24, -74.00, -122.42] >>> result = mapbox(lat, lon, title="Major US Cities") >>> result.figure.show()
>>> # Mapbox scatter plot with categorical coloring usage >>> lat = [34.05, 40.71, 37.77, 51.50] >>> lon = [-118.24, -74.00, -122.42, -0.12] >>> category = ["US", "US", "US", "UK"] >>> result = mapbox(lat, lon, category=category, title="USA vs UK Cities") >>> result.figure.show()
>>> # HTML saving example >>> lat = [34.05, 40.71] >>> lon = [-118.24, -74.00] >>> result = mapbox( ... lat, lon, ... plot_kws={"zoom": 4}, ... directory="./plots", ... title="Saved Map" ... ) >>> result.figure.show() >>> # Check that the file actually created >>> Path("./plots/Saved Map.html").exists() True
- explorica.visualizations.plots.piechart(data: Sequence[float], category: Sequence[Any], autopct_method: str = 'value', **kwargs) VisualizationResult[source]
Draw a pie chart based on categorical and corresponding numerical data.
This function generates a pie chart where each segment represents a category from the input data. The size of each segment is proportional to the corresponding numerical value in data. The chart supports automatic display of percentages, raw values, or both on each segment.
Under the hood, the function uses Matplotlib’s matplotlib.axes.Axes.pie function and applies Seaborn styles for aesthetic defaults. This allows passing additional kwargs directly to the underlying Matplotlib calls via plot_kws. For complete parameter documentation and advanced customization options, see urls below
- Parameters:
- dataSequence[float]
A numerical sequence representing the sizes of the segments.
- categorySequence[Any]
A categorical sequence representing the pie chart segments.
- autopct_methodstr, default=”value”
Determines how the values are displayed on the pie chart. Supported options: “percent”, “value”, “both”.
- Returns:
- VisualizationResult
A dataclass encapsulating the result of a visualization. See also
explorica.types.VisualizationResultfor full attribute details.
- Other Parameters:
- titlestr, optional
Title of the pie chart.
- xlabelstr, optional
The label for the X-axis. Overrides the automatic label.
- ylabelstr, optional
The label for the Y-axis. Overrides the automatic label.
- show_legendbool, default=True
Whether to display a legend.
- show_labelsbool, default=True
Whether to display category labels directly on the chart.
- palettestr or list, optional
Color palette to use for the plot.
- figsizetuple, default=(10, 6)
Figure size (width, height) in inches.
- plot_kwsdict, optional
Dictionary of keyword arguments passed directly to the underlying matplotlib function (matplotlib.Axes.ax.pie). This allows overriding any default plotting behavior. If not provided, the function internally constructs a dictionary from its own relevant parameters. Keys provided in plot_kws take precedence over internally generated defaults. For complete parameter documentation and advanced customization options, see urls below.
- directorystr, optional
If provided, the plot will be saved to this directory.
- nan_policy{“drop”, “raise”}, default=”drop”
How to handle NaN values.
- verbosebool, default=False
If True, print additional information.
- Raises:
- ValueError
If input sizes mismatch. If invalid autopct method is provided.
- Warns:
- UserWarning
Raised if the input data is empty. An empty plot with a warning message will be returned in this case.
Notes
This function uses matplotlib.axes.Axes.pie under the hood. For complete parameter documentation and advanced customization options, see: matplotlib pie.
Examples
>>> import matplotlib.pyplot as plt >>> from explorica.visualizations.plots import piechart >>> >>> >>> # Simple pie chart displaying raw values >>> data = [15, 30, 45, 10] >>> categories = ["A", "B", "C", "D"] >>> result = piechart(data, categories, autopct_method="value", ... title="Simple Pie") >>> # Display the chart >>> result.figure.show() >>> result.title 'Simple Pie'
>>> # Pie chart showing percentages on each segment >>> data = [50, 25, 25] >>> categories = ["Apples", "Bananas", "Cherries"] >>> result = piechart(data, categories, ... autopct_method="percent", show_legend=True) >>> result.figure.show() >>> result.extra_info["autopct_method"] 'percent' >>> # Close all mpl figure after usage >>> plt.close('all')
explorica.visualizations.scatterplot
High-level plotting utilities for Explorica visualizations.
This module provides a high-level interface for generating scatter plots with optional categorization and trendline fitting. It is designed to offer a consistent and expressive API built on top of Matplotlib, with seamless support for themes, palettes, NaN handling, and plot saving.
Functions
- scatterplot(data, target, category=None, **kwargs)
Generates a scatter plot with optional categorization and a trendline.
Notes
All plotting functions return a
explorica.types.VisualizationResult, which provides a consistent interface for accessing the figure, axes (if applicable), plotting engine, and additional metadata.plot_kws allows passing keyword arguments directly to the underlying plotting function used by the engine (Matplotlib, Seaborn, or Plotly). This provides fine-grained control over styling and behavior specific to that function.
Examples
>>> import matplotlib.pyplot as plt
>>> from explorica.visualizations.scatterplot import scatterplot
>>>
>>>
>>> data = [1, 2, 3, 4, 5, 6, 7]
>>> target = [2.5, 3.2, 4.8, 5.1, 7.5, 9.0, 10.5]
>>> category = ['A', 'A', 'B', 'B', 'A', 'B', 'A']
>>> # Basic scatterplot with linear trendline
>>> plot = scatterplot(data, target, trendline='linear', show_legend=False,
... title='Basic Scatterplot')
>>> plot.figure.show()
>>> # Scatterplot with categories and saving
>>> plot = scatterplot(
... data, target, category=category, show_legend=True,
... title='Scatterplot with Categories', directory='plots',
... figsize=(8, 5))
>>> # The figure is saved to the 'plots' directory with filename 'scatterplot.png'
>>> # Scatterplot with polynomial trendline and custom plot_kws
>>> plot = scatterplot(data, target, trendline='polynomial',
... plot_kws={'s': 100, 'marker': 'o', 'c': 'orange',
... 'edgecolor': 'black'},
... title='Scatterplot with Polynomial Trendline')
>>> plot.figure.show()
>>> plt.close(plot.figure)
- explorica.visualizations.scatterplot.scatterplot(data: Sequence[Number], target: Sequence[Number], category: Sequence[Any] = None, **kwargs) VisualizationResult[source]
Generate a scatter plot with optional categorization and a trendline.
The function supports coloring points by category and displaying a fitted trendline. The trendline can be automatically calculated using Ordinary Least Squares (OLS) for linear or polynomial regression, or a custom pre-calculated user function can be provided.
Under the hood, the function uses Matplotlib’s matplotlib.axes.Axes.scatter function and applies Seaborn styles for aesthetic defaults. This allows passing additional kwargs directly to the underlying Matplotlib calls via plot_kws. For complete parameter documentation and advanced customization options, see urls below.
- Parameters:
- dataSequence[Number]
Data for the X-axis.
- targetSequence[Number]
Data for the Y-axis.
- categorySequence[Any], optional
Categorical data used for coloring points and generating a legend. Defaults to None (no categorization).
- Returns:
- VisualizationResult
A dataclass encapsulating the result of a visualization. See also
explorica.types.VisualizationResultfor full attribute details.
- Other Parameters:
- titlestr, optional
Plot title.
- xlabelstr, optional
X-axis label.
- ylabelstr, optional
Y-axis label.
- title_legendstr, default=”Category”
Title for the category legend.
- show_legendbool, default=False
If True, displays the category legend (if categories exist).
- opacityfloat, optional
Transparency level for scatter points (0.0 to 1.0).
- palettestr or list or None, optional
Color palette name (e.g., ‘viridis’) or a list of specific colors.
- cmapstr or None, optional
Colormap name for continuous data (rarely used in scatter plots but included for theme compatibility).
- stylestr or None, optional
Matplotlib style context (e.g., ‘seaborn-v0_8’).
- figsizetuple[int, int], default=(10, 6)
Figure size (width, height) in inches.
- trendlinestr or Callable, optional
Method to draw a trendline. Supports ‘linear’, ‘polynomial’, or a custom callable function. If a string (‘linear’ or ‘polynomial’) is provided, the trendline is automatically fitted using Ordinary Least Squares (OLS) under the hood. If a callable function is provided, it must implement a mapping from a single numeric input to a single numeric output (y = f(x)); the function itself is not modified and will be automatically vectorized over the X-domain for plotting.
- trendline_kwsdict, optional
Additional arguments for the trendline function. Keys include:
- ‘color’str, optional
Color of the trendline.
- ‘linestyle’str, default=’–’
Linestyle of the trendline (e.g., ‘-’, ‘–’, ‘:’, ‘-.’).
- ‘linewidth’int, default=2
Thickness of the trendline.
- ‘x_range’tuple[float, float], optional
The domain (min, max) for which the trendline should be calculated and plotted. If None, it uses the min and max of the input data (x).
- ‘degree’int, default=2
The degree of the polynomial to fit. Only used when trendline=’polynomial’.
- ‘dots’int, default=1000
The number of data points used to draw the smooth trendline curve.
- plot_kwsdict, optional
Dictionary of keyword arguments passed directly to the underlying seaborn function (matplotlib.axes.Axes.scatter). This allows overriding any default plotting behavior. If not provided, the function internally constructs a dictionary from its own relevant parameters. Keys provided in plot_kws take precedence over internally generated defaults. For complete parameter documentation and advanced customization options, see urls below.
- nan_policystr, default=”drop”
Policy for handling NaN values in input data. Supports ‘drop’ (removes rows with NaNs) or ‘raise’ (raises an error).
- directorystr or None, optional
Directory path to save the plot. If None, the plot is not saved.
- verbosebool, default=False
If True, prints save messages.
- Raises:
- ValueError
If trendline is a string and is not one of the supported methods (‘linear’, ‘polynomial’).
If trendline is a callable and its output is not a 1D sequence of numbers with the same length as x_domain.
If data, target, or category (if provided) are not 1D sequences or their lengths do not match.
If any of data, target, or category contains NaNs and nan_policy=’raise’.
If trendline_kws[‘degree’] or trendline_kws[‘dots’] are not natural numbers.
- Warns:
- UserWarning
if the input data is empty. An empty plot with a warning message will be returned in this case. If the number of unique objects to visualize (categories + trendline) exceeds the number of available colors in the chosen palette.
Notes
This function uses matplotlib.axes.Axes.scatter under the hood. For complete parameter documentation and advanced customization options, see: matplotlib scatter
Examples
>>> import matplotlib.pyplot as plt >>> from explorica.visualizations.scatterplot import scatterplot >>> >>> >>> data = [1, 2, 3, 4, 5, 6, 7] >>> target = [2.5, 3.2, 4.8, 5.1, 7.5, 9.0, 10.5] >>> category = ['A', 'A', 'B', 'B', 'A', 'B', 'A'] >>> >>> # Basic scatterplot with linear trendline >>> plot = scatterplot( ... data, ... target, ... trendline='linear', ... show_legend=False, ... title='Basic Scatterplot' ... ) >>> plot.figure.show()
>>> # Scatterplot with categories and saving the plot >>> plot = scatterplot( ... data, ... target, ... category=category, ... show_legend=True, ... title='Scatterplot with Categories', ... directory='plots', ... figsize=(8, 5) ... ) >>> # The figure is saved to the 'plots' directory with filename 'scatterplot.png'
>>> # Passing additional Matplotlib options via plot_kws and a polynomial trendline >>> plot = scatterplot( ... data, ... target, ... trendline="polynomial", ... plot_kws={'s': 100, 'marker': 'o', 'c': 'orange', 'edgecolor': 'black'}, ... title='Scatterplot with Polynomial Trendline' ... ) >>> plot.figure.show() >>> plt.close(plot.figure)
explorica.visualizations.statistical_plots
High-level plotting utilities for Explorica visualizations.
This module defines high-level functions for exploring distributions and relationships between numeric variables. Each function is independent and returns a VisualizationResult dataclass encapsulating the figure, axes, engine, and metadata.
Functions
- distplot(data, bins = 30, kde = True, **kwargs)
Plots a histogram of numeric data with optional Kernel Density Estimation (KDE).
- boxplot(data, **kwargs)
Draws a boxplot for a numeric variable to visualize distribution, median, and potential outliers.
- hexbin(data, target, **kwargs)
Creates a hexbin plot for two numeric variables. Useful for visualizing dense scatter data.
- heatmap(data, **kwargs)
Generates a heatmap to visualize a 2D array of numeric values.
Notes
All plotting functions return a
explorica.types.VisualizationResult, which provides a consistent interface for accessing the figure, axes (if applicable), plotting engine, and additional metadata.plot_kws allows passing keyword arguments directly to the underlying plotting function used by the engine (Matplotlib, Seaborn, or Plotly). This provides fine-grained control over styling and behavior specific to that function.
Examples
>>> from pathlib import Path
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from explorica.visualizations.statistical_plots import distplot
>>>
>>>
>>> # Distribution plot with custom bins, KDE, and figure size
>>> data = np.random.normal(loc=0, scale=1, size=100)
>>> result = distplot(
... data,
... bins=30,
... kde=True,
... title="Normal Distribution Example",
... figsize = (8, 5),
... )
>>> result.figure.show()
>>> # Boxplot with figure saving
>>> from explorica.visualizations.statistical_plots import boxplot
>>> result = boxplot(
... data,
... title="Boxplot Example",
... directory="./plots",
... figsize = (6, 4)
... )
>>> Path("./plots/boxplot.png").exists()
True
>>> # Hexbin plot with color map and point sizing via plot_kws
>>> from explorica.visualizations.statistical_plots import hexbin
>>> x = np.random.randn(500)
>>> y = x*0.5 + np.random.randn(500)*0.5
>>> result = hexbin(
... x, y,
... gridsize=25,
... colormap="plasma",
... title="Hexbin Example",
... figsize = (7, 6),
... plot_kws={"mincnt": 1}
... )
>>> result.figure.show()
>>> # Heatmap with annotations, custom colormap, and figure size
>>> from explorica.visualizations.statistical_plots import heatmap
>>> matrix = np.random.rand(5, 5)
>>> result = heatmap(
... matrix,
... annot=True,
... cmap="coolwarm",
... title="Annotated Heatmap",
... figsize=(6, 5)
... )
>>> result.figure.show()
>>> plt.close(result.figure)
- explorica.visualizations.statistical_plots.boxplot(data: Sequence[float] | Mapping[str, Sequence[float]], **kwargs) VisualizationResult[source]
Draw a boxplot for a numeric variable.
This function generates a standard boxplot to visualize the distribution, median, quartiles, and potential outliers of numeric data.
Under the hood, the function uses Matplotlib’s matplotlib.axes.Axes.boxplot function and applies Seaborn styles for aesthetic defaults. This allows passing additional kwargs directly to the underlying Matplotlib calls via plot_kws. For complete parameter documentation and advanced customization options, see urls below.
- Parameters:
- dataSequence[float] | Mapping[str, Sequence[float]]
Numeric input data. Can be:
1D sequence of numbers
Dictionary with single key-value pair (value is numeric sequence)
pandas Series or single-column DataFrame
- Returns:
- VisualizationResult
A dataclass encapsulating the result of a visualization. See also
explorica.types.VisualizationResultfor full attribute details.
- Other Parameters:
- titlestr, optional
Title of the chart.
- xlabelstr, optional
Label for the X-axis.
- ylabelstr, optional
Label for the Y-axis.
- palettestr or list, optional
Color palette to use for the plot.
- figsizetuple, default=(10, 6)
Figure size (width, height) in inches.
- plot_kwsdict, optional
Dictionary of keyword arguments passed directly to the underlying matplotlib function (matplotlib.axes.Axes.boxplot). This allows overriding any default plotting behavior. If not provided, the function internally constructs a dictionary from its own relevant parameters. Keys provided in plot_kws take precedence over internally generated defaults. For complete parameter documentation and advanced customization options, see urls below.
- directorystr, optional
If provided, the plot will be saved to this directory.
- nan_policy{“drop”, “raise”}, default=”drop”
How to handle NaN values.
- verbosebool, default=False
If True, enables informational logging during plot generation.
- Warns:
- UserWarning
Raised if the input data is empty. An empty plot with a warning message will be returned in this case.
Notes
This function uses matplotlib.axes.Axes.boxplot under the hood. For complete parameter documentation and advanced customization options, see: matplotlib boxplot.
Examples
>>> from pathlib import Path >>> import matplotlib.pyplot as plt >>> from explorica.visualizations.statistical_plots import boxplot >>> # Basic boxplot >>> plot = boxplot([5, 7, 8, 5, 6, 9, 12], title="Simple Boxplot") >>> plot.figure.show()
>>> # Saving the plot to disk >>> plot = boxplot( ... [4, 6, 5, 7, 8], ... directory="./plots", ... title="Saved Boxplot" ... ) >>> Path("./plots/boxplot.png").exists() True
>>> # Detecting potential outliers >>> data_with_outliers = [10, 12, 11, 14, 100, 13, 12, 9, 105] >>> plot = boxplot(data_with_outliers, title="Boxplot with Outliers") >>> plot.figure.show() >>> plt.close(plot.figure)
- explorica.visualizations.statistical_plots.distplot(data: Sequence[float] | Mapping[str, Sequence[float]], bins: int = 30, kde: bool = True, **kwargs) VisualizationResult[source]
Plot a histogram with optional kernel density estimate.
This function generates a histogram for univariate data with an optional kernel density estimate (KDE) curve overlay. It automatically handles data conversion, NaN values, and provides flexible styling options through integration with Seaborn’s plotting system.
Under the hood, the function uses Seaborn’s seaborn.histplot function and applies Seaborn styles for aesthetic defaults. This allows passing additional kwargs directly to the underlying Seaborn calls via plot_kws. For complete parameter documentation and advanced customization options, see urls below.
- Parameters:
- dataSequence[float] | Mapping[str, Sequence[float]]
Numeric input data. Can be:
1D sequence of numbers
Dictionary with single key-value pair (value is numeric sequence)
pandas Series or single-column DataFrame
Must be one-dimensional.
- binsint, default=30
Number of bins in the histogram. Must be a positive integer.
- kdebool, default=True
If True, adds kernel density estimate curve.
- opacityfloat, default=0.5
Transparency of the histogram bars (alpha value). Must be between 0 and 1.
- titlestr, optional
Plot title.
- xlabelstr, optional
Label for the X-axis.
- ylabelstr, optional
Label for the Y-axis.
- figsizetuple[float, float], default=(10, 6)
Figure size (width, height) in inches.
- stylestr, optional
Seaborn style context (e.g., “whitegrid”, “darkgrid”).
- palettestr, optional
Seaborn color palette (e.g., “viridis”, “husl”).
- plot_kwsdict, optional
Dictionary of keyword arguments passed directly to the underlying seaborn function (sns.histplot). This allows overriding any default plotting behavior. If not provided, the function internally constructs a dictionary from its own relevant parameters. Keys provided in plot_kws take precedence over internally generated defaults. For complete parameter documentation and advanced customization options, see urls below.
- directorystr, optional
File path to save figure (e.g., “./plot.png”).
- nan_policystr | Literal[‘drop’, ‘raise’], default=’drop’
Policy for handling NaN values in input data:
‘raise’ : raise ValueError if any NaNs are present in data.
‘drop’ : drop rows (axis=0) containing NaNs before computation. This does not drop entire columns.
- verbosebool, default=False
If True, enables informational logging during plot generation.
- Returns:
- VisualizationResult
A dataclass encapsulating the result of a visualization. See also
explorica.types.VisualizationResultfor full attribute details.
- Raises:
- ValueError
If input data contains multiple columns (not 1-dimensional). If bins is not a positive integer.
- Warns:
- UserWarning
Raised if the input data is empty. An empty plot with a warning message will be returned in this case.
Notes
This function uses seaborn.histplot under the hood. For complete parameter documentation and advanced customization options, see: seaborn histplot.
Vectorization support is planned to be added.
Empty data returns a placeholder plot with informative message.
Examples
>>> from pathlib import Path >>> import matplotlib.pyplot as plt >>> from explorica.visualizations.statistical_plots import distplot >>> # Simple distribution plot with KDE >>> data = [1, 2, 2, 3, 3, 3, 4] >>> result = distplot(data, kde=True, title="Small Dataset Example") >>> result.figure.show()
>>> # Distribution plot with figure saving >>> data = [10, 20, 20, 30, 40, 40, 50] >>> result = distplot( ... data, ... bins=5, ... title="Saved Plot Example", ... directory="./plots" ... ) >>> Path("./plots/distplot.png").exists() True
>>> # Normal distribution with custom bins and figure size >>> import numpy as np >>> data = np.random.normal(loc=50, scale=15, size=200) >>> result = distplot( ... data, ... bins=25, ... kde=True, ... title="Normal Distribution Example", ... figsize=(8, 5), ... plot_kws={"color": "skyblue"} ... ) >>> result.figure.show() >>> plt.close(result.figure)
- explorica.visualizations.statistical_plots.heatmap(data: Sequence[float] | Sequence[Sequence[float]] | Mapping[Any, Sequence[float]], **kwargs) VisualizationResult[source]
Draw a heatmap from the provided data using Matplotlib and Seaborn.
Create a heatmap visualization from numerical data, supporting 1D, 2D sequences, or mappings of keys to sequences. The function automatically handles NaN values according to nan_policy, and provides options for figure size, annotations, and saving the plot to a directory.
Under the hood, the function uses Seaborn’s seaborn.heatmap function and applies Seaborn styles for aesthetic defaults. This allows passing additional kwargs directly to the underlying Seaborn calls via plot_kws. For complete parameter documentation and advanced customization options, see urls below.
- Parameters:
- datasequence of floats, sequence of sequences of floats, or mapping
Input data for the heatmap. Can be:
1D sequence of numerical values (converted to a 1-row heatmap),
2D sequence (list of lists, NumPy array, etc.),
Mapping of keys to sequences (converted to a DataFrame).
- Returns:
- VisualizationResult
A dataclass encapsulating the result of a visualization. See also
explorica.types.VisualizationResultfor full attribute details.
- Other Parameters:
- annotbool, default=True
Whether to annotate the heatmap cells with their numeric values.
- cmapstr, optional
Color map for the heatmap.
- titlestr, optional
Plot title.
- xlabelstr, optional
X-axis label.
- ylabelstr, optional
Y-axis label.
- figsizetuple[float, float], default=(10, 6)
Figure size.
- plot_kwsdict, optional
Dictionary of keyword arguments passed directly to the underlying seaborn function (seaborn.heatmap). This allows overriding any default plotting behavior. If not provided, the function internally constructs a dictionary from its own relevant parameters. Keys provided in plot_kws take precedence over internally generated defaults. For complete parameter documentation and advanced customization options, see urls below.
- directorystr or None, optional
Directory path to save the figure. If None, the figure is not saved.
- nan_policystr, default=”drop”
Policy for handling NaN values in input data. Supports ‘drop’ (removes rows with NaNs) or ‘raise’ (raises an error).
- verbosebool, default=False
Whether to print additional messages during plotting.
- Raises:
- ValueError
If NaN values are present and nan_policy=”raise”.
- Warns:
- UserWarning
Raised if the input data is empty. An empty plot with a warning message will be returned in this case.
Notes
This function uses seaborn.heatmap under the hood. For complete parameter documentation and advanced customization options, see seaborn heatmap.
Examples
>>> from pathlib import Path >>> import matplotlib.pyplot as plt >>> from explorica.visualizations.statistical_plots import heatmap >>> # Simple usage >>> plot = heatmap( ... [[1, 2, 3, 4, 5], ... [5, 4, 3, 2, 1], ... [2, 4, 6, 8, 10]], ... xlabel="X Variable", ... ylabel="Y Variable", ... cmap="viridis" ... ) >>> plot.figure.show()
>>> # Saving the plot to a directory >>> plot = heatmap( ... [[1, 2, 3, 4, 5], ... [5, 4, 3, 2, 1], ... [2, 4, 6, 8, 10]], ... title="Heatmap Example", ... directory="plots", ... figsize=(8, 5) ... ) >>> # The figure is saved to the 'plots' directory with filename 'heatmap.png'
>>> # Passing additional Matplotlib options via plot_kws >>> plot = heatmap( ... [[1, 2, 3, 4, 5], ... [5, 4, 3, 2, 1], ... [2, 4, 6, 8, 10]], ... plot_kws={"cmap": "viridis", "cbar": True}, ... annot=False ... ) >>> plot.figure.show() >>> plt.close(plot.figure)
- explorica.visualizations.statistical_plots.hexbin(data: Sequence[Number], target: Sequence[Number], **kwargs) VisualizationResult[source]
Create a hexbin plot for two numeric variables to visualize point density.
A hexbin plot is a bivariate histogram that uses hexagonal bins to display the density of points in a 2D space. It is particularly useful for large datasets where scatter plots become overcrowded.
Under the hood, the function uses Matplotlib’s matplotlib.axes.Axes.hexbin function and applies Seaborn styles for aesthetic defaults. This allows passing additional kwargs directly to the underlying Matplotlib calls via plot_kws. For complete parameter documentation and advanced customization options, see urls below.
- Parameters:
- dataSequence[Number]
First numeric variable (plotted on x-axis).
- targetSequence[Number]
Second numeric variable (plotted on y-axis).
- Returns:
- VisualizationResult
A dataclass encapsulating the result of a visualization. See also
explorica.types.VisualizationResultfor full attribute details.
- Other Parameters:
- cmapstr or matplotlib.colors.Colormap, optional
Colormap used to color hexagons by count density. Defaults to None.
- opacityfloat, default=1
Opacity of the hexagons (0 = fully transparent, 1 = fully opaque).
- gridsizeint, default=30
The number of hexagons in the x-direction. Larger values produce smaller hexagons and higher resolution.
- titlestr, optional
Title of the chart.
- xlabelstr, optional
Label for the x-axis.
- ylabelstr, optional
Label for the y-axis.
- stylestr, default=None
Seaborn style context (e.g., “whitegrid”, “darkgrid”).
- figsizetuple, default=(10, 6)
Figure size (width, height) in inches.
- plot_kwsdict, optional
Dictionary of keyword arguments passed directly to the underlying matplotlib function (matplotlib.axes.Axes.hexbin). This allows overriding any default plotting behavior. If not provided, the function internally constructs a dictionary from its own relevant parameters. Keys provided in plot_kws take precedence over internally generated defaults. For complete parameter documentation and advanced customization options, see urls below.
- directorystr, optional
If provided, the plot will be saved to this directory.
- nan_policy{“drop”, “raise”}, default=”drop”
How to handle NaN values in the data.
- verbosebool, default=False
If True, enables informational logging during plot generation.
- Raises:
- ValueError
Raised in the following cases: If data and target have different lengths. If NaN values are present and nan_policy=”raise”. If gridsize is not a positive integer.
- Warns:
- UserWarning
Raised if the input data is empty. An empty plot with a warning message will be returned in this case.
Notes
This function uses matplotlib.axes.Axes.hexbin under the hood.
For complete parameter documentation and advanced customization options, see: matplotlib hexbin.
Hexbin plots are most effective with large datasets (>1000 points)
The color intensity represents the count of points in each hexagon
For very sparse data, consider using a scatter plot instead
Examples
>>> from pathlib import Path >>> import matplotlib.pyplot as plt >>> from explorica.visualizations.statistical_plots import hexbin >>> >>> >>> # Simple usage >>> plot = hexbin( ... [1, 2, 3, 4, 5], ... [2, 4, 6, 8, 10], ... xlabel="X Variable", ... ylabel="Y Variable", ... gridsize=20 ... ) >>> plot.figure.show()
>>> # Saving the plot to a directory >>> plot = hexbin( ... [1, 2, 3, 4, 5], ... [2, 4, 6, 8, 10], ... title="Hexbin Example", ... directory="plots", ... figsize=(8, 5) ... ) >>> # The figure is saved to the 'plots' directory with filename 'hexbin.png'
>>> # Passing additional Matplotlib options via plot_kws >>> plot = hexbin( ... [1, 2, 3, 4, 5], ... [2, 4, 6, 8, 10], ... plot_kws={"cmap": "viridis", "mincnt": 1}, ... gridsize=25 ... ) >>> plot.figure.show() >>> plt.close(plot.figure)