uadapy.data package

uadapy.data.data module

uadapy.data.data.fetch_covtype_distributions_normal()

Fetch the Forest Cover Type dataset as class-conditional multivariate Normals. Fits a multivariate Normal distribution for each of the 7 cover types over the 54-dimensional feature space (terrain and one-hot indicators).

Returns:

Seven multivariate Normal distributions, one per forest cover type.

Return type:

list of Distribution

uadapy.data.data.generate_synthetic_gmm(n_classes=3, n_dims=4, random_state=0)

Generates synthetic Gaussian Mixture Model distributions. Creates multiple classes, each represented as a GMM with random number of components (1 to 10). Per component, random means and covariances are generated to form the GMM.

Parameters:
  • n_classes (int, optional) – Number of classes to generate. Default value is 3.

  • n_dims (int, optional) – Dimensionality of the original data space. Default value is 4.

  • random_state (int, optional) – Random seed for reproducibility. Default value is 0.

Returns:

List of Distribution objects, each wrapping a MultivariateGMM model.

Return type:

list

uadapy.data.data.generate_synthetic_timeseries(timesteps=200, trend=0.1)

Generates synthetic time series data by modeling a combination of trend, periodic patterns, and noise using a multivariate normal distribution with an exponential quadratic kernel for covariance.

Parameters:

timesteps (int) – The time steps of the time series. Default value is 200.

Returns:

timeseries – An instance of the TimeSeries class, which represents a univariate time series.

Return type:

Timeseries object

uadapy.data.data.load_breast_cancer(normal: bool = False)

Load the Breast Cancer Wisconsin dataset as class-conditional distributions (KDE-based by default). Creates one nonparametric, KDE-backed Distribution for each label (benign, malignant) from the 30-dimensional features.

Parameters:

normal (bool, optional) – If False (default), use KDE per class. If True, use multivariate Normal.

Returns:

Two distributions, one for each class.

Return type:

list of Distribution

uadapy.data.data.load_breast_cancer_normal()

Load the Breast Cancer Wisconsin dataset as multivariate Normal distributions. Fits a multivariate Normal to the feature vectors for each class label (benign vs. malignant).

Returns:

Two multivariate Normal distributions, one per class.

Return type:

list of Distribution

uadapy.data.data.load_digits_normal()

Load the Digits dataset as class-conditional multivariate Normal distributions. Fits a multivariate Normal to the flattened 8x8 image vectors for each digit (0-9).

Returns:

Ten multivariate Normal distributions, one per digit.

Return type:

list of Distribution

uadapy.data.data.load_iris(normal: bool = False)

Load the Iris dataset as class-conditional distributions (KDE-based by default). This function creates one nonparametric, KDE-backed Distribution per Iris species (setosa, versicolor, virginica).

Parameters:

normal (bool, optional) – If False (default), represent each species with a KDE over raw samples (nonparametric). If True, fit a multivariate Normal per species.

Returns:

Three distributions, one for each Iris species (setosa, versicolor, virginica).

Return type:

list of Distribution

uadapy.data.data.load_iris_gmm(n_components=2, random_state=0)

Uses the iris dataset and fits a Gaussian Mixture Model for each class.

Parameters:
  • n_components (int, optional) – Number of mixture components for each GMM. Default value is 2.

  • random_state (int, optional) – Random seed for reproducibility. Default value is 0.

Returns:

List of Distribution objects, each wrapping a MultivariateGMM model fitted to one class of the iris dataset.

Return type:

list

uadapy.data.data.load_iris_normal()

Load the Iris dataset as class-conditional multivariate Normal distributions. This function fits a multivariate Normal distribution to the feature vectors of each Iris species (setosa, versicolor, virginica).

Returns:

Three multivariate Normal distributions, one for each Iris species.

Return type:

list of Distribution

Returns:

Three multivariate Normal distributions, one per Iris species.

Return type:

list of Distribution

uadapy.data.data.load_wine(normal: bool = False)

Load the Wine dataset as class-conditional distributions (KDE-based by default). Builds one KDE-backed, nonparametric Distribution per wine cultivar using the 13-dimensional chemical analysis features.

Parameters:

normal (bool, optional) – If False (default), use KDE per class. If True, use multivariate Normal.

Returns:

Three distributions, one for each wine cultivar.

Return type:

list of Distribution

uadapy.data.data.load_wine_normal()

Load the Wine dataset as class-conditional multivariate Normal distributions. Fits a multivariate Normal to the chemical analysis features for each of the three wine cultivars.

Returns:

Three multivariate Normal distributions, one per wine cultivar.

Return type:

list of Distribution

uadapy.data.data.make_blobs_distributions(n_samples: int = 1500, centers: int | ndarray = 3, n_features: int = 2, cluster_std: float | Iterable[float] = 1.0, random_state: int | None = 42, normal: bool = False)

Generate Gaussian blobs and return KDE-based class-conditional distributions (by default). Uses make_blobs to synthesize clustered data and builds one Distribution per cluster label.

Parameters:
  • n_samples (int, optional) – Total number of samples. Default is 1500.

  • centers (int or ndarray, optional) – Number of clusters or explicit centers. Default is 3.

  • n_features (int, optional) – Dimensionality of the feature space. Default is 2.

  • cluster_std (float or iterable, optional) – Standard deviation(s) of clusters. Default is 1.0.

  • random_state (int or None, optional) – Seed for reproducibility. Default is 42.

  • normal (bool, optional) – If False (default), KDE per cluster. If True, multivariate Normal per cluster.

Returns:

One distribution for each generated blob.

Return type:

list of Distribution

uadapy.data.data.make_blobs_distributions_normal(**kwargs)

Generate Gaussian blobs and fit class-conditional multivariate Normals. Creates synthetic clusters with make_blobs and fits a multivariate Normal to each cluster’s samples.

Parameters:

**kwargs – Forwarded to make_blobs_distributions (e.g., n_samples, centers, n_features, cluster_std, random_state).

Returns:

One multivariate Normal distribution per blob.

Return type:

list of Distribution

uadapy.data.data.make_circles_distributions(n_samples: int = 2000, noise: float = 0.1, factor: float = 0.5, random_state: int | None = 42, normal: bool = False)

Generate concentric circles and return KDE-based class-conditional distributions (by default). Creates two rings (inner/outer) and fits a KDE-backed Distribution to each.

Parameters:
  • n_samples (int, optional) – Total number of samples. Default is 2000.

  • noise (float, optional) – Standard deviation of Gaussian noise. Default is 0.1.

  • factor (float, optional) – Scale between inner/outer circle (0 < factor < 1). Default is 0.5.

  • random_state (int or None, optional) – Seed for reproducibility. Default is 42.

  • normal (bool, optional) – If False (default), KDE per class. If True, multivariate Normal per class.

Returns:

Two distributions, one for the inner circle and one for the outer circle.

Return type:

list of Distribution

uadapy.data.data.make_circles_distributions_normal(**kwargs)

Generate concentric circles and fit multivariate Normal distributions. Uses make_circles to create two concentric rings and fits a multivariate Normal to each ring’s samples.

Parameters:

**kwargs – Forwarded to make_circles_distributions (e.g., n_samples, noise, factor, random_state).

Returns:

Two multivariate Normal distributions, one per circle.

Return type:

list of Distribution

uadapy.data.data.make_classification_distributions_normal(n_samples: int = 2000, n_features: int = 10, n_informative: int = 5, n_redundant: int = 2, n_repeated: int = 0, n_classes: int = 3, class_sep: float = 1.0, flip_y: float = 0.01, weights: Iterable[float] | None = None, random_state: int | None = 42)

Generate a synthetic classification problem and fit multivariate Normals. Calls make_classification to synthesize an n-class dataset with controllable informative/redundant features and class separability, then fits a multivariate Normal per class for a parametric density view.

Parameters:
  • n_samples (int, optional) – Total number of samples to generate. Default is 2000.

  • n_features (int, optional) – Total number of features. Default is 10.

  • n_informative (int, optional) – Number of informative features. Default is 5.

  • n_redundant (int, optional) – Number of redundant features (linear combinations of informative). Default is 2.

  • n_repeated (int, optional) – Number of duplicated features. Default is 0.

  • n_classes (int, optional) – Number of target classes. Default is 3.

  • class_sep (float, optional) – Controls separability between classes (larger = easier). Default is 1.0.

  • flip_y (float, optional) – Fraction of samples whose class is randomly exchanged (label noise). Default is 0.01.

  • weights (iterable of float, optional) – Class weights that sum to 1. If None, classes are balanced. Default is None.

  • random_state (int or None, optional) – Seed for reproducibility. Default is 42.

Returns:

One distribution per generated class.

Return type:

list of Distribution

uadapy.data.data.make_gaussian_quantiles_data(normal: bool = False, n_samples: int = 2000, n_features: int = 2, n_classes: int = 3, mean: ndarray | None = None, cov: float = 1.0, random_state: int = 42)

Generate Gaussian-quantiles data and return KDE-based class-conditional distributions (by default). Samples from a multivariate Gaussian are split into labels by quantile thresholds.

Parameters:
  • normal (bool, optional) – If False (default), KDE per class. If True, multivariate Normal per class.

  • n_samples (int, optional) – Total number of samples. Default is 2000.

  • n_features (int, optional) – Feature dimensionality. Default is 2.

  • n_classes (int, optional) – Number of quantile-based classes. Default is 3.

  • mean (ndarray or None, optional) – Mean vector of the underlying Gaussian. Default is None (zeros).

  • cov (float, optional) – Covariance scaling factor. Default is 1.0.

  • random_state (int, optional) – Seed for reproducibility. Default is 42.

Returns:

One distribution per class created by the quantile split.

Return type:

list of Distribution

uadapy.data.data.make_gaussian_quantiles_normal(**kwargs)

Generate Gaussian-quantiles data and fit multivariate Normal distributions. Samples from an underlying multivariate Gaussian are split into classes by quantile thresholds (make_gaussian_quantiles).

Parameters:

**kwargs – Forwarded to make_gaussian_quantiles_data (e.g., n_samples, n_features, n_classes, mean, cov, random_state).

Returns:

One multivariate Normal distribution per quantile-defined class.

Return type:

list of Distribution

uadapy.data.data.make_hastie_10_2_distributions(n_samples: int = 12000, random_state: int | None = 42, normal: bool = False)

Generate the Hastie 10-2 binary classification dataset and return class-conditional distributions (KDE-based by default). This classic synthetic set has 10 features and labels y ∈ {-1, +1}. By default, this function builds one nonparametric, KDE-backed Distribution per class.

Parameters:
  • n_samples (int, optional) – Total number of samples to generate. Default is 12000 (as in sklearn).

  • random_state (int or None, optional) – Seed for reproducibility. Default is 42.

  • normal (bool, optional) – If False (default), use KDE per class. If True, use multivariate Normal.

Returns:

Two distributions, one for the class y = -1 and one for y = +1.

Return type:

list of Distribution

uadapy.data.data.make_hastie_10_2_distributions_normal(**kwargs)

Generate the Hastie 10-2 dataset and fit multivariate Normal distributions. Creates the 10-feature binary dataset and fits a multivariate Normal to each class (y ∈ {-1, +1} mapped to {0, 1}).

Parameters:

**kwargs – Forwarded to make_hastie_10_2_distributions (e.g., n_samples, random_state).

Returns:

Two multivariate Normal distributions, one per class.

Return type:

list of Distribution

uadapy.data.data.make_moons_distributions(n_samples: int = 2000, noise: float = 0.15, random_state: int | None = 42, normal: bool = False)

Generate the two-moons dataset and return KDE-based class-conditional distributions (by default). Produces two interleaving half-circles with label 0/1 and builds a KDE-backed Distribution for each.

Parameters:
  • n_samples (int, optional) – Total number of samples. Default is 2000.

  • noise (float, optional) – Standard deviation of Gaussian noise. Default is 0.15.

  • random_state (int or None, optional) – Seed for reproducibility. Default is 42.

  • normal (bool, optional) – If False (default), KDE per class. If True, multivariate Normal per class.

Returns:

Two distributions, one for each moon.

Return type:

list of Distribution

uadapy.data.data.make_moons_distributions_normal(**kwargs)

Generate the two-moons dataset and fit multivariate Normal distributions. Produces two interleaving half-circles with make_moons and fits a multivariate Normal to each moon’s samples.

Parameters:

**kwargs – Forwarded to make_moons_distributions (e.g., n_samples, noise, random_state).

Returns:

Two multivariate Normal distributions, one per moon.

Return type:

list of Distribution