conditional_kde.gaussian module¶

Module containing Gaussian versions of the Conditional KDE.

class conditional_kde.gaussian.ConditionalGaussian(bandwidth=1.0)[source]¶

Bases: object

Conditional Gaussian. Makes a simple Gaussian fit to the data, allowing for conditioning.

Parameters:: bandwidth (float) – allows for the additional smoothing/shrinking of the covariance. In most cases, it should be left as 1.

static _covariance_decomposition(cov, cond_mask, cond_only=False)[source]¶

Decomposing covariance matrix into the unconditional, conditional and cross terms.

Parameters:

cov (array) – covariance matrix.
cond_mask (array) – boolean array defining conditional dimensions.
cond_only (bool) – to return only conditional matrix or all decompositions.

Returns:

If cond_only is True, only conditional part of the covariance, otherwise: conditional, unconditional and cross parts, respectively.

static _log_prob(X, mean, cov, add_norm=True)[source]¶

Log probability of a gaussian KDE distribution.

Parameters:

X (array) – array of samples for which probability is calculated. Of shape (n, n_features).
mean (array) – mean of a gaussian distribution.
cov (float, array) – covariance matrix of a gaussian distribution. If float, it is a variance shared for all features. If 1D array, it is a variance for every feature separately. if 2D array, it is a full covariance matrix.
add_norm (bool) – either to add normalization factor to the calculation or not.

Returns:

Log probabilities.

fit(X, weights=None, features=None)[source]¶

Fitting the Conditional Kernel Density.

Parameters:

X (array) – data of shape (n_samples, n_features).
weights (array) – weights of every sample, of shape (n_samples).
features (list) – optional, list defining names for every feature. It’s used for referencing conditional dimensions. Defaults to [0, 1, …, n_features - 1].

Returns:

An instance of itself.

sample(conditionals=None, n_samples=1, random_state=None, keep_dims=False)[source]¶

Generate random samples from the conditional model. There are two modes of sampling: (1) specify conditionals as scalar values and sample n_samples out of distribution. (2) specify conditionals as an array, where the number of samples will be the length of an array.

Parameters:

conditionals (dict) – desired variables (features) to condition upon. Dictionary keys should be only feature names from features. For example, if self.features == [“a”, “b”, “c”] and one would like to condition on “a” and “c”, then conditionals = {“a”: cond_val_a, “c”: cond_val_c}. Conditioned values can be either float or array, where in the case of the latter, all conditioned arrays have to be of the same size. Defaults to None, i.e. normal KDE.
n_samples (int) – number of samples to generate. Ignored in the case conditional arrays have been passed in conditionals. Defaults to 1.
random_state (np.random.RandomState, int) – seed or RandomState instance, optional. Determines random number generation used to generate random samples. See Glossary <random_state>.
keep_dims (bool) – whether to return non-conditioned dimensions only or keep given conditional values. Defaults to False.

Returns:

Array of samples, of shape (n_samples, n_features) if conditional_variables is None, or (n_samples, n_features - len(conditionals)) otherwise.

score_samples(X, conditional_features=None)[source]¶

Compute the (un)conditional log-probability of each sample under the model.

Parameters:

X (array) – data of shape (n, n_features). Last dimension should match dimension of training data (n_features).
conditional_features (list) – subset of self.features, which dimensions of data to condition upon. Defaults to None, meaning unconditional log-probability.

Returns:

Conditional log probability for each sample in X.

class conditional_kde.gaussian.ConditionalGaussianKernelDensity(whitening_algorithm='rescale', bandwidth='scott', **kwargs)[source]¶

Bases: object

Conditional Kernel Density estimator.

Parameters:

whitening_algorithm (str) – data whitening algorithm, either None, “rescale” or “ZCA”. See util.DataWhitener for more details. “rescale” by default.
bandwidth (str, float) –
the width of the Gaussian centered around every point.

It can be either:
1. ”scott”, using Scott’s parameter,
2. ”optimized”, which minimizes cross entropy to find the optimal bandwidth, or
3. float, specifying the actual value.
By default, it uses Scott’s parameter.
**kwargs –
additional kwargs used in the case of “optimized” bandwidth.

steps (int): how many steps to use in optimization, 10 by default.

cv_fold (int): cross validation fold, 5 by default.

n_jobs (int): number of jobs to run cross validation in parallel, -1 by default, i.e. using all available processors.

verbose (int): verbosity of the cross validation run, for more details see sklearn.model_selection.GridSearchCV.

static _conditional_weights(conditional_values, conditional_data, cov, optimize_memory=False)[source]¶

Weights for the sampling from the conditional distribution.

They amount to the conditioned part of the gaussian for every data point.

Parameters:

conditional_values (array) – of length n_conditionals.
cond_data (array) – of shape (n_samples, n_conditionals). Here non-conditional dimensions are already removed.
cov (float, array) – covariance matrix. If float, it is a variance shared for all features. If 1D array, it is a variance for every feature separately. if 2D array, it is a full covariance matrix.
optimize_memory (bool) – only for the vectorized conditionals, it makes an effort to minimize memory footprint, and enlarges computational time.

Returns:

Normalized weights.

static _covariance_decomposition(cov, cond_mask, cond_only=False)[source]¶

Decomposing covariance matrix into the unconditional, conditional and cross terms.

Parameters:

cov (array) – covariance matrix.
cond_mask (array) – boolean array defining conditional dimensions.
cond_only (bool) – to return only conditional matrix or all decompositions.

Returns:

If cond_only is True, only conditional part of the covariance, otherwise: conditional, unconditional and cross parts, respectively.

static _log_prob(X, data, cov, add_norm=True)[source]¶

Log probability of a gaussian KDE distribution.

Parameters:

X (array) – array of samples for which probability is calculated. Of shape (n, n_features).
data (array) – KDE data, of shape (n_samples, n_features).
cov (float, array) – covariance matrix of a gaussian distribution. If float, it is a variance shared for all features. If 1D array, it is a variance for every feature separately. if 2D array, it is a full covariance matrix.
add_norm (bool) – either to add normalization factor to the calculation or not.

Returns:

Log probabilities.

_sample(conditionals=None, n_samples=1, random_state=None, keep_dims=False)[source]¶

Generate random samples from the conditional model.

Here there is an assumption that all dimensions have not been distorted, but only rescaled. In other words, it works for None and “rescale” whitening algorithms, but not for “ZCA”.

_sample_general(conditionals=None, n_samples=1, random_state=None, keep_dims=False)[source]¶

Generate random samples from the conditional model.

This function is the most general sampler, without any assumptions. It should be used for ZCA.

fit(X, features=None)[source]¶

Fitting the Conditional Kernel Density.

Parameters:

X (array) – data of shape (n_samples, n_features).
features (list) – optional, list defining names for every feature. It’s used for referencing conditional dimensions. Defaults to [0, 1, …, n_features - 1].

Returns:

An instance of itself.

static log_scott(n_samples, n_features)[source]¶: Scott’s parameter.

sample(conditionals=None, n_samples=1, random_state=None, keep_dims=False)[source]¶

Parameters:

conditionals (dict) – desired variables (features) to condition upon. Dictionary keys should be only feature names from features. For example, if self.features == [“a”, “b”, “c”] and one would like to condition on “a” and “c”, then conditionals = {“a”: cond_val_a, “c”: cond_val_c}. Conditioned values can be either float or array, where in the case of the latter, all conditioned arrays have to be of the same size. Defaults to None, i.e. normal KDE.
n_samples (int) – number of samples to generate. Ignored in the case conditional arrays have been passed in conditionals. Defaults to 1.
random_state (np.random.RandomState, int) – seed or RandomState instance, optional. Determines random number generation used to generate random samples. See Glossary <random_state>.
keep_dims (bool) – whether to return non-conditioned dimensions only or keep given conditional values. Defaults to False.

Returns:

Array of samples, of shape (n_samples, n_features) if conditional_variables is None, or (n_samples, n_features - len(conditionals)) otherwise.

score_samples(X, conditional_features=None)[source]¶

Compute the (un)conditional log-probability of each sample under the model.

Parameters:

X (array) – data of shape (n, n_features). Last dimension should match dimension of training data (n_features).
conditional_features (list) – subset of self.features, which dimensions of data to condition upon. Defaults to None, meaning unconditional log-probability.

Returns:

Conditional log probability for each sample in X.