conditional_kde.interpolated module¶
Module containing Interpolated Conditional KDE.
- class conditional_kde.interpolated.InterpolatedConditionalGaussian(bandwidth=1.0)[source]¶
Bases:
objectInterpolated Conditional Gaussian estimator.
With respect to the ConditionalGaussian, which fits full distribution and slices through it to obtain the conditional distribution, here we allow for some dimensions of the data to be inherently conditional. For such dimensions, data should be available for every point on a grid.
To compute the final conditional density, one then interpolates for the inherently conditional dimensions, and slices through others as before.
- Parameters:
bandwidth (float) – allows for the additional smoothing/shrinking of the covariance. In most cases, it should be left as 1.
- fit(data, inherent_features=None, features=None, interpolation_points=None, interpolation_method='linear')[source]¶
Fitting the Interpolated Conditional Gaussian.
Let’s define by Y = (y1, y2, …, yN) inherently conditional random variables of the dataset, and by X = (x1, x2, …, xM) other variables, for which one has a sample of points. This function then fits P(X | Y) for every point on a gridded Y space. To make this possible, one needs to pass a set of X samples for every point on a grid. Later, one can use interpolation in Y and slicing in X to compute P(x1, x2 | x3, …, xM, y1, …, yN), or similar. Note that all Y values need to be conditioned.
- Parameters:
data (list of arrays, array) – data to fit. Of shape (n_interp_1, n_interp_2, …, n_samples, n_features). For every point on a grid (n_interp_1, n_interp_2, …, n_interp_N) one needs to pass (n_samples, n_features) dataset, for which a separate n_features-dim Gaussian KDE is fitted. All points on a grid have to have the same number of features (n_features). In the case n_samples is not the same for every point, one needs to pass a nested list of arrays.
inherent_features (list) – optional, list defining name of every inherently conditional feature. It is used for referencing conditional dimensions. Defaults to [-1, -2, …, -N], where N is the number of inherently conditional features.
features (list) – optional, list defining name for every other feature. It’s used for referencing conditional dimensions. Defaults to [0, 1, …, n_features - 1].
interpolation_points (dict) – optional, a dictionary of feature: list_of_values pairs. This defines the grid points for every inherently conditional feature. Every list of values should be a strictly ascending. By default it amounts to: {-1: np.linspace(0, 1, n_interp_1), …, -N: np.linspace(0, 1, n_interp_N)}.
interpolation_method (str) – either “linear” or “nearest”, making linear interpolation between distributions or picking the closest one, respectively.
- Returns:
An instance of itself.
- sample(inherent_conditionals, conditionals=None, n_samples=1, random_state=None, keep_dims=False)[source]¶
Generate random samples from the conditional model. For inherent_condtitionals, there’s only one mode of sampling, where only scalar values are accepted. For conditionals there are two different modes: (1) specify conditionals as scalar values and sample n_samples out of distribution. (2) specify conditionals as an array, where the number of samples will be the length of an array.
- Parameters:
inherent_conditionals (dict) – values of inherent (grid) features. This values are used to interpolate on the grid. All inherently conditional dimensions must be defined.
conditionals (dict) – desired variables (features) to condition upon. Dictionary keys should be only feature names from features. For example, if self.features == [“a”, “b”, “c”] and one would like to condition on “a” and “c”, then conditionals = {“a”: cond_val_a, “c”: cond_val_c}. Conditioned values can be either float or array, where in the case of the latter, all conditioned arrays have to be of the same size. Defaults to None, i.e. normal KDE.
n_samples (int) – number of samples to generate. Defaults to 1.
random_state (np.random.RandomState, int) – seed or RandomState instance, optional. Determines random number generation used to generate random samples. See Glossary <random_state>.
keep_dims (bool) – whether to return non-conditioned dimensions only or keep given conditional values. Defaults to False.
- Returns:
Array of samples of shape (n_samples, N + n_features) if conditional_variables is None, or (n_samples, n_features - len(conditionals)) otherwise.
- score_samples(X, inherent_conditionals, conditional_features=None)[source]¶
Compute the conditional log-probability of each sample under the model.
For the simplicity of calculation, here the grid point is fixed by defining a point in inherently conditional dimensions. X is then an array of shape (n, n_features), including all other dimensions of the data.
- Parameters:
X (array) – data of shape (n, n_features). Last dimension should match dimension of training data (n_features).
inherent_conditionals (dict) – values of inherent (grid) features. This values are used to interpolate on the grid. All inherently conditional dimensions must be defined.
conditional_features (list) – subset of self.features, which dimensions of data to additionally condition upon. Defaults to None, meaning no additionally conditioned dimensions.
- Returns:
Conditional log probability for each sample in X, conditioned on inherently conditional dimensions by inherent_conditionals and other dimensions by conditional_features.
- class conditional_kde.interpolated.InterpolatedConditionalKernelDensity(whitening_algorithm='rescale', bandwidth='scott', **kwargs)[source]¶
Bases:
objectInterpolated Conditional Kernel Density estimator.
With respect to the ConditionalKernelDensity, which fits full distribution and slices through it to obtain the conditional distribution, here we allow for some dimensions of the data to be inherently conditional. For such dimensions, data should be available for every point on a grid.
To compute the final conditional density, one then interpolates for the inherently conditional dimensions, and slices through others as before.
- Parameters:
whitening_algorithm (str) – data whitening algorithm, either None, “rescale” or “ZCA”. See util.DataWhitener for more details. “rescale” by default.
bandwidth (str, float) –
the width of the Gaussian centered around every point.
It can be either:
”scott”, using Scott’s parameter,
”optimized”, which minimizes cross entropy to find the optimal bandwidth, or
float, specifying the actual value.
By default, it uses Scott’s parameter.
**kwargs –
additional kwargs used in the case of “optimized” bandwidth.
steps (int): how many steps to use in optimization, 10 by default.
cv_fold (int): cross validation fold, 5 by default.
n_jobs (int): number of jobs to run cross validation in parallel, -1 by default, i.e. using all available processors.
verbose (int): verbosity of the cross validation run, for more details see sklearn.model_selection.GridSearchCV.
- fit(data, inherent_features=None, features=None, interpolation_points=None, interpolation_method='linear')[source]¶
Fitting the Interpolated Conditional Kernel Density.
Let’s define by Y = (y1, y2, …, yN) inherently conditional random variables of the dataset, and by X = (x1, x2, …, xM) other variables, for which one has a sample of points. This function then fits P(X | Y) for every point on a gridded Y space. To make this possible, one needs to pass a set of X samples for every point on a grid. Later, one can use interpolation in Y and slicing in X to compute P(x1, x2 | x3, …, xM, y1, …, yN), or similar. Note that all Y values need to be conditioned.
- Parameters:
data (list of arrays, array) – data to fit. Of shape (n_interp_1, n_interp_2, …, n_samples, n_features). For every point on a grid (n_interp_1, n_interp_2, …, n_interp_N) one needs to pass (n_samples, n_features) dataset, for which a separate n_features-dim Gaussian KDE is fitted. All points on a grid have to have the same number of features (n_features). In the case n_samples is not the same for every point, one needs to pass a nested list of arrays.
inherent_features (list) – optional, list defining name of every inherently conditional feature. It is used for referencing conditional dimensions. Defaults to [-1, -2, …, -N], where N is the number of inherently conditional features.
features (list) – optional, list defining name for every other feature. It’s used for referencing conditional dimensions. Defaults to [0, 1, …, n_features - 1].
interpolation_points (dict) – optional, a dictionary of feature: list_of_values pairs. This defines the grid points for every inherently conditional feature. Every list of values should be a strictly ascending. By default it amounts to: {-1: np.linspace(0, 1, n_interp_1), …, -N: np.linspace(0, 1, n_interp_N)}.
interpolation_method (str) – either “linear” or “nearest”, making linear interpolation between distributions or picking the closest one, respectively.
- Returns:
An instance of itself.
- sample(inherent_conditionals, conditionals=None, n_samples=1, random_state=None, keep_dims=False)[source]¶
Generate random samples from the conditional model. For inherent_condtitionals, there’s only one mode of sampling, where only scalar values are accepted. For conditionals there are two different modes: (1) specify conditionals as scalar values and sample n_samples out of distribution. (2) specify conditionals as an array, where the number of samples will be the length of an array.
- Parameters:
inherent_conditionals (dict) – values of inherent (grid) features. This values are used to interpolate on the grid. All inherently conditional dimensions must be defined.
conditionals (dict) – desired variables (features) to condition upon. Dictionary keys should be only feature names from features. For example, if self.features == [“a”, “b”, “c”] and one would like to condition on “a” and “c”, then conditionals = {“a”: cond_val_a, “c”: cond_val_c}. Conditioned values can be either float or array, where in the case of the latter, all conditioned arrays have to be of the same size. Defaults to None, i.e. normal KDE.
n_samples (int) – number of samples to generate. Defaults to 1.
random_state (np.random.RandomState, int) – seed or RandomState instance, optional. Determines random number generation used to generate random samples. See Glossary <random_state>.
keep_dims (bool) – whether to return non-conditioned dimensions only or keep given conditional values. Defaults to False.
- Returns:
Array of samples of shape (n_samples, N + n_features) if conditional_variables is None, or (n_samples, n_features - len(conditionals)) otherwise.
- score_samples(X, inherent_conditionals, conditional_features=None)[source]¶
Compute the conditional log-probability of each sample under the model.
For the simplicity of calculation, here the grid point is fixed by defining a point in inherently conditional dimensions. X is then an array of shape (n, n_features), including all other dimensions of the data.
- Parameters:
X (array) – data of shape (n, n_features). Last dimension should match dimension of training data (n_features).
inherent_conditionals (dict) – values of inherent (grid) features. This values are used to interpolate on the grid. All inherently conditional dimensions must be defined.
conditional_features (list) – subset of self.features, which dimensions of data to additionally condition upon. Defaults to None, meaning no additionally conditioned dimensions.
- Returns:
Conditional log probability for each sample in X, conditioned on inherently conditional dimensions by inherent_conditionals and other dimensions by conditional_features.