TransportMaps.KL.minimize_KL_divergence
¶
Module Contents¶
Functions¶
|
Solve \(\arg \min_{\bf a}\mathcal{D}_{KL}\left(\pi, (T^\sharp\pi_{\rm tar})_{\bf a}\right)\) |
|
Objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization. |
|
Gradient of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization. |
Function evaluation and gradient of the objective \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization. |
|
|
Hessian of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization. |
|
Action of the Hessian of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) on the direction |
|
Assemble the Hessian \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) and compute its action on the vector \(v\), for the KL-divergence minimization problem. |
|
Compute \({\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)\) |
|
Objective function \(-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)\) |
Gradient of the objective function \(-\sum_{i=0}^m \nabla_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\) |
|
Hessian of the objective function \(-\sum_{i=0}^m \nabla^2_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\) |
|
|
Compute: \({\bf a}^* = \arg\min_{\bf a}\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) |
|
|
|
|
|
Compute \({\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)\) |
|
|
|
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, qtype: int = None, qparams=None, x=None, w=None, params_d1=None, params_d2=None, x0=None, regularization=None, tol=0.0001, maxit=100, ders=2, fungrad=False, hessact=False, precomp_type='uni', batch_size=None, mpi_pool=None, grad_check=False, hess_check=False)[source]¶
Solve \(\arg \min_{\bf a}\mathcal{D}_{KL}\left(\pi, (T^\sharp\pi_{\rm tar})_{\bf a}\right)\)
- Parameters:
d1 – sampling distribution
d2 – target distribution \(\pi_{\rm tar}\)
qtype (int) – quadrature type number provided by \(\pi\)
qparams (object) – inputs necessary to the generation of the selected quadrature
x (
ndarray
[\(m,d\)]) – quadrature pointsw (
ndarray
[\(m\)]) – quadrature weightsparams_d1 (dict) – parameters for the evaluation of \(\pi\)
params_d2 (dict) – parameters for the evaluation of \(\pi_{\rm tar}\)
x0 (
ndarray
[\(N\)]) – coefficients to be used as initial values for the optimizationregularization (dict) – defines the regularization to be used. If
None
, no regularization is applied. If keytype=='L2'
then applies Tikonhov regularization with coefficient in keyalpha
.tol (float) – tolerance to be used to solve the KL-divergence problem.
maxit (int) – maximum number of iterations
ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free, 1 -> gradient, 2 -> hessian.
fungrad (bool) – whether the target distribution provide the method
Distribution.tuple_grad_x_log_pdf()
computing the evaluation and the gradient in one step. This is used only forders==1
.hessact (bool) – use the action of the Hessian. The target distribution must implement the function
Distribution.action_hess_x_log_pdf()
.precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’
batch_size (
list
[3 or 2] ofint
) – the list contains the size of the batch to be used for each iteration. A size1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one.mpi_pool (
mpi_map.MPI_Pool
) – pool of processesgrad_check (bool) – whether to use finite difference to check the correctness of of the gradient
hess_check (bool) – whether to use finite difference to check the correctenss of the Hessian
- Returns:
log informations from the solver
- Return type:
log (dict)
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_objective(a, params)[source]¶
Objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_grad_a_objective(a, params)[source]¶
Gradient of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_tuple_grad_a_objective(a, params)[source]¶
Function evaluation and gradient of the objective \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_hess_a_objective(a, params)[source]¶
Hessian of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_action_hess_a_objective(a, da, params)[source]¶
Action of the Hessian of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) on the direction
v
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_action_storage_hess_a_objective(a, v, params)[source]¶
Assemble the Hessian \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) and compute its action on the vector \(v\), for the KL-divergence minimization problem.
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_component(f: TransportMaps.Maps.Functionals.ProductDistributionParametricPullbackComponentFunction, x, w, x0=None, regularization=None, tol=0.0001, maxit=100, ders=2, fungrad=False, precomp_type='uni', batch_size=None, cache_level=1, mpi_pool=None)[source]¶
Compute \({\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)\)
- Parameters:
f – function \(f\)
x (
ndarray
[\(m,d\)]) – quadrature pointsw (
ndarray
[\(m\)]) – quadrature weightsx0 (
ndarray
[\(N\)]) – coefficients to be used as initial values for the optimizationregularization (dict) – defines the regularization to be used. If
None
, no regularization is applied. If keytype=='L2'
then applies Tikonhov regularization with coefficient in keyalpha
.tol (float) – tolerance to be used to solve the KL-divergence problem.
maxit (int) – maximum number of iterations
ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free, 1 -> gradient, 2 -> hessian.
fungrad (bool) – whether the distributions \(\pi_1,\pi_2\) provide the method
Distribution.tuple_grad_x_log_pdf()
computing the evaluation and the gradient in one step. This is used only forders==1
.precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’
batch_size (
list
[3 or 2] ofint
orlist
ofbatch_size
) – the list contains the size of the batch to be used for each iteration. A size1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one. If the target distribution is aProductDistribution
, then the optimization problem decouples andbatch_size
is a list of lists containing the batch sizes to be used for each component of the map.cache_level (int) – use high-level caching during the optimization, storing the function evaluation
0
, and the gradient evaluation1
or nothing-1
mpi_pool (
mpi_map.MPI_Pool
orlist
ofmpi_pool
) – pool of processes to be used,None
stands for one process. If the target distribution is aProductDistribution
, then the minimization problem decouples andmpi_pool
is a list containing ``mpi_pool``s for each component of the map.
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_component_objective(a, params)[source]¶
Objective function \(-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)\)
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_component_grad_a_objective(a, params)[source]¶
Gradient of the objective function \(-\sum_{i=0}^m \nabla_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_component_hess_a_objective(a, params)[source]¶
Hessian of the objective function \(-\sum_{i=0}^m \nabla^2_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, x=None, w=None, params_d1=None, params_d2=None, x0=None, regularization=None, tol=0.0001, maxit=100, ders=1, fungrad=False, hessact=False, precomp_type='uni', batch_size=None, mpi_pool=None, grad_check=False, hess_check=False)[source]¶
Compute: \({\bf a}^* = \arg\min_{\bf a}\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\)
- Parameters:
d1 (Distribution) – distribution \(\pi_1\)
d2 (Distribution) – distribution \(\pi_2\)
x (
ndarray
[\(m,d\)]) – quadrature pointsw (
ndarray
[\(m\)]) – quadrature weightsparams_d1 (dict) – parameters for distribution \(\pi_1\)
params_d2 (dict) – parameters for distribution \(\pi_2\)
x0 (
ndarray
[\(N\)]) – coefficients to be used as initial values for the optimizationregularization (dict) – defines the regularization to be used. If
None
, no regularization is applied. If keytype=='L2'
then applies Tikonhov regularization with coefficient in keyalpha
.tol (float) – tolerance to be used to solve the KL-divergence problem.
maxit (int) – maximum number of iterations
ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free (SLSQP), 1 -> gradient (SLSQP).
fungrad (bool) – whether the target distribution provides the method
Distribution.tuple_grad_x_log_pdf()
computing the evaluation and the gradient in one step. This is used only forders==1
.hessact (bool) – this option is disabled for linear span maps (no Hessian used)
precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’
batch_size (
list
[2] ofint
) – the list contains the size of the batch to be used for each iteration. A size1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one. If the target distribution is aProductDistribution
, then the optimization problem decouples andbatch_size
is a list of lists containing the batch sizes to be used for each component of the map.mpi_pool (
mpi_map.MPI_Pool
orlist
ofmpi_pool
) – pool of processes to be used,None
stands for one process. If the target distribution is aProductDistribution
, then the minimization problem decouples andmpi_pool
is a list containing ``mpi_pool``s for each component of the map.grad_check (bool) – whether to use finite difference to check the correctness of of the gradient
hess_check (bool) – whether to use finite difference to check the correctenss of the Hessian
- Returns:
log informations from the solver
- Return type:
log (dict)
Note
The parameters
(qtype,qparams)
and(x,w)
are mutually exclusive, but one pair of them is necessary.
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_constraints(a, params)[source]¶
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_da_constraints(a, params)[source]¶
- TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_component(f, x, w, x0=None, regularization=None, tol=0.0001, maxit=100, ders=2, fungrad=False, precomp_type='uni', batch_size=None, cache_level=1, mpi_pool=None)[source]¶
Compute \({\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)\)
- Parameters:
f (ProductDistributionParametricPullbackComponentFunction) – function \(f\)
x (
ndarray
[\(m,d\)]) – quadrature pointsw (
ndarray
[\(m\)]) – quadrature weightsx0 (
ndarray
[\(N\)]) – coefficients to be used as initial values for the optimizationregularization (dict) – defines the regularization to be used. If
None
, no regularization is applied. If keytype=='L2'
then applies Tikonhov regularization with coefficient in keyalpha
.tol (float) – tolerance to be used to solve the KL-divergence problem.
maxit (int) – maximum number of iterations
ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free, 1 -> gradient, 2 -> hessian.
fungrad (bool) – whether the distributions \(\pi_1,\pi_2\) provide the method
Distribution.tuple_grad_x_log_pdf()
computing the evaluation and the gradient in one step. This is used only forders==1
.precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’
batch_size (
list
[3 or 2] ofint
orlist
ofbatch_size
) – the list contains the size of the batch to be used for each iteration. A size1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one. If the target distribution is aProductDistribution
, then the optimization problem decouples andbatch_size
is a list of lists containing the batch sizes to be used for each component of the map.cache_level (int) – use high-level caching during the optimization, storing the function evaluation
0
, and the gradient evaluation1
or nothing-1
mpi_pool (
mpi_map.MPI_Pool
orlist
ofmpi_pool
) – pool of processes to be used,None
stands for one process. If the target distribution is aProductDistribution
, then the minimization problem decouples andmpi_pool
is a list containing ``mpi_pool``s for each component of the map.