# TransportMaps.KL.minimize_KL_divergence¶

## Module Contents¶

### Functions¶

 minimize_kl_divergence(d1, d2[, qtype, qparams, x, w, ...]) Solve $$\arg \min_{\bf a}\mathcal{D}_{KL}\left(\pi, (T^\sharp\pi_{\rm tar})_{\bf a}\right)$$ minimize_kl_divergence_objective(a, params) Objective function $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ for the KL-divergence minimization. Gradient of the objective function $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ for the KL-divergence minimization. Function evaluation and gradient of the objective $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ for the KL-divergence minimization. Hessian of the objective function $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ for the KL-divergence minimization. Action of the Hessian of the objective function $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ on the direction v Assemble the Hessian $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ and compute its action on the vector $$v$$, for the KL-divergence minimization problem. minimize_kl_divergence_component(f, x, w[, x0, ...]) Compute $${\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)$$ Objective function $$-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)$$ Gradient of the objective function $$-\sum_{i=0}^m \nabla_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)$$ Hessian of the objective function $$-\sum_{i=0}^m \nabla^2_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)$$ minimize_kl_divergence_pointwise_monotone(d1, d2[, x, ...]) Compute: $${\bf a}^* = \arg\min_{\bf a}\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ Compute $${\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)$$
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, qtype: int = None, qparams=None, x=None, w=None, params_d1=None, params_d2=None, x0=None, regularization=None, tol=0.0001, maxit=100, ders=2, fungrad=False, hessact=False, precomp_type='uni', batch_size=None, mpi_pool=None, grad_check=False, hess_check=False)[source]

Solve $$\arg \min_{\bf a}\mathcal{D}_{KL}\left(\pi, (T^\sharp\pi_{\rm tar})_{\bf a}\right)$$

Parameters:
• d1 – sampling distribution

• d2 – target distribution $$\pi_{\rm tar}$$

• qtype (int) – quadrature type number provided by $$\pi$$

• qparams (object) – inputs necessary to the generation of the selected quadrature

• x (ndarray [$$m,d$$]) – quadrature points

• w (ndarray [$$m$$]) – quadrature weights

• params_d1 (dict) – parameters for the evaluation of $$\pi$$

• params_d2 (dict) – parameters for the evaluation of $$\pi_{\rm tar}$$

• x0 (ndarray [$$N$$]) – coefficients to be used as initial values for the optimization

• regularization (dict) – defines the regularization to be used. If None, no regularization is applied. If key type=='L2' then applies Tikonhov regularization with coefficient in key alpha.

• tol (float) – tolerance to be used to solve the KL-divergence problem.

• maxit (int) – maximum number of iterations

• ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free, 1 -> gradient, 2 -> hessian.

• fungrad (bool) – whether the target distribution provide the method Distribution.tuple_grad_x_log_pdf() computing the evaluation and the gradient in one step. This is used only for ders==1.

• hessact (bool) – use the action of the Hessian. The target distribution must implement the function Distribution.action_hess_x_log_pdf().

• precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’

• batch_size (list [3 or 2] of int) – the list contains the size of the batch to be used for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

• mpi_pool (mpi_map.MPI_Pool) – pool of processes

• grad_check (bool) – whether to use finite difference to check the correctness of of the gradient

• hess_check (bool) – whether to use finite difference to check the correctenss of the Hessian

Returns:

Return type:

log (dict)

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_objective(a, params)[source]

Objective function $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ for the KL-divergence minimization.

Parameters:

Gradient of the objective function $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ for the KL-divergence minimization.

Parameters:

Function evaluation and gradient of the objective $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ for the KL-divergence minimization.

Parameters:
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_hess_a_objective(a, params)[source]

Hessian of the objective function $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ for the KL-divergence minimization.

Parameters:
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_action_hess_a_objective(a, da, params)[source]

Action of the Hessian of the objective function $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ on the direction v

Parameters:
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_action_storage_hess_a_objective(a, v, params)[source]

Assemble the Hessian $$\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$ and compute its action on the vector $$v$$, for the KL-divergence minimization problem.

Parameters:
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_component(f: TransportMaps.Maps.Functionals.ProductDistributionParametricPullbackComponentFunction, x, w, x0=None, regularization=None, tol=0.0001, maxit=100, ders=2, fungrad=False, precomp_type='uni', batch_size=None, cache_level=1, mpi_pool=None)[source]

Compute $${\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)$$

Parameters:
• f – function $$f$$

• x (ndarray [$$m,d$$]) – quadrature points

• w (ndarray [$$m$$]) – quadrature weights

• x0 (ndarray [$$N$$]) – coefficients to be used as initial values for the optimization

• regularization (dict) – defines the regularization to be used. If None, no regularization is applied. If key type=='L2' then applies Tikonhov regularization with coefficient in key alpha.

• tol (float) – tolerance to be used to solve the KL-divergence problem.

• maxit (int) – maximum number of iterations

• ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free, 1 -> gradient, 2 -> hessian.

• fungrad (bool) – whether the distributions $$\pi_1,\pi_2$$ provide the method Distribution.tuple_grad_x_log_pdf() computing the evaluation and the gradient in one step. This is used only for ders==1.

• precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’

• batch_size (list [3 or 2] of int or list of batch_size) – the list contains the size of the batch to be used for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. If the target distribution is a ProductDistribution, then the optimization problem decouples and batch_size is a list of lists containing the batch sizes to be used for each component of the map.

• cache_level (int) – use high-level caching during the optimization, storing the function evaluation 0, and the gradient evaluation 1 or nothing -1

• mpi_pool (mpi_map.MPI_Pool or list of mpi_pool) – pool of processes to be used, None stands for one process. If the target distribution is a ProductDistribution, then the minimization problem decouples and mpi_pool is a list containing mpi_pools for each component of the map.

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_component_objective(a, params)[source]

Objective function $$-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)$$

Parameters:

Gradient of the objective function $$-\sum_{i=0}^m \nabla_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)$$

Parameters:
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_component_hess_a_objective(a, params)[source]

Hessian of the objective function $$-\sum_{i=0}^m \nabla^2_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)$$

Parameters:
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, x=None, w=None, params_d1=None, params_d2=None, x0=None, regularization=None, tol=0.0001, maxit=100, ders=1, fungrad=False, hessact=False, precomp_type='uni', batch_size=None, mpi_pool=None, grad_check=False, hess_check=False)[source]

Compute: $${\bf a}^* = \arg\min_{\bf a}\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)$$

Parameters:
• d1 (Distribution) – distribution $$\pi_1$$

• d2 (Distribution) – distribution $$\pi_2$$

• x (ndarray [$$m,d$$]) – quadrature points

• w (ndarray [$$m$$]) – quadrature weights

• params_d1 (dict) – parameters for distribution $$\pi_1$$

• params_d2 (dict) – parameters for distribution $$\pi_2$$

• x0 (ndarray [$$N$$]) – coefficients to be used as initial values for the optimization

• regularization (dict) – defines the regularization to be used. If None, no regularization is applied. If key type=='L2' then applies Tikonhov regularization with coefficient in key alpha.

• tol (float) – tolerance to be used to solve the KL-divergence problem.

• maxit (int) – maximum number of iterations

• ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free (SLSQP), 1 -> gradient (SLSQP).

• fungrad (bool) – whether the target distribution provides the method Distribution.tuple_grad_x_log_pdf() computing the evaluation and the gradient in one step. This is used only for ders==1.

• hessact (bool) – this option is disabled for linear span maps (no Hessian used)

• precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’

• batch_size (list [2] of int) – the list contains the size of the batch to be used for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. If the target distribution is a ProductDistribution, then the optimization problem decouples and batch_size is a list of lists containing the batch sizes to be used for each component of the map.

• mpi_pool (mpi_map.MPI_Pool or list of mpi_pool) – pool of processes to be used, None stands for one process. If the target distribution is a ProductDistribution, then the minimization problem decouples and mpi_pool is a list containing mpi_pools for each component of the map.

• grad_check (bool) – whether to use finite difference to check the correctness of of the gradient

• hess_check (bool) – whether to use finite difference to check the correctenss of the Hessian

Returns:

Return type:

log (dict)

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_constraints(a, params)[source]
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_da_constraints(a, params)[source]
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_component(f, x, w, x0=None, regularization=None, tol=0.0001, maxit=100, ders=2, fungrad=False, precomp_type='uni', batch_size=None, cache_level=1, mpi_pool=None)[source]

Compute $${\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)$$

Parameters:
• f (ProductDistributionParametricPullbackComponentFunction) – function $$f$$

• x (ndarray [$$m,d$$]) – quadrature points

• w (ndarray [$$m$$]) – quadrature weights

• x0 (ndarray [$$N$$]) – coefficients to be used as initial values for the optimization

• regularization (dict) – defines the regularization to be used. If None, no regularization is applied. If key type=='L2' then applies Tikonhov regularization with coefficient in key alpha.

• tol (float) – tolerance to be used to solve the KL-divergence problem.

• maxit (int) – maximum number of iterations

• ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free, 1 -> gradient, 2 -> hessian.

• fungrad (bool) – whether the distributions $$\pi_1,\pi_2$$ provide the method Distribution.tuple_grad_x_log_pdf() computing the evaluation and the gradient in one step. This is used only for ders==1.

• precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’

• batch_size (list [3 or 2] of int or list of batch_size) – the list contains the size of the batch to be used for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. If the target distribution is a ProductDistribution, then the optimization problem decouples and batch_size is a list of lists containing the batch sizes to be used for each component of the map.

• cache_level (int) – use high-level caching during the optimization, storing the function evaluation 0, and the gradient evaluation 1 or nothing -1

• mpi_pool (mpi_map.MPI_Pool or list of mpi_pool) – pool of processes to be used, None stands for one process. If the target distribution is a ProductDistribution, then the minimization problem decouples and mpi_pool is a list containing mpi_pools for each component of the map.

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_component_constraints(a, params)[source]
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_component_da_constraints(a, params)[source]