TransportMaps.KL.minimize_KL_divergence

Module Contents

Functions

minimize_kl_divergence(d1, d2[, qtype, qparams, x, w, ...])

Solve \(\arg \min_{\bf a}\mathcal{D}_{KL}\left(\pi, (T^\sharp\pi_{\rm tar})_{\bf a}\right)\)

minimize_kl_divergence_objective(a, params)

Objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.

minimize_kl_divergence_grad_a_objective(a, params)

Gradient of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.

minimize_kl_divergence_tuple_grad_a_objective(a, params)

Function evaluation and gradient of the objective \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.

minimize_kl_divergence_hess_a_objective(a, params)

Hessian of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.

minimize_kl_divergence_action_hess_a_objective(a, da, ...)

Action of the Hessian of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) on the direction v

minimize_kl_divergence_action_storage_hess_a_objective(a, ...)

Assemble the Hessian \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) and compute its action on the vector \(v\), for the KL-divergence minimization problem.

minimize_kl_divergence_component(f, x, w[, x0, ...])

Compute \({\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)\)

minimize_kl_divergence_component_objective(a, params)

Objective function \(-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)\)

minimize_kl_divergence_component_grad_a_objective(a, ...)

Gradient of the objective function \(-\sum_{i=0}^m \nabla_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)

minimize_kl_divergence_component_hess_a_objective(a, ...)

Hessian of the objective function \(-\sum_{i=0}^m \nabla^2_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)

minimize_kl_divergence_pointwise_monotone(d1, d2[, x, ...])

Compute: \({\bf a}^* = \arg\min_{\bf a}\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\)

minimize_kl_divergence_pointwise_monotone_constraints(a, ...)

minimize_kl_divergence_pointwise_monotone_da_constraints(a, ...)

minimize_kl_divergence_pointwise_monotone_component(f, x, w)

Compute \({\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)\)

minimize_kl_divergence_pointwise_monotone_component_constraints(a, ...)

minimize_kl_divergence_pointwise_monotone_component_da_constraints(a, ...)

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, qtype: int = None, qparams=None, x=None, w=None, params_d1=None, params_d2=None, x0=None, regularization=None, tol=0.0001, maxit=100, ders=2, fungrad=False, hessact=False, precomp_type='uni', batch_size=None, mpi_pool=None, grad_check=False, hess_check=False)[source]

Solve \(\arg \min_{\bf a}\mathcal{D}_{KL}\left(\pi, (T^\sharp\pi_{\rm tar})_{\bf a}\right)\)

Parameters:
  • d1 – sampling distribution

  • d2 – target distribution \(\pi_{\rm tar}\)

  • qtype (int) – quadrature type number provided by \(\pi\)

  • qparams (object) – inputs necessary to the generation of the selected quadrature

  • x (ndarray [\(m,d\)]) – quadrature points

  • w (ndarray [\(m\)]) – quadrature weights

  • params_d1 (dict) – parameters for the evaluation of \(\pi\)

  • params_d2 (dict) – parameters for the evaluation of \(\pi_{\rm tar}\)

  • x0 (ndarray [\(N\)]) – coefficients to be used as initial values for the optimization

  • regularization (dict) – defines the regularization to be used. If None, no regularization is applied. If key type=='L2' then applies Tikonhov regularization with coefficient in key alpha.

  • tol (float) – tolerance to be used to solve the KL-divergence problem.

  • maxit (int) – maximum number of iterations

  • ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free, 1 -> gradient, 2 -> hessian.

  • fungrad (bool) – whether the target distribution provide the method Distribution.tuple_grad_x_log_pdf() computing the evaluation and the gradient in one step. This is used only for ders==1.

  • hessact (bool) – use the action of the Hessian. The target distribution must implement the function Distribution.action_hess_x_log_pdf().

  • precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’

  • batch_size (list [3 or 2] of int) – the list contains the size of the batch to be used for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

  • mpi_pool (mpi_map.MPI_Pool) – pool of processes

  • grad_check (bool) – whether to use finite difference to check the correctness of of the gradient

  • hess_check (bool) – whether to use finite difference to check the correctenss of the Hessian

Returns:

log informations from the solver

Return type:

log (dict)

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_objective(a, params)[source]

Objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.

Parameters:
  • a (ndarray [\(N\)]) – coefficients

  • params (dict) – dictionary of parameters

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_grad_a_objective(a, params)[source]

Gradient of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.

Parameters:
  • a (ndarray [\(N\)]) – coefficients

  • params (dict) – dictionary of parameters

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_tuple_grad_a_objective(a, params)[source]

Function evaluation and gradient of the objective \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.

Parameters:
  • a (ndarray [\(N\)]) – coefficients

  • params (dict) – dictionary of parameters

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_hess_a_objective(a, params)[source]

Hessian of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) for the KL-divergence minimization.

Parameters:
  • a (ndarray [\(N\)]) – coefficients

  • params (dict) – dictionary of parameters

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_action_hess_a_objective(a, da, params)[source]

Action of the Hessian of the objective function \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) on the direction v

Parameters:
  • a (ndarray [\(N\)]) – coefficients

  • da (ndarray [\(N\)]) – vector on which to apply the Hessian

  • params (dict) – dictionary of parameters

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_action_storage_hess_a_objective(a, v, params)[source]

Assemble the Hessian \(\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\) and compute its action on the vector \(v\), for the KL-divergence minimization problem.

Parameters:
  • a (ndarray [\(N\)]) – coefficients

  • v (ndarray [\(N\)]) – vector on which to apply the Hessian

  • params (dict) – dictionary of parameters

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_component(f: TransportMaps.Maps.Functionals.ProductDistributionParametricPullbackComponentFunction, x, w, x0=None, regularization=None, tol=0.0001, maxit=100, ders=2, fungrad=False, precomp_type='uni', batch_size=None, cache_level=1, mpi_pool=None)[source]

Compute \({\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)\)

Parameters:
  • f – function \(f\)

  • x (ndarray [\(m,d\)]) – quadrature points

  • w (ndarray [\(m\)]) – quadrature weights

  • x0 (ndarray [\(N\)]) – coefficients to be used as initial values for the optimization

  • regularization (dict) – defines the regularization to be used. If None, no regularization is applied. If key type=='L2' then applies Tikonhov regularization with coefficient in key alpha.

  • tol (float) – tolerance to be used to solve the KL-divergence problem.

  • maxit (int) – maximum number of iterations

  • ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free, 1 -> gradient, 2 -> hessian.

  • fungrad (bool) – whether the distributions \(\pi_1,\pi_2\) provide the method Distribution.tuple_grad_x_log_pdf() computing the evaluation and the gradient in one step. This is used only for ders==1.

  • precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’

  • batch_size (list [3 or 2] of int or list of batch_size) – the list contains the size of the batch to be used for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. If the target distribution is a ProductDistribution, then the optimization problem decouples and batch_size is a list of lists containing the batch sizes to be used for each component of the map.

  • cache_level (int) – use high-level caching during the optimization, storing the function evaluation 0, and the gradient evaluation 1 or nothing -1

  • mpi_pool (mpi_map.MPI_Pool or list of mpi_pool) – pool of processes to be used, None stands for one process. If the target distribution is a ProductDistribution, then the minimization problem decouples and mpi_pool is a list containing ``mpi_pool``s for each component of the map.

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_component_objective(a, params)[source]

Objective function \(-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)\)

Parameters:
  • a (ndarray [\(N\)]) – coefficients

  • params (dict) – dictionary of parameters

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_component_grad_a_objective(a, params)[source]

Gradient of the objective function \(-\sum_{i=0}^m \nabla_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)

Parameters:
  • a (ndarray [\(N\)]) – coefficients

  • params (dict) – dictionary of parameters

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_component_hess_a_objective(a, params)[source]

Hessian of the objective function \(-\sum_{i=0}^m \nabla^2_{\bf a} f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left( \log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)

Parameters:
  • a (ndarray [\(N\)]) – coefficients

  • params (dict) – dictionary of parameters

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, x=None, w=None, params_d1=None, params_d2=None, x0=None, regularization=None, tol=0.0001, maxit=100, ders=1, fungrad=False, hessact=False, precomp_type='uni', batch_size=None, mpi_pool=None, grad_check=False, hess_check=False)[source]

Compute: \({\bf a}^* = \arg\min_{\bf a}\mathcal{D}_{KL}\left(\pi_1, \pi_{2,{\bf a}}\right)\)

Parameters:
  • d1 (Distribution) – distribution \(\pi_1\)

  • d2 (Distribution) – distribution \(\pi_2\)

  • x (ndarray [\(m,d\)]) – quadrature points

  • w (ndarray [\(m\)]) – quadrature weights

  • params_d1 (dict) – parameters for distribution \(\pi_1\)

  • params_d2 (dict) – parameters for distribution \(\pi_2\)

  • x0 (ndarray [\(N\)]) – coefficients to be used as initial values for the optimization

  • regularization (dict) – defines the regularization to be used. If None, no regularization is applied. If key type=='L2' then applies Tikonhov regularization with coefficient in key alpha.

  • tol (float) – tolerance to be used to solve the KL-divergence problem.

  • maxit (int) – maximum number of iterations

  • ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free (SLSQP), 1 -> gradient (SLSQP).

  • fungrad (bool) – whether the target distribution provides the method Distribution.tuple_grad_x_log_pdf() computing the evaluation and the gradient in one step. This is used only for ders==1.

  • hessact (bool) – this option is disabled for linear span maps (no Hessian used)

  • precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’

  • batch_size (list [2] of int) – the list contains the size of the batch to be used for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. If the target distribution is a ProductDistribution, then the optimization problem decouples and batch_size is a list of lists containing the batch sizes to be used for each component of the map.

  • mpi_pool (mpi_map.MPI_Pool or list of mpi_pool) – pool of processes to be used, None stands for one process. If the target distribution is a ProductDistribution, then the minimization problem decouples and mpi_pool is a list containing ``mpi_pool``s for each component of the map.

  • grad_check (bool) – whether to use finite difference to check the correctness of of the gradient

  • hess_check (bool) – whether to use finite difference to check the correctenss of the Hessian

Returns:

log informations from the solver

Return type:

log (dict)

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_constraints(a, params)[source]
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_da_constraints(a, params)[source]
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_component(f, x, w, x0=None, regularization=None, tol=0.0001, maxit=100, ders=2, fungrad=False, precomp_type='uni', batch_size=None, cache_level=1, mpi_pool=None)[source]

Compute \({\bf a}^\star = \arg\min_{\bf a}-\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i) = \arg\min_{\bf a}-\sum_{i=0}^m f(x_i)\)

Parameters:
  • f (ProductDistributionParametricPullbackComponentFunction) – function \(f\)

  • x (ndarray [\(m,d\)]) – quadrature points

  • w (ndarray [\(m\)]) – quadrature weights

  • x0 (ndarray [\(N\)]) – coefficients to be used as initial values for the optimization

  • regularization (dict) – defines the regularization to be used. If None, no regularization is applied. If key type=='L2' then applies Tikonhov regularization with coefficient in key alpha.

  • tol (float) – tolerance to be used to solve the KL-divergence problem.

  • maxit (int) – maximum number of iterations

  • ders (int) – order of derivatives available for the solution of the optimization problem. 0 -> derivative free, 1 -> gradient, 2 -> hessian.

  • fungrad (bool) – whether the distributions \(\pi_1,\pi_2\) provide the method Distribution.tuple_grad_x_log_pdf() computing the evaluation and the gradient in one step. This is used only for ders==1.

  • precomp_type (str) – whether to precompute univariate Vandermonde matrices ‘uni’ or multivariate Vandermonde matrices ‘multi’

  • batch_size (list [3 or 2] of int or list of batch_size) – the list contains the size of the batch to be used for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. If the target distribution is a ProductDistribution, then the optimization problem decouples and batch_size is a list of lists containing the batch sizes to be used for each component of the map.

  • cache_level (int) – use high-level caching during the optimization, storing the function evaluation 0, and the gradient evaluation 1 or nothing -1

  • mpi_pool (mpi_map.MPI_Pool or list of mpi_pool) – pool of processes to be used, None stands for one process. If the target distribution is a ProductDistribution, then the minimization problem decouples and mpi_pool is a list containing ``mpi_pool``s for each component of the map.

TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_component_constraints(a, params)[source]
TransportMaps.KL.minimize_KL_divergence.minimize_kl_divergence_pointwise_monotone_component_da_constraints(a, params)[source]