TransportMaps.KL.KL_divergence

Module Contents

Functions

kl_divergence(d1, d2[, params1, params2, cache, ...])

Compute \(\mathcal{D}_{KL}(\pi_1 | \pi_2)\)

grad_a_kl_divergence(d1, d2[, params1, params2, ...])

Compute \(\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)

tuple_grad_a_kl_divergence(d1, d2[, params1, params2, ...])

Compute \(\left(\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\right)\)

hess_a_kl_divergence(d1, d2[, params1, params2, ...])

Compute \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)

action_hess_a_kl_divergence(da, d1, d2[, params1, ...])

Compute \(\langle\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\delta{\bf }\rangle\)

storage_hess_a_kl_divergence(d1, d2[, params1, ...])

Assemble \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\).

action_stored_hess_a_kl_divergence(H, v)

Evaluate action of \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\) on vector \(v\).

kl_divergence_component(f[, params, cache, x, w, ...])

Compute \(-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)\)

grad_a_kl_divergence_component(f[, params, cache, x, ...])

Compute \(-\sum_{i=0}^m \nabla_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)

hess_a_kl_divergence_component(f[, params, cache, x, ...])

Compute \(-\sum_{i=0}^m \nabla^2_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)

grad_t_kl_divergence(x, d1, d2[, params1, params2, ...])

Compute \(\nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\).

grad_x_grad_t_kl_divergence(x, d1, d2[, params1, ...])

Compute \(\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\).

tuple_grad_x_grad_t_kl_divergence(x, d1, d2[, ...])

Compute \(\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\).

TransportMaps.KL.KL_divergence.kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.Distribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None), d1_entropy=True)[source]

Compute \(\mathcal{D}_{KL}(\pi_1 | \pi_2)\)

Parameters:
  • d1 (Distribution) – distribution \(\pi_1\)

  • d2 (Distribution) – distribution \(\pi_2\)

  • params1 (dict) – parameters for distribution \(\pi_1\)

  • params2 (dict) – parameters for distribution \(\pi_2\)

  • cache (dict) – cached values

  • qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • qparams (object) – parameters necessary for the construction of the quadrature

  • x (ndarray [\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • w (ndarray [\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

  • mpi_pool_tuple (tuple [2] of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and d2

  • d1_entropy (bool) – whether to include the entropy term \(\mathbb{E}_{\pi_1}[\log \pi_1]\) in the KL divergence

Returns:

(float) – \(\mathcal{D}_{KL}(\pi_1 | \pi_2)\)

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.grad_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]

Compute \(\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)

Parameters:
  • d1 (Distribution) – distribution \(\pi_1\)

  • d2 (ParametricTransportMapDistribution) – distribution \(\pi_2\)

  • params1 (dict) – parameters for distribution \(\pi_1\)

  • params2 (dict) – parameters for distribution \(\pi_2\)

  • cache (dict) – cached values

  • qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • qparams (object) – parameters necessary for the construction of the quadrature

  • x (ndarray [\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • w (ndarray [\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

  • mpi_pool_tuple (tuple [2] of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and ``d2`

Returns:

(ndarray [\(N\)] –

\(\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.tuple_grad_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None), d1_entropy=True)[source]

Compute \(\left(\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\right)\)

Parameters:
  • d1 (Distribution) – distribution \(\pi_1\)

  • d2 (Distribution) – distribution \(\pi_2\)

  • params1 (dict) – parameters for distribution \(\pi_1\)

  • params2 (dict) – parameters for distribution \(\pi_2\)

  • cache (dict) – cached values

  • qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • qparams (object) – parameters necessary for the construction of the quadrature

  • x (ndarray [\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • w (ndarray [\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

  • mpi_pool_tuple (tuple [2] of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and ``d2`

Returns:

(tuple) –

\(\left(\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\right)\)

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.hess_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]

Compute \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)

Parameters:
  • d1 (Distribution) – distribution \(\pi_1\)

  • d2 (Distribution) – distribution \(\pi_2\)

  • params1 (dict) – parameters for distribution \(\pi_1\)

  • params2 (dict) – parameters for distribution \(\pi_2\)

  • cache (dict) – cached values

  • qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • qparams (object) – parameters necessary for the construction of the quadrature

  • x (ndarray [\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • w (ndarray [\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

  • mpi_pool_tuple (tuple [2] of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and ``d2`

Returns:

(ndarray [\(N,N\)] –

\(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.action_hess_a_kl_divergence(da, d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]

Compute \(\langle\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\delta{\bf }\rangle\)

Parameters:
  • da (ndarray [\(N\)]) – vector on which to apply the Hessian

  • d1 (Distribution) – distribution \(\pi_1\)

  • d2 (Distribution) – distribution \(\pi_2\)

  • params1 (dict) – parameters for distribution \(\pi_1\)

  • params2 (dict) – parameters for distribution \(\pi_2\)

  • cache (dict) – cached values

  • qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • qparams (object) – parameters necessary for the construction of the quadrature

  • x (ndarray [\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • w (ndarray [\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

  • mpi_pool_tuple (tuple [2] of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and ``d2`

Returns:

(ndarray [\(N,N\)] –

\(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.storage_hess_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]

Assemble \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\).

Parameters:
  • d1 (Distribution) – distribution \(\pi_1\)

  • d2 (Distribution) – distribution \(\pi_2\)

  • params1 (dict) – parameters for distribution \(\pi_1\)

  • params2 (dict) – parameters for distribution \(\pi_2\)

  • cache (dict) – cached values

  • qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • qparams (object) – parameters necessary for the construction of the quadrature

  • x (ndarray [\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • w (ndarray [\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

  • mpi_pool_tuple (tuple [2] of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and ``d2`

Returns:

(None) – the result is stored in params2['hess_a_kl_divergence']

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

Note

the dictionary params2 must be provided

TransportMaps.KL.KL_divergence.action_stored_hess_a_kl_divergence(H, v)[source]

Evaluate action of \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\) on vector \(v\).

Parameters:
  • v (ndarray [\(N\)]) – vector \(v\)

  • H (ndarray [\(N,N\)]) – Hessian \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)

Returns:

(ndarray [\(N\)]) –

\(\langle\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),v\rangle\)

TransportMaps.KL.KL_divergence.kl_divergence_component(f, params=None, cache=None, x=None, w=None, batch_size=None, mpi_pool=None)[source]

Compute \(-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)\)

Parameters:
  • f (ProductDistributionParametricPullbackComponentFunction) – function \(f\)

  • params (dict) – parameters for function \(f\)

  • cache (dict) – cached values

  • x (ndarray [\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • w (ndarray [\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

  • mpi_pool (mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of f

Returns:

(float) – value

TransportMaps.KL.KL_divergence.grad_a_kl_divergence_component(f, params=None, cache=None, x=None, w=None, batch_size=None, mpi_pool=None)[source]

Compute \(-\sum_{i=0}^m \nabla_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)

Parameters:
  • f (ProductDistributionParametricPullbackComponentFunction) – function \(f\)

  • params (dict) – parameters for function \(f\)

  • cache (dict) – cached values

  • x (ndarray [\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • w (ndarray [\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

  • mpi_pool (mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of f

Returns:

(float) – value

TransportMaps.KL.KL_divergence.hess_a_kl_divergence_component(f, params=None, cache=None, x=None, w=None, batch_size=None, mpi_pool=None)[source]

Compute \(-\sum_{i=0}^m \nabla^2_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)

Parameters:
  • f (ProductDistributionParametricPullbackComponentFunction) – function \(f\)

  • params (dict) – parameters for function \(f\)

  • cache (dict) – cached values

  • x (ndarray [\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • w (ndarray [\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

  • mpi_pool (mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of f

Returns:

(float) – value

TransportMaps.KL.KL_divergence.grad_t_kl_divergence(x, d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.PullBackTransportMapDistribution, params1=None, params2=None, cache1=None, cache2=None, grad_x_tm=None, batch_size=None)[source]

Compute \(\nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\).

This corresponds to:

Parameters:
  • d1 (Distribution) – distribution \(\pi_1\)

  • d2 (# pool of processes to be used for the evaluation of d1 and) – distribution \(\pi_2\)

  • params1 (dict) – parameters for distribution \(\pi_1\)

  • params2 (dict) – parameters for distribution \(\pi_2\)

  • cache1 (dict) – cache for distribution \(\pi_1\)

  • cache2 (dict) – cache for distribution \(\pi_2\)

  • grad_x_tm – optional argument passed if \(\nabla_x T(x)\) has been already computed

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

  • mpi_pool_tuple (#) –

  • d2

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.grad_x_grad_t_kl_divergence(x, d1, d2: TransportMaps.Distributions.PullBackTransportMapDistribution, params1=None, params2=None, grad_x_tm=None, grad_t=None, batch_size=None, mpi_pool_tuple=(None, None))[source]

Compute \(\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\).

This corresponds to:

Parameters:
  • d1 (Distribution) – distribution \(\pi_1\)

  • d2 (PullBackTransportMapDistribution) – distribution \(\pi_2\)

  • params1 (dict) – parameters for distribution \(\pi_1\)

  • params2 (dict) – parameters for distribution \(\pi_2\)

  • grad_x_tm – optional argument passed if \(\nabla_x T(x)\) has been already computed

  • grad_t – optional argument passed if the first variation has been already computed

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

  • mpi_pool_tuple (tuple [2] of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and d2

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.tuple_grad_x_grad_t_kl_divergence(x, d1, d2: TransportMaps.Distributions.PullBackTransportMapDistribution, params1=None, params2=None, grad_x_tm=None, batch_size=None, mpi_pool_tuple=(None, None))[source]

Compute \(\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\).

This corresponds to:

Parameters:
  • d1 (Distribution) – distribution \(\pi_1\)

  • d2 (PullBackTransportMapDistribution) – distribution \(\pi_2\)

  • params1 (dict) – parameters for distribution \(\pi_1\)

  • params2 (dict) – parameters for distribution \(\pi_2\)

  • grad_x_tm – optional argument passed if \(\nabla_x T(x)\) has been already computed

  • batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

  • mpi_pool_tuple (tuple [2] of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and d2

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.