# TransportMaps.KL.KL_divergence¶

## Module Contents¶

### Functions¶

 kl_divergence(d1, d2[, params1, params2, cache, ...]) Compute $$\mathcal{D}_{KL}(\pi_1 | \pi_2)$$ grad_a_kl_divergence(d1, d2[, params1, params2, ...]) Compute $$\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})$$ tuple_grad_a_kl_divergence(d1, d2[, params1, params2, ...]) Compute $$\left(\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\right)$$ hess_a_kl_divergence(d1, d2[, params1, params2, ...]) Compute $$\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})$$ action_hess_a_kl_divergence(da, d1, d2[, params1, ...]) Compute $$\langle\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\delta{\bf }\rangle$$ storage_hess_a_kl_divergence(d1, d2[, params1, ...]) Assemble $$\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})$$. Evaluate action of $$\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})$$ on vector $$v$$. kl_divergence_component(f[, params, cache, x, w, ...]) Compute $$-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)$$ grad_a_kl_divergence_component(f[, params, cache, x, ...]) Compute $$-\sum_{i=0}^m \nabla_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)$$ hess_a_kl_divergence_component(f[, params, cache, x, ...]) Compute $$-\sum_{i=0}^m \nabla^2_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)$$ grad_t_kl_divergence(x, d1, d2[, params1, params2, ...]) Compute $$\nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))$$. grad_x_grad_t_kl_divergence(x, d1, d2[, params1, ...]) Compute $$\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))$$. tuple_grad_x_grad_t_kl_divergence(x, d1, d2[, ...]) Compute $$\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))$$.
TransportMaps.KL.KL_divergence.kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.Distribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None), d1_entropy=True)[source]

Compute $$\mathcal{D}_{KL}(\pi_1 | \pi_2)$$

Parameters:
• d1 (Distribution) – distribution $$\pi_1$$

• d2 (Distribution) – distribution $$\pi_2$$

• params1 (dict) – parameters for distribution $$\pi_1$$

• params2 (dict) – parameters for distribution $$\pi_2$$

• cache (dict) – cached values

• qtype (int) – quadrature type to be used for the approximation of $$\mathbb{E}_{\pi_1}$$

• qparams (object) – parameters necessary for the construction of the quadrature

• x (ndarray [$$m,d$$]) – quadrature points used for the approximation of $$\mathbb{E}_{\pi_1}$$

• w (ndarray [$$m$$]) – quadrature weights used for the approximation of $$\mathbb{E}_{\pi_1}$$

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

• mpi_pool_tuple (tuple  of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and d2

• d1_entropy (bool) – whether to include the entropy term $$\mathbb{E}_{\pi_1}[\log \pi_1]$$ in the KL divergence

Returns:

(float) – $$\mathcal{D}_{KL}(\pi_1 | \pi_2)$$

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.grad_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]

Compute $$\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})$$

Parameters:
• d1 (Distribution) – distribution $$\pi_1$$

• d2 (ParametricTransportMapDistribution) – distribution $$\pi_2$$

• params1 (dict) – parameters for distribution $$\pi_1$$

• params2 (dict) – parameters for distribution $$\pi_2$$

• cache (dict) – cached values

• qtype (int) – quadrature type to be used for the approximation of $$\mathbb{E}_{\pi_1}$$

• qparams (object) – parameters necessary for the construction of the quadrature

• x (ndarray [$$m,d$$]) – quadrature points used for the approximation of $$\mathbb{E}_{\pi_1}$$

• w (ndarray [$$m$$]) – quadrature weights used for the approximation of $$\mathbb{E}_{\pi_1}$$

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

• mpi_pool_tuple (tuple  of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and d2

Returns:

(ndarray [$$N$$] –

$$\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})$$

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.tuple_grad_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None), d1_entropy=True)[source]

Compute $$\left(\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\right)$$

Parameters:
• d1 (Distribution) – distribution $$\pi_1$$

• d2 (Distribution) – distribution $$\pi_2$$

• params1 (dict) – parameters for distribution $$\pi_1$$

• params2 (dict) – parameters for distribution $$\pi_2$$

• cache (dict) – cached values

• qtype (int) – quadrature type to be used for the approximation of $$\mathbb{E}_{\pi_1}$$

• qparams (object) – parameters necessary for the construction of the quadrature

• x (ndarray [$$m,d$$]) – quadrature points used for the approximation of $$\mathbb{E}_{\pi_1}$$

• w (ndarray [$$m$$]) – quadrature weights used for the approximation of $$\mathbb{E}_{\pi_1}$$

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

• mpi_pool_tuple (tuple  of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and d2

Returns:

(tuple) –

$$\left(\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\right)$$

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.hess_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]

Compute $$\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})$$

Parameters:
• d1 (Distribution) – distribution $$\pi_1$$

• d2 (Distribution) – distribution $$\pi_2$$

• params1 (dict) – parameters for distribution $$\pi_1$$

• params2 (dict) – parameters for distribution $$\pi_2$$

• cache (dict) – cached values

• qtype (int) – quadrature type to be used for the approximation of $$\mathbb{E}_{\pi_1}$$

• qparams (object) – parameters necessary for the construction of the quadrature

• x (ndarray [$$m,d$$]) – quadrature points used for the approximation of $$\mathbb{E}_{\pi_1}$$

• w (ndarray [$$m$$]) – quadrature weights used for the approximation of $$\mathbb{E}_{\pi_1}$$

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

• mpi_pool_tuple (tuple  of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and d2

Returns:

(ndarray [$$N,N$$] –

$$\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})$$

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.action_hess_a_kl_divergence(da, d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]

Compute $$\langle\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\delta{\bf }\rangle$$

Parameters:
• da (ndarray [$$N$$]) – vector on which to apply the Hessian

• d1 (Distribution) – distribution $$\pi_1$$

• d2 (Distribution) – distribution $$\pi_2$$

• params1 (dict) – parameters for distribution $$\pi_1$$

• params2 (dict) – parameters for distribution $$\pi_2$$

• cache (dict) – cached values

• qtype (int) – quadrature type to be used for the approximation of $$\mathbb{E}_{\pi_1}$$

• qparams (object) – parameters necessary for the construction of the quadrature

• x (ndarray [$$m,d$$]) – quadrature points used for the approximation of $$\mathbb{E}_{\pi_1}$$

• w (ndarray [$$m$$]) – quadrature weights used for the approximation of $$\mathbb{E}_{\pi_1}$$

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

• mpi_pool_tuple (tuple  of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and d2

Returns:

(ndarray [$$N,N$$] –

$$\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})$$

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

TransportMaps.KL.KL_divergence.storage_hess_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]

Assemble $$\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})$$.

Parameters:
• d1 (Distribution) – distribution $$\pi_1$$

• d2 (Distribution) – distribution $$\pi_2$$

• params1 (dict) – parameters for distribution $$\pi_1$$

• params2 (dict) – parameters for distribution $$\pi_2$$

• cache (dict) – cached values

• qtype (int) – quadrature type to be used for the approximation of $$\mathbb{E}_{\pi_1}$$

• qparams (object) – parameters necessary for the construction of the quadrature

• x (ndarray [$$m,d$$]) – quadrature points used for the approximation of $$\mathbb{E}_{\pi_1}$$

• w (ndarray [$$m$$]) – quadrature weights used for the approximation of $$\mathbb{E}_{\pi_1}$$

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one.

• mpi_pool_tuple (tuple  of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and d2

Returns:

(None) – the result is stored in params2['hess_a_kl_divergence']

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

Note

the dictionary params2 must be provided

TransportMaps.KL.KL_divergence.action_stored_hess_a_kl_divergence(H, v)[source]

Evaluate action of $$\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})$$ on vector $$v$$.

Parameters:
Returns:

(ndarray [$$N$$]) –

$$\langle\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),v\rangle$$

TransportMaps.KL.KL_divergence.kl_divergence_component(f, params=None, cache=None, x=None, w=None, batch_size=None, mpi_pool=None)[source]

Compute $$-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)$$

Parameters:
• f (ProductDistributionParametricPullbackComponentFunction) – function $$f$$

• params (dict) – parameters for function $$f$$

• cache (dict) – cached values

• x (ndarray [$$m,d$$]) – quadrature points used for the approximation of $$\mathbb{E}_{\pi_1}$$

• w (ndarray [$$m$$]) – quadrature weights used for the approximation of $$\mathbb{E}_{\pi_1}$$

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

• mpi_pool (mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of f

Returns:

(float) – value

TransportMaps.KL.KL_divergence.grad_a_kl_divergence_component(f, params=None, cache=None, x=None, w=None, batch_size=None, mpi_pool=None)[source]

Compute $$-\sum_{i=0}^m \nabla_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)$$

Parameters:
• f (ProductDistributionParametricPullbackComponentFunction) – function $$f$$

• params (dict) – parameters for function $$f$$

• cache (dict) – cached values

• x (ndarray [$$m,d$$]) – quadrature points used for the approximation of $$\mathbb{E}_{\pi_1}$$

• w (ndarray [$$m$$]) – quadrature weights used for the approximation of $$\mathbb{E}_{\pi_1}$$

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

• mpi_pool (mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of f

Returns:

(float) – value

TransportMaps.KL.KL_divergence.hess_a_kl_divergence_component(f, params=None, cache=None, x=None, w=None, batch_size=None, mpi_pool=None)[source]

Compute $$-\sum_{i=0}^m \nabla^2_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)$$

Parameters:
• f (ProductDistributionParametricPullbackComponentFunction) – function $$f$$

• params (dict) – parameters for function $$f$$

• cache (dict) – cached values

• x (ndarray [$$m,d$$]) – quadrature points used for the approximation of $$\mathbb{E}_{\pi_1}$$

• w (ndarray [$$m$$]) – quadrature weights used for the approximation of $$\mathbb{E}_{\pi_1}$$

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

• mpi_pool (mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of f

Returns:

(float) – value

Compute $$\nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))$$.

This corresponds to:

Parameters:
• d1 (Distribution) – distribution $$\pi_1$$

• d2 (# pool of processes to be used for the evaluation of d1 and) – distribution $$\pi_2$$

• params1 (dict) – parameters for distribution $$\pi_1$$

• params2 (dict) – parameters for distribution $$\pi_2$$

• cache1 (dict) – cache for distribution $$\pi_1$$

• cache2 (dict) – cache for distribution $$\pi_2$$

• grad_x_tm – optional argument passed if $$\nabla_x T(x)$$ has been already computed

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

• mpi_pool_tuple (#) –

• d2

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

Compute $$\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))$$.

This corresponds to:

Parameters:
• d1 (Distribution) – distribution $$\pi_1$$

• d2 (PullBackTransportMapDistribution) – distribution $$\pi_2$$

• params1 (dict) – parameters for distribution $$\pi_1$$

• params2 (dict) – parameters for distribution $$\pi_2$$

• grad_x_tm – optional argument passed if $$\nabla_x T(x)$$ has been already computed

• grad_t – optional argument passed if the first variation has been already computed

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

• mpi_pool_tuple (tuple  of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and d2

Note

The parameters (qtype,qparams) and (x,w) are mutually exclusive, but one pair of them is necessary.

Compute $$\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))$$.

This corresponds to:

Parameters:
• d1 (Distribution) – distribution $$\pi_1$$

• d2 (PullBackTransportMapDistribution) – distribution $$\pi_2$$

• params1 (dict) – parameters for distribution $$\pi_1$$

• params2 (dict) – parameters for distribution $$\pi_2$$

• grad_x_tm – optional argument passed if $$\nabla_x T(x)$$ has been already computed

• batch_size (int) – this is the size of the batch to evaluated for each iteration. A size 1 correspond to a completely non-vectorized evaluation. A size None correspond to a completely vectorized one. (Note: if nprocs > 1, then the batch size defines the size of the batch for each process)

• mpi_pool_tuple (tuple  of mpi_map.MPI_Pool) – pool of processes to be used for the evaluation of d1 and d2

Note

The parameters (qtype,qparams) and (x,w)` are mutually exclusive, but one pair of them is necessary.