TransportMaps.KL.KL_divergence
¶
Module Contents¶
Functions¶
|
Compute \(\mathcal{D}_{KL}(\pi_1 | \pi_2)\) |
|
Compute \(\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\) |
|
Compute \(\left(\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\right)\) |
|
Compute \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\) |
|
Compute \(\langle\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\delta{\bf }\rangle\) |
|
Assemble \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\). |
Evaluate action of \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\) on vector \(v\). |
|
|
Compute \(-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)\) |
|
Compute \(-\sum_{i=0}^m \nabla_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\) |
|
Compute \(-\sum_{i=0}^m \nabla^2_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\) |
|
Compute \(\nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\). |
|
Compute \(\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\). |
|
Compute \(\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\). |
- TransportMaps.KL.KL_divergence.kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.Distribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None), d1_entropy=True)[source]¶
Compute \(\mathcal{D}_{KL}(\pi_1 | \pi_2)\)
- Parameters:
d1 (Distribution) – distribution \(\pi_1\)
d2 (Distribution) – distribution \(\pi_2\)
params1 (dict) – parameters for distribution \(\pi_1\)
params2 (dict) – parameters for distribution \(\pi_2\)
cache (dict) – cached values
qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)
qparams (object) – parameters necessary for the construction of the quadrature
x (
ndarray
[\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)w (
ndarray
[\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one. (Note: ifnprocs > 1
, then the batch size defines the size of the batch for each process)mpi_pool_tuple (
tuple
[2] ofmpi_map.MPI_Pool
) – pool of processes to be used for the evaluation ofd1
andd2
d1_entropy (bool) – whether to include the entropy term \(\mathbb{E}_{\pi_1}[\log \pi_1]\) in the KL divergence
- Returns:
(
float
) – \(\mathcal{D}_{KL}(\pi_1 | \pi_2)\)
Note
The parameters
(qtype,qparams)
and(x,w)
are mutually exclusive, but one pair of them is necessary.
- TransportMaps.KL.KL_divergence.grad_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]¶
Compute \(\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)
- Parameters:
d1 (Distribution) – distribution \(\pi_1\)
d2 (ParametricTransportMapDistribution) – distribution \(\pi_2\)
params1 (dict) – parameters for distribution \(\pi_1\)
params2 (dict) – parameters for distribution \(\pi_2\)
cache (dict) – cached values
qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)
qparams (object) – parameters necessary for the construction of the quadrature
x (
ndarray
[\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)w (
ndarray
[\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one.mpi_pool_tuple (
tuple
[2] ofmpi_map.MPI_Pool
) – pool of processes to be used for the evaluation ofd1
and ``d2`
- Returns:
- (
ndarray
[\(N\)] – \(\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)
- (
Note
The parameters
(qtype,qparams)
and(x,w)
are mutually exclusive, but one pair of them is necessary.
- TransportMaps.KL.KL_divergence.tuple_grad_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None), d1_entropy=True)[source]¶
Compute \(\left(\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\right)\)
- Parameters:
d1 (Distribution) – distribution \(\pi_1\)
d2 (Distribution) – distribution \(\pi_2\)
params1 (dict) – parameters for distribution \(\pi_1\)
params2 (dict) – parameters for distribution \(\pi_2\)
cache (dict) – cached values
qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)
qparams (object) – parameters necessary for the construction of the quadrature
x (
ndarray
[\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)w (
ndarray
[\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one.mpi_pool_tuple (
tuple
[2] ofmpi_map.MPI_Pool
) – pool of processes to be used for the evaluation ofd1
and ``d2`
- Returns:
- (
tuple
) – \(\left(\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\nabla_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\right)\)
- (
Note
The parameters
(qtype,qparams)
and(x,w)
are mutually exclusive, but one pair of them is necessary.
- TransportMaps.KL.KL_divergence.hess_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]¶
Compute \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)
- Parameters:
d1 (Distribution) – distribution \(\pi_1\)
d2 (Distribution) – distribution \(\pi_2\)
params1 (dict) – parameters for distribution \(\pi_1\)
params2 (dict) – parameters for distribution \(\pi_2\)
cache (dict) – cached values
qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)
qparams (object) – parameters necessary for the construction of the quadrature
x (
ndarray
[\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)w (
ndarray
[\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one.mpi_pool_tuple (
tuple
[2] ofmpi_map.MPI_Pool
) – pool of processes to be used for the evaluation ofd1
and ``d2`
- Returns:
- (
ndarray
[\(N,N\)] – \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)
- (
Note
The parameters
(qtype,qparams)
and(x,w)
are mutually exclusive, but one pair of them is necessary.
- TransportMaps.KL.KL_divergence.action_hess_a_kl_divergence(da, d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]¶
Compute \(\langle\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}}),\delta{\bf }\rangle\)
- Parameters:
da (
ndarray
[\(N\)]) – vector on which to apply the Hessiand1 (Distribution) – distribution \(\pi_1\)
d2 (Distribution) – distribution \(\pi_2\)
params1 (dict) – parameters for distribution \(\pi_1\)
params2 (dict) – parameters for distribution \(\pi_2\)
cache (dict) – cached values
qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)
qparams (object) – parameters necessary for the construction of the quadrature
x (
ndarray
[\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)w (
ndarray
[\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one.mpi_pool_tuple (
tuple
[2] ofmpi_map.MPI_Pool
) – pool of processes to be used for the evaluation ofd1
and ``d2`
- Returns:
- (
ndarray
[\(N,N\)] – \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\)
- (
Note
The parameters
(qtype,qparams)
and(x,w)
are mutually exclusive, but one pair of them is necessary.
- TransportMaps.KL.KL_divergence.storage_hess_a_kl_divergence(d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.ParametricTransportMapDistribution, params1=None, params2=None, cache=None, qtype=None, qparams=None, x=None, w=None, batch_size=None, mpi_pool_tuple=(None, None))[source]¶
Assemble \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\).
- Parameters:
d1 (Distribution) – distribution \(\pi_1\)
d2 (Distribution) – distribution \(\pi_2\)
params1 (dict) – parameters for distribution \(\pi_1\)
params2 (dict) – parameters for distribution \(\pi_2\)
cache (dict) – cached values
qtype (int) – quadrature type to be used for the approximation of \(\mathbb{E}_{\pi_1}\)
qparams (object) – parameters necessary for the construction of the quadrature
x (
ndarray
[\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)w (
ndarray
[\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one.mpi_pool_tuple (
tuple
[2] ofmpi_map.MPI_Pool
) – pool of processes to be used for the evaluation ofd1
and ``d2`
- Returns:
(None) – the result is stored in
params2['hess_a_kl_divergence']
Note
The parameters
(qtype,qparams)
and(x,w)
are mutually exclusive, but one pair of them is necessary.Note
the dictionary
params2
must be provided
- TransportMaps.KL.KL_divergence.action_stored_hess_a_kl_divergence(H, v)[source]¶
Evaluate action of \(\nabla^2_{\bf a}\mathcal{D}_{KL}(\pi_1 | \pi_{2,{\bf a}})\) on vector \(v\).
- TransportMaps.KL.KL_divergence.kl_divergence_component(f, params=None, cache=None, x=None, w=None, batch_size=None, mpi_pool=None)[source]¶
Compute \(-\sum_{i=0}^m f(x_i) = -\sum_{i=0}^m \log\pi\circ T_k(x_i) + \log\partial_{x_k}T_k(x_i)\)
- Parameters:
f (ProductDistributionParametricPullbackComponentFunction) – function \(f\)
params (dict) – parameters for function \(f\)
cache (dict) – cached values
x (
ndarray
[\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)w (
ndarray
[\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one. (Note: ifnprocs > 1
, then the batch size defines the size of the batch for each process)mpi_pool (
mpi_map.MPI_Pool
) – pool of processes to be used for the evaluation off
- Returns:
(
float
) – value
- TransportMaps.KL.KL_divergence.grad_a_kl_divergence_component(f, params=None, cache=None, x=None, w=None, batch_size=None, mpi_pool=None)[source]¶
Compute \(-\sum_{i=0}^m \nabla_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)
- Parameters:
f (ProductDistributionParametricPullbackComponentFunction) – function \(f\)
params (dict) – parameters for function \(f\)
cache (dict) – cached values
x (
ndarray
[\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)w (
ndarray
[\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one. (Note: ifnprocs > 1
, then the batch size defines the size of the batch for each process)mpi_pool (
mpi_map.MPI_Pool
) – pool of processes to be used for the evaluation off
- Returns:
(
float
) – value
- TransportMaps.KL.KL_divergence.hess_a_kl_divergence_component(f, params=None, cache=None, x=None, w=None, batch_size=None, mpi_pool=None)[source]¶
Compute \(-\sum_{i=0}^m \nabla^2_{\bf a}f[{\bf a}](x_i) = -\sum_{i=0}^m \nabla^2_{\bf a} \left(\log\pi\circ T_k[{\bf a}](x_i) + \log\partial_{x_k}T_k[{\bf a}](x_i)\right)\)
- Parameters:
f (ProductDistributionParametricPullbackComponentFunction) – function \(f\)
params (dict) – parameters for function \(f\)
cache (dict) – cached values
x (
ndarray
[\(m,d\)]) – quadrature points used for the approximation of \(\mathbb{E}_{\pi_1}\)w (
ndarray
[\(m\)]) – quadrature weights used for the approximation of \(\mathbb{E}_{\pi_1}\)batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one. (Note: ifnprocs > 1
, then the batch size defines the size of the batch for each process)mpi_pool (
mpi_map.MPI_Pool
) – pool of processes to be used for the evaluation off
- Returns:
(
float
) – value
- TransportMaps.KL.KL_divergence.grad_t_kl_divergence(x, d1: TransportMaps.Distributions.Distribution, d2: TransportMaps.Distributions.PullBackTransportMapDistribution, params1=None, params2=None, cache1=None, cache2=None, grad_x_tm=None, batch_size=None)[source]¶
Compute \(\nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\).
This corresponds to:
- Parameters:
d1 (Distribution) – distribution \(\pi_1\)
d2 (# pool of processes to be used for the evaluation of d1 and) – distribution \(\pi_2\)
params1 (dict) – parameters for distribution \(\pi_1\)
params2 (dict) – parameters for distribution \(\pi_2\)
cache1 (dict) – cache for distribution \(\pi_1\)
cache2 (dict) – cache for distribution \(\pi_2\)
grad_x_tm – optional argument passed if \(\nabla_x T(x)\) has been already computed
batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one. (Note: ifnprocs > 1
, then the batch size defines the size of the batch for each process)mpi_pool_tuple (#) –
d2 –
Note
The parameters
(qtype,qparams)
and(x,w)
are mutually exclusive, but one pair of them is necessary.
- TransportMaps.KL.KL_divergence.grad_x_grad_t_kl_divergence(x, d1, d2: TransportMaps.Distributions.PullBackTransportMapDistribution, params1=None, params2=None, grad_x_tm=None, grad_t=None, batch_size=None, mpi_pool_tuple=(None, None))[source]¶
Compute \(\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\).
This corresponds to:
- Parameters:
d1 (Distribution) – distribution \(\pi_1\)
d2 (PullBackTransportMapDistribution) – distribution \(\pi_2\)
params1 (dict) – parameters for distribution \(\pi_1\)
params2 (dict) – parameters for distribution \(\pi_2\)
grad_x_tm – optional argument passed if \(\nabla_x T(x)\) has been already computed
grad_t – optional argument passed if the first variation has been already computed
batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one. (Note: ifnprocs > 1
, then the batch size defines the size of the batch for each process)mpi_pool_tuple (
tuple
[2] ofmpi_map.MPI_Pool
) – pool of processes to be used for the evaluation ofd1
andd2
Note
The parameters
(qtype,qparams)
and(x,w)
are mutually exclusive, but one pair of them is necessary.
- TransportMaps.KL.KL_divergence.tuple_grad_x_grad_t_kl_divergence(x, d1, d2: TransportMaps.Distributions.PullBackTransportMapDistribution, params1=None, params2=None, grad_x_tm=None, batch_size=None, mpi_pool_tuple=(None, None))[source]¶
Compute \(\nabla_x \nabla_T \mathcal{D}_{KL}(\pi_1, \pi_2(T))\).
This corresponds to:
- Parameters:
d1 (Distribution) – distribution \(\pi_1\)
d2 (PullBackTransportMapDistribution) – distribution \(\pi_2\)
params1 (dict) – parameters for distribution \(\pi_1\)
params2 (dict) – parameters for distribution \(\pi_2\)
grad_x_tm – optional argument passed if \(\nabla_x T(x)\) has been already computed
batch_size (int) – this is the size of the batch to evaluated for each iteration. A size
1
correspond to a completely non-vectorized evaluation. A sizeNone
correspond to a completely vectorized one. (Note: ifnprocs > 1
, then the batch size defines the size of the batch for each process)mpi_pool_tuple (
tuple
[2] ofmpi_map.MPI_Pool
) – pool of processes to be used for the evaluation ofd1
andd2
Note
The parameters
(qtype,qparams)
and(x,w)
are mutually exclusive, but one pair of them is necessary.