TransportMaps.Samplers.puwr

puwr – Error analysis for Monte Carlo data in python

This module implements the method to estimate the autocorrelation time as described in [1]. In the reference, the author provides an implementation in MATLAB. Our aim is to provide an alternative that does not depend on proprietary software. Make sure you read [1] carefully to know what exactly this package does and how to interpret its output. Note also that there is a similar method described in [2] which may be more suitable in your case.

Module Contents

Classes

DataInfo

Given an array of data points, extract information like the

DataProject

Class to calculate the projected data

Functions

means(data)

Calculate per-observable means:

deriv(f, alpha, h)

Calculates the partial numerical derivative of a function.

gamma(data, f)

Calculates an estimator for the autocorrelation function

tauint(data, f[, full_output, plots])

Estimate the autocorrelation time of data as presented in

correlated_data([tau, n])

Generate correlated data as explained in the appendix of

idf(n)

Project on n-th argument:

puwr.means(data)

Calculate per-observable means:

>>> from puwr import means
>>> import numpy as np
>>> data = [[np.linspace(0,10,20)],[np.linspace(10,20,20)]]
>>> means(data)
array([  5.,  15.])
Parameters:

data – The input data is assumed to be in the format \(\mathtt{data[}\alpha\mathtt{][r][i]} = a_\alpha^{i,r}\) where \(i\) labels the measurements, \(r\) the replica and \(\alpha\) the observables.

exception puwr.DataSanityCheckFail(s)

Bases: Exception

Exception class.

__str__()

Return str(self).

class puwr.DataInfo(data)

Bases: object

Given an array of data points, extract information like the number of observables, replica, and measurements per replicum. If there is not the same number of measurements per replicum for each observable, an exception is raised:

>>> from puwr import DataInfo, correlated_data
>>> d = DataInfo([[correlated_data(5,20)[0][0],
...                correlated_data(6,20)[0][0]]])
>>> d.nobs  # 1 observable
1
>>> d.R     # 2 'replica'
2
>>> d.Nr    # 20 measurements / replicum
[20, 20]
>>> DataInfo([correlated_data(5,20)[0],
...           correlated_data(6,29)[0]])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "puwr.py", line 61, in __init__
    self.R = len(data[0]) # number of replica
puwr.DataSanityCheckFail: 'Inconsistent number of measurements/replicum'
Parameters:

data – The input data is assumed to be in the format \(\mathtt{data[}lpha\mathtt{][r][i]} = a_lpha^{i,r}\) where \(i\) labels the measurements, \(r\) the replica and \(lpha\) the observables.

Raises:

DataSanityCheckFail

puwr.deriv(f, alpha, h)

Calculates the partial numerical derivative of a function.

\[\partial_{i,h} f(x_0, \ldots, x_n) = \frac 1 {2h} \{ f(x_0, \dots, x_i + h, \ldots, x_n) - f(x_0, \dots, x_i - h, \ldots, x_n)\}.\]
Parameters:
  • f – Function.

  • alpha – Calculate partial derivative with respect to the alpha-th argument.

  • h – Step-size.

Returns:

The derivative (function).

class puwr.DataProject(data)

Class to calculate the projected data

\[a_f^{i,r} = \sum_\alpha \bar{\bar{f}}_\alpha a_\alpha^{i,r}\,, \quad f_\alpha = \frac{\mathrm d f}{\mathrm d A_\alpha}\]

for arbitrary functions.

Parameters:

data – The input data is assumed to be in the format \(\mathtt{data[}\alpha\mathtt{][r][i]} = a_\alpha^{i,r}\) where \(i\) labels the measurements, \(r\) the replica and \(\alpha\) the observables.

project(f)

Calculate the actual projected data w.r.t. the function \(f(A_1, ..., A_n)\),

\[a_f^{i,r} = \sum_\alpha \bar{\bar{f}}_\alpha a_\alpha^{i,r}\,, \quad f_\alpha = \frac{\mathrm d f}{\mathrm d A_\alpha}\]
Parameters:

f – Function.

Returns:

Projected data \(a_f^{i,r}\) (array).

puwr.gamma(data, f)

Calculates an estimator for the autocorrelation function

\[\begin{split}\bar{\bar{\Gamma}}_F(t) =& \sum_{\alpha\beta} \bar{\bar{f}}_\alpha\,\bar{\bar{f}}_\beta \bar{\bar{\Gamma}}_{\alpha\beta}(t)\,,\\ \bar{\bar{\Gamma}}_{\alpha\beta}(t) =& \frac 1 {N -R t} \sum_{r = 1}^R \sum_{i = 1}^{N_r -t} (a_\alpha^{i,r} - \bar{\bar{a}}_\alpha) (a_\beta^{i+t,r} - \bar{\bar{a}}_\beta)\,,\\ f_\alpha = \frac{\mathrm d f}{\mathrm d A_\alpha}\end{split}\]

where \(f\) is a function, \(i\) labels the measurements, \(r\) the replica, and \(\alpha\) the observables.

Parameters:
  • data – The input data is assumed to be in the format \(\mathtt{data[}\alpha\mathtt{][r][i]} = a_\alpha^{i,r}\) where \(i\) labels the measurements, \(r\) the replica and \(\alpha\) the observables.

  • f – Function \(f\) as above.

puwr.tauint(data, f, full_output=False, plots=False)

Estimate the autocorrelation time of data as presented in [1].

Parameters:
  • data – The input data is assumed to be in the format \(\mathtt{data[}\alpha\mathtt{][r][i]} = a_\alpha^{i,r}\) where \(i\) labels the measurements, \(r\) the replica and \(\alpha\) the observables.

  • f – Function or integer. If a function is passed, the autocorrelation time for the secondary observable, defined by the function applied to the input data (with the primary observables in the order they are given in data passed as arguments to f) is calculated. If an integer s passed, the auto-correlation time of data[f] is estimated.

  • full_output – If set to True, the autocorrelation matrix \(\bar{\bar{\Gamma}}\) and the optimal window size \(W\) will be appended to the output.

  • plots – If set to True, and if the matplotlib package is installed, plots are produced that depict the autocorrelation matrix and estimated autocorrelation time vs. the window size.

Returns:

A tuple containing the mean, variance, estimated autocorrelation and the estimated error thereof.

puwr.correlated_data(tau=5, n=10000)

Generate correlated data as explained in the appendix of [1]. One draws a sequence of \(n\) normally distributed random numbers \(\eta_i, i = 1,\ldots,n\) with unit variance and vanishing mean. From this one constructs

\[\begin{split}\nu_1 = \eta_1,\quad \nu_{i+1} = \sqrt{1 - a^2} \eta_{i+1} + a \nu_i\,,\\ a = \frac {2 \tau - 1}{2\tau + 1}, \quad \tau \geq \frac 1 2\,,\end{split}\]

where \(\tau\) is the autocorrelation time:

>>> from puwr import correlated_data
>>> correlated_data(2, 10)
[[array([ 1.02833043,  1.08615234,  1.16421776,  1.15975754,
          1.23046603,  1.13941114,  1.1485227 ,  1.13464388,
          1.12461557,  1.15413354])]]
Parameters:
  • tau – Target autocorrelation time.

  • n – Number of data points to generate.

puwr.idf(n)

Project on n-th argument:

idf(n) = lambda *a : a[n]
Parameters:

n – Number of element to project on.