I got an error "NameError: free variable 'gfk' referenced before assignment in enclosing scope"

Question

Hi,

I am trying to use your TransportMaps library to track an airplane using ground based radar. However, I got an error : NameError: free variable 'gfk' referenced before assignment in enclosing scope.

I would very appreciate if you can find some time to answer me.

Best regards,

Jacks.

Here the python version information:

import sys, scipy, numpy,TransportMaps; print(scipy.__version__, numpy.__version__, sys.version_info,TransportMaps.__version__)
('1.0.0', '1.14.0', sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0), '1.1b2')

Here the python script I use:

import numpy as np
import TransportMaps.Distributions as DIST
import TransportMaps.Maps as MAPS
import TransportMaps as TM
import TransportMaps.Distributions.Decomposable as DISTDEC
import TransportMaps.Likelihoods as LKL
init_x = np.array([100, 200, 2000])
dt = 0.05
state_dim = 3
obs_dim = 1
F =  np.eye(3) + np.array([[0, 1, 0],
                       [0, 0, 0],
                       [0, 0, 0]]) * dt

Q =   np.eye(state_dim)
Q0 =  50 *np.eye(state_dim)
nu_x0 = DIST.GaussianDistribution(init_x, Q0)
nu_w = DIST.GaussianDistribution(np.zeros(state_dim), Q)
nsteps = 10
Z = np.zeros( (state_dim, nsteps+1) )
Z[:,0] = np.array([200, 100, 1000])
for i in range(nsteps):
    Z[:,i+1] = np.dot(F, Z[:,i])  + nu_w.rvs(1)[0,:]  
range_std = 5. 
R = np.diag([range_std**2])
nu_v = DIST.GaussianDistribution(np.zeros(obs_dim),  R)
Y = []
for i in range(nsteps+1):
    Y.append(np.sqrt(Z[2,i]**2+ Z[0,i]**2) + nu_v.rvs(1)[0,:] )

class NonlinearMap(MAPS.Map):
    def __init__(self):
        super(NonlinearMap, self).__init__(state_dim, obs_dim)
    def evaluate(self, x, *args, **kwargs):
        out = np.zeros((x.shape[0], self.dim_out))
        out[:,0] = (x[:,0]**2 + x[:,2]**2) ** 0.5 
        return out
    def grad_x(self, x, *args, **kwargs):
        out = np.zeros((x.shape[0], self.dim_out, self.dim_in))  # out.shape = (m x dy x dx)  
        horiz_dist = x[:,0]
        altitude = x[:,2]
        denom = np.sqrt(horiz_dist**2 + altitude**2) 
        out[:,0,0] = horiz_dist/denom
        out[:,0,2] = altitude/denom
        return out
    
    def hess_x(self, x, *args, **kwargs):
#         eq 3.83
        out = np.zeros((x.shape[0], self.dim_out, self.dim_in, self.dim_in)) # out.shape = (m x dy x dx x dx)
        horiz_dist = x[:,0]
        altitude = x[:,2]
        denom = np.sqrt(horiz_dist**2 + altitude**2)
        out[:,0,0,0] = altitude**2 /denom**3
        out[:,0,0,2] =  -1 * horiz_dist * altitude /denom**3
        out[:,0,2,0] = out[:,0,0,2]
        out[:,0,2,2] = horiz_dist**2 /denom**3
        return out

PhiMap = MAPS.LinearMap(np.zeros(state_dim), F)
pi_trans = DISTDEC.AR1TransitionDistribution(nu_w, PhiMap)
pi = DISTDEC.SequentialHiddenMarkovChainDistribution([], [])
for n in range(nsteps+1):
    Hmap = NonlinearMap()
    ll = LKL.AdditiveLogLikelihood(Y[n], nu_v, Hmap)
    if n == 0: pin = nu_x0
    else: pin = pi_trans
    pi.append(pin, ll)
lap = TM.laplace_approximation(pi)

The whole Error information:

NameError                                 Traceback (most recent call last)
<ipython-input-1-28eebd6a4657> in <module>()
     66     else: pin = pi_trans
     67     pi.append(pin, ll)
---> 68 lap = TM.laplace_approximation(pi)

/Users/junjie/anaconda/lib/python2.7/site-packages/TransportMaps/Routines.pyc in laplace_approximation(pi, params, x0, tol, ders, fungrad)
   1486                               method='newton-cg',
   1487                               tol=tol,
-> 1488                               options=options)
   1489     else:
   1490         raise ValueError("ders parameter not valid. Chose between 0,1,2.")

/Users/junjie/anaconda/lib/python2.7/site-packages/scipy/optimize/_minimize.pyc in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    482     elif meth == 'newton-cg':
    483         return _minimize_newtoncg(fun, x0, args, jac, hess, hessp, callback,
--> 484                                   **options)
    485     elif meth == 'l-bfgs-b':
    486         return _minimize_lbfgsb(fun, x0, args, jac, bounds,

/Users/junjie/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.pyc in _minimize_newtoncg(fun, x0, args, jac, hess, hessp, callback, xtol, eps, maxiter, disp, return_all, **unknown_options)
   1597             msg = ("Warning: CG iterations didn't converge.  The Hessian is not "
   1598                    "positive definite.")
-> 1599             return terminate(3, msg)
   1600 
   1601         pk = xsupi  # search direction is solution to system.

/Users/junjie/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.pyc in terminate(warnflag, msg)
   1517             print("         Hessian evaluations: %d" % hcalls)
   1518         fval = old_fval
-> 1519         result = OptimizeResult(fun=fval, jac=gfk, nfev=fcalls[0],
   1520                                 njev=gcalls[0], nhev=hcalls, status=warnflag,
   1521                                 success=(warnflag == 0), message=msg, x=xk,

NameError: free variable 'gfk' referenced before assignment in enclosing scope

dabi · Answer 1 · Apr 6, 2018

Hi Jack, you are trying to solve the following problem, right?

\[ \nu_{Z_0}=\mathcal{N}(0, Q_0) \;, \quad \nu_w = \mathcal{N}(0,Q) \;,\quad \nu_v=\mathcal{N}(0,R) \]

\[ Z_{t+1} = F Z_{t} + \varepsilon \;, \quad \varepsilon \sim \nu_{w} \]

\[ Y_t = \sqrt{Z_{2,t}^2 + Z_{0,t}^2} + \delta \;, \quad \delta\sim\nu_v \]

I just tried to run your code and I run on a different error, namely

Traceback (most recent call last):
  File "script.py", line 68, in <module>
    lap = TM.laplace_approximation(pi)
  File "/home/dabi/VC-Projects/Software/Mine/Current/transportmaps-private/TransportMaps/Routines.py", line 1664, in laplace_approximation
    x0 = pi.prior.rvs(1).flatten()
  File "/home/dabi/VC-Projects/Software/Mine/Current/transportmaps-private/TransportMaps/Distributions/DistributionBase.py", line 68, in rvs
    raise NotImplementedError("The method is not implemented for this distribution")
NotImplementedError: The method is not implemented for this distribution

This means that the random sampler for the prior \( \nu_Z \) distribution is not implelemented (now that I know it, I will do it soon). If this is not implemented, then you need to provide a starting point for the optimization used to find the MAP point for the Laplace approximation. For example, if you wanted to use the generating states as a starting point, you would do

lap = TM.laplace_approximation(pi, x0=Z)

This way the code runs on my machine. Let me know whether you are still having the same problem and I can dig more into it.

Best,

Daniele

PS: Also, if you want to see whether the optimization worked fine, you can lower the logging level to the INFO level (20) or to the debug level (10):

TM.logger.setLevel(20)

answered Apr 6, 2018 by dabi (307 points)

Thanks. It works now.

Best,

Jack

commented Apr 6, 2018 by jack (2 points)
edited Apr 7, 2018 by jack

The SequentialInference is sensitive to the prior information? I have compared the SequentialInference method with the laplace approximation method and I found the performance of SequentialInference is influenced by the prior information.

The python code is:

import numpy as np
import TransportMaps.Distributions as DIST
import TransportMaps.Maps as MAPS
import TransportMaps as TM
import TransportMaps.Distributions.Decomposable as DISTDEC
import TransportMaps.Likelihoods as LKL
from matplotlib.ticker import MaxNLocator, FormatStrFormatter
import matplotlib.pyplot as plt
import TransportMaps.Algorithms.SequentialInference as ALG
import TransportMaps.Diagnostics as DIAG
TM.logger.setLevel(20)
init_x = np.array([100, 200, 2000])
dt = 0.05
state_dim = 3
obs_dim = 1
F =  np.eye(3) + np.array([[0, 1, 0],
                       [0, 0, 0],
                       [0, 0, 0]]) * dt

Q =   np.eye(state_dim)
Q0 =  10 *np.eye(state_dim)
nu_x0 = DIST.GaussianDistribution(init_x-10, Q0)
nu_w = DIST.GaussianDistribution(np.zeros(state_dim), Q)
nsteps = 20
Z = np.zeros( (state_dim, nsteps+1) )
Z[:,0] = init_x
for i in range(nsteps):
    Z[:,i+1] = np.dot(F, Z[:,i])  + nu_w.rvs(1)[0,:]
range_std = 1.
R = np.diag([range_std**2])
nu_v = DIST.GaussianDistribution(np.zeros(obs_dim),  R)
Y = []
for i in range(nsteps+1):
    Y.append(np.sqrt(Z[2,i]**2+ Z[0,i]**2) + nu_v.rvs(1)[0,:] )

class NonlinearMap(MAPS.Map):
    def __init__(self):
        super(NonlinearMap, self).__init__(state_dim, obs_dim)
    def evaluate(self, x, *args, **kwargs):
        out = np.zeros((x.shape[0], self.dim_out))
        out[:,0] = (x[:,0]**2 + x[:,2]**2) ** 0.5
        return out
    def grad_x(self, x, *args, **kwargs):
        out = np.zeros((x.shape[0], self.dim_out, self.dim_in))  # out.shape = (m x dy x dx)
        horiz_dist = x[:,0]
        altitude = x[:,2]
        denom = np.sqrt(horiz_dist**2 + altitude**2)
        out[:,0,0] = horiz_dist/denom
        out[:,0,2] = altitude/denom
        return out

    def hess_x(self, x, *args, **kwargs):
#         eq 3.83
        out = np.zeros((x.shape[0], self.dim_out, self.dim_in, self.dim_in)) # out.shape = (m x dy x dx x dx)
        horiz_dist = x[:,0]
        altitude = x[:,2]
        denom = np.sqrt(horiz_dist**2 + altitude**2)
        out[:,0,0,0] = altitude**2 /denom**3
        out[:,0,0,2] =  -1 * horiz_dist * altitude /denom**3
        out[:,0,2,0] = out[:,0,0,2]
        out[:,0,2,2] = horiz_dist**2 /denom**3
        return out


INT = ALG.TransportMapsSmoother()
pi = DISTDEC.SequentialHiddenMarkovChainDistribution([], [])
PhiMap = MAPS.LinearMap(np.zeros(state_dim), F)
pi_trans = DISTDEC.AR1TransitionDistribution(nu_w, PhiMap)
for n in range(0,nsteps+1):
    print n
    Hmap = NonlinearMap()
    ll = LKL.AdditiveLogLikelihood(Y[n], nu_v, Hmap)
    if n == 0: pin = nu_x0
    else: pin = pi_trans
    if n == 0: dim_tm = state_dim
    else: dim_tm = state_dim*2
    tm = TM.Default_IsotropicIntegratedSquaredTriangularTransportMap(
        dim=dim_tm, order=1)
    # Solution parameters
    solve_params = {'qtype': 3, 'qparams': [1]*dim_tm,
                    'tol': 1e-3, 'ders':2}
    INT.assimilate(pin, ll, tm, solve_params)

filt_tm_list = INT.filtering_map_list
rho = DIST.StandardNormalDistribution(state_dim)
samp_list = []
for tm in filt_tm_list:
    push_tm_rho = DIST.PushForwardTransportMapDistribution(tm, rho)
    samp_list.append( push_tm_rho.rvs(10000) )


T = INT.smoothing_map
pi = INT.pi
pull_T_pi = DIST.PullBackTransportMapDistribution(T, pi)
rho = DIST.StandardNormalDistribution(pi.dim)
lap = TM.laplace_approximation(pi,x0=np.zeros(pi.dim)+10)
var_lap = DIAG.variance_approx_kl(lap, pi, qtype=0, qparams=10000)
var_seq = DIAG.variance_approx_kl(rho, pull_T_pi, qtype=0, qparams=10000)
print("Variance diagnostic - Laplace:    %e" % var_lap)
print("Variance diagnostic - Sequential: %e" % var_seq)
idx = np.arange(0,(nsteps+1)*state_dim,3)
z1_samp = np.vstack([ s[:,0] for s in samp_list ]).T
z2_samp = np.vstack([ s[:,1] for s in samp_list ]).T
z3_samp = np.vstack([ s[:,2] for s in samp_list ]).T
idx = np.arange(0,(nsteps+1)*state_dim,3)
mean1 = np.mean(z1_samp, axis=0)
mean2 = np.mean(z2_samp, axis=0)
mean3 = np.mean(z3_samp, axis=0)

fig = plt.figure()
plt.plot(lap.mu[idx],'r-',label=r"$\mathbb{E}[{\bf Z}_0 \vert {\bf y}_{0:k}]$-lap")
plt.plot(mean1,'k-',label=r"$\mathbb{E}[{\bf Z}_0 \vert {\bf y}_{0:k}]$-Seq")
plt.plot(Z[0,:],'b-',label=r"${\bf Z}_0$")
plt.legend()
plt.show()

When I set the prior covariance Q0= 5 *np.eye(state_dim), the filter will lose the target. However, when I set Q0= 10 *np.eye(state_dim), it performs well.

Besides, it runs slow when I set the order=2 and qparams=[3]*dim_tm.

Best,

Jack

commented Apr 7, 2018 by jack (2 points)

On the prior

Yes, in general the solution is sensitive to prior information (both for the sequential inference algorithm and the Laplace approximation). In the end the prior information concur in the definition of your problem/posterior, and the fact that you are able to track an object or not depends from two factors:

whether you give a good starting point and appropriately model the transitions and updates
you characterize the posterior distribution

If the model is not good (e.g. no good prior/transitions/updates), you can still characterize the posterior accurately, but in practice you are finding the "right answer to the wrong question". This is the case for any filtering technique I know of: if your starting guess is not good, it's quite unlikely the filter will recover.

On this account, I see that you have changed 100 to 10 in this line

nu_x0 = DIST.GaussianDistribution(init_x-10, Q0)

This is indeed gives a better starting point for the filter, as well as relaxing the prior on x0 by using a variance 10 instead of 5. In fact, I tried to use

nu_x0 = DIST.GaussianDistribution(init_x, Q0)

and it works very well. Again, one cannot always have accurate initial measurements, but then he should account for that by using an appropriate variance in Q0.

On the comparison with the Laplace approximation

The graphical comparison between Laplace and sequential inference is difficult to do, because the two methods are trying to answer to different questions.

The Laplace approximation finds a local-Gaussian approximation to the full posterior (after seeing all the data). This means that: (a) it cannot provide the filtering distribution at a cost that is constant in time, (b) it tends to perform well in the characterization of the full posterior (smoothing) after many steps, because the posterior tends to concentrate in the limit of infinite data.
The sequential inference method approximates the smoothing sequentially. This means that: (a) it provides us the filtering distributions at a cost that is constant in time, (b) the approximation of the full posterior (smoothing) is subject to accumulation of error (which probably the reason why you see that the variance diagnostic on the smoothing is better for the Laplace approximation with respect to the sequential inference algorithm)

Given this, if you want to graphically compare the two methods, you should plot the smoothing distribution given by the sequential algorithm:

# Sample smoothing
push_T_rho = DIST.PushForwardTransportMapDistribution(T, rho)
samps = push_T_rho.rvs(10000)
samp_list = [ samps[:,i:i+3] for i in range(0,pi.dim,3) ]
plot_filtering(np.array(Z), samp_list, fig=None, marker='o-')

The flexibility given by the sequential inference algorithm is that one can tune the approximation complexity to get the desired accuracy (which will be reflected also in better accuracy of the filtering distributions). As you probably experienced, this is done at the expences of computational time.

Monitoring convergence

One way to check whether the map you are using at each step is of the sufficient order, is to turn on the step-by-step variance diagnostic. First turn on logging at INFO level for the package TransportMaps (the indication I gave you before was turning on INFO level only for a sub-part of TransportMap):

TM.setLogLevel(20)

Next, use the following setting for each step:

solve_params = {'qtype': 3, 'qparams': [3]*dim_tm,
                'tol': 1e-3, 'ders': 1, 'maxit': 1000}
var_diag_conv_params = {'qtype': 3, 'qparams': [4]*dim_tm}
INT.assimilate(pin, ll, tm, solve_params,
               var_diag_convergence_params=var_diag_conv_params)

This will print out (and store in the variable INT.var_diag_convergence) the variance diagnostic for each step.

Using the setting above, I see that already the linear map (order 1) is doing a good job in approximating each step (the variance diagnostics are around \( < 10^{-6} \) ). This means that the fact that the variance diagnostic of the full smoothing is higher than in the Laplace approximation case, is given mostly by error accumulation. This error could be "cleaned away" by optimizing the resulting smoothing map with respect to the full posterior or by using a unbiased sampler (MCMC/importance sampling) preconditioned by this map. Both of these approach, however, defeat the purpose of filtering, because require the evaluation of the full posterior (smoothing).

I don't think you would gain anything from going to order 2, but rather you will need to use more quadrature points to identify all the coefficiens of the second order map, i.e. increase the computational cost. In our experiments (in the problems where higher orders were needed, e.g. the ones described in the tutorial) we often use an "adaptive" approach in the construction of the single maps. This feature is going to be available in the next release of the software.

I hope to have answered to most of your questions.

commented Apr 7, 2018 by dabi (307 points)

Thank you for your answers, I understand it now.

Best,

Jack

commented Apr 7, 2018 by jack (2 points)

Thanks, It all make sense now.

commented Aug 4, 2020 by batjoro (0 points)

I got an error "NameError: free variable 'gfk' referenced before assignment in enclosing scope"

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

On the prior

On the comparison with the Laplace approximation

Monitoring convergence

Please log in or register to add a comment.