Inherits from
- basemodel: SciPy.maxentropy.maxentropy.basemodel
Method summary
- __init__(self)
- estimate(self)
- expectations(self)
- lognormconst(self)
- logpdf(self, fx, log_prior_x = None)
- pdf(self, fx)
- pdf_function(self)
- resample(self)
- setsampleFgen(self, sampler, staticsample = True)
- settestsamples(self, F_list, logprob_list, testevery = 1, priorlogprob_list = None)
- stochapprox(self, K)
- test(self)
Methods
- __init__(self)
- estimate(self)
This function approximates both the feature expectation vector E_p f(X) and the log of the normalization term Z with importance sampling. It also computes the sample variance of the component estimates of the feature expectations as: varE = var(E_1, ..., E_T) where T is self.matrixtrials and E_t is the estimate of E_p f(X) approximated using the 't'th auxiliary feature matrix. It doesn't return anything, but stores the member variables logZapprox, mu and varE. (This is done because some optimization algorithms retrieve the dual fn and gradient fn in separate function calls, but we can compute them more efficiently together.) It uses a supplied generator sampleFgen whose .next() method returns features of random observations s_j generated according to an auxiliary distribution aux_dist. It uses these either in a matrix (with multiple runs) or with a sequential procedure, with more updating overhead but potentially stopping earlier (needing fewer samples). In the matrix case, the features F={f_i(s_j)} and vector [log_aux_dist(s_j)] of log probabilities are generated by calling resample(). We use [Rosenfeld01Wholesentence]'s estimate of E_p[f_i] as: {sum_j p(s_j)/aux_dist(s_j) f_i(s_j) } / {sum_j p(s_j) / aux_dist(s_j)}. Note that this is consistent but biased. This equals: {sum_j p_dot(s_j)/aux_dist(s_j) f_i(s_j) } / {sum_j p_dot(s_j) / aux_dist(s_j)} Compute the estimator E_p f_i(X) in log space as: num_i / denom, where num_i = exp(logsumexp(theta.f(s_j) - log aux_dist(s_j) + log f_i(s_j))) and denom = [n * Zapprox] where Zapprox = exp(self.lognormconst()). We can compute the denominator n*Zapprox directly as: exp(logsumexp(log p_dot(s_j) - log aux_dist(s_j))) = exp(logsumexp(theta.f(s_j) - log aux_dist(s_j))) - expectations(self)
Estimates the feature expectations E_p[f(X)] under the current model p = p_theta using the given sample feature matrix. If self.staticsample is True, uses the current feature matrix self.sampleF. If self.staticsample is False or self.matrixtrials is > 1, draw one or more sample feature matrices F afresh using the generator function supplied to sampleFgen().
- lognormconst(self)
Estimate the normalization constant (partition function) using the current sample matrix F.
- logpdf(self, fx, log_prior_x = None)
Returns the log of the estimated density p(x) = p_theta(x) at the point x. If log_prior_x is None, this is defined as: log p(x) = theta.f(x) - log Z where f(x) is given by the (m x 1) array fx. If, instead, fx is a 2-d (m x n) array, this function interprets each of its rows j=0,...,n-1 as a feature vector f(x_j), and returns an array containing the log pdf value of each point x_j under the current model. log Z is estimated using the sample provided with setsampleFgen(). The optional argument log_prior_x is the log of the prior density p_0 at the point x (or at each point x_j if fx is 2-dimensional). The log pdf of the model is then defined as log p(x) = log p0(x) + theta.f(x) - log Z and p then represents the model of minimum KL divergence D(p||p0) instead of maximum entropy. - pdf(self, fx)
Returns the estimated density p_theta(x) at the point x with feature statistic fx = f(x). This is defined as p_theta(x) = exp(theta.f(x)) / Z(theta), where Z is the estimated value self.normconst() of the partition function. - pdf_function(self)
Returns the estimated density p_theta(x) as a function p(f) taking a vector f = f(x) of feature statistics at any point x. This is defined as: p_theta(x) = exp(theta.f(x)) / Z - resample(self)
(Re)samples the matrix F of sample features.
- setsampleFgen(self, sampler, staticsample = True)
Initializes the Monte Carlo sampler to use the supplied generator of samples' features and log probabilities. This is an alternative to defining a sampler in terms of a (fixed size) feature matrix sampleF and accompanying vector samplelogprobs of log probabilities.
Calling sampler.next() should generate tuples (F, lp), where F is an (m x n) matrix of features of the n sample points x_1,...,x_n, and lp is an array of length n containing the (natural) log probability density (pdf or pmf) of each point under the auxiliary sampling distribution.
The output of sampler.next() can optionally be a 3-tuple (F, lp, sample) instead of a 2-tuple (F, lp). In this case the value 'sample' is then stored as a class variable self.sample. This is useful for inspecting the output and understanding the model characteristics.
If matrixtrials > 1 and staticsample = True, (which is useful for estimating variance between the different feature estimates), sampler.next() will be called once for each trial (0,...,matrixtrials) for each iteration. This allows using a set of feature matrices, each of which stays constant over all iterations.
We now insist that sampleFgen.next() return the entire sample feature matrix to be used each iteration to avoid overhead in extra function calls and memory copying (and extra code).
An alternative was to supply a list of samplers, sampler=[sampler0, sampler1, ..., sampler_{m-1}, samplerZ], one for each feature and one for estimating the normalization constant Z. But this code was unmaintained, and has now been removed (but it's in Ed's CVS repository :).
Example use: >>> import spmatrix >>> model = bigmodel() >>> def sampler(): ... n = 0 ... while True: ... f = spmatrix.ll_mat(1,3) ... f[0,0] = n+1; f[0,1] = n+1; f[0,2] = n+1 ... yield f, 1.0 ... n += 1 ... >>> model.setsampleFgen(sampler()) >>> type(model.sampleFgen) <type 'generator'> >>> [model.sampleF[0,i] for i in range(3)] [1.0, 1.0, 1.0]
We now set matrixtrials as a class property instead, rather than passing it as an argument to this function, where it can be written over (perhaps with the default function argument by accident) when we re-call this func (e.g. to change the matrix size.)
- settestsamples(self, F_list, logprob_list, testevery = 1, priorlogprob_list = None)
Requests that the model be tested every 'testevery' iterations during fitting using the provided list F_list of feature matrices, each representing a sample {x_j} from an auxiliary distribution q, together with the corresponding log probabiltiy mass or density values log {q(x_j)} in logprob_list. This is useful as an external check on the fitting process with sample path optimization, which could otherwise reflect the vagaries of the single sample being used for optimization, rather than the population as a whole.
If self.testevery > 1, only perform the test every self.testevery calls.
If priorlogprob_list is not None, it should be a list of arrays of log(p0(x_j)) values, j = 0,. ..., n - 1, specifying the prior distribution p0 for the sample points x_j for each of the test samples.
- stochapprox(self, K)
Tries to fit the model to the feature expectations K using stochastic approximation, with the Robbins-Monro stochastic approximation algorithm: theta_{k+1} = theta_k + a_k g_k - a_k e_k where g_k is the gradient vector (= feature expectations E - K) evaluated at the point theta_k, a_k is the sequence a_k = a_0 / k, where a_0 is some step size parameter defined as self.a_0 in the model, and e_k is an unknown error term representing the uncertainty of the estimate of g_k. We assume e_k has nice enough properties for the algorithm to converge.
- test(self)
Estimate the dual and gradient on the external samples, keeping track of the parameters that yield the minimum such dual. The vector of desired (target) feature expectations is stored as self.K.
