Dynamic Structural Econometrics - Unobserved heterogeneity and CCP estimation

Prepared by Aaron Barkley, University of Melbourne

Table of Contents

The Expectation-Maximization (EM) Algorithm Expectation step. Maximization step. Using the EM algorithm in a dynamic discrete choice model Model description and setup Estimation Expectation step. Maximization step. Functions References

The Expectation-Maximization (EM) Algorithm

We start with an overview of the Expectation-Maximization algorithm. Formally described by Dempster et al (1977), the EM algorithm provides a computationally convenient way to obtain maximum likelihood estimates for "incomplete data" environments. For our purposes, we can think of settings in which each observation in the data is generated by one of two different distribution types. However, the econometrician does not observe which data point has been generated by each type, and only observes the overall distribution. The goal is to identify the distribution of each type and the distribution of the latent variable that governs the distribution over types.

As an example, suppose we have the following data:

rng(2023);

nobs=1000;

p=rand([nobs,1]);

Y=(p>0.6).*poissrnd(2,[nobs,1]) + (p<=0.6).*poissrnd(8,[nobs,1]);

histogram(Y,'Normalization',"probability")

The above figure shows the empirical distribution of 1000 observations of an integer-valued random variable Y. Suppose we would like to fit a parametric model to this data. The main feature of the data that complicates this exercise is that the data is bimodal with a heavy right tail, and few (if any) standard distributions for discrete random variables will accurately capture this feature. To test, this, let's try fitting two common parametric distributions, Poisson and Negative Binomial, and see how they do.

poiss_hat = poissfit(Y);

nbin_hat = nbinfit(Y);

x=0:16;

histogram(Y,'Normalization','pdf','FaceColor',[0.4,0.4,0.4]);

hold on

plot(x,poisspdf(x,poiss_hat));

plot(x,nbinpdf(x,nbin_hat(1),nbin_hat(2)),'--');

hold off;

legend('Empirical distribution','Poisson estimates','Neg. Bin. estimates','location','northeast')

As we can see from the figure above, the results are not encouraging. Simple parametric models may not suffice to accurately capture the qualitative features of the empirical distribution. While we could consider nonparametric approaches to estimating the distribution, this will have demanding data requirements. We also might have other reasons to believe that the distribution is generated by a finite number of different "types" of random draws that are mixed together in the overall distribution of Y.

A common way to estimate these distributions is to consider a mixture model. That is, we will assume that each draw of Y is obtained from distribution

with probability

and distribution

with probability

. The mixing parameters

govern the distribution over types, while the distributions

describe the distribution of outcomes within each type. Typically, we will parameterize the distributions

and

with parameters

. Noting the restriction that

, we have three parameters we need to estimate:

, and

The log-likelihood for observation

can be expressed as

Practical applications have found that direct maximum likelihood approaches to estimating these parameters perform poorly -- convergence may be slow, there can be many local optima, and the estimates may be extremely sensitive to starting values for the parameters. The alternative approach of Dempster et al (1977) is to update the parameters π and

in separate stages. The main features of the estimator can be described in four steps:

Fix initial values for , and and set convergence criteria.
Given the current values for , compute the probability of each observation type, i.e., calculate Prfor . This yields an updated value for π and pointwise estimates for the probability that each observation i is of type j.
Hold fixed the type probabilities for each computed in step 2 and maximize the likelihood with respect to given these probabilities. This yields updated values for .
Repeat steps 2-3 until convergence (per-iteration change in the overall likelihood is "small").

We will implement this algorithm on the data set above using the assumptions that (i) there are two components in the mixture (i.e., observations are each associated with one of two unobserved types) and (ii) the two types

each follow a Poisson distribution with parameter

% Step 1 (a): initial values for theta and lambda
obs=size(Y,1);
theta_0=[1,2];
pi_0=[0.5,0.5];
 
% Step 1 (b): specify convergence criteria
iter=1;
maxiter=1000;
lik0=10000;
normdiff=10000;
tolerance=1e-6;
while normdiff>tolerance && iter<=maxiter
 

Expectation step.

The expectation step takes the current values of

as fixed and calculates the probability of observation

being of type 1 (

) or type 2 (

). We can write the density function of Y conditional on θ, π, and the (unobserved)

Similarly, the distribution of

conditional on θ and π is

where

. These equations together with Bayes theorem imply that the conditional distribution of the

are the proportional likelihood of each mixture component j weighted by the current estimates of the mixture weights λ:

    % Expectation:
    EZ=zeros(obs,length(pi_0));
    temp_pdf=poisspdf(repmat(Y,1,length(theta_0)),repmat(theta_0,obs,1));
    for j=1:length(pi_0)
        EZ(:,j)=(pi_0(j)*temp_pdf(:,j))./(temp_pdf*pi_0');
    end
 

In the code above, the "temp_pdf" variable creates an

matrix of likelihood values, where the first column is the set of likelihoods associated with

and the second column is the set of likelihoods associated with

Maximization step.

The maximization step performs a standard maximum likelihood procedure in which the probabilities of being in each group

are treated as known and fixed.

Because the likelihood is additively separable in θ and λ, the maximization step can be done in two parts. Updating π is simple: maximizing

with respect to π simply requires taking the average of the probability of being each type:

That is, we simply take the mean of each column of the

"EZ" matrix we computed above.

Updating θ requires us to numerically optimize the first component of the log-likelihood function above.

Since Matlab has a built-in function for the pmf of a Poisson distribution, this likelihood function can be defined and optimized with two lines of code.

     % Maximization
     theta_fun= @(theta) (-1)*sum(sum(log(poisspdf(repmat(Y,1,length(theta_0)),repmat(theta,obs,1))).*EZ));
     [theta_1,~]=fminsearch(theta_fun,theta_0);
     
     pi_1=(1/obs)*sum(EZ);
 

The last step in each iteration is to calculate the overall likelihood

and check for convergence. If

is less than the convergence criteria, we stop. Otherwise, we proceed to the next iteration.

     % Update likelihood:
     lik1=sum(sum(log(poisspdf(repmat(Y,1,length(theta_1)),repmat(theta_1,obs,1))).*EZ)) + sum(sum(log(pi_1).*EZ));
     normdiff=(lik0-lik1)^2;
 
     % Update for next iteration
     lik0=lik1;
     theta_0=theta_1;
     pi_0=pi_1;
     iter=iter+1;
end
 
ThetaHat=theta_1;
PiHat=pi_1;
 

With the estimates obtained, let's see how we did:

x=0:20;

Y_fit=PiHat(1)*poisspdf(x,ThetaHat(1))+PiHat(2)*poisspdf(x,ThetaHat(2));

plot(x,Y_fit)

hold on

histogram(Y,'Normalization','probability','FaceColor',[0.4,0.4,0.4]);

hold off

legend('Estimated mixture distribution','Empirical distribution','location','northeast')

This looks much better: the estimated distribution matches the shape of the empirical distribution much more closely. Since we know the true parameters used to generate the data, we can also compare these with the estimated parameters:

FiniteMixtureEstimatesTable = table([ThetaHat'; PiHat'],[2; 8; 0.4; 0.6],'VariableNames',{'Estimates','True values'},'RowNames',{'Theta_1','Theta_2','Pi_1','Pi_2'})
FiniteMixtureEstimatesTable = 4×2 table 
 EstimatesTrue values
1 Theta_11.91762
2 Theta_27.98768
3 Pi_10.419220.4
4 Pi_20.580780.6

	Estimates	True values
1 Theta_1	1.9176	2
2 Theta_2	7.9876	8
3 Pi_1	0.41922	0.4
4 Pi_2	0.58078	0.6

The estimator also converges quickly, taking fewer than 30 iterations to converge.

Using the EM algorithm in a dynamic discrete choice model

Now we apply the EM algorithm to a dynamic discrete choice model. The basic features of the algorithm do not change: at each iteration, we compute the probability of each individual (or bus) of being of type s given the most recent estimates for θ, and then use these probabilities to obtain a new estimate for θ.

Model description and setup

We continue to use the model of Harold Zurcher's bus engine replacement problem introduced in Rust (1987) as the framework for estimation. To briefly reiterate the model, we have the following setup, noting that we now assume that the bus type s is unobservable:

Time is discrete, with denoting each period, with .
Each period, Harold Zurcher chooses between two actions. He can either replace the bus engine, , or not replace . These are mutually exclusive, so that .
Zurcher's payoff depends on a single state variable that represents the mileage of the bus. Mileage advances by a random amount if the bus engine is not replaced and is set to zero if the bus engine is replaced. Denote by the distribution over next period mileage conditional on current mileage after making choice d. When the bus engine is replaced, mileage resets to zero, so that for all . Mileage accumulation when the engine is not replaced follows a discrete analog of the exponential distribution up to some maximum mileage where mileage increment is .

In addition to mileage , there is a permanent and unobservable bus-specific type . Assume that the deterministic component of flow payoffs for each action for a bus of type s are given by

Payoffs also have an additively separable idiosyncratic component so that the total flow payoff from choice j in period t is . We assume that are i.i.d draws from a Type I EV distribution.
Denoting the optimal decision rule at t by , we define the ex-ante value function for state in period t as

The conditional value function is the value function for choice j in period t and state less the utility shock . In the case of Zurcher's bus engine replacement problem, these functions can be expressed as

Note that all t subscripts have been removed from value functions and state transitions: this is a stationary, infinite horizon problem.
In the binary choice setting with Type I EV preference shocks we can express conditional choice probabilities very simply:

Representation: finally, we can use the results of Hotz and Miller (1993) to express the difference between the value function and conditional value function :

where

when

Type I EV, where

is the Euler-Mascheroni constant.

clear
rng(57);
% States (mileage)
x_min = 0.0 ; x_max = 10.0; 
delta_x = 0.5; 
x_len = (x_max-x_min)/delta_x+1;
x = linspace(x_min, x_max, x_len);
% State transitions
x_tday = repmat(x, [x_len,1]);
x_next = x_tday';                                      
f = (x_next >= x_tday)' .* exp(- (x_next - x_tday)') .* (1 - exp(-delta_x)); 
f(:,end) = 1 - sum(f(:,1:(end-1)), 2); % The last column is set so that the sum of probs is 1
F = cumsum(f(:,:), 2);
 
F2=f;
F1=zeros(size(f));
F1(:,1)=1;
emc = double(eulergamma);
 
% Define parameters and simulate data
beta = 0.9;
theta = [2.00, -0.15, 1.0];
T=40;
S=1:2;
X=x;
N=1000;
pi=0.4; 
[data_t, data_x, data_s, data_d, data_x_index] = generate_data(N, T, S, f, F, X,beta, pi, theta);
 
% Optional: save data
% Data_vars={'data_t', 'data_x', 'data_s', 'data_d', 'data_x_index'};
% Const_vars={'beta','theta','S','X','x_len','pi','T','x','N','f','F','F1','F2','emc'};
% All_vars=[Data_vars, Const_vars];
% save("Bus_engine_unobs_het_data.mat",All_vars{:});

Estimation

We begin by setting initial values for the parameters

and

, and use the initial values for

to initialize the conditional choice probabilities

The convergence parameter is a tolerance level of 1e-7, and we check for convergence by testing the percentage change in the likelihood for values 25 iterations apart. Finally, we set the maximum number of iterations to 1000 and include a check (and error message) in case we reach this level before converging.

% Optional: load data
% load("Bus_engine_unobs_het_data.mat");
iter=1;
max_iter=1000;
tol=1e-6;
exit_flag=0;
stored_lik_vals=zeros(max_iter,1);
 
theta_0=0.1*ones(3,1);
pi_hat=0.5;
q_new=[pi_hat*ones(N,1), (1-pi_hat)*ones(N,1)];
ccp_hat = zeros(length(X),length(S));
epsilon=1e-4;
for j=1:length(X)
    for s=1:length(S)
        ccp_hat(j,s) = sum(repmat(q_new(:,s),[T,1]).*(data_x_index(:)==j).*(data_d(:)==1))/sum(repmat(q_new(:,s),[T,1]).*(data_x_index(:)==j));
    end
end
ccp_hat=max(epsilon, min(1-epsilon,ccp_hat));
 

To illustrate the method as it applies to the bus engine replacement case, we go through one iteration first before putting each term inside the loop that completes the estimation.

Expectation step.

With initial values for the parameters established, we first need to calculate the likelihood for each observation bus n, time period t, and bus type s. Recall that in the finite Poisson mixture in the example above, we updated the guesses for the probability that observation i was type j by using the π-weighted likelihoods of observing outcome

conditional on being type j and having type-specific parameters

. With the dynamic bus engine replacement problem, we proceed similarly. However, we have to remember that the relevant likelihood for establishing whether bus n is of type s is the sequence of replacement decisions

made for that bus. Therefore, we calculate the individual likelihood components

for each period, bus, and type, and then we take the product across time periods t to compute the overall likelihood associated with each bus type s.

Each individual likelihood term can be calculated as

where we exploit the one-period finite dependence property inherent to the renewal of the bus engine to express the differences conditional value functions in terms of the one-period ahead conditional choice probabilities:

u2 = theta_0(1) + theta_0(2)*X' + theta_0(3)*S;
v1_ccp = repmat(beta*(-log(ccp_hat(1,:))) + beta*emc,[length(X),1]);
v2_ccp = u2 + beta*F2(:,:)*(0 - log(ccp_hat(:,:))) + beta*emc;
 
% Expectation step:
likelihood_pointwise_1=((data_d(:)==2).*exp((v2_ccp(data_x_index(:),1)-v1_ccp(data_x_index(:),1)))+(1-(data_d(:)==2)))./(1+exp(v2_ccp(data_x_index(:),1)-v1_ccp(data_x_index(:),1)));
likelihood_pointwise_2=((data_d(:)==2).*exp((v2_ccp(data_x_index(:),2)-v1_ccp(data_x_index(:),2)))+(1-(data_d(:)==2)))./(1+exp(v2_ccp(data_x_index(:),2)-v1_ccp(data_x_index(:),2)));
 

The likelihood associated with each bus n conditional on engine type s is the product of the likelihood of the sequence of decisions

conditional on the states

, engine type s, conditional choice probabilities

, and utility parameters θ:

. This is the term that captures the observation-level likelihood terms

from our simple EM algorithm example above. As before, we weight the engine-type specific likelihoods by the mixing parameter

yielding an overal per-bus estimate of the probability of being engine type s of

ll_pw1=reshape(likelihood_pointwise_1,[N,T]);
ll_pw2=reshape(likelihood_pointwise_2,[N,T]);
likelihood_by_n=[prod(ll_pw1,2), prod(ll_pw2,2)];
 
q_new(:,1) = pi_hat*likelihood_by_n(:,1)./(pi_hat*likelihood_by_n(:,1) + (1-pi_hat)*likelihood_by_n(:,2));
q_new(:,2) = 1-q_new(:,1);
 

Maximization step.

Having established the expectation that each bus n is of type s in the previous step, we now perform that maximization step of the algorithm. As before, updating the mixing parameter π is simple, as we simply take the mean of the bus-level probabilities of being of type s:

To update the utility parameters θ, we first need to update the conditional choice probabilities

since we are using CCP methods to calculate the likelihood terms. We update the conditional choice probabilities using the bin estimator after applying weights via the per-bus probability of being in state s:

The last maximization step is to update the utility parameters θ. Here we maximize the likelihood, again weighting each likelihood component

by the probability that bus n is of type s using the estimates for

obtained in the previous step:

% Maximization step:
pi_hat = mean(q_new(:,1));
ccp_new = zeros(length(X),length(S));
epsilon=1e-4;
for j=1:length(X)
    for s=1:length(S)
        ccp_new(j,s) = sum(repmat(q_new(:,s),[T,1]).*(data_x_index(:)==j).*(data_d(:)==1))/sum(repmat(q_new(:,s),[T,1]).*(data_x_index(:)==j));
    end
end
never_reached=isnan(ccp_new(:)); % keep track of states that never occur in the data
ccp_new=max(epsilon, min(1-epsilon,ccp_new)); % Makes sure that all probabilities are in the open interval (0,1), as "1" and "0" observations cause problems.
lik_fun = @(theta) ccp_likelihood_inner(theta,X,S,f,data_x_index,q_new,data_d,ccp_new,beta,T);
 
[updated_theta,fval] = fminunc(lik_fun,theta_0,optimoptions('fminunc','Display','none'));
 

Finally, we include everything in a while loop so as to iterate until convergence:

 
tic;
cond=0;
while cond==0 && iter<max_iter
    
    u2 = theta_0(1) + theta_0(2)*X' + theta_0(3)*S;
    v1_ccp = repmat(beta*(-log(ccp_hat(1,:))) + beta*emc,[length(X),1]);
    v2_ccp = u2 + beta*F2(:,:)*(0 - log(ccp_hat(:,:))) + beta*emc;
    
    % STEP 1: Expectation 
    likelihood_pointwise_1=((data_d(:)==2).*exp((v2_ccp(data_x_index(:),1)-v1_ccp(data_x_index(:),1)))+(1-(data_d(:)==2)))./(1+exp(v2_ccp(data_x_index(:),1)-v1_ccp(data_x_index(:),1)));
    likelihood_pointwise_2=((data_d(:)==2).*exp((v2_ccp(data_x_index(:),2)-v1_ccp(data_x_index(:),2)))+(1-(data_d(:)==2)))./(1+exp(v2_ccp(data_x_index(:),2)-v1_ccp(data_x_index(:),2)));
 
    ll_pw1=reshape(likelihood_pointwise_1,[N,T]);
    ll_pw2=reshape(likelihood_pointwise_2,[N,T]);
    likelihood_by_n=[prod(ll_pw1,2), prod(ll_pw2,2)];
 
    q_new(:,1) = pi_hat*likelihood_by_n(:,1)./(pi_hat*likelihood_by_n(:,1) + (1-pi_hat)*likelihood_by_n(:,2));
    q_new(:,2)=1-q_new(:,1);
 
    
    % STEP 2: Maximization 
    pi_hat = mean(q_new(:,1));
    ccp_new = zeros(length(X),length(S));
    epsilon=1e-4;
    for j=1:length(X)
        for s=1:length(S)
            ccp_new(j,s) = sum(repmat(q_new(:,s),[T,1]).*(data_x_index(:)==j).*(data_d(:)==1))/sum(repmat(q_new(:,s),[T,1]).*(data_x_index(:)==j));
        end
    end
    never_reached=isnan(ccp_new(:)); % keep track of states that never occur in the data
    ccp_new=max(epsilon, min(1-epsilon,ccp_new)); % Makes sure that all probabilities are in the open interval (0,1), as "1" and "0" observations cause problems.
    lik_fun = @(theta) ccp_likelihood_inner(theta,X,S,f,data_x_index,q_new,data_d,ccp_new,beta,T);
 
    [updated_theta,fval] = fminunc(lik_fun,theta_0,optimoptions('fminunc','Display','none'));
 
    % STEP 3: Re-calculate pointwise likelihood with updated thetas, pi
    % hat, and ccps, and check convergence 
    u2 = updated_theta(1) + updated_theta(2)*X' + updated_theta(3)*S;
    v1_ccp = repmat(beta*(-log(ccp_new(1,:))) + beta*emc,[length(X),1]);
    v2_ccp = u2 + beta*F2(:,:)*(0 - log(ccp_new(:,:))) + beta*emc;
 
    likelihood_pointwise_1=((data_d(:)==2).*exp((v2_ccp(data_x_index(:),1)-v1_ccp(data_x_index(:),1)))+(1-(data_d(:)==2)))./(1+exp(v2_ccp(data_x_index(:),1)-v1_ccp(data_x_index(:),1)));
    likelihood_pointwise_2=((data_d(:)==2).*exp((v2_ccp(data_x_index(:),2)-v1_ccp(data_x_index(:),2)))+(1-(data_d(:)==2)))./(1+exp(v2_ccp(data_x_index(:),2)-v1_ccp(data_x_index(:),2)));
 
    temp_val=sum(q_new(:,1).*log(prod(reshape(likelihood_pointwise_1,[N,T]),2)) + q_new(:,2).*log(prod(reshape(likelihood_pointwise_2,[N,T]),2))) + sum(log(pi_hat)*q_new(:,1) + log(1-pi_hat)*q_new(:,2));
    
    stored_lik_vals(iter)=temp_val;
    if iter>50
        if abs((stored_lik_vals(iter) - stored_lik_vals(iter-25))./stored_lik_vals(iter))<tol
            cond=1;
        end
    end    
    
    % STEP 4: Update values and continue to next iteration if cond==0
    iter=iter+1;
    theta_0=updated_theta;
    ccp_hat=ccp_new;
    if iter >= max_iter
        exit_flag=1;
        disp('Maximum number of iterations reached')
    end
end
toc
Elapsed time is 75.517123 seconds.
 
 
UnobservedHetCCP_EstimatesTable = table([updated_theta; pi_hat],[theta'; pi],'VariableNames',{'CCP estimates','True values'},'RowNames',{'Theta_0','Theta_1','Theta_2','Pi'})
UnobservedHetCCP_EstimatesTable = 4×2 table 
 CCP estimatesTrue values
1 Theta_01.99392
2 Theta_1-0.14786-0.15
3 Theta_21.02791
4 Pi0.419360.4

	CCP estimates	True values
1 Theta_0	1.9939	2
2 Theta_1	-0.14786	-0.15
3 Theta_2	1.0279	1
4 Pi	0.41936	0.4

Important disclaimer: in the data generation function, we have assigned exogenous state variables in the initial period

. This is unlikely to hold in general. Suppose all buses had started with zero mileage in period 0 but we, the econometricians, only observed the buses starting in period 10. Because the mileage accumulation and replacement decisions will be correlated with the unobserved type s, we will have an initial conditions problem: the first state that we observed the bus,

, will be correlated with the unobserved type, and we will need to control for this in order to obtain consistent estimates. There are several ways to account for this; see the supplementary material for Arcidiacono and Miller (2011) for one such implementation.

Functions

function [ccp_lik] = ccp_likelihood_inner(theta,X,S,x_tran,data_x_index,q,data_d,ccp_hat,beta,T)
emc = double(eulergamma);
u2 = theta(1) + theta(2)*X' + theta(3)*S;
v1_ccp = repmat(beta*(-log(ccp_hat(1,:))) + beta*emc,[length(X),1]); 
v2_ccp = u2 + beta*x_tran(:,:)*(0 - log(ccp_hat(:,:))) + beta*emc;
q_use=repmat(q(:,1),[1,T]);
q_use(q_use==2)=0;
ccp_lik_1 = -sum(q_use(:).*((data_d(:)==2).*(v2_ccp(data_x_index,1)-v1_ccp(data_x_index,1)) - log(1+exp(v2_ccp(data_x_index,1)-v1_ccp(data_x_index,1)))));
ccp_lik_2 = -sum((1-q_use(:)).*((data_d(:)==2).*(v2_ccp(data_x_index,2)-v1_ccp(data_x_index,2)) - log(1+exp(v2_ccp(data_x_index,2)-v1_ccp(data_x_index,2)))));
ccp_lik = ccp_lik_1+ccp_lik_2;
 
end
 

The value function iteration and data generation functions below are only used in generating the data, not in estimating the model. We have modified the data generating function slightly compared with previous lecture notes to assign exogenous mileage state variables x to each bus in the first time period observed, rather than letting the model run for a certain number of periods before sampling mileage observations. This allows us to ignore the initial conditions problem in estimating the distribution of s (see disclaimer above).

function [value_function,value_diff,ccps] = value_function_iteration(S,x_tran,X, beta,theta)
x_len = length(unique(X));
emc = double(eulergamma);
value_function_2 = zeros(x_len,length(S));
ccps=zeros(x_len,length(S));
value_diff=1;
max_iter=1000;
iter=1;
while value_diff>1e-5 && iter < max_iter
    value_function_1 = value_function_2;
    for s=1:length(S)
                v1 = repmat(beta*value_function_1(1,s),[x_len,1]);
                v2=theta(1) + theta(2)*X' + theta(3)*s + beta*(x_tran(:,:)*value_function_1(:,s));
                value_function_2(:,s) = log(exp(v1)+exp(v2)) + emc;
                ccps(:,s)=1./(1+exp(v2-v1));
    end
    iter=iter+1;
    value_diff = max(max((value_function_2 - value_function_1).^2));
end
value_function = value_function_2;
end
 
 
function [data_t, data_x, data_s, data_d, data_x_index] = generate_data(N, T, S, x_tran, x_tran_cumul, X, beta,pi,theta)
    
    initial_run=0; % We start the state variables at zero and let the model run for ten periods before we start sampling
    x_index = ones(N, T+initial_run); x_index(:,1)=randi(length(X),[N,1]);
    x_sim = zeros(N,T+initial_run); x_sim(:,1)=X(x_index(:,1));
    
    s_sim  = (rand(N,1) > pi) + 1;
    d_sim  = zeros(N, T+initial_run); 
    draw_ccp = rand(N, T+initial_run);
    draw_x  = rand(N, T+initial_run);
 
    value_function = value_function_iteration(S,x_tran,X,beta, theta);
    for n = 1:N
        for t = 1:T+initial_run
            % j=1 replace 
            u1 = 0.0;
            v1 = u1 + beta*value_function(1,s_sim(n)); % Reset to zero mileage deterministically if engine is replaced.
            % j=2 is continue
            u2 = theta(1) + theta(2) * X(x_index(n,t)) + theta(3) * s_sim(n);
            v2 = u2 + beta* x_tran(x_index(n,t),:)*value_function(:,s_sim(n));
            
            % Conditional choice probability
            p1 = exp(v1)/(exp(v1)+exp(v2)); 
 
            % Simulate next period
            d_sim(n,t)    = (draw_ccp(n,t) > p1) + 1; % 1: replace, 2: continue
            x_index(n,t+1) = (d_sim(n,t)==2) * sum((draw_x(n,t) > x_tran_cumul(x_index(n,t),:))) + 1; 
            x_sim(n,t+1) = X(x_index(n,t+1));
        end
    end
    
    data_x = x_sim(:, initial_run+1:T+initial_run);
    data_d = d_sim(:, initial_run+1:T+initial_run);
    data_s = repmat(s_sim,  [1,T]);
    data_t = repmat(transpose(1:T), [1,N])';
    [~, data_x_index] = ismember(data_x,X); 
    
    return 
 
end

References

Arcidiacono, P., & Ellickson, P. B. (2011). Practical methods for estimation of dynamic discrete choice models. Annu. Rev. Econ., 3(1), 363-394.

Arcidiacono, P., & Miller, R. A. (2011). Conditional choice probability estimation of dynamic discrete choice models with unobserved heterogeneity. Econometrica, 79(6), 1823-1867.

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), 39(1), 1-22.

Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica: Journal of the Econometric Society, 999-1033.