Dynamic Structural Econometrics - CCP estimation

Prepared by Aaron Barkley, University of Melbourne

Table of Contents

The random utility model Moving from static to dynamic Bus engine replacement Model Representation and identification Two ways to develop an estimator. Data CCP estimation Full-information maximum likelihood estimation Exercise: using the unrestricted utility estimates directly (Bonus: Method 3. Stationary representation of the ex-ante value function.) Functions Stationary CCP likelihood function Data generation Full-information maximum likelihood Value function iteration References

Disclaimer: This file has been written with an emphasis on clarity. In particular, this means that (i) the file is completely self-contained in the sense that all necessary functions are defined within the file (as opposed to be stored as separate .m files) and (ii) code has not been optimized for performance.

The random utility model

We begin by reviewing the random utility model introduced in McFadden (1974, 1981). This model features many of the components we will use in considering dynamic discrete choice models, including discrete choices (binary in our case), logistic regression, and so on.

Consider an individual decision maker deciding whether to make a binary decision. For example, suppose an individual is moving across the United States to take a new job. They are planning to drive their car to their new city and sell the car when they get there. However, the current mileage on the car indicates that the car might break down before completing the long trip to the new city. The question facing the individual is whether or not to sell/trade in their car before they leave and obtain a more reliable car.

Utility that is additively separable in observed state x and idiosyncratic utility term ϵ. Let denote the utility from selling the current car. We represent this utility as

Normalizing the deterministic component of utility for one choice to zero. In this case, we let for all x so that .
The idiosyncratic utility draws are iid from the same distribution: Type I Extreme Value. In particular, this means that follows a Logistic distribution, so that Pr, the familiar logistic probability.
Finally, we assume that the utility from selling the car is linear in the observed mileage: , where represents the monetary benefit to the individual from not having to obtain a new, more reliable car and is the expected cost of repairs if the car breaks down before reaching the destination.

Suppose we observe a set of decisions from N individuals about whether or not to take the trip with the current car. If

, the car was replaced for individual n, and if

, then the car was kept.

Taking the log and plugging in

and

gives the log-likelihood associated with observation n:

Maximizing the log-likelihood with respect to the parameters θ conditional on the observed decisions and car mileages

will generate the structural estimates. The code below simulates some sample data and estimates this model.

rng(2023);
theta=[5; -2]; % Utility parameters
Nobs=1000; % Number of observations
x=2.5+0.5*randn([Nobs,1]); % State variables
x=round(x,1);
[~,x_ind]=ismember(x,unique(x)); % Encoded states -- we'll explain this below
X=unique(x);
x_len=length(X);
ChoiceProb= exp([ones(Nobs,1), x]*theta)./(1+exp([ones(Nobs,1), x]*theta)); % Logit choice probabilities
p=rand([Nobs,1]);
d=(p<ChoiceProb); % Simulate choices with uniform random draws.
 
% Now estimate with the logit log-likelihood
lik_fun = @(theta) -sum((d==1).*([ones(Nobs,1), x]*theta) - log(1+exp([ones(Nobs,1), x]*theta)));
[theta_hat,~]=fminunc(lik_fun,[0;1],optimoptions('fminunc','Display','none'));
RUM_estimates = table(theta_hat,theta,'VariableNames',{'Logit estimates','True values'})
RUM_estimates = 2×2 table 
 Logit estimatesTrue values
14.90675
2-1.9544-2

	Logit estimates	True values
1	4.9067	5
2	-1.9544	-2

Moving from static to dynamic

Suppose that each individual was not making a one-off decision about replacing the car. For simplicity, suppose that they anticipate moving again within the next two years, and expect to have to make the same decision about whether or not to replace their car for something more reliable at that time. (For example, the individual is a recent economics PhD graduate who is taking a two year postdoc, and expects that at the end of the postdoc they will have to move again.) Each individual is rationally forward looking, and so will anticipate how obtaining a more reliable car now might help them with moving to a new city again in a few years.

Suppose we ignore this dynamic feature of the decision-making process when we estimate the model, and assume that the consumers are making one-off decisions as in the case above. What happens to our estimates? (Ignore the specific details about the data generation in the code below -- we will cover this later!)

 
emc=double(eulergamma);
beta=0.9;
ChoiceProb2=exp([ones(x_len,1), X]*theta)./(1+exp([ones(x_len,1), X]*theta));
test_ind_1=((1:x_len)+min(5,x_len-(1:x_len)))';
test_ind_2=repmat(5,[x_len,1]);
v2=[ones(x_len,1), X]*theta + beta*(emc-log(ChoiceProb2(test_ind_1)));
v1=beta*(emc-log(ChoiceProb2(test_ind_2)));
ChoiceProb1=exp(v2-v1)./(1+exp(v2-v1));
p1=p; 
p2=rand([Nobs,1]);
d1=(p1<ChoiceProb1(x_ind));
x2=5.*(1-d1) + d1.*test_ind_1(x_ind);
[~,x2_ind]=ismember(x2,(1:max(x_ind))');
d2=(p2<ChoiceProb2(x2_ind));
 
lik_fun = @(theta) -sum((d1==1).*([ones(Nobs,1), x]*theta) - log(1+exp([ones(Nobs,1), x]*theta)));
 
[theta_hat_static,~]=fminunc(lik_fun,[0;1],optimoptions('fminunc','Display','none'));
Static_estimates = table(theta_hat_static,theta,'VariableNames',{'Logit "static" estimates','True values'})
Static_estimates = 2×2 table 
 Logit "static" estimatesTrue values
12.90495
2-0.70613-2

	Logit "static" estimates	True values
1	2.9049	5
2	-0.70613	-2

As we can see from the above table, the estimates are now significantly biased. Assuming a static model when the model is dynamic and consumers are forward-looking will lead to incorrect inferences about preferences! If we fix this and account for dynamics, we get the correct estimates. (Again, ignore the implementation details -- this will get covered in the next section.)

 
lik_fun2= @(theta) -sum((d1==1).*([ones(Nobs,1), x]*theta + beta*(emc-log(ChoiceProb2(test_ind_1(x_ind))))-beta*(emc-log(ChoiceProb2(test_ind_2(x_ind))))) ...
    - log(1+exp([ones(Nobs,1), x]*theta + beta*(emc-log(ChoiceProb2(test_ind_1(x_ind))))-beta*(emc-log(ChoiceProb2(test_ind_2(x_ind)))))));
[theta_hat_dynamic,~]=fminunc(lik_fun2,[0;1],optimoptions('fminunc','Display','none'));
Dynamic_estimates = table(theta_hat_dynamic,theta,'VariableNames',{'Logit dynamic estimates','True values'})
Dynamic_estimates = 2×2 table 
 Logit dynamic estimatesTrue values
14.96515
2-1.992-2

	Logit dynamic estimates	True values
1	4.9651	5
2	-1.992	-2

Bus engine replacement

We now consider the problem of Harold Zurcher studied in Rust (1987). Zurcher maintains a fleet of buses. Maintenance on the buses becomes more costly as the bus engines accumulate mileage. Replacing the engines effectively resets the mileage on the bus and lowers the maintenance cost; however, replacing the engines is much more costly than a single maintenance. The problem facing Zurcher is to decide when to replace the engines: replacing the engines too early generates high costs through expensive engine replacements, while waiting too long means that maintenance becomes excessively costly over the long term.

We assume that, as in Rust (1987), we have monthly data on bus maintenance records. This includes data on a set of buses

over T months. The observations include mileage readings

for each bus and and indicator

for whether the engine has been replaced.

Model

Time is discrete, with denoting each period, with .
Each period, Harold Zurcher chooses between two actions. He can either replace the bus engine, , or not replace . These are mutually exclusive, so that .
Zurcher's payoff depends on a single state variable that represents the mileage of the bus. Mileage advances by a random amount if the bus engine is not replaced and is set to zero if the bus engine is replaced. Denote by the distribution over next period mileage conditional on current mileage after making choice d. When the bus engine is replaced, mileage resets to zero, so that for all . Mileage accumulation when the engine is not replaced follows a discrete analog of the exponential distribution up to some maximum mileage where mileage increment is .

In addition to mileage , there is a permanent and observable bus-specific type . Assume that the deterministic component of flow payoffs for each action for a bus of type s are given by

Payoffs also have an additively separable idiosyncratic component so that the total flow payoff from choice j in period t is . We assume that are i.i.d draws from a Type I EV distribution.
Let denote the time-varying state variable together with the permanent bus type s.
Denoting the optimal decision rule at t by , we define the ex-ante value function for state in period t as

The conditional value function is the value function for choice j in period t and state less the utility shock . In the case of Zurcher's bus engine replacement problem, these functions can be expressed as

where we have "undone" the combination of

and s in the state variable to make clear that the state transition function f only affects mileage accumulation and not the permanent bus type s.

Note that all t subscripts have been removed from value functions and state transitions: this is a stationary, infinite horizon problem.
In the binary choice setting with Type I EV preference shocks we can express conditional choice probabilities very simply:

Representation: finally, we can use the results of Hotz and Miller (1993) to express the difference between the value function and conditional value function :

where

when

Type I EV, where

is the Euler-Mascheroni constant.

Representation and identification

The mapping between the difference conditional value functions and the conditional choice probabilities

forms the basis for representation and identification. To generate the conditions that establish identification, we use the restrictions imposed on the utility function. In our case, we have assumed that

for all states

. Define

as the distribution over state variables reached after

periods with respect to the choice sequence

. For simplicity, assume for now that there is only a single bus type s, so that we can effectively ignore this state variable and express states and transitions only in terms of bus mileage s. The distribution over period τ states

is expressed as

In our setting, we will define

, i.e., the reference choice in each period is to replace the engine. Repeated application of

using replacement as the reference choice in each period

generates that

This equation establishes identification of

for each

: everything on the right-hand side is known (

) or can be estimated (

). It is useful to "stack" the moment condition equations for each state x and express the moment conditions in matrix form.

Two ways to develop an estimator.

With this matrix representation, we can write the moment conditions in matrix form in two ways that permit straightforward estimation.

1. Using the unrestricted utilities.

When

, we can use the matrix version of geometric series convergence to express the utility as:

Note that everything on the right-hand side is known (utilities, β, transition matrices) or can be estimated (

2. Using the differenced conditional value functions.

Recall that if we can express

in terms of the parameters θ, then we can use the logit estimator directly. While we can derive an expression directly from the representation identity, it is easiest to work from the unrestricted utility estimates above.

This form allows us to use the logit estimator:

is a known function of the structural parameter vector θ, and everything else is known or can be estimated.

(3. Bonus method: using the ex-ante value function)

We can also express the ex-ante value function with this type of matrix representation (see Arcidiacono and Ellickson (2011) for additional discussion):

where ∗ is element-wise multiplication,

is the

vector of choice probabilities for choice d and λ is an

vector of ones.

Methods 1 and 2 above are especially useful when both (i)

is linear in θ and (ii)

for all x. In settings without these assumptions, Method 3 using parameters from both utility functions

and

may be a more practical way to represent the value function. Note that the second term in parentheses above is simply the probability that each choice is optimal

multiplied by the expected utility conditional on that choice being the optimal choice

for each possible choice d.

Data

clc; clear
rng(19);
 
%x_min = 0.0 ; x_max = 15.0; 
%delta_x = 0.05;
x_min = 0.0 ; x_max = 18.0; 
delta_x = 0.9;
x_len = (x_max-x_min)/delta_x+1;

We have set

and

. With mileage starting at zero, there are 21 state values.

format shortG
x = linspace(x_min, x_max, x_len);
StatesTable = table((1:10)',x(1:10)','VariableNames',{'State index (encoded state)','State value'})
StatesTable = 10×2 table 
 State index (encoded state)State value
110
220.9
331.8
442.7
553.6
664.5
775.4
886.3
997.2
10108.1

	State index (encoded state)	State value
1	1	0
2	2	0.9
3	3	1.8
4	4	2.7
5	5	3.6
6	6	4.5
7	7	5.4
8	8	6.3
9	9	7.2
10	10	8.1

Knowing the state values is needed for calculating the flow payoffs

, but for nearly every other calculation we are primarily interested in the state's index. We call this the encoded state: it maps value associated with a state to that state's number. For example, in the representation above, the encoded state of mileage = 0.2 is 5. (Note that this will change slightly below when we add the bus types to the state variable.)

Calculating the probability of transitioning from one state to another requires referring to the relevant row and column indices in the state transition matrix f. Recall that

for

. We can define the state transition matrix in two steps. First, we assign the probabilities for each state

, and then we lump all of the leftover probability from mileage larger than

into the last state.

x_tday = repmat(x, [x_len,1]);
x_next = x_tday';                                      
f = (x_next >= x_tday)' .* exp(- (x_next - x_tday)') .* (1 - exp(-delta_x)); 
f(:,end) = 1 - sum(f(:,1:(end-1)), 2); % The last column is set so that the sum of probs is 1
F = cumsum(f(:,:), 2);
 
F2=f;
F1=zeros(size(f));
F1(:,1)=1;
emc = double(eulergamma);
F2(1:10,1:10)
ans = 10×10
      0.59343      0.24127     0.098093     0.039882     0.016215    0.0065924    0.0026803    0.0010897   0.00044305   0.00018013
            0      0.59343      0.24127     0.098093     0.039882     0.016215    0.0065924    0.0026803    0.0010897   0.00044305
            0            0      0.59343      0.24127     0.098093     0.039882     0.016215    0.0065924    0.0026803    0.0010897
            0            0            0      0.59343      0.24127     0.098093     0.039882     0.016215    0.0065924    0.0026803
            0            0            0            0      0.59343      0.24127     0.098093     0.039882     0.016215    0.0065924
            0            0            0            0            0      0.59343      0.24127     0.098093     0.039882     0.016215
            0            0            0            0            0            0      0.59343      0.24127     0.098093     0.039882
            0            0            0            0            0            0            0      0.59343      0.24127     0.098093
            0            0            0            0            0            0            0            0      0.59343      0.24127
            0            0            0            0            0            0            0            0            0      0.59343

Defined in this way, each row i of f corresponds to the probability distribution over states conditional on state i. The first row is the distribution over next period states conditional on the bus engine having a mileage of zero. The probability of remaining at zero is

, the probability of mileage increasing to state 2 is

, and so on. All elements below the diagonal are zero because the bus mileage cannot decrease without replacing the engine.

We assume that the replacement decision resets mileage to zero and that the bus cannot accumulate any mileage in this period (i.e., the act of replacing the engine takes the entire period and no driving can occur for that bus).

Data generation

We use the following parameters and data settings to simulate the data, where π is the fraction of buses that have

beta = 0.9;

theta = [2.00, -0.15, 1.0];

T=15;

S=1:2;

X=x;

N=1000;

pi=0.4;

[data_t, data_x, data_s, data_d, data_x_index] = generate_data(N, T, S, f, F, X,beta, pi, theta);

plot(data_t(1,1:12), data_x(1,1:12));

xline(data_t(1,data_d(1,1:12)==1),'--');

%xline(7,'--');

xlabel('Time'); ylabel('Mileage');title('Mileage and replacement for bus #1');

legend('Mileage','Engine replacement','location','northwest');

% Optional: save data

%Data_vars={'data_t', 'data_x', 'data_s', 'data_d', 'data_x_index'};

%Const_vars={'beta','theta','S','X','x_len','pi','T','x','N','f','F','F1','F2','emc'};

%All_vars=[Data_vars, Const_vars];

%save("Bus_engine_ccp_data.mat",All_vars{:});

CCP estimation

The CCP estimation routine has three steps.

(i) Estimating the conditional choice probabilities

Our first step is to estimate the conditional choice probabilities. We can obtain the maximum likehood estimates by a simple frequency estimator:

% Optional: load data
%load("Bus_engine_ccp_data.mat");
 
ccp_hat = zeros(length(X),length(S));
epsilon=1e-4;
for j=1:length(X)
    for s=1:length(S)
        ccp_hat(j,s) = sum(data_d(data_x_index(:)==j & data_s(:)==s)==1)/sum(data_x_index(:)==j & data_s(:)==s);
    end
end
never_reached=isnan(ccp_hat(:)); % keep track of states that never occur in the data
ccp_hat=max(epsilon, min(1-epsilon,ccp_hat)); % Makes sure that all probabilities are in the open interval (0,1), as "1" and "0" observations cause problems.
 

In practice, it is common to use a parametric first-stage and estimate the conditional choice probabilities with a flexible parametric or semi-parametric form. Here we use a logit with quadratic and interaction terms to estimate the conditional choice probabilities.

Remember, the goal at this stage is choice prediction using the observed states. Many different estimators can be used in this first stage. The primary objective is that capture how states predict choices as accurately as possible. We estimate λ using the parametric representation above and then use

to generate fitted values for the conditional choice probabilities.

start_est=tic;
X_logit=[data_x(:) data_x(:).^2 (data_s(:)==2) (data_s(:)==2).*data_x(:)]';
lambda_hat = mnrfit(X_logit',data_d(:));
 
ccp_hat_logit=zeros(length(X),length(S));
 
for s=1:length(S)
    X_logit_fit=[X', X'.^2, (s==2)*ones(size(X,2),1), (s==2).*X' ];
    temp_ccps=mnrval(lambda_hat,X_logit_fit);
    ccp_hat_logit(:,s)=temp_ccps(:,1);
end
 
ccp_hat = ccp_hat_logit;
 
% Compare with true CCPs
[~,~,ccps_true] = value_function_iteration(S,F2,X, beta,theta);
 
%figure(1);subplot(1,2,1);h=heatmap(1-ccp_hat(1:20,:));
%h.Title='Estimates for non-replacement p_2';h.XLabel = 'Bus type s'; h.YLabel='Mileage state index';
%subplot(1,2,2);h2=heatmap(1-ccps_true(1:20,:));
%h2.Title='True p_2';h2.XLabel = 'Bus type s'; h2.YLabel='Mileage state index';

Combining states: to have fewer expressions to work with, we will combine the two states, mileage and bus type, into a single state variable (

) and work with this object.

all_states = [X', ones(length(X),1); X', 2*ones(length(X),1)];
ccp_tot=ccp_hat(:);
F1_tot=[F1, zeros(x_len); zeros(x_len), F1];
F2_tot=[F2, zeros(x_len); zeros(x_len), F2];
[~, x_s_index]=ismember( [data_x(:), data_s(:)], all_states, 'rows');
 
num_states = size(all_states,1);

The variable is the encoded state of each observation in the data. It maps each observation to a row of the matrix.
The matrix and are the combined state transitions for all states (all mileage states plus the two observed bus types). These are constructed by blocks of zeros and the original state transition matrices because bus types cannot change: a "type 2" bus will always be a "type 2" bus, so all probabilities associated with "type 1" bus states are zeros.

(ii) Calculate the future value term

BigV_diff = beta*(F2_tot - F1_tot)*((eye(num_states) - beta*F1_tot)\(emc-log(ccp_tot)));
 

(iii) Estimate θ using the logit representation

Recalling that

, it is straightforward to define the log-likelihood function. This is similar to the log-likelihood for the static random utility model; the difference is that there is a state-specific offset term (our "BigV_diff") applied to each observation ("BigV_est").

X_logit_struct = [ones(num_states,1) all_states];
 
X_est = X_logit_struct(x_s_index,:); % Get the state variables associated with each observation
BigV_est = BigV_diff(x_s_index); % Get the BigV_diff term associated with each observation
 
% Log-likelihood: like static case, but with a constant "BigV" term for each state:
lik_fun = @(theta) -sum((data_d(:)==2).*(X_est*theta + BigV_est) - log(1+exp(X_est*theta + BigV_est)));
[theta_hat_ccp,fval_ccp]=fminunc(lik_fun,[1,1,1]');
Local minimum possible.

fminunc stopped because the size of the current step is less than
the value of the step size tolerance.

<stopping criteria details>
LogitCCPEstimatesTable = table(theta_hat_ccp,theta','VariableNames',{'Logit CCP estimates','True values'})
LogitCCPEstimatesTable = 3×2 table 
 Logit CCP estimatesTrue values
12.00272
2-0.15967-0.15
30.978111

	Logit CCP estimates	True values
1	2.0027	2
2	-0.15967	-0.15
3	0.97811	1

The table above shows that the estimated values are quite similar to the true values and can be generated with two logistic regressions and a few calculations.

Full-information maximum likelihood estimation

For reference, we also include a full information maximum likelihood estimation routine based on the nested fixed point. This version is simple to code but computationally very slow; however, it gives a sense of how close the CCPs estimate are to a fully efficient benchmark.

fiml_start=tic;
fiml_fun = @(theta) FIML_log_likelihood(theta,data_x,data_s,data_d,X,S,data_x_index,f,beta);
 
theta_0=0.1*ones(3,1);
options = optimoptions('fminunc','Display','none');
[theta_hat,~] = fminunc(fiml_fun,theta_0,options);
FIML_EstimatesTable = table(theta_hat,theta','VariableNames',{'FIML estimates','True values'})
FIML_EstimatesTable = 3×2 table 
 FIML estimatesTrue values
12.06432
2-0.15989-0.15
30.984111
fiml_time=toc(fiml_start)
fiml_time = 
      0.31592

	FIML estimates	True values
1	2.0643	2
2	-0.15989	-0.15
3	0.98411	1

Exercise: using the unrestricted utility estimates directly

Defining the estimator

This estimator follows the moment condition formulation above, using the unrestricted utility estimates to develop a minimum distance estimator (Altug and Miller (1998)).

We estimate the unrestricted utility of not replacing the engine for each state according to

where

is an

vector of zeros, and

is an

vector of utilities for choice 2.

Finally, we define a weighting matrix W and estimate the parameters so as to minimize the weighted squared deviation.

where

and

are vectors of the utilities associated with each state.

Exercise: Implement this estimator using the estimated conditional choice probabilities from the previous sections. Use the identity matrix as a weighting matrix. When you construct the estimator, use one moment condition for each observation; that is, you should use the

variable to "copy" moment conditions corresponding to the number of times they appear in the data. You should estimate θ using an OLS estimator.

% Exercise code here

(Bonus: Method 3. Stationary representation of the ex-ante value function.)

Recall from above that, using a similar derivation as the unrestricted utility estimates from Section 1.2, we can express the ex-ante value function in terms of the flow utility, state transition probabilities, and Ψ terms (see Arcidiacono and Ellickson (2011) for details):

where ∗ is element-wise multiplication,

is the

vector of choice probabilities for choice d and λ is an

vector of ones. The relevant term for evaluating the logit likelihood is

, which we can express as

with the above expression for

, we can now have an expression for the differenced conditional value function that we can use in the logit likelihood.

Now we are ready to begin construction of the estimator. One of the advantages of the CCP approach is that we can perform many of these calculations outside of the optimization routine, reducing the number of calculations that need to be conducted for each optimization iteration.

To simplify the representation, we construct the components of the ex-ante value function

in pieces. We start with the term

, which we define as

V1=(eye(size(all_states,1))-beta*(repmat(ccp_tot,[1,length(all_states)]).*F1_tot + repmat(1-ccp_tot,[1,length(all_states)]).*F2_tot));

Next, we define

V2=(ccp_tot.*(emc-log(ccp_tot))+(1-ccp_tot).*(emc - log(1-ccp_tot)));

Finally, we have

. Recalling that

, we can express this as

. We're going to leave off the θ term for now, as this will be the main argument in the optimization function:

V3=((1-ccp_tot).*[ones(length(all_states),1) all_states]);

Finally, we write the optimization function (see "ccp_stationary_likelihood" below) and define the likelihood function inline using the inputs we created above:

 
temp_lik=@(theta) ccp_stationary_likelihood(theta, data_d, F1_tot, F2_tot,x_s_index,all_states,beta,V1,V2,V3);
[theta_hat_stationary,fval]=fminunc(temp_lik,[1,1,1]');
Local minimum found.

Optimization completed because the size of the gradient is less than
the value of the optimality tolerance.

<stopping criteria details>
est_time=toc(start_est)
est_time = 
       1.2722
StationaryCCPEstimatesTable = table(theta_hat_stationary,theta','VariableNames',{'Stationary CCP estimates','True values'})
StationaryCCPEstimatesTable = 3×2 table 
 Stationary CCP estimatesTrue values
12.06072
2-0.15926-0.15
30.984271

	Stationary CCP estimates	True values
1	2.0607	2
2	-0.15926	-0.15
3	0.98427	1

Functions

Stationary CCP likelihood function

function [lik]=ccp_stationary_likelihood(theta, data_d, F1_tot, F2_tot,x_s_index,all_states,beta,V1,V2,V3)
 

We first "re-assemble" the pieces we defined above. The value defined below is

V_exante=V1\(V2+V3*theta); % Using the "slash" notation for matrix inverse multiplication as recommended by Matlab.

Finally, we use the ex-ante value function to evaluate the differenced conditional value function for each observation in the sample. We use the "x_s_index" variable, which maps each state observation into an index for one of the 602 state variables, to do this in one step.

v_diff =[ones(size(x_s_index,1),1) all_states(x_s_index,:)]*theta + beta*(F2_tot(x_s_index,:)-F1_tot(x_s_index,:))*V_exante;
lik=-sum((data_d(:)==2).*(v_diff) - log(1+exp(v_diff)));
end

Data generation

function [data_t, data_x, data_s, data_d, data_x_index] = generate_data(N, T, S, x_tran, x_tran_cumul, X, beta,pi,theta)
    
    initial_run=5; % We start the state variables at zero and let the model run for five periods before we start sampling
    x_index = ones(N, T+initial_run); 
    x_sim = zeros(N,T+initial_run);
    
    s_sim  = (rand(N,1) > pi) + 1;
    d_sim  = zeros(N, T+initial_run); 
    draw_ccp = rand(N, T+initial_run);
    draw_x  = rand(N, T+initial_run);
 
    value_function = value_function_iteration(S,x_tran,X,beta, theta);
    for n = 1:N
        for t = 1:T+initial_run
            % j=1 replace 
            u1 = 0.0;
            v1 = u1 + beta*value_function(1,s_sim(n)); % Reset to zero mileage deterministically if engine is replaced.
            % j=2 is continue
            u2 = theta(1) + theta(2) * X(x_index(n,t)) + theta(3) * s_sim(n);
            v2 = u2 + beta* x_tran(x_index(n,t),:)*value_function(:,s_sim(n));
            
            % Conditional choice probability
            p1 = exp(v1)/(exp(v1)+exp(v2)); 
 
            % Simulate next period
            d_sim(n,t)    = (draw_ccp(n,t) > p1) + 1; % 1: replace, 2: continue
            x_index(n,t+1) = (d_sim(n,t)==2) * sum((draw_x(n,t) > x_tran_cumul(x_index(n,t),:))) + 1; 
            x_sim(n,t+1) = X(x_index(n,t+1));
        end
    end
    
    data_x = x_sim(:, initial_run+1:T+initial_run);
    data_d = d_sim(:, initial_run+1:T+initial_run);
    data_s = repmat(s_sim,  [1,T]);
    data_t = repmat(transpose(1:T), [1,N])';
    [~, data_x_index] = ismember(data_x,X); 
    
    return 
 
end
 

Full-information maximum likelihood

function [fiml_lik] = FIML_log_likelihood(theta,data_x,data_s,data_d,X,S,data_x_index,x_tran,beta)
 
value_function = value_function_iteration(S,x_tran,X,beta,theta);
 
D_temp = data_d(:);
S_temp = data_s(:);
X_index_temp = data_x_index(:);
 
v1 = repmat(beta*value_function(1,S),[length(X),1]);
v2 = theta(1) + theta(2)*X' + theta(3)*S + beta*(x_tran(:,:)*value_function(:,:));
 
x_s_index    = sub2ind(size(v2), X_index_temp, S_temp);
fiml_lik = -sum((D_temp==2).*(v2(x_s_index)-v1(x_s_index)) - log(1+exp(v2(x_s_index)-v1(x_s_index))));
 
end

Value function iteration

function [value_function,value_diff,ccps] = value_function_iteration(S,x_tran,X, beta,theta)
x_len = length(unique(X));
emc = double(eulergamma);
value_function_2 = zeros(x_len,length(S));
ccps=zeros(x_len,length(S));
value_diff=1;
max_iter=1000;
iter=1;
while value_diff>1e-5 && iter < max_iter
    value_function_1 = value_function_2;
    for s=1:length(S)
                v1 = repmat(beta*value_function_1(1,s),[x_len,1]);
                v2=theta(1) + theta(2)*X' + theta(3)*s + beta*(x_tran(:,:)*value_function_1(:,s));
                value_function_2(:,s) = log(exp(v1)+exp(v2)) + emc;
                ccps(:,s)=1./(1+exp(v2-v1));
    end
    iter=iter+1;
    value_diff = max(max((value_function_2 - value_function_1).^2));
end
value_function = value_function_2;
end
 

References

Abbring, J. H., & Daljord, Ø. (2020). Identifying the discount factor in dynamic discrete choice models. Quantitative Economics, 11(2), 471-501.

Altuğ, S., & Miller, R. A. (1998). The effect of work experience on female wages and labour supply. The Review of Economic Studies, 65(1), 45-85.

Arcidiacono, P., & Ellickson, P. B. (2011). Practical methods for estimation of dynamic discrete choice models. Annu. Rev. Econ., 3(1), 363-394.

Arcidiacono, P., & Miller, R. A. (2011). Conditional choice probability estimation of dynamic discrete choice models with unobserved heterogeneity. Econometrica, 79(6), 1823-1867.

Caoui, E. H., Hollenbeck, B., & Osborne, M. (2022). The Impact of Dollar Store Expansion on Local Market Structure and Food Access. Working paper.

Kalouptsidi, M., Scott, P. T., & Souza-Rodrigues, E. (2021). Linear IV regression estimators for structural dynamic discrete choice models. Journal of Econometrics, 222(1), 778-804.

Keane, M. P., & Wolpin, K. I. (1997). The career decisions of young men. Journal of political Economy, 105(3), 473-522.

Magnac, T., & Thesmar, D. (2002). Identifying dynamic discrete decision processes. Econometrica, 70(2), 801-816.

McFadden, D. (1981). Econometric Models of Probabilistic Choice. Structural analysis of discrete data with econometric applications, 198272.

Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica: Journal of the Econometric Society, 999-1033.