Dynamic Structural Econometrics - CCP estimation

Prepared by Aaron Barkley, University of Melbourne
Table of Contents
Disclaimer: This file has been written with an emphasis on clarity. In particular, this means that (i) the file is completely self-contained in the sense that all necessary functions are defined within the file (as opposed to be stored as separate .m files) and (ii) code has not been optimized for performance.

The random utility model

We begin by reviewing the random utility model introduced in McFadden (1974, 1981). This model features many of the components we will use in considering dynamic discrete choice models, including discrete choices (binary in our case), logistic regression, and so on.
Consider an individual decision maker deciding whether to make a binary decision. For example, suppose an individual is moving across the United States to take a new job. They are planning to drive their car to their new city and sell the car when they get there. However, the current mileage on the car indicates that the car might break down before completing the long trip to the new city. The question facing the individual is whether or not to sell/trade in their car before they leave and obtain a more reliable car.
Suppose we observe a set of decisions from N individuals about whether or not to take the trip with the current car. If , the car was replaced for individual n, and if , then the car was kept.
Taking the log and plugging in and gives the log-likelihood associated with observation n:
Maximizing the log-likelihood with respect to the parameters θ conditional on the observed decisions and car mileages will generate the structural estimates. The code below simulates some sample data and estimates this model.
rng(2023);
theta=[5; -2]; % Utility parameters
Nobs=1000; % Number of observations
x=2.5+0.5*randn([Nobs,1]); % State variables
x=round(x,1);
[~,x_ind]=ismember(x,unique(x)); % Encoded states -- we'll explain this below
X=unique(x);
x_len=length(X);
ChoiceProb= exp([ones(Nobs,1), x]*theta)./(1+exp([ones(Nobs,1), x]*theta)); % Logit choice probabilities
p=rand([Nobs,1]);
d=(p<ChoiceProb); % Simulate choices with uniform random draws.
 
% Now estimate with the logit log-likelihood
lik_fun = @(theta) -sum((d==1).*([ones(Nobs,1), x]*theta) - log(1+exp([ones(Nobs,1), x]*theta)));
[theta_hat,~]=fminunc(lik_fun,[0;1],optimoptions('fminunc','Display','none'));
RUM_estimates = table(theta_hat,theta,'VariableNames',{'Logit estimates','True values'})
RUM_estimates = 2×2 table
 Logit estimatesTrue values
14.90675
2-1.9544-2

Moving from static to dynamic

Suppose that each individual was not making a one-off decision about replacing the car. For simplicity, suppose that they anticipate moving again within the next two years, and expect to have to make the same decision about whether or not to replace their car for something more reliable at that time. (For example, the individual is a recent economics PhD graduate who is taking a two year postdoc, and expects that at the end of the postdoc they will have to move again.) Each individual is rationally forward looking, and so will anticipate how obtaining a more reliable car now might help them with moving to a new city again in a few years.
Suppose we ignore this dynamic feature of the decision-making process when we estimate the model, and assume that the consumers are making one-off decisions as in the case above. What happens to our estimates? (Ignore the specific details about the data generation in the code below -- we will cover this later!)
 
emc=double(eulergamma);
beta=0.9;
ChoiceProb2=exp([ones(x_len,1), X]*theta)./(1+exp([ones(x_len,1), X]*theta));
test_ind_1=((1:x_len)+min(5,x_len-(1:x_len)))';
test_ind_2=repmat(5,[x_len,1]);
v2=[ones(x_len,1), X]*theta + beta*(emc-log(ChoiceProb2(test_ind_1)));
v1=beta*(emc-log(ChoiceProb2(test_ind_2)));
ChoiceProb1=exp(v2-v1)./(1+exp(v2-v1));
p1=p;
p2=rand([Nobs,1]);
d1=(p1<ChoiceProb1(x_ind));
x2=5.*(1-d1) + d1.*test_ind_1(x_ind);
[~,x2_ind]=ismember(x2,(1:max(x_ind))');
d2=(p2<ChoiceProb2(x2_ind));
 
lik_fun = @(theta) -sum((d1==1).*([ones(Nobs,1), x]*theta) - log(1+exp([ones(Nobs,1), x]*theta)));
 
[theta_hat_static,~]=fminunc(lik_fun,[0;1],optimoptions('fminunc','Display','none'));
Static_estimates = table(theta_hat_static,theta,'VariableNames',{'Logit "static" estimates','True values'})
Static_estimates = 2×2 table
 Logit "static" estimatesTrue values
12.90495
2-0.70613-2
As we can see from the above table, the estimates are now significantly biased. Assuming a static model when the model is dynamic and consumers are forward-looking will lead to incorrect inferences about preferences! If we fix this and account for dynamics, we get the correct estimates. (Again, ignore the implementation details -- this will get covered in the next section.)
 
lik_fun2= @(theta) -sum((d1==1).*([ones(Nobs,1), x]*theta + beta*(emc-log(ChoiceProb2(test_ind_1(x_ind))))-beta*(emc-log(ChoiceProb2(test_ind_2(x_ind))))) ...
- log(1+exp([ones(Nobs,1), x]*theta + beta*(emc-log(ChoiceProb2(test_ind_1(x_ind))))-beta*(emc-log(ChoiceProb2(test_ind_2(x_ind)))))));
[theta_hat_dynamic,~]=fminunc(lik_fun2,[0;1],optimoptions('fminunc','Display','none'));
Dynamic_estimates = table(theta_hat_dynamic,theta,'VariableNames',{'Logit dynamic estimates','True values'})
Dynamic_estimates = 2×2 table
 Logit dynamic estimatesTrue values
14.96515
2-1.992-2

Bus engine replacement

We now consider the problem of Harold Zurcher studied in Rust (1987). Zurcher maintains a fleet of buses. Maintenance on the buses becomes more costly as the bus engines accumulate mileage. Replacing the engines effectively resets the mileage on the bus and lowers the maintenance cost; however, replacing the engines is much more costly than a single maintenance. The problem facing Zurcher is to decide when to replace the engines: replacing the engines too early generates high costs through expensive engine replacements, while waiting too long means that maintenance becomes excessively costly over the long term.
We assume that, as in Rust (1987), we have monthly data on bus maintenance records. This includes data on a set of buses over T months. The observations include mileage readings for each bus and and indicator for whether the engine has been replaced.

Model

where we have "undone" the combination of and s in the state variable to make clear that the state transition function f only affects mileage accumulation and not the permanent bus type s.
where when Type I EV, where is the Euler-Mascheroni constant.

Representation and identification

The mapping between the difference conditional value functions and the conditional choice probabilities forms the basis for representation and identification. To generate the conditions that establish identification, we use the restrictions imposed on the utility function. In our case, we have assumed that for all states . Define as the distribution over state variables reached after periods with respect to the choice sequence . For simplicity, assume for now that there is only a single bus type s, so that we can effectively ignore this state variable and express states and transitions only in terms of bus mileage s. The distribution over period τ states is expressed as
In our setting, we will define , i.e., the reference choice in each period is to replace the engine. Repeated application of using replacement as the reference choice in each period generates that
This equation establishes identification of for each : everything on the right-hand side is known () or can be estimated (). It is useful to "stack" the moment condition equations for each state x and express the moment conditions in matrix form.

Two ways to develop an estimator.

With this matrix representation, we can write the moment conditions in matrix form in two ways that permit straightforward estimation.
1. Using the unrestricted utilities.
When , we can use the matrix version of geometric series convergence to express the utility as:
Note that everything on the right-hand side is known (utilities, β, transition matrices) or can be estimated ().
2. Using the differenced conditional value functions.
Recall that if we can express in terms of the parameters θ, then we can use the logit estimator directly. While we can derive an expression directly from the representation identity, it is easiest to work from the unrestricted utility estimates above.
This form allows us to use the logit estimator: is a known function of the structural parameter vector θ, and everything else is known or can be estimated.
(3. Bonus method: using the ex-ante value function)
We can also express the ex-ante value function with this type of matrix representation (see Arcidiacono and Ellickson (2011) for additional discussion):
where is element-wise multiplication, is the vector of choice probabilities for choice d and λ is an vector of ones.
Methods 1 and 2 above are especially useful when both (i) is linear in θ and (ii) for all x. In settings without these assumptions, Method 3 using parameters from both utility functions and may be a more practical way to represent the value function. Note that the second term in parentheses above is simply the probability that each choice is optimal multiplied by the expected utility conditional on that choice being the optimal choice for each possible choice d.

Data

clc; clear
rng(19);
 
%x_min = 0.0 ; x_max = 15.0;
%delta_x = 0.05;
x_min = 0.0 ; x_max = 18.0;
delta_x = 0.9;
x_len = (x_max-x_min)/delta_x+1;
We have set and . With mileage starting at zero, there are 21 state values.
format shortG
x = linspace(x_min, x_max, x_len);
StatesTable = table((1:10)',x(1:10)','VariableNames',{'State index (encoded state)','State value'})
StatesTable = 10×2 table
 State index (encoded state)State value
110
220.9
331.8
442.7
553.6
664.5
775.4
886.3
997.2
10108.1
Knowing the state values is needed for calculating the flow payoffs , but for nearly every other calculation we are primarily interested in the state's index. We call this the encoded state: it maps value associated with a state to that state's number. For example, in the representation above, the encoded state of mileage = 0.2 is 5. (Note that this will change slightly below when we add the bus types to the state variable.)
Calculating the probability of transitioning from one state to another requires referring to the relevant row and column indices in the state transition matrix f. Recall that for . We can define the state transition matrix in two steps. First, we assign the probabilities for each state , and then we lump all of the leftover probability from mileage larger than into the last state.
x_tday = repmat(x, [x_len,1]);
x_next = x_tday';
f = (x_next >= x_tday)' .* exp(- (x_next - x_tday)') .* (1 - exp(-delta_x));
f(:,end) = 1 - sum(f(:,1:(end-1)), 2); % The last column is set so that the sum of probs is 1
F = cumsum(f(:,:), 2);
 
F2=f;
F1=zeros(size(f));
F1(:,1)=1;
emc = double(eulergamma);
F2(1:10,1:10)
ans = 10×10
0.59343 0.24127 0.098093 0.039882 0.016215 0.0065924 0.0026803 0.0010897 0.00044305 0.00018013 0 0.59343 0.24127 0.098093 0.039882 0.016215 0.0065924 0.0026803 0.0010897 0.00044305 0 0 0.59343 0.24127 0.098093 0.039882 0.016215 0.0065924 0.0026803 0.0010897 0 0 0 0.59343 0.24127 0.098093 0.039882 0.016215 0.0065924 0.0026803 0 0 0 0 0.59343 0.24127 0.098093 0.039882 0.016215 0.0065924 0 0 0 0 0 0.59343 0.24127 0.098093 0.039882 0.016215 0 0 0 0 0 0 0.59343 0.24127 0.098093 0.039882 0 0 0 0 0 0 0 0.59343 0.24127 0.098093 0 0 0 0 0 0 0 0 0.59343 0.24127 0 0 0 0 0 0 0 0 0 0.59343
Defined in this way, each row i of f corresponds to the probability distribution over states conditional on state i. The first row is the distribution over next period states conditional on the bus engine having a mileage of zero. The probability of remaining at zero is , the probability of mileage increasing to state 2 is , and so on. All elements below the diagonal are zero because the bus mileage cannot decrease without replacing the engine.
We assume that the replacement decision resets mileage to zero and that the bus cannot accumulate any mileage in this period (i.e., the act of replacing the engine takes the entire period and no driving can occur for that bus).
Data generation
We use the following parameters and data settings to simulate the data, where π is the fraction of buses that have :
beta = 0.9;
theta = [2.00, -0.15, 1.0];
T=15;
S=1:2;
X=x;
N=1000;
pi=0.4;
[data_t, data_x, data_s, data_d, data_x_index] = generate_data(N, T, S, f, F, X,beta, pi, theta);
 
 
plot(data_t(1,1:12), data_x(1,1:12));
xline(data_t(1,data_d(1,1:12)==1),'--');
%xline(7,'--');
xlabel('Time'); ylabel('Mileage');title('Mileage and replacement for bus #1');
legend('Mileage','Engine replacement','location','northwest');
 
% Optional: save data
%Data_vars={'data_t', 'data_x', 'data_s', 'data_d', 'data_x_index'};
%Const_vars={'beta','theta','S','X','x_len','pi','T','x','N','f','F','F1','F2','emc'};
%All_vars=[Data_vars, Const_vars];
%save("Bus_engine_ccp_data.mat",All_vars{:});

CCP estimation

The CCP estimation routine has three steps.
(i) Estimating the conditional choice probabilities
Our first step is to estimate the conditional choice probabilities. We can obtain the maximum likehood estimates by a simple frequency estimator:
% Optional: load data
%load("Bus_engine_ccp_data.mat");
 
ccp_hat = zeros(length(X),length(S));
epsilon=1e-4;
for j=1:length(X)
for s=1:length(S)
ccp_hat(j,s) = sum(data_d(data_x_index(:)==j & data_s(:)==s)==1)/sum(data_x_index(:)==j & data_s(:)==s);
end
end
never_reached=isnan(ccp_hat(:)); % keep track of states that never occur in the data
ccp_hat=max(epsilon, min(1-epsilon,ccp_hat)); % Makes sure that all probabilities are in the open interval (0,1), as "1" and "0" observations cause problems.
 
In practice, it is common to use a parametric first-stage and estimate the conditional choice probabilities with a flexible parametric or semi-parametric form. Here we use a logit with quadratic and interaction terms to estimate the conditional choice probabilities.
Remember, the goal at this stage is choice prediction using the observed states. Many different estimators can be used in this first stage. The primary objective is that capture how states predict choices as accurately as possible. We estimate λ using the parametric representation above and then use to generate fitted values for the conditional choice probabilities.
start_est=tic;
X_logit=[data_x(:) data_x(:).^2 (data_s(:)==2) (data_s(:)==2).*data_x(:)]';
lambda_hat = mnrfit(X_logit',data_d(:));
 
ccp_hat_logit=zeros(length(X),length(S));
 
for s=1:length(S)
X_logit_fit=[X', X'.^2, (s==2)*ones(size(X,2),1), (s==2).*X' ];
temp_ccps=mnrval(lambda_hat,X_logit_fit);
ccp_hat_logit(:,s)=temp_ccps(:,1);
end
 
ccp_hat = ccp_hat_logit;
 
% Compare with true CCPs
[~,~,ccps_true] = value_function_iteration(S,F2,X, beta,theta);
 
%figure(1);subplot(1,2,1);h=heatmap(1-ccp_hat(1:20,:));
%h.Title='Estimates for non-replacement p_2';h.XLabel = 'Bus type s'; h.YLabel='Mileage state index';
%subplot(1,2,2);h2=heatmap(1-ccps_true(1:20,:));
%h2.Title='True p_2';h2.XLabel = 'Bus type s'; h2.YLabel='Mileage state index';
Combining states: to have fewer expressions to work with, we will combine the two states, mileage and bus type, into a single state variable () and work with this object.
all_states = [X', ones(length(X),1); X', 2*ones(length(X),1)];
ccp_tot=ccp_hat(:);
F1_tot=[F1, zeros(x_len); zeros(x_len), F1];
F2_tot=[F2, zeros(x_len); zeros(x_len), F2];
[~, x_s_index]=ismember( [data_x(:), data_s(:)], all_states, 'rows');
 
num_states = size(all_states,1);
(ii) Calculate the future value term
BigV_diff = beta*(F2_tot - F1_tot)*((eye(num_states) - beta*F1_tot)\(emc-log(ccp_tot)));
 
(iii) Estimate θ using the logit representation
Recalling that , it is straightforward to define the log-likelihood function. This is similar to the log-likelihood for the static random utility model; the difference is that there is a state-specific offset term (our "BigV_diff") applied to each observation ("BigV_est").
X_logit_struct = [ones(num_states,1) all_states];
 
X_est = X_logit_struct(x_s_index,:); % Get the state variables associated with each observation
BigV_est = BigV_diff(x_s_index); % Get the BigV_diff term associated with each observation
 
% Log-likelihood: like static case, but with a constant "BigV" term for each state:
lik_fun = @(theta) -sum((data_d(:)==2).*(X_est*theta + BigV_est) - log(1+exp(X_est*theta + BigV_est)));
[theta_hat_ccp,fval_ccp]=fminunc(lik_fun,[1,1,1]');
Local minimum possible. fminunc stopped because the size of the current step is less than the value of the step size tolerance. <stopping criteria details>
LogitCCPEstimatesTable = table(theta_hat_ccp,theta','VariableNames',{'Logit CCP estimates','True values'})
LogitCCPEstimatesTable = 3×2 table
 Logit CCP estimatesTrue values
12.00272
2-0.15967-0.15
30.978111

The table above shows that the estimated values are quite similar to the true values and can be generated with two logistic regressions and a few calculations.

Full-information maximum likelihood estimation

For reference, we also include a full information maximum likelihood estimation routine based on the nested fixed point. This version is simple to code but computationally very slow; however, it gives a sense of how close the CCPs estimate are to a fully efficient benchmark.
fiml_start=tic;
fiml_fun = @(theta) FIML_log_likelihood(theta,data_x,data_s,data_d,X,S,data_x_index,f,beta);
 
theta_0=0.1*ones(3,1);
options = optimoptions('fminunc','Display','none');
[theta_hat,~] = fminunc(fiml_fun,theta_0,options);
FIML_EstimatesTable = table(theta_hat,theta','VariableNames',{'FIML estimates','True values'})
FIML_EstimatesTable = 3×2 table
 FIML estimatesTrue values
12.06432
2-0.15989-0.15
30.984111
fiml_time=toc(fiml_start)
fiml_time =
0.31592

Exercise: using the unrestricted utility estimates directly

Defining the estimator
This estimator follows the moment condition formulation above, using the unrestricted utility estimates to develop a minimum distance estimator (Altug and Miller (1998)).
We estimate the unrestricted utility of not replacing the engine for each state according to
,
where , , is an vector of zeros, and is an vector of utilities for choice 2.
Finally, we define a weighting matrix W and estimate the parameters so as to minimize the weighted squared deviation.
where and are vectors of the utilities associated with each state.
Exercise: Implement this estimator using the estimated conditional choice probabilities from the previous sections. Use the identity matrix as a weighting matrix. When you construct the estimator, use one moment condition for each observation; that is, you should use the variable to "copy" moment conditions corresponding to the number of times they appear in the data. You should estimate θ using an OLS estimator.
% Exercise code here
 

(Bonus: Method 3. Stationary representation of the ex-ante value function.)

Recall from above that, using a similar derivation as the unrestricted utility estimates from Section 1.2, we can express the ex-ante value function in terms of the flow utility, state transition probabilities, and Ψ terms (see Arcidiacono and Ellickson (2011) for details):
where is element-wise multiplication, is the vector of choice probabilities for choice d and λ is an vector of ones. The relevant term for evaluating the logit likelihood is , which we can express as
with the above expression for , we can now have an expression for the differenced conditional value function that we can use in the logit likelihood.
Now we are ready to begin construction of the estimator. One of the advantages of the CCP approach is that we can perform many of these calculations outside of the optimization routine, reducing the number of calculations that need to be conducted for each optimization iteration.
To simplify the representation, we construct the components of the ex-ante value function in pieces. We start with the term , which we define as
V1=(eye(size(all_states,1))-beta*(repmat(ccp_tot,[1,length(all_states)]).*F1_tot + repmat(1-ccp_tot,[1,length(all_states)]).*F2_tot));
Next, we define :
V2=(ccp_tot.*(emc-log(ccp_tot))+(1-ccp_tot).*(emc - log(1-ccp_tot)));
Finally, we have . Recalling that , we can express this as . We're going to leave off the θ term for now, as this will be the main argument in the optimization function:
V3=((1-ccp_tot).*[ones(length(all_states),1) all_states]);
Finally, we write the optimization function (see "ccp_stationary_likelihood" below) and define the likelihood function inline using the inputs we created above:
 
temp_lik=@(theta) ccp_stationary_likelihood(theta, data_d, F1_tot, F2_tot,x_s_index,all_states,beta,V1,V2,V3);
[theta_hat_stationary,fval]=fminunc(temp_lik,[1,1,1]');
Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. <stopping criteria details>
est_time=toc(start_est)
est_time =
1.2722
StationaryCCPEstimatesTable = table(theta_hat_stationary,theta','VariableNames',{'Stationary CCP estimates','True values'})
StationaryCCPEstimatesTable = 3×2 table
 Stationary CCP estimatesTrue values
12.06072
2-0.15926-0.15
30.984271

Functions

Stationary CCP likelihood function

function [lik]=ccp_stationary_likelihood(theta, data_d, F1_tot, F2_tot,x_s_index,all_states,beta,V1,V2,V3)
 
We first "re-assemble" the pieces we defined above. The value defined below is
V_exante=V1\(V2+V3*theta); % Using the "slash" notation for matrix inverse multiplication as recommended by Matlab.
Finally, we use the ex-ante value function to evaluate the differenced conditional value function for each observation in the sample. We use the "x_s_index" variable, which maps each state observation into an index for one of the 602 state variables, to do this in one step.
v_diff =[ones(size(x_s_index,1),1) all_states(x_s_index,:)]*theta + beta*(F2_tot(x_s_index,:)-F1_tot(x_s_index,:))*V_exante;
lik=-sum((data_d(:)==2).*(v_diff) - log(1+exp(v_diff)));
end

Data generation

function [data_t, data_x, data_s, data_d, data_x_index] = generate_data(N, T, S, x_tran, x_tran_cumul, X, beta,pi,theta)
initial_run=5; % We start the state variables at zero and let the model run for five periods before we start sampling
x_index = ones(N, T+initial_run);
x_sim = zeros(N,T+initial_run);
s_sim = (rand(N,1) > pi) + 1;
d_sim = zeros(N, T+initial_run);
draw_ccp = rand(N, T+initial_run);
draw_x = rand(N, T+initial_run);
 
value_function = value_function_iteration(S,x_tran,X,beta, theta);
for n = 1:N
for t = 1:T+initial_run
% j=1 replace
u1 = 0.0;
v1 = u1 + beta*value_function(1,s_sim(n)); % Reset to zero mileage deterministically if engine is replaced.
% j=2 is continue
u2 = theta(1) + theta(2) * X(x_index(n,t)) + theta(3) * s_sim(n);
v2 = u2 + beta* x_tran(x_index(n,t),:)*value_function(:,s_sim(n));
% Conditional choice probability
p1 = exp(v1)/(exp(v1)+exp(v2));
 
% Simulate next period
d_sim(n,t) = (draw_ccp(n,t) > p1) + 1; % 1: replace, 2: continue
x_index(n,t+1) = (d_sim(n,t)==2) * sum((draw_x(n,t) > x_tran_cumul(x_index(n,t),:))) + 1;
x_sim(n,t+1) = X(x_index(n,t+1));
end
end
data_x = x_sim(:, initial_run+1:T+initial_run);
data_d = d_sim(:, initial_run+1:T+initial_run);
data_s = repmat(s_sim, [1,T]);
data_t = repmat(transpose(1:T), [1,N])';
[~, data_x_index] = ismember(data_x,X);
return
 
end
 

Full-information maximum likelihood

function [fiml_lik] = FIML_log_likelihood(theta,data_x,data_s,data_d,X,S,data_x_index,x_tran,beta)
 
value_function = value_function_iteration(S,x_tran,X,beta,theta);
 
D_temp = data_d(:);
S_temp = data_s(:);
X_index_temp = data_x_index(:);
 
v1 = repmat(beta*value_function(1,S),[length(X),1]);
v2 = theta(1) + theta(2)*X' + theta(3)*S + beta*(x_tran(:,:)*value_function(:,:));
 
x_s_index = sub2ind(size(v2), X_index_temp, S_temp);
fiml_lik = -sum((D_temp==2).*(v2(x_s_index)-v1(x_s_index)) - log(1+exp(v2(x_s_index)-v1(x_s_index))));
 
end

Value function iteration

function [value_function,value_diff,ccps] = value_function_iteration(S,x_tran,X, beta,theta)
x_len = length(unique(X));
emc = double(eulergamma);
value_function_2 = zeros(x_len,length(S));
ccps=zeros(x_len,length(S));
value_diff=1;
max_iter=1000;
iter=1;
while value_diff>1e-5 && iter < max_iter
value_function_1 = value_function_2;
for s=1:length(S)
v1 = repmat(beta*value_function_1(1,s),[x_len,1]);
v2=theta(1) + theta(2)*X' + theta(3)*s + beta*(x_tran(:,:)*value_function_1(:,s));
value_function_2(:,s) = log(exp(v1)+exp(v2)) + emc;
ccps(:,s)=1./(1+exp(v2-v1));
end
iter=iter+1;
value_diff = max(max((value_function_2 - value_function_1).^2));
end
value_function = value_function_2;
end
 

References

Abbring, J. H., & Daljord, Ø. (2020). Identifying the discount factor in dynamic discrete choice models. Quantitative Economics, 11(2), 471-501.
Altuğ, S., & Miller, R. A. (1998). The effect of work experience on female wages and labour supply. The Review of Economic Studies, 65(1), 45-85.
Arcidiacono, P., & Ellickson, P. B. (2011). Practical methods for estimation of dynamic discrete choice models. Annu. Rev. Econ., 3(1), 363-394.
Arcidiacono, P., & Miller, R. A. (2011). Conditional choice probability estimation of dynamic discrete choice models with unobserved heterogeneity. Econometrica, 79(6), 1823-1867.
Caoui, E. H., Hollenbeck, B., & Osborne, M. (2022). The Impact of Dollar Store Expansion on Local Market Structure and Food Access. Working paper.
Kalouptsidi, M., Scott, P. T., & Souza-Rodrigues, E. (2021). Linear IV regression estimators for structural dynamic discrete choice models. Journal of Econometrics, 222(1), 778-804.
Keane, M. P., & Wolpin, K. I. (1997). The career decisions of young men. Journal of political Economy, 105(3), 473-522.
Magnac, T., & Thesmar, D. (2002). Identifying dynamic discrete decision processes. Econometrica, 70(2), 801-816.
McFadden, D. (1981). Econometric Models of Probabilistic Choice. Structural analysis of discrete data with econometric applications, 198272.
Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica: Journal of the Econometric Society, 999-1033.