Skip to content

Commit 943ebdd

Browse files
committed
Updated SMM.md
1 parent 2ea2886 commit 943ebdd

File tree

2 files changed

+274
-0
lines changed

2 files changed

+274
-0
lines changed

docs/book/struct_est/SMM.md

Lines changed: 274 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -728,6 +728,280 @@ print('Sim. mean =', mean_sim)
728728
print('Sim. variance =', var_sim)
729729
```
730730

731+
We can also simulate many $(S)$ data sets of test scores, each with $N=161$ test scores. The estimate of the model moments will be the average of the simulated data moments across the simulations.
732+
733+
```{code-cell} ipython3
734+
:tags: []
735+
736+
N = 161
737+
S = 100
738+
mu_2 = 300.0
739+
sig_2 = 30.0
740+
cut_lb = 0.0
741+
cut_ub = 450.0
742+
np.random.seed(25) # Set the random number seed to get same answers every time
743+
unif_vals_2 = sts.uniform.rvs(0, 1, size=(N, S))
744+
draws_2 = trunc_norm_draws(unif_vals_2, mu_2, sig_2,
745+
cut_lb, cut_ub)
746+
747+
mean_sim, var_sim = data_moments2(draws_2)
748+
print("Mean test score in each simulation:")
749+
print(mean_sim)
750+
print("")
751+
print("Variance of test scores in each simulation:")
752+
print(var_sim)
753+
mean_mod = mean_sim.mean()
754+
var_mod = var_sim.mean()
755+
print("")
756+
print('Estimated model mean (avg. of means) =', mean_mod)
757+
print('Estimated model variance (avg. of variances) =', var_mod)
758+
```
759+
760+
Our SMM model moments $\hat{m}(\tilde{scores}_i|\mu,\sigma)$ are an estimate of the true models moments that we got in the GMM case by integrating using the PDF of the truncated normal distribution. Our SMM moments we got by simulating the data $S$ times and taking the average of the simulated data moments across the simulations as our estimator of the model moments.
761+
762+
Define the error vector as the vector of percent deviations of the model moments from the data moments.
763+
764+
$$ e(\tilde{scores}_i,scores_i|\mu,\sigma) \equiv \frac{\hat{m}(\tilde{scores}_i|\mu,\sigma) - m(scores_i)}{m(scores_i)} $$
765+
766+
The SMM estimator for this moment vector is the following.
767+
768+
$$ (\hat{\mu}_{SMM},\hat{\sigma}_{SMM}) = (\mu,\sigma):\quad \min_{\mu,\sigma} e(\tilde{scores}_i,scores_i|\mu,\sigma)^T \, W \, e(\tilde{scores}_i,scores_i|\mu,\sigma) $$
769+
770+
Now let's define a criterion function that takes as inputs the parameters and the estimator for the weighting matrix $\hat{W}$.
771+
772+
```{code-cell} ipython3
773+
:tags: []
774+
775+
def err_vec2(data_vals, unif_vals, mu, sigma, cut_lb, cut_ub, simple):
776+
'''
777+
--------------------------------------------------------------------
778+
This function computes the vector of moment errors (in percent
779+
deviation from the data moment vector) for SMM.
780+
--------------------------------------------------------------------
781+
INPUTS:
782+
data_vals = (N,) vector, test scores data
783+
unif_vals = (N, S) matrix, S simulations of N observations from
784+
uniform distribution U(0,1)
785+
mu = scalar, mean of the nontruncated normal distribution
786+
from which the truncated normal is derived
787+
sigma = scalar > 0, standard deviation of the nontruncated
788+
normal distribution from which the truncated normal is
789+
derived
790+
cut_lb = scalar or string, ='None' if no lower bound cutoff is
791+
given, otherwise is scalar lower bound value of
792+
distribution. Values below this cutoff have zero
793+
probability
794+
cut_ub = scalar or string, ='None' if no upper bound cutoff is
795+
given, otherwise is scalar lower bound value of
796+
distribution. Values below this cutoff have zero
797+
probability
798+
simple = boolean, =True if errors are simple difference, =False
799+
if errors are percent deviation from data moments
800+
801+
OTHER FUNCTIONS AND FILES CALLED BY THIS FUNCTION:
802+
trunc_norm_draws()
803+
data_moments()
804+
805+
OBJECTS CREATED WITHIN FUNCTION:
806+
mean_data = scalar, mean value of data
807+
var_data = scalar > 0, variance of data
808+
moms_data = (2, 1) matrix, column vector of two data moments
809+
mean_model = scalar, estimated mean value from model
810+
var_model = scalar > 0, estimated variance from model
811+
moms_model = (2, 1) matrix, column vector of two model moments
812+
err_vec = (2, 1) matrix, column vector of two moment error
813+
functions
814+
815+
FILES CREATED BY THIS FUNCTION: None
816+
817+
RETURNS: err_vec
818+
--------------------------------------------------------------------
819+
'''
820+
sim_vals = trunc_norm_draws(unif_vals, mu, sigma, cut_lb, cut_ub)
821+
mean_data, var_data = data_moments2(data_vals)
822+
moms_data = np.array([[mean_data], [var_data]])
823+
mean_sim, var_sim = data_moments2(sim_vals)
824+
mean_model = mean_sim.mean()
825+
var_model = var_sim.mean()
826+
moms_model = np.array([[mean_model], [var_model]])
827+
if simple:
828+
err_vec = moms_model - moms_data
829+
else:
830+
err_vec = (moms_model - moms_data) / moms_data
831+
832+
return err_vec
833+
834+
835+
def criterion(params, *args):
836+
'''
837+
--------------------------------------------------------------------
838+
This function computes the SMM weighted sum of squared moment errors
839+
criterion function value given parameter values and an estimate of
840+
the weighting matrix.
841+
--------------------------------------------------------------------
842+
INPUTS:
843+
params = (2,) vector, ([mu, sigma])
844+
mu = scalar, mean of the normally distributed random variable
845+
sigma = scalar > 0, standard deviation of the normally
846+
distributed random variable
847+
args = length 5 tuple,
848+
(xvals, unif_vals, cut_lb, cut_ub, W_hat)
849+
xvals = (N,) vector, values of the truncated normally
850+
distributed random variable
851+
unif_vals = (N, S) matrix, matrix of draws from U(0,1) distribution.
852+
This fixes the seed of the draws for the simulations
853+
cut_lb = scalar or string, ='None' if no lower bound cutoff is
854+
given, otherwise is scalar lower bound value of
855+
distribution. Values below this cutoff have zero
856+
probability
857+
cut_ub = scalar or string, ='None' if no upper bound cutoff is
858+
given, otherwise is scalar lower bound value of
859+
distribution. Values below this cutoff have zero
860+
probability
861+
W_hat = (R, R) matrix, estimate of optimal weighting matrix
862+
863+
OTHER FUNCTIONS AND FILES CALLED BY THIS FUNCTION:
864+
err_vec2()
865+
866+
OBJECTS CREATED WITHIN FUNCTION:
867+
err = (2, 1) matrix, column vector of two moment error
868+
functions
869+
crit_val = scalar > 0, GMM criterion function value
870+
871+
FILES CREATED BY THIS FUNCTION: None
872+
873+
RETURNS: crit_val
874+
--------------------------------------------------------------------
875+
'''
876+
mu, sigma = params
877+
xvals, unif_vals, cut_lb, cut_ub, W_hat = args
878+
err = err_vec2(xvals, unif_vals, mu, sigma, cut_lb, cut_ub,
879+
simple=False)
880+
crit_val = err.T @ W_hat @ err
881+
882+
return crit_val
883+
```
884+
885+
```{code-cell} ipython3
886+
:tags: []
887+
888+
mu_test = 400
889+
sig_test = 70
890+
cut_lb = 0.0
891+
cut_ub = 450.0
892+
sim_vals = trunc_norm_draws(unif_vals_2, mu_test, sig_test, cut_lb, cut_ub)
893+
mean_sim, var_sim = data_moments2(sim_vals)
894+
mean_mod = mean_sim.mean()
895+
var_mod = var_sim.mean()
896+
err_vec2(data, unif_vals_2, mu_test, sig_test, cut_lb, cut_ub, simple=False)
897+
crit_test = criterion(np.array([mu_test, sig_test]), data, unif_vals_2,
898+
0.0, 450.0, np.eye(2))
899+
print("Average of mean test scores across simulations is:", mean_mod)
900+
print("")
901+
print("Average variance of test scores across simulations is:", var_mod)
902+
print("")
903+
print("Criterion function value is:", crit_test[0][0])
904+
```
905+
906+
Now we can perform the SMM estimation using SciPy's minimize function to choose the values of $\mu$ and $\sigma$ of the truncated normal distribution that best fit the data by minimizing the crietrion function. Let's start with the identity matrix as our estimate for the optimal weighting matrix $W = I$.
907+
908+
```{code-cell} ipython3
909+
:tags: []
910+
911+
mu_init_1 = 300
912+
sig_init_1 = 30
913+
params_init_1 = np.array([mu_init_1, sig_init_1])
914+
W_hat1_1 = np.eye(2)
915+
smm_args1_1 = (data, unif_vals_2, cut_lb, cut_ub, W_hat1_1)
916+
results1_1 = opt.minimize(criterion, params_init_1, args=(smm_args1_1),
917+
method='L-BFGS-B',
918+
bounds=((1e-10, None), (1e-10, None)))
919+
mu_SMM1_1, sig_SMM1_1 = results1_1.x
920+
print('mu_SMM1_1=', mu_SMM1_1, ' sig_SMM1_1=', sig_SMM1_1)
921+
```
922+
923+
```{code-cell} ipython3
924+
:tags: []
925+
926+
mean_data, var_data = data_moments2(data)
927+
print('Data mean of scores =', mean_data, ', Data variance of scores =', var_data)
928+
sim_vals_1 = trunc_norm_draws(unif_vals_2, mu_SMM1_1, sig_SMM1_1, cut_lb, cut_ub)
929+
mean_sim_1, var_sim_1 = data_moments2(sim_vals_1)
930+
mean_model_1 = mean_sim_1.mean()
931+
var_model_1 = var_sim_1.mean()
932+
err_1 = err_vec2(data, unif_vals_2, mu_SMM1_1, sig_SMM1_1, cut_lb, cut_ub,
933+
False).reshape(2,)
934+
print("")
935+
print('Model mean 1 =', mean_model_1, ', Model variance 1 =', var_model_1)
936+
print("")
937+
print('Error vector 1 =', err_1)
938+
print("")
939+
print("Results from scipy.opmtimize.minimize:")
940+
print(results1_1)
941+
```
942+
943+
Let's plot the PDF implied by these SMM estimates $(\hat{\mu}_{SMM},\hat{\sigma}_{SMM})=(612.337, 197.264)$ against the histogram of the data in {numref}`Figure %s <FigSMM_EconScoreSMM1>` below.
944+
945+
```{code-cell} ipython3
946+
:tags: ["remove-output"]
947+
948+
# Plot the histogram of the data
949+
count, bins, ignored = plt.hist(data, 30, density=True,
950+
edgecolor='black', linewidth=1.2, label='data')
951+
plt.title('Econ 381 scores: 2011-2012', fontsize=20)
952+
plt.xlabel('Total points')
953+
plt.ylabel('Percent of scores')
954+
plt.xlim([0, 550]) # This gives the xmin and xmax to be plotted"
955+
956+
# Plot the estimated SMM PDF
957+
dist_pts = np.linspace(0, 450, 500)
958+
plt.plot(dist_pts, trunc_norm_pdf(dist_pts, mu_SMM1_1, sig_SMM1_1, 0.0, 450.0),
959+
linewidth=2, color='k', label='PDF: ($\hat{\mu}_{SMM1}$,$\hat{\sigma}_{SMM1}$)=(612.34, 197.26)')
960+
plt.legend(loc='upper left')
961+
962+
plt.show()
963+
```
964+
965+
```{figure} ../../../images/smm/Econ381scores_smm1.png
966+
---
967+
height: 500px
968+
name: FigSMM_EconScoreSMM1
969+
---
970+
SMM-estimated PDF function and data histogram, 2 moments, identity weighting matrix, Econ 381 scores (2011-2012)
971+
```
972+
973+
That looks just like the maximum likelihood estimate from the {ref}`Chap_MaxLikeli` chapter. Let's see what the criterion function looks like for different values of $\mu$ and $\sigma$.
974+
975+
```{code-cell} ipython3
976+
:tags: []
977+
978+
mu_vals = np.linspace(60, 700, 50)
979+
sig_vals = np.linspace(20, 250, 50)
980+
crit_vals = np.zeros((50, 50))
981+
crit_args = (data, unif_vals_2, cut_lb, cut_ub, W_hat1_1)
982+
for mu_ind in range(50):
983+
for sig_ind in range(50):
984+
crit_params = np.array([mu_vals[mu_ind], sig_vals[sig_ind]])
985+
crit_vals[mu_ind, sig_ind] = criterion(crit_params, *crit_args)[0][0]
986+
987+
mu_mesh, sig_mesh = np.meshgrid(mu_vals, sig_vals)
988+
989+
crit_SMM1_1 = criterion(np.array([mu_SMM1_1, sig_SMM1_1]), *crit_args)[0][0]
990+
991+
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
992+
ax.plot_surface(sig_mesh, mu_mesh, crit_vals, rstride=8,
993+
cstride=1, cmap=cmap1)
994+
ax.scatter(sig_SMM1_1, mu_SMM1_1, crit_SMM1_1, color='red', marker='o',
995+
s=10, label='SMM1 estimate')
996+
ax.view_init(elev=12, azim=30, roll=0)
997+
ax.set_title('Criterion function for values of mu and sigma')
998+
ax.set_xlabel(r'$\sigma$')
999+
ax.set_ylabel(r'$\mu$')
1000+
ax.set_zlabel(r'Crit. func.')
1001+
1002+
plt.show()
1003+
```
1004+
7311005

7321006
(SecSMM_CodeExmp_BM72)=
7331007
### Brock and Mirman (1972) estimation by SMM

images/smm/Econ381scores_smm1.png

134 KB
Loading

0 commit comments

Comments
 (0)