Advanced Stata Workshop - FHSS Research Support Center

Report
Advanced Stata Workshop
FHSS Research Support Center
Presentation Layout
•
•
•
•
Visualization and Graphing
Macros and Looping
Panel and Survey Data
Postestimation
Visualization and Graphing in Stata
Life expectancy at birth vs. GNP per capita
Fraction
.2
.1
0 2.5
3
loggnp
3.5
4
4.5
80
.2
.15
.1
Fraction
.05
0 55
60
65
70
75
Life expectancy at birth
75
70
65
60
55
Life expectancy at birth
80
.3
2.5
3
Source: 1998 data from The World Bank Group
3.5
loggnp
4
4.5
Intro To Graphing In Stata
. sysuse auto, clear
“graph” is often optional. So is
“twoway” in this case.
. graph twoway scatter mpg weight //Note that you don't need to type graph or twoway
10
20
30
40
. scatter mpg weight
0
5,000
10,000
15,000
Price
Note: Nearly all graphing commands
start with “graph”, and “twoway” is
a large family of graphs.
Creating Multiple Graphs with “by():”
. twoway scatter mpg weight, by(foreign)
Domestic
Foreign
10
20
30
40
Note that the value label is
displayed above the graphs, and the
variable label is displayed in the
bottom right hand corner.
2,000
3,000
4,000
5,000
2,000
Weight (lbs.)
Graphs by Car type
3,000
4,000
5,000
Overlaying “twoway” graphs
. twoway scatter mpg weight || lfit mpg weight
10
10
20
20
30
30
40
40
. twoway (scatter mpg weight) (lfit mpg weight)
2,000
3,000
Weight (lbs.)
Mileage (mpg)
4,000
Fitted values
The || tells Stata to put the second
graph on top of the first one – order
matters! You don’t need to type
“twoway” twice; it applies to both.
5,000
2,000
3,000
Weight (lbs.)
Mileage (mpg)
4,000
Fitted values
This is another way of writing the
command – it doesn’t matter which
one you use.
5,000
"by()" statements with overlaid graphs
. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)
“qfitci” is a type of graph which
plots the prediction line from a
quadratic regression, and adds a
confidence interval. The “stdf”
option specifies that the confidence
interval be created on the basis
Foreign
0
10
20
30
40
Domestic
2000
3000
4000
5000
2000
3000
4000
5000
Weight (lbs.)
95% CI
Mileage (mpg)
Fitted values
Graphs by Car type
. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)
stdf is an option of qfitci.
by(foreign) is an option of
twoway.
"by()" statements with overlaid graphs
Another way of writing the previous
command is:
. twoway qfitci mpg weight, stdf || scatter mpg weight ||, by(foreign)
Foreign
0
10
20
30
40
Domestic
2000
3000
4000
5000
2000
3000
4000
5000
Weight (lbs.)
95% CI
Mileage (mpg)
Fitted values
Graphs by Car type
So:
This was is easier to read.
. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)
. twoway qfitci mpg weight, stdf || scatter mpg weight ||, by(foreign)
This way is easier to type.
Graphs with Many Options and Overlays
You can make pretty impressive graphs just from code, if you overlay the graphs and
specify certain options like: multiple axes, notes, titles and subtitles, axis titles and
labels, and legends.
Code for Previous Graph
.
.
.
.
use http://www.stata-press.com/data/r12/uslifeexp, clear
generate diff = le_wm - le_bm
label var diff "Difference"
#delimit ;
.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
twoway line le_wm year, yaxis(1 2) xaxis(1 2)
|| line le_bm year
|| line diff year
|| lfit diff year
||,
ytitle( "", axis(2) )
xtitle( "", axis(2) )
xlabel( 1918, axis(2) )
ylabel( 0(5)20, axis(2) grid gmin angle(horizontal) )
ylabel( 0 20(10)80, gmax angle(horizontal) )
ytitle( "Life expectancy at birth (years)" )
title( "White and black life expectancy" )
subtitle( "USA, 1900-1999" )
note( "Source: National Vital Statistics, Vol 50, No. 6"
"(1918 dip caused by 1918 Influenza Pandemic)" )
legend(label(1 "White males") label(2 "Black males") );
.
#delimit cr
.
#delimit cr
This may look scary, but it is actually
fairly straightforward. See the
accompanying do-file for
explanation of each component.
Using the Graph Editor
It is often easier to make changes in the
graph editor than to specify all the
options in code.
14
. tsline nci abc
ABC.com Inc.
Closing Share Price vs. Nasdaq Composite Index
12
Sep 24, 2009 - June 7, 2010
16
8
10
Share Price (USD)
14
12
10
8
6
4
6
2
0
01oct2009
01jan2010
01apr2010
01jul2010
date
NASDAQ Composite Index
Oct 1, 2009
Dec 1, 2009
Feb 1, 2010
Apr 1, 2010
Jun 1, 2010
Nov 1, 2009
Jan 1, 2010
Mar 1, 2010
May 1, 2010
NASDAQ Composite Index
ABC.com, Inc. share price
ABC.com, Inc. share price
Source: CRSP, Bloomberg
Let’s make graph 1 into graph 2 by using the
graph editor tools.
Recording Edits in the Graph Editor
Before you start making changes, click the record button. After
you are done, click it again, and save your changes as a
recording so you can “play” them back later. We will save this
recording as advanced_workshop_1.
Graph Element
Change
Graph Title
Enter Title using quotes to separate lines, color=black
Graph Subtitle
Enter subtitle
Graph Region
X- Axis
Color = Bluish-gray
Range = 0 to 16 by 2, axis line = medium thick, add title, label angle = horizontal,
grid lines = off
title = off, minor ticks = off, suggest # of ticks = 8,
alternate spacing of adjacent labels = on, change label format, label size=small,
axis line = medium thick
Plot 1 line
color=green, width = thick
Plot 2 line
color = blue, width = thick
Caption
Add caption
Y-Axis
Play Your Graph Recording
You can create a graph, open the graph editor, click the green play button, and then play
back your recorded edits.
Or, you can play your edits right from the code:
. tsline nci abc, play(advanced_workshop_1)
You can run your recorded edits on a
graph of a different type, though in this
case not all of your edits will make
sense:
You can also run all of your recorded
edits on a different graph, and just
change the title:
. twoway (scatter nci date) (scatter abc date) ///
> , play(advanced_workshop_1)
. tsline comp_world comp_planet , play(advanced_workshop_1)
ABC.com Inc.
Closing Share Price vs. Nasdaq Composite Index
ABC.com Inc.
Closing Share Price vs. Nasdaq Composite Index
Sep 24, 2009 - June 7, 2010
Sep 24, 2009 - June 7, 2010
16
14
14
Share Price (USD)
16
12
10
8
6
4
12
10
8
6
4
2
2
0
0
Oct 1, 2009
Dec 1, 2009
Feb 1, 2010
Apr 1, 2010
Jun 1, 2010
Nov 1, 2009
Jan 1, 2010
Mar 1, 2010
May 1, 2010
Computer World share price
Source: CRSP, Bloomberg
Computer Planet share price
Oct 1, 2009
Dec 1, 2009
Feb 1, 2010
Apr 1, 2010
Jun 1, 2010
Nov 1, 2009
Jan 1, 2010
Mar 1, 2010
May 1, 2010
NASDAQ Composite Index
Source: CRSP, Bloomberg
ABC.com, Inc. share price
Storing and Moving Your Recordings
Graph recordings are stored as .grec files in your “personal”
folder, under the “grec” folder. Type “personal” to see where
this is; normally it is C:\ado\personal. So by default Stata
should store your .grec files in C:\ado\personal\grec.
. personal
your personal ado-directory is c:\ado\personal\
. dir c:\ado\personal\grec\
0.4k
2/21/13 9:12 advanced_workshop_1.grec
0.7k
3/01/12 9:48 jeff_test_recording_graph_edits.grec
0.9k
5/17/12 15:47 line..grec
1.3k 11/21/12 10:12 x grid.grec
Unfortunately, if you are not faculty, you are probably using lab computers to use
Stata, and when they are re-imaged, you will lose the files in your grec folder. So
you can store the recordings on your flash drive by clicking the Browse button
when you save your recording. Now, when you are in the graph editor and click
the play button, your recording will not appear in the list because it is not stored
where Stata knows to look for it. Never fear, just click Browse, and navigate to
where your .grec file is. If you want your recording to be available right from code,
as in play(advanced_workshop_1), you will need to move it (at least temporarily)
to the “grec” folder, or write the directory location in the code:
play(E:\flashdrive\Graph Recordings\advanced_workshop_1)
Using Schemes in Graphing
Recordings are great if you are going to be making the same kind of graph a lot. But a
recording for a scatter plot will hardly affect a histogram at all, and might even make it
look terrible. If you want to change the look of all graphs that you make, you may want to
make a scheme. Schemes are text files which tell Stata how to draw graphs.
. scatter le year, scheme(economist)
60
55
45
50
45
40
50
55
life expectancy
65
65
60
. sysuse uslifeexp2, clear
. scatter le year
40
1900
1910
1920
Year
1930
1940
1900
1910
1920
Year
1930
1940
More on Schemes
. graph query, schemes
Available schemes are
s2color
s2mono
s2manual
s2gmanual
s2gcolor
s1color
s1mono
s1rcolor
s1manual
sj
economist
see help scheme_s2color
see help scheme_s2mono
see help scheme_s2manual
see
see
see
see
see
see
help
help
help
help
help
help
scheme_s1color
scheme_s1mono
scheme_s1rcolor
scheme_s1manual
scheme_sj
scheme_economist
Schemes are very powerful, because they let your implement a certain look without
specifying a long series of options in every graph, or running every graph through the
graph editor. However, creating schemes is fairly time consuming.
For more on creating your own schemes, see:
http://www3.eeg.uminho.pt/economia/nipe/2010_Stata_UGM/papers/Rising.pdf
And http://www.ats.ucla.edu/stat/stata/seminars/stata_graph/graphsem.txt
Manipulating Graphs: Memory vs. Disk
When you draw a graph, it is stored in memory, under the name Graph.
. sysuse auto, clear
. scatter price mpg
If you draw another graph, it replaces the previous one in memory, and is now called Graph.
. scatter price length
If you want to have multiple graphs up at the same time, you can use the name option.
. scatter price mpg, name(scatter1)
graph save moves your graph from memory to disk, saving it as a .gph file.
. cd C:\Users\nickj22\Downloads\
. graph save scatter1 mygraph1.gph
graph dir lists all graphs in memory and on disk (in the current directory)
. graph dir
Graph
scatter1
mygraph1.gph
graph drop drops a graph from memory. Graphs contain the data files they represent, so if
the dataset is large, they can actually take up quite a bit of memory.
. graph drop scatter1
Manipulating Graphs: Demo
Graph manipulation commands are quite useful for exploratory analysis.
See do file for code.
Note: Annotated code is in the do file for all of these
Histogram, with overlaid normal distribution
Avg. education level
Avg. education level
Avg. education level
NE
N Cntrl
20 40 60
12
12
8
20 40 60
10
12
Percent
15
16
8
6
22
22
33
22
17
Avg. education level
Avg. education level
South
West
38
38
31
25
13
6
15
13
8
6
8
5
0
6
50
33
0
20
20
9.5
10
9.5
10
10.5
average education level
Source: US Census, 1980 and 1990
10.5
11
9.5
average education level
2
Percent
normal educ
Percent
0
Percent
More Example Graphs
11
Graphs by Census region
10
10.5
11
More Example Graphs
Use graph bar to make bar graphs
Average July and January temperatures
80
by regions of the United States
81.0
73.5
72.1
60
73.3
46.2
27.9
21.7
0
20
40
46.1
N.E.
N. Central
July
Source: U.S. Census Bureau, U.S. Dept. of Commerce
South
January
West
More Example Graphs
Use graph combine to combine 3 graphs into one:
Life expectancy at birth vs. GNP per capita
Fraction
.1
0 2.5
3
loggnp
3.5
4
4.5
.2
.15
.1
Fraction
.05
0 55
60
65
70
75
Life expectancy at birth
75
70
65
60
55
Life expectancy at birth
80
.2
80
.3
2.5
3
Source: 1998 data from The World Bank Group
3.5
loggnp
4
4.5
More Example Graphs
Graph matrix is a great alternative to a correlation matrix to
investigate relationships between variables
Correlations among 1998 life-expectancy data
50
60
70
80
20
40
60
80
100
3
2
Avg.
annual %
growth
1
0
-1
80
70
60
Life
expectancy
at birth
50
12
Log GNP
per
capita
10
8
6
100
80
60
safewater
40
20
-1
0
1
2
3
Source: The World Bank Group
6
8
10
12
More Example Graphs
Get data labels (called marker labels in Stata) from the values of
another variable
Life expectancy vs. GNP per capita
80
North, Central, and South America
Canada
Jamaica
Chile
Panama
Uruguay
Venezuela
Trinidad
Mexico
Dominican Republic
Ecuador
Para Colombia
Honduras
El Salvador
Peru
Nicaragua
65
70
75
United States
Argentina
Brazil
Guatemala
55
60
Bolivia
Haiti
.5
5
10
GNP per capita (thousands of dollars)
Data source: World Bank, 1998
15
20 25 30
More Example Graphs
Xtline from a panel data set can overlay lines for each value of
panel variable.
Calories Consumed by Subject
4500
4000
3500
Calories consumed
5000
Jan 1 2002 - Jan 1 2003
01jan2002
01apr2002
01jul2002
Date
Tess
Arnold
01oct2002
Sam
01jan2003
Macros
• Macros come in two general types:
1. Globals
•
Exist until Stata is closed
2. Locals
•
Exist until the end of the do file
• Other types of macros exist, but are rarely
used
global vs. local
Creating the global
. global names "Ballav Nick ChongMing Joe David"
. local names2
"Jake Steven Jose Tyrell Martin"
. di "`names2'"
Jake Steven Jose Tyrell Martin
. di "$names"
Ballav Nick ChongMing Joe David
.
end of do-file
. di "`names2'"
. di "$names"
Ballav Nick ChongMing Joe David
Creating the local
- References to locals have to
be enclosed in single quotes
- References to globals have to
begin with a $
End of the do file
The local no longer exists
Conversely, the global still
exists
When do we need “for” loops?
• If a STATA program involves repetitive actions
on a group of variables, files, or other items
• Examples
•
•
•
•
Creating new variables
Recoding missing values on a list of variables
Merging multiple datasets
Labeling variables
Determining what macros already exist
. macro list
names:
S_level:
F1:
F2:
F7:
F8:
S_ADO:
S_StataSE:
S_FLAVOR:
S_OS:
S_OSDTL:
S_MACH:
_names2:
Ballav Nick ChongMing Joe David
95
help advice;
describe;
save
use
UPDATES;BASE;SITE;.;PERSONAL;PLUS;OLDPLACE
SE
Intercooled
Windows
64-bit
PC (64-bit x86-64)
Jake Steven Jose Tyrell Martin
The local we created
General macros
automatically created
by Stata
The global we created
Foreach
• Syntax of foreach command
– foreach lname {in|of varilist} variables {
commands referring to `lname'
}
• The open brace must appear on the same line as
the foreach;
• Nothing may follow the open brace except, of
course, comments; the first command to be
executed must appear on a new line;
• The close brace must appear on a line by itself
• Differences in Using -in- option and -of varlistoption in the -foreach- command
– foreach i in variable1-variable5 {
Stata commands
}
– There is only one variable called “variable1-variable5”
– foreach i of varlist variable1-variable5 {
Stata commands
}
– There are five variables, including variable1 through variable5
Stata commands in recoding variables
Without using the
foreach command
foreach command
with "in" option
foreach command
with "of varlist" option
foreach x in v1 v2 v3 v4 {
recode `x' (99 = .)
}
foreach x of varlist v1-v4 {
recode `x' (99 = .)
}
recode v1 (99 = .)
recode v2 (99 = .)
recode v3 (99 = .)
recode v4 (99 = .)
Using macros to store variable names
. global ind_vars = "age iq gender weight anxiety"
Global for ind. vars
. reg depress $ind_vars
Source
SS
df
MS
Model
Residual
14.1382227
22.587052
5 2.82764454
85 .265730024
Total
36.7252747
90 .408058608
depress
Coef.
age
iq
gender
weight
anxiety
_cons
-.0193698
-.0093197
-.240945
-.0188543
.5563893
2.332141
Std. Err.
.0137039
.0130535
.1438568
.0229981
.0831225
1.466546
t
-1.41
-0.71
-1.67
-0.82
6.69
1.59
Number of obs
F( 5,
85)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.161
0.477
0.098
0.415
0.000
0.115
=
=
=
=
=
=
91
10.64
0.0000
0.3850
0.3488
.51549
[95% Conf. Interval]
-.0466168
-.0352736
-.526971
-.0645807
.3911195
-.5837454
.0078771
.0166341
.0450809
.0268721
.7216592
5.248027
Global for ind. vars
. reg depress $ind_vars sleep satlife
Source
SS
df
MS
Model
Residual
22.2722042
13.3233014
7
81
3.18174346
.164485202
Total
35.5955056
88
.404494382
depress
Coef.
age
iq
gender
weight
anxiety
sleep
satlife
_cons
-.0252532
-.0212878
-.0288896
-.017562
.3652071
-.6100973
-.4784158
4.336996
Std. Err.
.0109484
.0103962
.1233419
.0181686
.074345
.1435988
.1009435
1.192267
t
-2.31
-2.05
-0.23
-0.97
4.91
-4.25
-4.74
3.64
Number of obs
F( 7,
81)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.024
0.044
0.815
0.337
0.000
0.000
0.000
0.000
=
=
=
=
=
=
89
19.34
0.0000
0.6257
0.5934
.40557
[95% Conf. Interval]
-.0470371
-.041973
-.2743013
-.0537117
.2172839
-.8958139
-.6792618
1.964759
-.0034692
-.0006025
.216522
.0185878
.5131304
-.3243807
-.2775698
6.709233
Running Parallel lists with macros
.
****
.
local
1
"cat
.
local
2
"meow
.
local
n
:
running
parrallel
dog
cow
woof
word
lists
****
Create a local called “1”
pig"
moo
count
oinkoink"
Create macro 3 = # of words in
macro 1
`1'
.
.
forvalues
i
=
1/`n'
2.
local
a
:
word
`i'
of
`1'
3.
local
b
:
word
`i'
of
`2'
4.
di
5.
}
"`a'
says
cat
says
meow
dog
says
woof
cow
says
moo
pig
says
oinkoink
Create local called “2”
{
`b'"
Extracting word `I’ from local “1”
Extracting word `I’ from local “2”
Using the new locals in a
display command with
other text
Results
Creating a program in Stata
Command name
.
program
Program name
First command to be run when
the program is implemented
printit
1.
display
2.
list
3.
end
"Listing
make
price
Telling Stata that
there are no more
commands to be
used as part of the
program
the
mpg
values
of
four
variables"
foreign
Second command to be run when
the program is implemented
.
.
printit
Listing
the
values
of
four
make
variables
price
mpg
foreign
1.
AMC
Concord
4,099
22
Domestic
2.
AMC
Pacer
4,749
17
Domestic
3.
AMC
Spirit
3,799
22
Domestic
4.
Buick
Century
4,816
20
Domestic
5.
Buick
Electra
7,827
15
Domestic
6.
Buick
LeSabre
5,788
18
Domestic
7.
Buick
Opel
4,453
26
Domestic
8.
Buick
Regal
5,189
20
Domestic
9.
Buick
Riviera
10,372
16
Domestic
10.
Buick
Skylark
4,082
19
Domestic
11.
Cad.
Deville
11,385
14
Domestic
12.
Cad.
Eldorado
14,500
14
Domestic
13.
Cad.
Seville
15,906
21
Domestic
14.
Chev.
Chevette
3,299
29
Domestic
15.
Chev.
Impala
5,705
16
Domestic
16.
Chev.
Malibu
4,504
22
Domestic
17.
Chev.
Monte
5,104
22
Domestic
18.
Chev.
Monza
3,667
24
Domestic
19.
Chev.
Nova
3,955
19
Domestic
20.
Dodge
Colt
3,984
30
Domestic
21.
Dodge
Diplomat
4,010
18
Domestic
22.
Dodge
Magnum
5,886
16
Domestic
23.
Dodge
St.
6,342
17
Domestic
24.
Ford
Fiesta
4,389
28
Domestic
25.
Ford
Mustang
4,187
21
Domestic
26.
Linc.
Continental
11,497
12
Domestic
27.
Linc.
Mark
13,594
12
Domestic
28.
Linc.
Versailles
13,466
14
Domestic
29.
Merc.
Bobcat
3,829
22
Domestic
30.
Merc.
Cougar
5,379
14
Domestic
31.
Merc.
Marquis
6,165
15
Domestic
32.
Merc.
Monarch
4,516
18
Domestic
33.
Merc.
XR-7
6,303
14
Domestic
34.
Merc.
Zephyr
3,291
20
Domestic
35.
Olds
98
8,814
21
Domestic
36.
Olds
Cutl
5,172
19
Domestic
37.
Olds
Cutlass
4,733
19
Domestic
38.
Olds
Delta
4,890
18
Domestic
39.
Olds
Omega
4,181
19
Domestic
40.
Olds
Starfire
4,195
24
Domestic
41.
Olds
Toronado
10,371
16
Domestic
42.
Plym.
Arrow
4,647
28
Domestic
43.
Plym.
Champ
4,425
34
Domestic
44.
Plym.
Horizon
4,482
25
Domestic
45.
Plym.
Sapporo
6,486
26
Domestic
46.
Plym.
Volare
4,060
18
Domestic
47.
Pont.
Catalina
5,798
18
Domestic
48.
Pont.
Firebird
4,934
18
Domestic
49.
Pont.
Grand
5,222
19
Domestic
Carlo
Regis
V
Supr
88
Prix
Invoke the
program by simply
typing the program
name and then
running in Stata.
Results
SVYset and SVY Prefix
Simple vs. Complex
Sample
• Many Statistical techniques assume simple
random sample
• Simple random sample—each element of the
sample has equal probability of being
sampled.
Complex Survey
• Sampling weights
– inverse probability of being sampled
– represent weight elements in the population
• Clustering
– groups sampled together
– primary sampling units (PSU) -- first level clusters
• Stratification
– groups of clusters– strata
– strata sampled separately
Example
• States, Counties, Schools, Students
sample states in different regions
sample counties within each state
sample schools within each county
sample students from schools
svyset
• svyset psu? [pweight=?] , strata = (?) fpc(?)
|| psu?, fpc(?)
psu = primary sampling unit
pweight = probability weight
fpc = finite population correction (total # of
stratus or clusters PSU is sampled from)
|| = next stage
SVYSET Examples
•
•
•
•
•
use http://www.stata-press.com/data/r12/multistage
svyset county [pw=sampwgt], strata(state) fpc(ncounties) || school, fpc(nschools)
save highschool
use highschool
svyset
SVY Prefix Examples
•
•
•
•
•
svy: proportion race sex
svy: tab race sex, ci
svy: tab race sex, count ci
svy, subpop(if sex==1): mean weight height
svy, subpop(if sex==2): mean weight height,
over (race)
• svy: reg weight sex
Note: subpop is preferred over “if statement” as stata will include all cases for estimating standard errors
Take-home Message
• Ask what sampling design for your data before
running analysis.
• If complex survey data, consider svyset or
multilevel modeling.
xtset and xtprefix
xtset—Declare Panel Data
•
•
•
•
xtset panelvar
specify unit observed repeatedly
xtset panelvar timevar [, tsoptions] specify time var
xtset
display current xtset
xtset, clear
clear xtset
Menu
Statistics > Longitudinal/panel data > Setup and utilities > Declare dataset to be panel
data
Time-Unit Options
• [unitoptions]
specify units of time
clocktime, daily, weekly, monthly, quarterly,
halfyearly, yearly…
• [deltaoption]
delta (#)
delta (exp)
delta (# units)
specify duration between observations
e.g.
deta (2)
delta (7*24)
delta (10 min)/(7 days)
Xtdescribe—pattern of xt data
• xtdescribe [if] [in] [, options]
[,opti
ons]
patterns(#)
e.g. p(10) -- display max. 10
width(#)
w(80) -- display 80 columns
Menu
Statistics > Longitudinal/panel data > Setup and utilities > Describe pattern of xt
data
Examples
•
•
•
•
•
•
•
use http://www.stata-press.com/data/r12/nlswork
xtset
Browse
xtdes, p(20)
xtsum hours
xttab race
xtreg ln_w grade age ttl_exp tenure south, mle
Post Estimation in STATA
Generating variables with fitted values
•
•
After a regression, use the “predict newvar” syntax to create a new variable,
that contains the fitted values for each observation.
If the model is fitted only for a limited sample, use the following syntax to
get the predicted value for that sample
Generating variables with residuals
•
After a regression, use the “predict newvar, r ” syntax to create a new variable,
that contains the residuals for each observation.
Reformat and write regression tables to
document files
•
•
‘Outreg’ command can be used to reformat and write regression tables to document files
Example
•
Outreg has lots of options that lets us customize the look of the output table.
Margins
•
•
Margins can be useful to understand regression results
Example –
•
In the above regression, the coefficient on weight is misleading as an increase in weight affects both
weight and weight squared. So, the total effect depends on the starting value of weight.
The following command will set the variables to their means and find the derivative of expected price with
respect to weight at that point.
•
Marginsplot
•
•
•
Often, the results from margins can be hard to read as in the following example.
The command ‘marginsplot’ can be used to visualize the results and understand them better.
Example
.06
.04
.02
0
Effects on Pr(Diabetes)
.08
Average Marginal Effects
20
30
40
50
60
age in years
1.black
1.female
70
Using saved results
•
•
•
•
Stata stores results from a command in various forms – scalar, string, matrices etc. Such results are called
returned results
Returned results can be used to make other computations in STATA
We can type ‘return list’ after we run a command to see what the returned results
Example –
•
•
We can use the returned results as variables and perform computations
Example – gen range = r(max) – r(min)
Using saved results contd…
•
•
•
•
Results are stored mainly as r() class or e() class depending on the commands used
Access r() class results – return list, access e() class results – ereturn list
Matrices in returned results can be used as regular matrices.
Example :
•
More advanced computations with matrices can be done in MATA which is a matrix language built into
STATA.
Post estimation statistics
•
•
•
estat ic
Available only after commands that report log likelihood
Given two models, the one with the smaller AIC and BIC values fits the data better
•
estat vce
- displays the covariance matrix estimates
Postfile
•
•
•
Results can be stored into a STATA dataset using the ‘postfile’ command
This can be useful when we have to run a lot of regressions, for example - monte carlo simulations.
Lets consider an example from the STATA manual –
Suppose we want the means and variances from 10,000 randomly constructed 100-observation
samples of data and store the results in results.dta
We could do that as follows (refer to the do file)

similar documents