Appendix
Solutions
Exercise 1 Solution
1:
sysuse lifeexp
. (Life expectancy, 1998)
3:
sysuse sandstone, clear
. of Lamont sandstone in an area of Ohio) (Subsea elevation
or
clear
.
sysuse sandstone
. of Lamont sandstone in an area of Ohio) (Subsea elevation
4:
If the working directory isn’t convenient, change it using the dialogue box. Then, save sandstone
.
Exercise 2 Solution
1:
webuse census9, clear
. data by state) (1980 Census
2: Data is from the 1980 census, which we can see by the label (visible in describe, simple
). We’ve got two identifiers of state, death rate, population, median age and census region.
3: Since there data has 50 rows, it’s a good guess there are no missing sates.
4: Looking at describe
, we see that the two state identifiers are strings, the rest numeric.
5: compress
. Nothing saved! Because Stata already did it before posting it!
Exercise 3 Solution
1:
webuse census9, clear
. data by state)
(1980 Census
save mycensus9, replace
. file mycensus9.dta saved
The replace
option is added in case I run it twice.
2:
rename drate deathrate
.
label variable deathrate "Death rate per 10,000" .
3:
tab region
.
Census |
region | Freq. Percent Cum.
------------+-----------------------------------
NE | 9 18.00 18.00N Cntrl | 12 24.00 42.00
South | 16 32.00 74.00
West | 13 26.00 100.00
------------+-----------------------------------
Total | 50 100.00
label list cenreg
.
cenreg:
1 NEN Cntrl
2
3 South
4 West
label define region_label 1 "Northeast" 2 "North Central" 3 "South" 4 "West"
.
label values region region_label
.
label drop cenreg
.
label list
.
region_label:
1 Northeast
2 North Central
3 South
4 West
tab region
.
Census region | Freq. Percent Cum.
--------------+-----------------------------------
Northeast | 9 18.00 18.00
North Central | 12 24.00 42.00
South | 16 32.00 74.00
West | 13 26.00 100.00
--------------+----------------------------------- Total | 50 100.00
4:
save, replace
. file mycensus9.dta saved
Exercise 4 Solution
use mycensus9, clear
. data by state) (1980 Census
1: Use summarize
and codebook
to take a look at the mean/max/min. No errors detected.
2:
codebook, compact
.
Variable Obs Unique Mean Min Max Label
-------------------------------------------------------------------------------
state 50 50 . . . State
state2 50 50 . . . Two-letter state abbreviation
deathrate 50 30 84.3 40 107 Death rate per 10,000
pop 50 50 4518149 401851 2.37e+07 Population
medage 50 37 29.54 24.2 34.7 Median age
region 50 4 2.66 1 4 Census region -------------------------------------------------------------------------------
deathrate
and medage
both have less than 50 unique values. This is due to both being heavily rounded. If we saw more precision, there would be more unique entires.
3:
codebook, problems
.
in dataset mycensus9.dta
Potential problems
Potential problem Variables
--------------------------------------------------string vars with embedded blanks state
--------------------------------------------------
This just flags spaces () in the data. Not a real problem!
Exercise 5 Solution
use mycensus9, clear
. data by state) (1980 Census
1:
generate deathperc = deathrate/10000
.
label variable deathperc "Percentage of population deceeased in 1980"
.
list death* in 1/5
.
+---------------------+e deathp~c |
| deathr~
|---------------------|
1. | 91 .0091 |
2. | 40 .004 |
3. | 78 .0078 |
4. | 99 .0099 |
5. | 79 .0079 | +---------------------+
2:
generate agecat = 1 if medage < .
.
replace agecat = 2 if medage > 26.2 & medage <= 30.1
. real changes made)
(30
replace agecat = 3 if medage > 30.1 & medage <= 32.8
. real changes made)
(17
replace agecat = 4 if medage > 32.8
. real change made)
(1
label define agecat_label 1 "Significantly below national average" ///
. "Below national average" ///
> 2 "Above national average" ///
> 3 "Significantly above national average"
> 4
label values agecat agecat_label
.
tab agecat, mi
.
agecat | Freq. Percent Cum.
-------------------------------------+-----------------------------------
Significantly below national average | 2 4.00 4.00
Below national average | 30 60.00 64.00
Above national average | 17 34.00 98.00
Significantly above national average | 1 2.00 100.00
-------------------------------------+----------------------------------- Total | 50 100.00
We have no missing data (seen with summarize
and codebook
in the previous exercise) but it’s good practice to check for them anyways.
3:
bysort agecat: summarize deathrate
.
-------------------------------------------------------------------------------
-> agecat = Significantly below national average
dev. Min Max
Variable | Obs Mean Std.
-------------+---------------------------------------------------------
deathrate | 2 47.5 10.6066 40 55
-------------------------------------------------------------------------------
-> agecat = Below national average
dev. Min Max
Variable | Obs Mean Std.
-------------+---------------------------------------------------------
deathrate | 30 81.76667 9.761583 50 94
-------------------------------------------------------------------------------
-> agecat = Above national average
dev. Min Max
Variable | Obs Mean Std.
-------------+---------------------------------------------------------
deathrate | 17 91.76471 8.422659 73 104
-------------------------------------------------------------------------------
-> agecat = Significantly above national average
dev. Min Max
Variable | Obs Mean Std.
-------------+---------------------------------------------------------
deathrate | 1 107 . 107 107
We see that the groups with the higher median age tend to have higher deathrates.
4:
preserve
.
gsort -deathrate
.
list state deathrate in 1
.
+--------------------+e |
| state deathr~
|--------------------|
1. | Florida 107 |
+--------------------+
gsort +deathrate
.
list state deathrate in 1
.
+-------------------+e |
| state deathr~
|-------------------|
1. | Alaska 40 |
+-------------------+
gsort -medage
.
list state medage in 1
.
+------------------+
| state medage |
|------------------|
1. | Florida 34.70 |
+------------------+
gsort +medage
.
list state medage in 1
.
+----------------+
| state medage |
|----------------|
1. | Utah 24.20 |
+----------------+
restore .
5:
encode state2, gen(statecodes)
.
codebook statecodes
.
-------------------------------------------------------------------------------
statecodes Two-letter state abbreviation
-------------------------------------------------------------------------------
long)
Type: Numeric (
Label: statecodes
Range: [1,50] Units: 1values: 50 Missing .: 0/50
Unique
Examples: 10 GA
20 MD
30 NH 40 SC