Appendix
Solutions
Exercise 1 Solution
1:
. sysuse lifeexp
(Life expectancy, 1998)3:
. sysuse sandstone, clear
(Subsea elevation of Lamont sandstone in an area of Ohio)or
. clear
. sysuse sandstone
(Subsea elevation of Lamont sandstone in an area of Ohio)4:
If the working directory isn’t convenient, change it using the dialogue box. Then, save sandstone.
Exercise 2 Solution
1:
. webuse census9, clear
(1980 Census data by state)2: Data is from the 1980 census, which we can see by the label (visible in describe, simple). We’ve got two identifiers of state, death rate, population, median age and census region.
3: Since there data has 50 rows, it’s a good guess there are no missing sates.
4: Looking at describe, we see that the two state identifiers are strings, the rest numeric.
5: compress. Nothing saved! Because Stata already did it before posting it!
Exercise 3 Solution
1:
. webuse census9, clear
(1980 Census data by state)
. save mycensus9, replace
(file mycensus9.dta not found)
file mycensus9.dta savedThe replace option is added in case I run it twice.
2:
. rename drate deathrate
. label variable deathrate "Death rate per 10,000"3:
. tab region
Census |
region | Freq. Percent Cum.
------------+-----------------------------------
NE | 9 18.00 18.00
N Cntrl | 12 24.00 42.00
South | 16 32.00 74.00
West | 13 26.00 100.00
------------+-----------------------------------
Total | 50 100.00
. label list cenreg
cenreg:
1 NE
2 N Cntrl
3 South
4 West
. label define region_label 1 "Northeast" 2 "North Central" 3 "South" 4 "West"
. label values region region_label
. label drop cenreg
. label list
region_label:
1 Northeast
2 North Central
3 South
4 West
. tab region
Census region | Freq. Percent Cum.
--------------+-----------------------------------
Northeast | 9 18.00 18.00
North Central | 12 24.00 42.00
South | 16 32.00 74.00
West | 13 26.00 100.00
--------------+-----------------------------------
Total | 50 100.004:
. save, replace
file mycensus9.dta savedExercise 4 Solution
. use mycensus9, clear
(1980 Census data by state)1: Use summarize and codebook to take a look at the mean/max/min. No errors detected.
2:
. codebook, compact
Variable Obs Unique Mean Min Max Label
-------------------------------------------------------------------------------
state 50 50 . . . State
state2 50 50 . . . Two-letter state abbreviation
deathrate 50 30 84.3 40 107 Death rate per 10,000
pop 50 50 4518149 401851 2.37e+07 Population
medage 50 37 29.54 24.2 34.7 Median age
region 50 4 2.66 1 4 Census region
-------------------------------------------------------------------------------deathrate and medage both have less than 50 unique values. This is due to both being heavily rounded. If we saw more precision, there would be more unique entires.
3:
. codebook, problems
Potential problems in dataset mycensus9.dta
Potential problem Variables
--------------------------------------------------
string vars with embedded blanks state
--------------------------------------------------This just flags spaces () in the data. Not a real problem!
Exercise 5 Solution
. use mycensus9, clear
(1980 Census data by state)1:
. generate deathperc = deathrate/10000
. label variable deathperc "Percentage of population deceeased in 1980"
. list death* in 1/5
+---------------------+
| deathr~e deathp~c |
|---------------------|
1. | 91 .0091 |
2. | 40 .004 |
3. | 78 .0078 |
4. | 99 .0099 |
5. | 79 .0079 |
+---------------------+2:
. generate agecat = 1 if medage < .
. replace agecat = 2 if medage > 26.2 & medage <= 30.1
(30 real changes made)
. replace agecat = 3 if medage > 30.1 & medage <= 32.8
(17 real changes made)
. replace agecat = 4 if medage > 32.8
(1 real change made)
. label define agecat_label 1 "Significantly below national average" ///
> 2 "Below national average" ///
> 3 "Above national average" ///
> 4 "Significantly above national average"
. label values agecat agecat_label
. tab agecat, mi
agecat | Freq. Percent Cum.
-------------------------------------+-----------------------------------
Significantly below national average | 2 4.00 4.00
Below national average | 30 60.00 64.00
Above national average | 17 34.00 98.00
Significantly above national average | 1 2.00 100.00
-------------------------------------+-----------------------------------
Total | 50 100.00We have no missing data (seen with summarize and codebook in the previous exercise) but it’s good practice to check for them anyways.
3:
. bysort agecat: summarize deathrate
-------------------------------------------------------------------------------
-> agecat = Significantly below national average
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deathrate | 2 47.5 10.6066 40 55
-------------------------------------------------------------------------------
-> agecat = Below national average
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deathrate | 30 81.76667 9.761583 50 94
-------------------------------------------------------------------------------
-> agecat = Above national average
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deathrate | 17 91.76471 8.422659 73 104
-------------------------------------------------------------------------------
-> agecat = Significantly above national average
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deathrate | 1 107 . 107 107
We see that the groups with the higher median age tend to have higher deathrates.
4:
. preserve
. gsort -deathrate
. list state deathrate in 1
+--------------------+
| state deathr~e |
|--------------------|
1. | Florida 107 |
+--------------------+
. gsort +deathrate
. list state deathrate in 1
+-------------------+
| state deathr~e |
|-------------------|
1. | Alaska 40 |
+-------------------+
. gsort -medage
. list state medage in 1
+------------------+
| state medage |
|------------------|
1. | Florida 34.70 |
+------------------+
. gsort +medage
. list state medage in 1
+----------------+
| state medage |
|----------------|
1. | Utah 24.20 |
+----------------+
. restore5:
. encode state2, gen(statecodes)
. codebook statecodes
-------------------------------------------------------------------------------
statecodes Two-letter state abbreviation
-------------------------------------------------------------------------------
Type: Numeric (long)
Label: statecodes
Range: [1,50] Units: 1
Unique values: 50 Missing .: 0/50
Examples: 10 GA
20 MD
30 NH
40 SC