5 Appendix

5.1 Interpreting Interactions in a Regression Model Overview

5.1.1 Two-Way Interactions

5.1.1.1 General

Let our regression model follow this form:

\[ Y = A + B + A*B \]

Where Y represents our dependent/outcome variable and $A*B$ represents the interaction between $A$ and $B$.

The regression coefficient for $A$ shows the effect of $A$ when $B=0$.
The regression coefficient for $B$ shows the effect of B when $A=0$.
The regression coefficient for $A*B$ demonstrates how $A$ changes with a one unit increase in $B.$ It also demonstrates how $B$ changes with a one unit increase in $A$.

5.1.1.2 Two Categorical Variables

Let $A$ represent gender
- 0=Female
- 1=Male
Let $B$ represent treatment condition
- 0=Control
- 1=Experimental
The interaction regression coefficient shows whether the effect of treatment condition is different for males and females.
The regression coefficient for $A$ shows the difference in $Y$ between males and females for the ‘control’ treatment group.
The regression coefficient for $B$ shows the difference in $Y$ between treatment and control groups for females.

5.1.1.3 One Categorical and One Continuous Variable

Let $A$ represent gender
- 0=Female
- 1=Male
Let $B$ represent a continuous variable: age in years.
The interaction regression coefficient shows if the effect of age on $Y$ is different for males and females.
The regression coefficient for $A$ shows the difference between males and females when age is equal to zero.
The regression coefficient for $B$ shows the effect of age for females.

5.1.1.4 Two Continuous Variables

Let $A$ represent a continuous variable: IQ score.
Let $B$ represent a continuous variable: Age.
The interaction regression coefficient shows
- if the relationship between age and $Y$ differs according to IQ
- if the relationship between IQ and $Y$ differs according to age.
The regression coefficient for $A$ shows the relationship between IQ and $Y$ when age equals zero.
The regression coefficient for $B$ shows the relationship between age and $Y$ when IQ equals zero.

5.1.2 Three-Way Interactions

The same principles apply from above. The general model:

\[ Y = A + B + C + A*B + A*C + B*C + A*B*C \]

The coefficient for $A$ shows the effect of $A$ on $Y$ when both $B$ and $C$ are zero.
The coefficient for $B$ shows the effect of $B$ on $Y$ when both $A$ and $C$ are zero.
The coefficient for $C$ shows the effect of $C$ on $Y$ when both $A$ and $B$ are zero.
The coefficient for $A*B$ shows the interaction between $A$ and $B$ when $C$ is zero.
The coefficient for $A*C$ shows the interaction between $A$ and $C$ when $B$ is zero.
The coefficient for $B*C$ shows the interaction between $B$ and $C$ when $A$ is zero.
The interaction regression coefficient shows if the relationship between
- $A$ and $Y$ differs according to $B$ and $C$
- $B$ and $Y$ differs according to $A$ and $C$
- $C$ and $Y$ differs according to $A$ and $B$.

5.2 Exercise Solutions

5.2.1 Exercise 1

In order to analyze data properly in SPSS, we need to follow the guidelines set out above. Open exercise1_data.sav and see what guidelines we have ignored.

5.2.1.1 Exercise 1 Solution

Too much information is contained in one variable (CTSSurgTypeCatCodeDesc, LOS, SURGLOS, DCDate, etc.)

Errors can easily be found by sorting (errors in Year, AGE)

The same content is entered in differently for a single variable (SEX, HTN, SMOKING)

Anything else?

5.2.2 Exercise 2

Open exercise2_data.sav (an Excel file). Modify this Excel file such that it can be imported into SPSS properly. Save the file and close it.

Open the file in SPSS (import it). Export this file back into Excel, but only save the following variables: id, salary, minority.

5.2.2.1 Exercise 2 Solution

Delete the first three rows of data (remove heading)
Remove rows 23 and 24 (contains summary information)
Remove the formatting (fill color)
Save the file as Exercise2_Data_Ready

Close Exercise2_Data_Ready
Open SPSS
Select File -> Open -> Data
Under “Files of Type” select either “All Files” or “Excel” to view Exercise2_Data_Ready, select the file, then select “Open”

A window appears
Check the box so the variable names will be imported
Select the sheet of the Excel file that you would like to be read in, then select “Ok”

The Excel data should now open in the Data Editor
Delete any “blank” rows of data or columns of data (indicated by .) by highlighting, right click, select “cut”

Select File -> Save As
Let the file name be Exercise2_Data_Ready_short
Change the file type to Excel 97 through 2003 (*.xls)
Select the “Variables…” button
Select the “Drop All” button
Under the “Keep” column, check the box for id, salary, minority
Select “Continue”

Select “Save”
Open the new file (Exercise2_Data_Ready_short) to investigate the results

5.2.3 Exercise 3

Open exercise3_data.sav and go to Variable View. Practice defining the correct attributes to each variable by following the code book.

Name	Label	Value Label	Missing Values	Measure
IDnum				Scale
sex	Respondent’s Sex	1 = Male		Nominal
		2 = Female
race	Race of Respondent	1 = White		Nominal
		2 = Black
		3 = Other
region	Region of the United States	1 = North East		Nominal
		2 = South East
		3 = West
happy	General Happiness	0 = NAP	0, 8, 9	Ordinal
		1 = Very Happy
		2 = Pretty Happy
		3 = Not too Happy
		8 = DK
		9 = NA
life	Is Life Exciting or Dull	0 = NAP	0, 8, 9	Ordinal
		1 = Exciting
		2 = Routine
		3 = Dull
		8 = DK
		9 = NA
sibs	Number of Brothers and Sisters	98 = DK	98, 99	Scale
		99 = NA
childs	Number of Children	8 = Eight or More	9	Scale
		9 = NA
age	Age of Respondent	98 = DK	0, 98, 99	Scale
		99 = NA
educ	Highest Year of School Completed	97 = NAP	97, 98, 99	Scale
		98 = DK
		99 = NA
paeduc	Highest Year School, Father	97 = NAP	97, 98, 99	Scale
		98 = DK
		99 = NA
maeduc	Highest Year School, Mother	97 = NAP	97, 98, 99	Scale
		98 = DK
		99 = NA
seeduc	Highest Year School, Spouse	97 = NAP	97, 98, 99	Scale
		98 = DK
		99 = NA
prestg80	Occupational Prestige Score	0 = DK,NA,NAP	0	Scale
occcat80	Occupational Category	1 = Managerial and Professional		Nominal
		2 = Technical and Sales
		3 = Service
		4 = Farming, Forest, and Fishing
		5 = Production and Craft
		6 = General Labor

5.2.3.1 Exercise 3 Solution

In Variable View, the first four columns do not need to be modified
To modify the variable label, click in the cell that you wish to edit and start tying in the label
To modify the value labels, click the cell that you wish to edit and then select the box with three small dots. The following window will appear:

Enter the value and label, then select “Add”. Once all possible value labels are added, select “OK”
When value labels (or other attributes such as label or missing) repeat for a variable, you can copy and paste the attribute values. Right click on the cell you want to copy, select copy. Then right click on the cell that you would like to paste in, and select paste.

Enter in missing values in a similar fashion—here we have discrete missing values
Use the drop down menu for “Measure” to specify the correct measurement type

5.2.4 Exercise 4

Open exercise4_data.sav.

Compute a new variable that is the change from beginning salary to current salary for each employee.

Recode the education variable into a new variable according to the following

1=High School or Less (educ<=12)
2=Some College (12<educ<=16)
3=Bachelor’s Degree or Higher (educ>=17)

5.2.4.1 Exercise 4 Solution

Compute a new variable that is the change from beginning salary to current salary for each employee.

Transform -> Compute Variable
Select “Reset”
Enter the following information
Target Variable: salchange
Double click (or use the arrow) to move salary to the Numeric Expression window
Use the calculator box below the numeric expression box to enter a minus sign (alternatively, you could type a minus sign) then select salbegin
Select OK, and the new variable will appear in the data set

Recode the education variable into a new variable according to the following

1=High School or Less (educ<=12)
2=Some College (12<educ<=16)
3=Bachelor’s Degree or Higher (educ>=17)

Transform -> Recode into different variables
Move education (educ) into the Input Variable Output Variable window by double clicking on it or using the arrow
Name: EducRecode
Label: Leave Blank
Click the change button
Under old value, select the radio dial for Range, LOWEST through value: enter 12
Under new value, select the radio dial for Value: enter 1
Select Add
Under old value, select the radio dial for Range: enter 13 through 15
Under new value, select the radio dial for Value: enter 2
Select Add
Under old value, select the radio dial for Range, value through HIGHEST: enter 16
Under new value, select the radio dial for Value: enter 3
Select Add
Select Continue
Select OK
Check the dataset in Data View

5.2.5 Exercise 5

Open exercise5_data.sav.

Select male managers. What is their average age?

(You can obtain the average age by choosing Analyze -> Descriptive Statistics -> Descriptives and moving “Age of Respondent (age)” to the right hand side.)

Use the “Split File” procedure to get the average age for each job category.

5.2.5.1 Exercise 5 Solution

Select male managers. What is their average age?

Check Values for sex and occat80 to see what values correspond to “male” and “manager” (it’s 1 and 1).
Data -> Select Cases
Under Select: Select the If Condition is Satisfied radio dial and select the If button

Enter the following information
- Open box should read as follows: sex=1 & occcat80=1
- Continue

Under Output: Select Filter Out Unselected Cases
Select OK
Inspect the data in Data View
Analyze -> Descriptive Statistics -> Descriptives

Select the age variable, select OK
Turn off the filter!

Use the “Split File” procedure to get the average age for each job category.

Data -> Split File
Select Compare Groups
Select occat80 (Occupational Category) and move it into the Groups Based On window by double clicking (or using the arrow)
Select Sort the File by Grouping Variables
Select Ok

Analyze -> Descriptive Statistics -> Descriptives
Select the age variable and OK

Turn off the split file!

5.2.6 Exercise 6

Convert exercise6_data from “Wide” format to “Long” format

5.2.6.1 Exercise 6 Solution

Open exercise6_data.sav
Select Data -> Restructure to open the Wizard
Select “Restructure selected variables into cases” then “Next”

How many variable groups to you want to restructure? Select “One” then “Next”

Case Group Identification should be changed to “Use selected variable” and the variable should be the ID variable
Variables to be transposed: Move the X variables over (X1, X2, X3)
Fixed Variable(s): Move Group and Age over
Select “Next”

How many index variables do you want to create? Select “one” then “Next”

What kind of index values? Select “Sequential Numbers” then select “Next”

Handling of Variables not Selected: Select “Keep and treat as fixed variable(s)”
System Missing or Blank Values in All Transposed Variables: Select “Create a case in the new file”
Leave “Case Count Variable” unchecked
Select “Next”

What do you want to do? Select “Restructure the data now”. In the future you may want to keep the syntax.
Select “Finish”
The following message appears, click “OK”

Inspect the data (and change “trans1” to “X”)

5.2.7 Exercise 7

Convert exercise7_data from “Long” format to “Wide” format

5.2.7.1 Exercise 7 Solution

Open exercise7_data.sav
Select Data -> Restructure to open the Wizard

Identifier Variable(s): ID
Index Variable(s): Index1
Select “Next”

Sort the current data? Yes
Select “Next”

Order of New Variable Groups: Group by original variable
Leave the other options unchecked
Select “Next”

Select “Restructure the Data Now” and “Finish”

The following message will appear, select “OK”. Inspect the data and save!

5.2.8 Exercise 8

Open exercise8_data.sav

Part 1: Investigate the variable attributes. Determine which variables are categorical variables (nominal and ordinal), and which variables are continuous (scale).

Obtain the appropriate descriptive statistics for each variable. Remember, continuous variables should be investigated with Descriptives and categorical variables should be investigated with frequency tables.

Hint: Select more than one variable in the Analyze -> Descriptive Statistics -> Descriptives”, or Analyze -> Descriptive Statistics -> Frequencies dialog boxes.

Part 2: Assess the distribution of the Occupational Prestige Score (“prestg80”) with both a histogram (normal curve displayed) and a Q-Q plot. Is the assumption that the population of Occupational Prestige Scores is normally distributed reasonable?

Part 3: Compare the average highest year of school completed (“educ”) for males and females.

Hint: First split the file by “sex” (Data -> Split File), then calculate the descriptive statistics. Be sure to return to the Split File menu when you are done with this question and return the dialog box to “Analyze all cases”.

Part 4: Produce a pie chart for the variable “region”. (We didn’t cover this, you can use either Chart Builder or Legacy Dialogs.)

5.2.8.1 Exercise 8 Solution

Open the dataset exercise8_data.sav

Part 1

Investigate the variable attributes. Determine which variables are categorical variables (nominal and ordinal), and which variables are continuous (scale).

Select the “Variable View” tab
Investigate the labels and measure of each variable

Obtain the appropriate descriptive statistics for each variable in the dataset. Remember, continuous variables should be investigated with 5-point summary descriptives and categorical variables should be investigated with frequency tables.

Select Analyze -> Descriptive Statistics -> Descriptives
Select the following variables: sibs, childs, age, educ, paeduc, maeduc, speduc, prestg80

Select “OK”
Notice there are only 519 respondents that have valid data points for all of the continuous variables.

Frequency Tables:

Select Analyze -> Descriptive Statistics -> Frequencies
Select the following variables: sex, region, race, happy, life, occcat80

Investigate the output

Histogram in Legacy Dialogs
Select Graphs -> Legacy Dialogs -> Histogram
Variable: prestg80
Check box to display normal curve
Select OK

Investigate the output

Q-Q Plot
Select Analyze -> Descriptive Statistics -> Q-Q Plots
Select the variable prestg80
Select OK

Investigate the output
Look to see how well the plotted points follow the solid diagonal line
It is particularly important to pay attention to the “tails”, or the left most and right most points to see if they follow the line

Part 3: Compare the average highest year of school completed (“educ”) for males and females.

Set up the dataset such that the output is split by groups based on sex
Select Data -> Split File
Select “Compare Groups”
Select the variable sex for “Groups Based on:”
Select “OK”

Compute the 5-Point Summary Descriptives for “educ”
Select Analyze -> Descriptive Statistics -> Descriptives
Select the variable “educ”
Select “OK”

Investigate the output
Males have an average of 13.23 years of education
Females have an average of 12.63 years of education

Turn the split file feature off
Select Data -> Split File
Select “Analyze all cases, do not create groups” (Alternatively, “Reset” can be selected)
Select “OK”

Part 4: Produce a pie chart for the variable “region”. Use “Legacy Dialogs”.

Select Graphs -> Legacy Dialogs -> Pie
Under “Data in Chart Are” select “Summaries for groups of cases”
Select “Define”

Select the variable “region” for “Define Slices by:”
The default for “Slices Represent” is “N of cases”, and leave this at the default
Select “OK”

Investigate the output

5.3 Additional Exercises

5.3.1 Exercise A1 – Categorical Data Analysis

Question 1

Open exercisea1_data. What percent of respondents said they were “Very Happy”? What about “Not too happy”? “Pretty happy”? Use a graph to display the variable.

Question 2

Do women appear to be more or less happy than men? Would you say this apparent relationship is statistically significant?

Question 3

Create a scatter plot of respondent’s education vs. their spouses’ education. Does this relationship appear to be linear? Add a linear regression line to the plot. Inspect the correlation between the respondent’s education and their spouses’ education. Is this correlation positive or negative? Is it statistically significant.

5.3.2 Exercise A1 Solution

Question 1

Open exercisea1_data. What percent of respondents said they were “Very Happy”? What about “Not too happy”? “Pretty happy”? Use a graph to display the variable.

Solution:

We have one categorical variable that we would like to investigate…check the all on one page handout!
Analyze -> Descriptive Statistics -> Frequencies

Enter the following information
- Select happy
- Select Charts
  - Under Chart Type, select Bar Chart
  - Under Chart Values, select Percentages
  - Select Continue
- Select the box for Display Frequency Tables
- Select OK

Question 2

Do women appear to be more or less happy than men? Would you say this apparent relationship is statistically significant?

Solution:

We are going to compare two categorical variables. From out handout, we will use Pearson Chi-Square crosstabs to do this!
Analyze -> Descriptive Statistics -> Crosstabs

Enter the following information
- Rows: sex
- Columns: happy

Select the Statistics button
- Check the box for Chi-Square
- Select Continue

Select the Cells button
- Check the box for Row under Percentages (leave the rest as default)
- Check the box for Adjusted Standardized Residuals under Residuals (leave the rest as default)
- Select Continue
Select the box for Display Clustered Bar Charts
Select OK

The Pearson Chi-Square statistic indicates that the differences between men and women are statistically significant (pvalue/asymptotic significance<.05).
The residuals, clustered bar chart, and row percentages can tell us where these differences arise
- An adjusted standardized residual (absolute value) greater than two shows us where the differences between groups occur. Here, we see that “not too happy” for males and females has a residual greater than 2.
- The row proportions indicate that there is a higher proportion of females that responded “not too happy” when compared to males.
- The clustered bar chart also shows that there are greater numbers of women that indicate that they are “not too happy”.

Question 3

Solution:

Graphs -> Legacy Dialogues -> Scatter/Dot
Simple Scatter and Define
Enter the following information
- Y Axis: speduc
- X Axis: educ
- Select OK
Check the output for the scatter plot
Double click the plot in the Output Viewer to open Chart Editor
Select the button for Add Fit Line at Total (first bar above the plot, axis with straight line plot)
Select Linear Fit, Apply, Close
Close out of chart editor (red X in the upper right corner) and the updated chart will appear in the Output Viewer.

Analyze -> Correlate -> Bivariate
Enter the following information
- Variables: educ, speduc
- Correlation coefficients: Pearson, Spearman
- Significance: Two Tailed
- Check the box for Flag significant correlations
- Select OK
The output indicates that the correlation between education and spouses’ education is positive and statistically significant.

5.3.3 Exercise A2 – Continuous Data Analysis

Open exercisea2_data.sav.

Research Question 1: Is there a relationship between a student’s socio-economic status and whether or not the student would participate in a racially insensitive joke?

What techniques would you use to investigate the relationship between SES and whether or not a student would participate in a racially insensitive joke?