Lab Exercise #1: Spatial Analysis of a survey on attitudes about Population Growth In Santa Barbara County

Background

In this dataset you should find an ESRI readable coverage of Santa Barbara county 1990 census data at the Tract and block-group level. This dataset is the demographic backdrop for performing a spatial analysis of geo-referenced survey data. I conducted a survey of registered voters in Santa Barbara County in 1995 (this work was my M.A. thesis). You should also find a word document that contains the survey instrument I used. It consisted of almost one hundred questions that characterized various demographic attributes of the respondent (age, education, income, sex, religiosity, etc) in addition to their attitudes to a multitude of questions about population, immigration, and the social, economic, and environmental impacts of population growth. 3,000 records were randomly selected from the registered voters of Santa Barbara County. All of these people were called on the phone and agreed to respond to this survey by mail. 750 responses were returned. You should find a file which contains the coded responses of these 750 odd people. The survey instrument requested respondents to provide the closest street intersection to their home location. You will use this information to geo-code the survey. By geo-coding (aka ‘address matching) this survey you can enable a ‘spatial’ analysis of this survey. The non-geo-coded responses lend themselves to many traditional statistical analyses; the geo-coded survey allows for many more traditional statistical analyses that incorporate spatially specific attributes. Your mission is to answer the following seventeen spatial and a-spatial questions using the information contained in this dataset.

 

Comments

This is the ‘stripped down’ version of the lab. All that this document includes is the questions you are supposed to answer for the lab. This document does not include “Betty Crocker” instructions as to how to do the lab (i.e. the data manipulations, GIS commands, and JMP analyses).  You will learn more if you figure out how to do this on your own or via collaboration with other students in the class. “Betty Crocker” explanations have been produced by students over the years and are available via the course web site. You will be involved in producing a “Betty Crocker” set of instructions for one of these labs. These are made for your use but they are an “as is” product. The seventeen questions start on the next page. Good luck.

 

 

Assessing ‘Representativeness’ of the survey respondents (1-6)

1)      Were the home locations of the respondents to this survey spatially random?

(you don’t have to do the analysis on this question, just answer it, & explain how you would have done the analysis if you had to)

2)      Were the home locations of the respondents to this survey random with respect to population density? (you do have to do the analysis for this one)

 

3)      What level of spatial aggregation (tracts or block-groups) is more appropriate to answering question #2. Explain.

 

4)      How would you test the following?: a) Is the age distribution of the respondents to this survey significantly different than the age distribution of the population of Santa Barbara County?  b) Is the income of the respondents to this survey significantly different than the population of Santa Barbara County?  c) Are the political party, ethnicity, and gender distribution of the respondents to this survey significantly different from the population of Santa Barbara County? (don’t actually do the tests just explain your best effort if you had to)

 

5)      What kinds of problems do you run into when trying to answer the questions posed in #4?

 

6)      Are the respondents to this survey age and income independent? Would you expect them to be?  Is this a parametric or non-parametric test?

 

 

Simple Demographic Comparisons (7-11)

7)      Based on the responses to the question: ‘Aabortion should remain legal as defined in “Roe v. Wade”?’; are Democrats significantly more ‘Pro-Choice’ than Republicans?

 

8)      Along a similar vein, are Women more ‘Pro-Choice’ than Men? (according to this survey)

 

9)      In a separate survey I found that Women were more ‘Pro-Choice’ than men and that Catholic women were significantly ‘More, more Pro-Choice’ than Catholic men. Is this true of the respondents to this survey?  How did you test that?  If you did find the gap between Catholic men and women significantly greater than the gap between men and women in general what would a statistician call such a phenomena? If it were true, how would you explain it?

 

10)  Are republicans different than non-republicans on the responses to any of the questions about immigration?

 

11)  Is there any relationship between ‘Religiosity’ and responses to the question: ‘The earth has a finite supply of natural resources such as water, arable land, etc. which imposes a limit on the number of people which can sustainabily live on it.’

 

 

 

Factor Analysis (12-14).

      Factor analysis is a data reduction technique that allows you to ‘compress’ your analysis. As you can imagine we could ask hundreds, if not thousands of questions of this dataset (e.g. are men different than women on questions 1-50, are Catholics different than Protestants on questions 1-50, there’s a hundred right there). However, as you can imagine, people will have similar responses to many of the questions. Factor analysis allows you to capture the co-variance that usually exists between questions. For example, there are 5 questions about immigration or immigration policy; an anti-immigration person will most likely respond in a similar manner to all the questions; consequently, only one question might be necessary to ‘capture’ such a response. Factor analysis is a means of ‘capturing’ this co-variance between questions and ‘reducing’ a many-question survey to a few factors. Labeling or ‘Naming’ these factors is one of the ‘arts’ of statisticians. It is now your turn to practice this ‘art’.  

 

12)  Run a factor analysis on the responses to the questions appropriate to such an analysis (we’ll decide these in class). Identify the questions with a factor contribution score of 0.40 or more for each factor and list the questions associated with factors 1-5. Be sure to save each respondent’s factor scores on each of Factors 1-5. For Factors 1-5 list the questions with a factor contribution score of 0.40 or more and study the questions that contributed to each factor. As a result of this study provide a name for each of the first five factors. (FYI, potential ‘names’ for factors when I analyzed this survey were: “Faith in Government”, “Belief in Adam Smith’s ‘invisible hand’”, and “Keep those Mexicans out of California”). Try to come up with your own (it’s actually kind of fun J).

 

13) Do all the statistical tests necessary to fill out the table below. Put an asterisk (*) in the cells that indicate any significant differences on factor scores between demographic varibales. For each asterisk provide a detailed description of the nature of the significant differences and some guess as to an explanation for the differences. Your ‘guess’ is referred to as ‘theory’ in academia. If you are really fired up about this exercise find references to support your theory.

 

Significant Factor Score Differences (*)

Sex

Pol. Party

Religion

Religiosity

Income

Education

Race/Ethnicity

Factor 1:

 

 

 

 

 

 

 

Factor 2:

 

 

 

 

 

 

 

Factor 3:

 

 

 

 

 

 

 

Factor 4:

 

 

 

 

 

 

 

Factor 5:

 

 

 

 

 

 

 

 

14)  Did filling out the table and answering the questions of #13 make you appreciate factor analysis?  (Explain. If your answer is ‘No’ stop by my office for a spanking J).

 

Spatial Anaysis: Where’s the Geography?

 

      So far, the analyses performed up to this point could have been done in a sociology department. The only ‘geographic’ analyses were the questions about randomness of the survey respondents with respect to population density and space. True spatial anlysis of surveys can shed light on interesting questions about the location of the respondent’s home or workplace relative to questions in the survey. For example:  Does the distance of a respondent’s home and/or work location influence their likelihood to support or use a light rail public transportation system? Does the population density of their home location or home city covary with attitudes about population growth and policy? Does the Hispanic proportion of their home neighborhood have any influence on their attitudes about U.S. immigration policy?

 

 

14) Test for any significant differences/variation for all of the factor scores (1-5) and the population density and percent non-white of the respondents home location. If you find any significant differences provide an explanation?

 

 

 

15)  Another test you could do is to test for increases variance in response based on a geographic attribute. For example, suppose that people’s responses to the 5 immigration questions became increasingly extreme (i.e. more 1’s (strongly agree) and 5’s(strongly disagree)) but the mean remained the same as the Hispainic proportion of the population in the respondent’s home location increased. What kind of statistical test would you use to look for that and if it proved significant, what would the explanation be?

 

 

 

 

General Questions

 

16) Describe 10 specific problems related to this little research project. Things to consider: Sampling frame was registered voters whereas census data was total population, Non-response Bias, etc.

 

17)Are these problems significant enough to invalidate any or all of the findings from an analysis of this data?