1)
Were the home locations of the respondents to this survey spatially
random?
No, they do
not appear to be random. To quantify the
non-random distribution statistically, the X2 test compares observed
survey return locations to a random expected return distribution based on
percent area in each census tract. The
calculation for this would be very similar to the technique used in 2) except
the LAND_KM for each tract would be used to get the % of the
2)
Were the home locations of the respondents to this survey random with
respect to population density? (You do have to do the analysis for this one)
To use the Chi-Square table
to compare an observed distribution to a theoretical distribution, the degrees
of freedom for the test are determined by df = categories – 1 = 82(tracts) – 1
= 81
For a 1-tail test where
α = .05 (95% confidence interval) and df = 81
From the table reading
across to .950, df choices are 80 or 100 reading df = 100 (harder to reject
null)
X2 = 77.93
The
Chi-squared table for the observed distribution and the X2
calculation is attached.
The resulting
X2 = 281.
We reject the
null that the respondents are random with respect to population density.
3)
What level of spatial aggregation (tracts or block-groups) is more
appropriate to answering question #2. Explain.
The sizes of
the blocks are much smaller than the size of the tracts.
Given the
assumptions of the Chi-Square test and the “Rule of Thumb” for categories
If df > 1
no cell frequency < 1 (i.e. = 0) and no more than 20% of the cells should
have frequencies < 5
Using the
tracts, there are tracts (i.e. categories) where no surveys respondents were
observed. So even with the larger
tracts, the assumption and rule of thumb is not being met. With the smaller blocks, the inability to
meet the assumptions would be even greater.
4)
How would you test the following?:
a) Is the age distribution
of the respondents to this survey significantly different than the age
distribution of the population of
Census tract
data provides data for age in ranges.
Using the Adults in the county (i.e. people 18 and Over) it is possible
to calculate the Expected Survey respondents of each age range based on the %
of people that age in the county. This
gives you an Expected for the age ranges (as defined in the census data). The Observed from the survey is in years but
would need to be reclassified into the ranges of the census tract data. A X2 test could then be
preformed comparing the Expected to the Observed.
b) Is the income of the
respondents to this survey significantly different than the population of
Census tract
data provides data for income in ranges.
The survey respondents also provided their income in ranges. Unfortunately the ranges do not match up
completely. It would be possible to
create ranges that could be used to do a X2 test but the result
would be extremely generalized. Again
the Expected would be based on the % for each income range from the census
tract data and the Observed would be from the survey answers.
c) Are the political party,
ethnicity, and gender distribution of the respondents to this survey significantly
different from the population of
Ethnicity and
Gender are provided in the Census
tract data but data for Number of Adults and Number of children is not
provided. It would be possible to get a
distribution for each of these attributes based on the total population, but it
would not be possible to refine it based on which are Adults and the target for
the survey respondents. Expected would
be based on the % of each classification in the total population (i.e. The
different Ethnicities for the first and Male or Female for the second). Observed would be from the survey
answers. A X2 test
could then be preformed comparing the Expected to the Observed.
Political
party is not provided in the Census tract data, but the respondents were not
sampled from the whole population of the census tracts. They were sampled from the voter registration
records. So using the data from the
voter registration records, the % of each party can be used to calculate the
Expected values. A X2 test could
then be preformed comparing the Expected from the Voter registration records to
the Observed survey answers.
5)
What kinds of problems do you run into when trying to answer the
questions posed in #4?
The data types
are not the same. Example the Age data
in the Census is Categories (Nominal) but the Survey data is interval/ratio.
The Categories
are not the same. Example the Income and
Ethnicity Categories are not the same.
The data
available for the population is incomplete (i.e. the voter registration records
are only those residents that are registered) and may not accurately reflect
the population (even though it does reflect the subset that was sampled from).
6)
Are the respondents to this survey age and income independent? Would
you expect them to be? Is this a
parametric or non-parametric test?
1-Way ANOVA is used to
compare a Nominal Independent variable (Income) that has 2 or more categories
to a Dependent variable collected at the Interval level (Age). This is a parametric test. The specific results are on the One-way
Analysis of Age By Income printout.
The survey doesn’t appear to
be age independent. The mean for the
respondent’s age is 48.6 years with a Standard Error in each income category of
< 2.2 so in general approximately 95 % of the respondents were 48.6
+/- 4.4 years. The persons in this age
range almost exclusively filled out the survey and mailed it back.
Based on the Number of
respondents in each income category, the reported incomes of the respondents
appear to be normally distributed (with respect to the levels provided) around
level 3 ($20,000 to 50,000) with 300 respondents in this level.
Overall Middle Income and
Middle Aged people responded, and I wouldn’t have expected much different. The age range is people who have been around
and wish to “contribute” and/or “pass on knowledge” and are still young enough
to have energy to do it. Based on the
income level, these adults are not struggling to get by nor are they so wealthy
they are on continual holiday.
One specific outcome of the
One-way Analysis of Age by Income is that the Age range of the lowest income
level is statistically different from the other levels and the probability of
this occurring by chance is < .0001.
For the people with income less than $10,000, there is a statistical
difference in the Mean Age (37.8533) for this level vs. ANY of the other income
levels. Using the Tukey-Kramer positive
values to show pairs of means that are significantly different all other income
levels show a positive significant difference.
This may just be a case where the respondents who make less are also
young people starting out and have lower paying jobs or part-time jobs.
Simple Demographic Comparisons (7-11)
7)
Based on the responses to the question: ‘Abortion should remain
legal as defined in “Roe v. Wade”?’; are Democrats significantly more
‘Pro-Choice’ than Republicans?
The specific results are on
the One-way Analysis of P3_9 By PolPrty printout.
All PolPtry Means are <
3.0 where 3.0 is Neutral and 1.0 and 2.0 are Strongly Agree and Agree. Therefore all Political Parties agreed
Abortion should remain legal as defined in “Roe v. Wade”. Based on the Tukey-Kramer comparisons, where
positive values show pairs of means that are significantly different, there is
a significant difference in the Means for the Democrats and the Means for the
Republicans.
The Democrat Mean was
1.61194 and the Mean for the Republicans was 2.32781. A 1-Way ANOVA test of P3_9 by PolPrty
produces an F Ratio of 16.8315. The
Probability of getting these results on this 1-Way ANOVA test were <
.0001.
8)
Along a similar vein, are Women more ‘Pro-Choice’ than Men? (According
to this survey)
The specific results are on
the One-way Analysis of P3_9 By Sex$ printout.
Both Sex$ Means are < 3.0
where 3.0 is Neutral and 1.0 and 2.0 are Strongly Agree and Agree. Therefore both Sexes agreed Abortion should
remain legal as defined in “Roe v. Wade”.
The F Ratio for this 1-Way ANOVA is 6.4824 and this results in a
Probability of 0.0110 so based on the Means for the Males and the Means for the
Females and a 95% confidence level there is a significant different in answer
to question P3_9 based on Sex.
9)
In a separate survey I found that Women were more ‘Pro-Choice’ than men
and that Catholic women were significantly ‘More, more Pro-Choice’ than
Catholic men. Is this true of the respondents to this survey? How did you test that? If you did find the gap between Catholic men
and women significantly greater than the gap between men and women in general
what would a statistician call such a phenomena? If it were true, how would you
explain it?
The specific results are on
the One-way Analysis of P3_9 By PolPrty
(subset) printout.
For the subset of Catholic
respondents, a 1-Way ANOVA test of P3_9 by PolPrty produces an F Ratio of
2.9280 and a Probability of 0.0346 of getting such an F value by chance. The difference based on Political Party for
Catholics is less extreme than the difference for the overall respondents.
The specific results are on
the One-way Analysis of P3_9 By Sex$
(subset) printout.
For the subset of Catholic
respondents, a 1-Way ANOVA test of P3_9 By Sex$ produces an F Ratio of 7.9974
and a Probability of 0.0051 of getting such an F value by chance. The difference based on Sex for Catholics is
more extreme than the difference for the overall respondents. It appears that “Women were more ‘Pro-Choice’
than men and that Catholic women were significantly ‘More, more Pro-Choice’
than Catholic men”. Is true of the respondents to this survey as well. This phenomenon is called interaction and
means that two variables enhance or diminish each others effects. Combining the Catholic Religion and Sex
therefore accentuates the difference due to Sex. It may be related to Catholic teachings,
doctrine or experiences specific to Catholics.
10)
Are republicans different than non-republicans on the responses to any
of the questions about immigration?
The specific results are on
the One-way Analysis of P4B_1 through 5 By Republican printout.
Yes, all the questions yielded
significant differences and probabilities of < 0.0001 that these results
would have occurred by chance.
Question 1 F Ratio = 43.1681
Question 2 F Ratio = 14.7869
Question 3 F Ratio = 30.6039
Question 4 F Ratio = 39.0164
Question 5 F Ratio = 27.4545
The two questions yielding
the highest F Ratio were:
The
Federal law should be
changed so that citizenship is not automatically granted to children born in
the
11)
Is there any relationship between ‘Religiosity’ and responses to the
question: ‘The earth has a finite supply of natural resources such as water,
arable land, etc. which imposes a limit on the number of people which can
sustainabily live on it.’
The specific results are on
the Bivariate Fit of P1_15 By RelgAct printout.
As Religious involvement
(religious activity) goes from 1 to 5 (Minimal to Extensive), agreement in
finite supplies of natural resources goes from Agree to Neutral. As religious involvement increases there is
less belief in the finite supply of natural resources.
Factor
Analysis (12-14). Factor analysis is a data reduction technique that allows you to
‘compress’ your analysis. Factor
analysis is a means of ‘capturing’ this co-variance between questions and
‘reducing’ a many-question survey to a few factors.
12) For Factors 1-5 list the
questions with a factor contribution score of 0.40 or more and study the
questions that contributed to each factor. As a result of this study provide a
name for each of the first five factors.
Sorting the data in
descending order for Factor each yields this list of questions with their
factor contribution scores > 0.4000.
I list the ones above .5000 or the top 6. The Factor name is at the top.
Name Factor
1: Government Intervention or “Laws will Fix It”
Question |
Factor 1 |
P2_10 |
0.828251 |
P2_11 |
0.818446 |
P2_12 |
0.769 |
P2_9 |
0.72183 |
P3_5 |
0.695323 |
P2_14 |
0.665964 |
P3_7 |
0.631293 |
P1_16 |
0.554414 |
P3_4 |
0.540022 |
P3_13 |
0.512226 |
P3_1 |
0.479687 |
P3_10 |
-0.47772 |
P1_13 |
0.446905 |
P3_9 |
0.433724 |
P1_5 |
0.426083 |
P2_10 Imposing restrictions
on CFC emissions that course depletion of ozone in the stratosphere was a
necessary and appropriate Government action.
P2_11 The potential
consequences of global warming justify the spending of money to reduce the
emission of greenhouse gases (CO2 & CH4).
P2_12 To protect the
environment for future generations, present economic and behavioral sacrifices
are justified.
P2_9 Human activities are
the major cause of environmental degradation.
Governments of the world must formulate policy to minimize the
degradation.
P3_5 The Govt. should insure
that various types of contraceptives are available at affordable prices for all
members of our society.
P2_14 Efforts, including
funding, should be made to enhance the opportunity for women, worldwide to
achieve improved educational, economic and political status.
P3_7 To reduce teen
pregnancy, sex education should be mandatory in the schools.
P1_16 Policies regarding
environmental degradation must also address the high per capita levels of
resource consumption that are common in the industrialized nations such as the
P3_4 Govt. sponsored
educational programs can be an effective means to achieve reduction of family
size by voluntary cooperation.
P3_13 The
Name Factor
2: The Indigent Tax Burden or “Why Benefit the Undeserving Poor”
Question |
Factor 2 |
P4B_1 |
0.822995 |
P4B_2 |
0.786595 |
P4B_4 |
0.76423 |
P4B_5 |
0.71261 |
P4B_3 |
0.625665 |
P3_2 |
0.539417 |
P4B_1 The
P4B_2 The U.S. should issue
a counterfeit-proof National Identification card so that only U.S. citizens
receive benefits that are restricted to U.S. citizens only.
P4B_4 Federal law should be
changed so that citizenship is not automatically granted to children born in
the
P4B_5 The
P4B_3 Immigration policies,
laws, and law enforcement are federal responsibilities; individual States
should be reimbursed for costs resulting from lack of enforcement of these laws
by the federal govt.
P3_2 Welfare support to
unwed mothers acts as an incentive to produce more children.
Name Factor
3: People are Good or “Keep Government out of the Bedroom”
Question |
Factor 3 |
P1_4 |
0.693664 |
P1_14 |
0.688308 |
P3_12 |
0.559302 |
P2_1 |
0.554813 |
P3_9 |
-0.49573 |
P2_6 |
0.494416 |
P2_5 |
0.476661 |
P2_7 |
0.458089 |
P3_10 |
0.457654 |
P1_2 |
0.439916 |
P1_10 |
-0.42691 |
P1_4 Population growth is
good because it increases the supply of our most valuable resource: People.
P1_14 A growing population
is necessary for a growing economy.
P3_12 Countries that allow
or condone abortion should be denied any kind of foreign aid.
P2_1 Attempts at curbing
population growth are usually the racist schemes of the people in power.
P3_9 Abortion should remain
legal as defined in Roe vs. Wade.
P2_6 Human ingenuity has
provided improved agricultural yields, better energy utilization and other
technological innovations. This
ingenuity can be counted upon to avert the need for population control.
Name Factor
4: Population Resource Degradation or “People Claustrophobia”
Question |
Factor 4 |
P1_6 |
0.743261 |
P1_7 |
0.735964 |
P1_8 |
0.71135 |
P1_13 |
0.64239 |
P1_3 |
0.638499 |
P1_9 |
0.629217 |
P1_5 |
0.567222 |
P1_15 |
0.525338 |
P1_10 |
0.494813 |
P1_11 |
0.494605 |
P1_12 |
0.482948 |
P2_6 |
-0.46944 |
P2_3 |
0.430306 |
P2_2 |
0.406022 |
P1_1 |
0.401505 |
P1_6 The Growing population
causes increasing traffic congestion.
P1_7 Population growth
increases competition for natural resources such as land, oil, and water.
P1_8 International violence
is aggravated by issues such as immigration and competition for natural
resources that are directly related to the growing human population.
P1_13 Increasing human
population threatens the diversity and survival of many plant & animal
species.
P1_3 Population growth is a
cause of increased pollution.
P1_9 The growing population
contributes to inter-racial conflict.
P1_5 Population growth is a
cause of deforestation in the
P1_15 The earth has finite
limits of land, air, and water, which impose a ceiling on the number of people
that can live on it.
Factor 5:
Limiting Reproduction or “Policies for those who Won’t Help Themselves”
Question |
Factor 5 |
P3_6 |
0.669543 |
P2_2 |
0.667867 |
P2_3 |
0.627428 |
P3_3 |
0.604894 |
P3_8 |
0.545589 |
P3_14 |
0.538953 |
P3_11 |
0.501134 |
P1_10 |
0.474396 |
P2_4 |
-0.46921 |
P1_11 |
0.448927 |
P3_13 |
0.407074 |
P3_6 The govt. should
provide economic incentives for seekers of public assistance to be temporarily
or permanently sterilized.
P2_2 The
P2_3 The U.S should have an
explicit and well-publicized International Population Policy.
P3_3 Incentive strategies
such as tax laws favoring small families and penalizing large families are
appropriate actions for govt. to use.
P3_8 As a condition of
public assistance, child abusers and drug addicts must accept implanting a
contraceptive such as NORPLANT.
P3_14 Coercive population
control policies such as
P3_11 The
13) Do all the statistical tests
necessary to fill out the table below. Put an asterisk (*) in the cells that
indicate any significant differences on factor scores between demographic
variables. For each asterisk provide a detailed description of the nature of
the significant differences and some guess as to an explanation for the differences.
Your ‘guess’ is referred to as ‘theory’ in academia. If you are really fired up
about this exercise find references to support your theory.
Significant Factor Score Differences (*) |
Sex |
Pol. Party |
Religion |
Religiosity |
Income |
Education |
Race/Ethnicity |
Factor1: |
* 0.0029 |
* <0.0001 |
* 0.0020 |
|
|
|
|
Factor 2: |
|
* <0.0001 |
|
* 0.0004 |
|
* 0.0491 |
|
Factor3: |
|
* 0.0012 |
* 0.0019 |
* <0.0001 |
|
* 0.0004 |
*0.0167 |
Factor4: |
|
|
|
|
* 0.0394 |
|
|
Factor5: |
* 0.0448 |
|
*0.0421 |
* 0.0001 |
|
|
|
For all the Factor Means the Negative values are
associated with Strongly agree. The
Positive values are associated with Neutral
Factor 1 “Laws will Fix It
As shown above, 3
demographic variables show up with differences where the probability of this
occurring by chance is less than 5%.
These are Sex, Political Party, and Religion.
Demographic Variable |
Group Name |
Factor Mean for Group |
Political Party |
Republican |
0.39594 |
Sex |
Male |
0.12927 |
Religion |
The assortment of
Christians |
0.00329, 0.18578,
-0.00571, 0.09941 |
Sex |
Female |
-0.21434 |
Political Party |
Democrat |
-0.48187 |
Religion |
Jewish, Agnostic, Atheist |
-0.52614, -0.59407,
-0.35203 |
From this ranking of the
Factor means it is clear that within Political Party the Factor Means have the
widest gap. That Jewish people and
Democrats come out very similarly is interesting. I have heard it said that Jewish people tend
to be Democrats…I don’t know if that is true but for this factor they have
similar results. These almost are
questions defining the differences in Political Party doctrines. It would appear from this that Jewish and
Democrat people believe laws will fix things and Republicans aren’t so sure.
Factor 2 “Why
Benefit the Undeserving Poor”
As
shown above, 3 demographic variables show up with differences where the
probability of this occurring by chance is less than 5%. These are Polprty, Religiosity and
Education.
Demographic Variable |
Group Name |
Factor Mean for Group |
Religious Activity |
Extensive |
0.82405 |
Political Party |
Other & Democrat |
0.45025, 0.29172 |
College |
Masters Degree |
0.32600 |
Political Party |
Republican &
Independent |
-0.23491, -0.13994 |
Religious Activity |
Average |
-0.39607 |
College |
No College |
-0.44929 |
From
this ranking of the Factor means, it is clear that within Religious Activity
the Factor Means have the widest gap.
“Religious Do-Gooders” and “Bleeding Heart Liberals” have the highest
Mean values while Average Church-goers and Un-college Educated have the
lowest. “Religious Do-Gooders” and
“Bleeding Heart Liberals” want everybody to be helped/aided. I guess “Bleeding Heart Liberalism” isn’t
taught in the High Schools and the Average Church-goes gave at the church.
Factor 3 “Keep Government out of the Bedroom”
As
shown above 5 different demographic variables show up with differences where
the probability of this occurring by chance is less than 5%. These are Polprty, Religion, Religiosity,
Education and Race/Ethnicity.
Demographic Variable |
Group Name |
Factor Mean for Group |
College |
PhD, MD, JD |
0.48313 (20) |
Religion |
Non-Denominational,
Jewish, Agnostic |
0.30418, 0.42428,
0.55411 (14, 25, 13) |
Polparty |
Independent, Other |
0.34933, 0.46708 (38, 12) |
Religious Activity |
Extensive |
0.23370 (143) |
Race |
White |
0.0540 (266) |
Polprty |
Democrat, Republican |
0.14169, -0.20776 (111, 141) |
Religion |
Other, Christian |
-0.40483, -0.28825 (37, 53) |
College |
No College |
-0.71959 (24) |
Religious Activity |
Minimal |
-0.74057 (15) |
Race |
Chicano/Mexican,
Latino/Hispanic |
-0.9285, -0.3597 (9, 11) |
The highest Means for this
Factor were among the Highly Educated, Agnostic and Other Political
Parties. They are most likely to be
Neutral. The lowest Means were among the
No College, No Church, Chicano or Latino Races.
They are most likely to strongly agree.
Notice that for both extremes these Means are only a few people in
number but they answered the questions correlated with Factor 3 very
differently than the majority in the demographic. It is possible that if the sample size for
these groups had been larger, the Means for these groups would not have been so
extreme. It may also mean that of the
minority groups for each demographic the people with the most extreme views
chose to respond.
Factor 4 “People Claustrophobia”
As
shown above only 1 demographic variable show up with differences where the
probability of this occurring by chance is less than 5%. This variable was Income.
Demographic Variable |
Group Name |
Factor Mean for Group |
Income |
$50,000 to $100,000 |
0.32273 |
Income |
All Other Groups |
-0.05538, -0.16188,
-0.11549, -0.04135 |
The odd
Income level is $50,000 to $100,000.
This group is more likely to be neutral about the environmental
concerns. Maybe they just bought a place
in the country and now “Own” their environment.
Factor 5 “Policies for those who Won’t Help
Themselves”
As shown
above, 3 demographic variables show up with differences where the probability
of this occurring by chance is less than 5%.
These are Sex, Religion and Religiosity.
Demographic Variable |
Group Name |
Factor Mean for Group |
Religious Activity |
Minimal, Some |
0.52066, 0.49975 |
Religion |
Christian |
0.32034 |
Sex |
Female |
0.14458 |
Sex |
Male |
-0.08722 |
Religious Activity |
Extensive |
-0.19910 |
Religion |
Jewish, Agnostic |
-0.34476, -0.44550 |
The
positive extreme Means on this Factor are found in the Minimal and Some Religious
Activity and in the people defining themselves as Christian. The negative extreme Means are among the
Agnostic, Jewish religions and the Extensive Religious Activity. The negative Means may be related to these
groups being willing to take reproductive rights away from others and more
concern for the good of the group/society as a whole. On the other hand those that are inactive
Christians may be more concerned about reproductive “due process” or simple not
want government to decide these things.
14) Did filling out the table
and answering the questions of #13 make you appreciate factor analysis?
Yes, Factor Analysis and Principal Components are
useful for:
1)
Pattern Identification,
a.
It will find Questions that are related to each other.
b.
Identify separate Independent variables that are unrelated to each
other
2)
Data Reduction
a.
It reduced a large data set to more manageable proportions
b.
Components capture most of the information in a smaller set of
variables
c.
Lowers the number of individual statistical tests that need to be
analysed
3)
Data Transformation
a.
It changes the data to meet the requirements of independence in
variables
b.
Provides scores to use the new variables
c.
No colinearity
4)
Selection of Surrogate Questions
a.
A shorter survey could be composed from the questions with the highest
loading on the Factors
b.
These would be questions that are the most independent and provide the
most information
5)
Evaluate the original Survey’s Structure based on the below questions
a.
Were there a lot of questions asking the same thing?
b.
Were all the correlated questions grouped together?
Spatial Anaysis: Where’s the
Geography?
15) Test for any significant differences/variation
for all of the factor scores (1-5) and the population density and percent
non-white of the respondents home location. If you find any significant
differences provide an explanation?
1)
Join geocoding with survey data What
I did here in ArcGIS:
a)
Join surveylocneg483.shp to sbcoblockgroupdemog.shp using a spatial
join to create a new Join.shp file with the points having all the data from the
block they were in (join polygon to points)
b)
The clean jump file with factors
was exported to excel and did the intersection field, saved as a .csv (coma
delimited), renamed .txt (ArcGIS likes those), renamed any field names ArcGIS
didn’t like and then used the table join to join this table to the Join.shp by
the Intersection fields from each table
c)
Export the data will create a
new Export.shp with all the three combined
Hope this is kind of what
you had in mind
2)
Analyze data in JMP with respect to spatial location
a)
Create new field from block data PopDens (Persons/Land_KM)
b)
Create new field from block data Non-White
(Black+Amind+Asian+O_Ethnic+Hispanic)
c)
Create new field from block data White/Non_White
d)
Analyze the Factors with Line fit
Hope this is what you had in
mind
Sorry, but neither of these
appeared to me to be statistically significant with regard to any of the
Factors. Possibly the above procedure
was in error.
15)
Another test you could do is to test for increases variance in response
based on a geographic attribute. What
kind of statistical test would you use to look for that and if it proved
significant, what would the explanation be?
Larger over Smaller
Read df smaller variance down the side df = N – 1
and df larger variance across the top df = N – 1
If the variance was significant it would mean that something other that % Hispanics is affecting their answer to the questions.
16) Describe 10 specific problems related to this
little research project. Things to consider: Sampling frame was registered
voters whereas census data was total population, Non-response Bias, etc.
Sampling
issues:
A) Registered Voters are people who have taken the
time to actually register to vote. They
may have characteristics that are different from the overall population. They may tend to be more stable, owners of
homes, older. Sampling from Registered
Voters might be considered a form of “
B) Since the respondents “responded”
they had to have some motivation because action was required on their
part. This would be a form of “Response
Bias” resulting in people who were highly motivated to answer the survey’s
questions. This was after all a very
long survey.
C) The advanced vocabulary used in the
questions probably increased “Response Bias” to the more highly educated. Words like formulate, degradation,
stratosphere, etc might have meant some people didn’t even understand the
questions being asked. Less highbrow
wording might have lead to increased response by some groups.
D) Because the sample wasn’t
“Stratified”, i.e. the population divided into homogeneous groups and randomly
sampled from each group some groups are represented by a very few individuals
(think Race or Age here). It is hard to
draw meaningful statistical results from such small sample sizes.
E) One question that is lacking, or
maybe the answer is implied somewhere, that I would have liked to see included
is whether the person is a Native Citizen or a Naturalized Citizen.
Spatial
issues:
F) To correlate this data with spatial
tract or block data it needs to be located in space. The methodology for collection of this
information was not 100% successful. The
G) Even with the geocoded data the
spatial join to the tract and block and polygon data was difficult. Many of the points were on the line between
the tracts (about 1/5) which also meant there were problems with the blocks as
well. The default is to associate the
point with the lower number tract or block.
Points on the line between a small tract and a large tract (in area)
frequently were associated with the small tract, which may or may not be
correct. This would tend to bias the
spatial distribution even further in the direction of high population density
blocks.
H) The data is spatially biased but even
the cause is unclear due to the above 2 problems.
Data
Comparison issues:
I) Lack of standardized categories
between the Census Data and the Survey Data.
I am thinking here of the Ethnicity/Race: and Income: categories
specifically. Especially in the $20,000
to $50,000 and the $50,000 to 100,000 categories in Income on the survey much
information may have been lost due to the wide ranges.
J) The Factors only captured about half
the variability in the survey. There
would still be quite a few things that would have to be learned only from 1
specific question.
17)
Are these problems significant enough to invalidate any or all of the findings
from an analysis of this data?
For specific groups there is enough data to evaluate their opinions (Whites, 45-55 in age, Christian religions, Male & Female) but for any Racial, Age, or Less Common Religions I doubt that the survey would accurately reflect their views and the statistical differences of the population.