Jump to content

Statistics help please!


scarletlatitude

Recommended Posts

scarletlatitude

Halp! I needs halp! 

 

I am doing my doctorate on four school districts. In order to describe them, I am looking at how to best categorize them. First I considered urban/rural/suburban, but that is really vague and no good info is available. 

 

The next thing I am trying is media household income (from the US Census). Here is the info:

 

School district (pseudonym)

Median household income (2018)

Crystal River

$78,926

Deer Valley

$66,350

Pine Hill

$62,412

River Valley

$39,371

Westwood

$50, 437

 

Average for the area is $59,114. If I did the percent difference between them... does this math make sense? Or am I just making up math? ... Or is there better math to use?

 

Crystal River = (78,926 - 59,114) / 59,114 = -33.51% (or 33% higher than average)

Link to post
Share on other sites

As a disclaimer, I hate statistics and struggle with wrapping my head around terms and meanings and everything. But I have a BA in sociology so had to do some stats classes and deal with stuff like this. It's also been over 2 years since I had to deal with any of this, so I can be completely out of my depth. Especially compared to a doctorate student.

 

But, you can classify the school districts by their median household income and use the comparison to the average, like you demonstrated with Crystal River, but does this accomplish anything? What is the purpose you are trying to get from your categorizations? I'm not sure where these school districts are, so rural/suburban/urban might be more useful even if they're vague, because it helps contextualize. Unless you get the context elsewhere in your study/research/whatever. 

 

As for the numbers and math you presented, wouldn't a 33.51% higher be a positive percentage? Like +33.51% rather than -33.51%? I would imagine a negative percentage implying below average rather than above average. 

 

But the math seems to check out? Even though I got 59,499.2 as the average for the numbers you gave. But I'm just using a basic calculator on my computer while reading through English homework. 

 

Okay, my vocabulary is failing me, but isn't there a term to use that means how far a number deviates from the average? 

Link to post
Share on other sites

(78,926 + 66,350+ 62,412+ 39,372 + 50,437) = 207,497 / 5 = 59,499.4??? Am I doing this wrong?

 

78,926 - 59,499.4 = 19,426.6 

 

(19,426.6 / 59,499.4) x 100 = 32.65%

 

But if I got the average wrong then yeah your maths checks out :) 

Link to post
Share on other sites
the great acescape

Math looks correct to me, at least upon first glance. You might try using formulas in excel to double check your math.

 

BTW, the Bureau of Labor Statistics might be another resource to use as you conduct this analysis. I used it to conduct shift-share analysis when I was in graduate school.

Link to post
Share on other sites
the great acescape
18 minutes ago, SithEmpress said:

But, you can classify the school districts by their median household income and use the comparison to the average, like you demonstrated with Crystal River, but does this accomplish anything? What is the purpose you are trying to get from your categorizations? I'm not sure where these school districts are, so rural/suburban/urban might be more useful even if they're vague, because it helps contextualize. Unless you get the context elsewhere in your study/research/whatever. 

I think that depends on what exactly you're trying to accomplish in the course of your analysis. I've used census/BLS data when conducting economic impact analyses, and insofar as developing a case based around quantitative data, it's extremely difficult to meaningfully measure urban vs suburban and rural areas. Usually, economists might use a designation like MSA (Metropolitan Statistical Area), but for an area like Atlanta, for example, that will include suburban counties in addition to urban ones. 

 

I definitely think household income is a more reliable indicator of wealth or disparities for school districts than individual or even family income. But again, that depends on what you are trying to measure.

Link to post
Share on other sites
1 minute ago, the great acescape said:

I think that depends on what exactly you're trying to accomplish in the course of your analysis. I've used census/BLS data when conducting economic impact analyses, and insofar as developing a case based around quantitative data, it's extremely difficult to meaningfully measure urban vs suburban and rural areas. Usually, economists might use a designation like MSA (Metropolitan Statistical Area), but for an area like Atlanta, for example, that will include suburban counties in addition to urban ones. 

 

I definitely think household income is a more reliable measure for school districts than individual or even family income.

That does make sense. I guess my mind was more in the realm of context and explanation rather than actual usage and classification. Also I forgot how household income affects school districts. 

Link to post
Share on other sites
1 hour ago, scarletlatitude said:

Halp! I needs halp! 

 

I am doing my doctorate on four school districts. In order to describe them, I am looking at how to best categorize them. First I considered urban/rural/suburban, but that is really vague and no good info is available. 

 

The next thing I am trying is media household income (from the US Census). Here is the info:

 

School district (pseudonym)

Median household income (2018)

Crystal River

$78,926

Deer Valley

$66,350

Pine Hill

$62,412

River Valley

$39,371

Westwood

$50, 437

 

Average for the area is $59,114. If I did the percent difference between them... does this math make sense? Or am I just making up math? ... Or is there better math to use?

 

Crystal River = (78,926 - 59,114) / 59,114 = -33.51% (or 33% higher than average)

The one thing that jumps out at me as funny-looking is that you're taking an average of median household incomes.  Or are you comparing the district medians to the median of the the entire area?  Or are you comparing the median district incomes to the average income for the area?

 

To answer the question you asked, income is a scale parameter.  If your income were to double, you really would be earning twice as much of something.  Contrast with temperature, which is not a scale parameter.  If the temperature changes from 50 degrees Farenheit in March to 100 degrees Farenheit in July, that doesn't mean there actually is twice as much heat.  The choice of where we assign 0 degrees is somewhat arbitrary.  This means that yes, a percent difference is appropriate in this context.

Link to post
Share on other sites
1 hour ago, scarletlatitude said:

First I considered urban/rural/suburban, but that is really vague and no good info is available. 

Shot in the dark here:

 

Maybe rather than classifying them into discrete urban/rural/suburban categories, would it work if you used population per square mile of landmass?

 

I'm guessing there's a roughly 70% chance that your answer will be "I tried that.  Of course I tried that.  I wouldn't ask for help before trying that."  I thought I'd make the suggestion in case you hadn't tried it.

Link to post
Share on other sites
scarletlatitude
1 hour ago, SithEmpress said:

But, you can classify the school districts by their median household income and use the comparison to the average, like you demonstrated with Crystal River, but does this accomplish anything? What is the purpose you are trying to get from your categorizations? I'm not sure where these school districts are, so rural/suburban/urban might be more useful even if they're vague, because it helps contextualize. Unless you get the context elsewhere in your study/research/whatever. 

 

Exactly, the context will come from the other numbers that I collect. I need academic sources to back up the preliminary data, and finding an academic source to specifically categorize rural/urban/suburban is a nightmare... especially when some sources use "urban" to mean more brown people. 

 

1 hour ago, the great acescape said:

I definitely think household income is a more reliable indicator of wealth or disparities for school districts than individual or even family income. But again, that depends on what you are trying to measure.

Eventually I am measuring the technology use of teachers. I think this data will come into play when I describe what specific tools the teachers use. So if school A has different technology than school B, I could say it may be because of the income disparity between the districts. 

 

19 minutes ago, AspieAlly613 said:

The one thing that jumps out at me as funny-looking is that you're taking an average of median household incomes.  Or are you comparing the district medians to the median of the the entire area?  Or are you comparing the median district incomes to the average income for the area?

 

Wait... you are right! They are both medians. One is  the median for the specific area, and one is the median for the whole county. Durp. >.< 

 

21 minutes ago, AspieAlly613 said:

This means that yes, a percent difference is appropriate in this context.

I knew I remembered something from statistics. :P Thanks!

 

16 minutes ago, AspieAlly613 said:

Maybe rather than classifying them into discrete urban/rural/suburban categories, would it work if you used population per square mile of landmass?

 

I could... the census website has data from 2010 (the last time they did a census). I  guess that would also help to provide context... 

Link to post
Share on other sites

My advice:  run the analysis using both the density and income figures.  If they essentially show the same thing, only include one set of analysis.  If they show different things, that would be really interesting and worth specifically highlighting.

Link to post
Share on other sites

The maths seem to be the smallest of your problems here. For me the real question is - how does the median household income help you with your topic?

 

To illustrate what I mean, the London borough of Tower Hamlets is on average the richest borough in all of the UK. It has also the highest child poverty in the UK. You see where the problem is? In this sense the median is certainly better than an arithmetic average, but the household income will still not reflect what's actually going on.

Link to post
Share on other sites
scarletlatitude
8 hours ago, AspieAlly613 said:

My advice:  run the analysis using both the density and income figures.  If they essentially show the same thing, only include one set of analysis.  If they show different things, that would be really interesting and worth specifically highlighting.

Interesting idea. It so happens that the area with the highest density has the lowest income. Hrmmmmm.....

 

6 hours ago, timewarp said:

To illustrate what I mean, the London borough of Tower Hamlets is on average the richest borough in all of the UK. It has also the highest child poverty in the UK. You see where the problem is? In this sense the median is certainly better than an arithmetic average, but the household income will still not reflect what's actually going on.

I'm going with what my committee wants right now. They want more info on the communities... I think in the end though, I'll probably delete it unless it shows some really deep connection to what I'm studying. I dunno... maybe they want me to prove that I know something about the people I am studying? *shrug*

Link to post
Share on other sites

Statistics is a tool, but you have to decide what it is you are trying to compare.    You can look at the average median, or the population weighted average median, or the actual average. 

 

Maybe take a step back.  What is the theory that you are trying to disprove.  (eg, *science* it).   I know it sounds pedantic, but especially with social sciences it helps to be very clear what you are trying to measure.    This is very separate from politics where the goal is to find a set of statistics that supports your point (which is almost always possible for almost any point)

 

 

Link to post
Share on other sites
  • 4 weeks later...
scarletlatitude

*digs thread up from the bottom of the ocean* @AspieAlly613 @the great acescape @uhtred  and everyone else... 

 

So which of you all knows something about ICC, ANOVA, or LMM? Do my paragraphs below make some kind of sense?

 

Quote

Quantitative data will be derived from the responses provided from the TPACK-Deep survey (Yurdakul et al, 2012).  See Appendix A for the survey.  For the purpose of this study, I will be analyzing only the questions that represent TPK (questions 11-24) and TCK (questions 1-10).  Additional survey data will be reserved for a potential follow-up study.  As it is very difficult to separate TPK and TCK in classrooms (Hofer & Harris, 2012; Yurdakul et al, 2012), both will be considered for this study.  It will be possible for a teacher to count as two data points.  For example, a 10th grade biology teacher can count within both the “10th grade teachers” group and the “biology teachers group”.

 

I will use SPSS to determine if there are statistically significant differences between the groups.  The groups will be defined as 1) years of experience (0-5 years, 6-10 years, 10-20 years, 20+ years), 2) the different science certifications (i.e. biology, chemistry, physics, etc.), and 3) the different grade levels taught (7-12).    For this analysis, the null hypothesis will be that there is no difference between groups.  

 

First, I will run an intraclass correlation (ICC) to ascertain how strongly the individuals in the groups resemble each other.  Specifically, in SPSS, ICC would be a “two-way random” calculation because I will have sample data (Grace-Martin, n.d.).  ICC estimates the variance between clusters (groups).  If the ICC is calculated to be 0-0.5, this means the variance does not come from grouping only, and I would be able to proceed with ANOVA.

 

If ANOVA will be used, a one-way analysis of variance (ANOVA) will be used to determine whether the different groups of teachers have different TPK or TCK levels.  Multiple ANOVA will be used to compare years of experience, subject area, and grade levels with a teacher’s self-reported TPK and TCK. 

 

If ANOVA will not be used, I will use a linear mixed model, which would be appropriate for data taken from teachers connected by schools, subjects, and grade levels (Fox, 2002).  Linear mixed models consider variation that is explained by the independent variables (fixed effects) and variation that are not explained by the independent variable (random effects) (Winter, 2003).  For this study, the fixed effects would be grade levels, subjects taught, and schools. SPSS will be used to create scatterplots with a regression lines to show any relationships between the groups (Winter, 2003).

 

Link to post
Share on other sites
6 hours ago, scarletlatitude said:

*digs thread up from the bottom of the ocean* @AspieAlly613 @the great acescape @uhtred  and everyone else... 

 

So which of you all knows something about ICC, ANOVA, or LMM? Do my paragraphs below make some kind of sense?

 

 

I'm sorry, can't help.  I think each field has its own language for statistics and I'm afraid I'd be reading wikipedia on this

Link to post
Share on other sites
AspieAlly613

It *looks* okay but I'm unfamiliar with those specific procedures.

 

I'll give my standard grumpy curmudgeon and say this:

 

If it were up to me, I'd say to make sure your statistical tests can distinguish between EACH of the following three possibilities:

 

  • The null hypothesis is true
  • The null hypothesis is false
  • The data are inconclusive

However, the scientific community does not agree with me on this, and accepts tests that only distinguish between two of the three possibilities.

Link to post
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...