ASA Connect

View Only

Back to eGroups

Expand all | Collapse all

Hypothesis testing

1. Hypothesis testing

0 Recommend
Tanaya Kavathekar
Posted 05-22-2020 13:43
Hello,

I have a database of about 30,000 women who have tracked their weight throughout pregnancy. I have other information such as age and bmi group (BMI before pregnancy). The ages are group into 5 years bracket. So in total there 5 age groups and 4 bmi groups. The number of observations in the groups is uneven.

I want to test if there are statistical differences between the weight difference in the age/BMI group. My null hypothesis is the mean of the weight difference is the same across 5 age groups. I am currently using ANOVA in R. This test shows me a message "Estimated effects may be unbalanced"

Am I using the correct test? Is there any other test that works best with the unbalanced data?

Thank you,
Tanaya

------------------------------
Tanaya Kavathekar
George Washington University
------------------------------
2. RE: Hypothesis testing

0 Recommend
Chauncey Dayton
Posted 05-25-2020 06:55
Which SS type did you select - II or III? Type II provides adjusted main effects similar to running SS(A |B) and SS(B|A) in regression. With unequal n's in the cells of a two-way ANOVA, there is no unique analysis for the main effects (they are confounded). Check on-line for info on Type I to III SS in ANOVA.

------------------------------
Chauncey Dayton
------------------------------

Original Message
3. RE: Hypothesis testing

0 Recommend
Blaise Egan
Posted 05-26-2020 08:23
Tanaya

This post on Stackoverflow may be relevant to you.

https://stats.stackexchange.com/questions/76640/estimated-effects-may-be-unbalanced-message-when-running-aov-in-r-what-does-i

Blaise

------------------------------
Blaise Egan
Lead Data Scientist
British Telecommunications PLC
------------------------------

Original Message
4. RE: Hypothesis testing

0 Recommend
David Turner
Posted 05-26-2020 12:05
| view attached
The adjustment implied by "Type II provides adjusted main effects similar to running SS(A |B) and SS(B|A)..." is often misinterpreted. What is really tested by both SS(A|B) and AA(B|A) are the marginal means from a table of AB cell means ADJUSTED to have no AB interaction!

For details see Turner, David L. (1990) 'An easy way to tell what you are testing in analysis of variance', Communications in Statistics - Theory and Methods, 19:12, 4807 - 4832
URL: http://dx.doi.org/10.1080/03610929008830475

------------------------------
David L. Turner
USDA Forest Service Research Statistician, Retired
------------------------------

Original Message
5. RE: Hypothesis testing

0 Recommend
Robert O'Brien
Posted 05-25-2020 08:08
Hello Tanaya,

I would suggest using lm() (i.e., regression) instead.

Robert

------------------------------
Robert O'Brien
------------------------------

Original Message
6. RE: Hypothesis testing

0 Recommend
Mark Bailey
Posted 05-25-2020 08:29
Balance is not required for estimation or testing of these effects. Balance achieves the minimum standard errors, but it might be impractical or impossible to achieve.

Your data suggests that a two-way ANOVA with an interaction term is possible. This analysis is easily performed with R.

------------------------------
Mark Bailey
Principal Analytical Training Consultant
SAS Institute, Inc.
------------------------------

Original Message
7. RE: Hypothesis testing

0 Recommend
Rene Valverde-Ventura
Posted 05-25-2020 10:32
Hello Tanaya,

I rather have questions than answers:
1. Why would you do a hypothesis testing?
a. I doubt that the sample is random.
b. It seems to me that it was not a planned situation.
c. What would be the target population for the intended inference?

2. I would rather calculate some descriptive statistics.
a. With 30 averages(I suppose that you are taking the average for each woman; otherwise your degrees of freedom will be inflated) per woman and 20 groups (5x4) there are, on the average, 1500 women. With those data you can have an idea about the corresponding population distribution.
b. Most probably there are other grouping factors, ethnicity, etc. So that you will have still some mixture in your 20 groups.

Regards,

Rene Valverde-Ventura

------------------------------
Rene Valverde-Ventura
------------------------------

Original Message
8. RE: Hypothesis testing

0 Recommend
Bruce Blaine
Posted 05-25-2020 10:47
Tanaya,
Yes on the SS calculation recommendations offered by Chauncey. But with your sample size any tiny difference will be significant so I doubt "significant" differences will tell you anything beyond that you have N=30,000. Maybe consider mean (or median, depending on how weight is distributed) values by condition with bootstrapped CIs.
Also, BMI is calculated from weight so will be a very good predictor of weight 9 months later; not sure what the BMI effect in your ANOVA model would mean.
Bruce

------------------------------
Evan Blaine, PhD, PStat
Statistics Program Director
St. John Fisher College
Rochester NY
------------------------------

Original Message
9. RE: Hypothesis testing

0 Recommend
Ajit Thakur
Posted 05-25-2020 12:39
It is not clear from your e-mail what type of ANOVA you are doing in R. Hopefully it is not a single classification ANOVA, but a factorial classification ANOVA and you are testing not the simple main effects but also the interaction (Age Group x BMI index). The imbalance warning you are getting is more towards this interaction term in your model. It is simply pointing out that the imbalance in your data may produce F-statistics that may not be exactly F-distributed. With your large sample sizes, I would not be too worried about it. In real world data you may never have sets that are completely balanced. Also, in well known packages such as BMDP, SAS, SPSS, and many others (I am sure both R+ and S+) have procedures that have pseudo F-statistics that may be better behaved than the classical F-statistics. This may be specially true if you also have repeated measures in your model (do you?). You have two sources of imbalance. One is in the Age factor and the other is in the BMI index. These two factors between themselves are imbalanced (5 and 4). Is it possible to have equal number of levels for both of them by regrouping your data? If you can, that will minimize your source of imbalance. Unequal sample sizes within each factor is going to be a fact of real life but given such large sample sizes, the impact of the imbalance will be minimal. There may be other statisticians with other solutions. In any case, please specify your model statement and what kind of post hoc hypothesis you are testing for the community to make other suggestions. One further point- if are collaborating with clinicians in your research, I would use the simplest yet a defendable model so that there will be no ambiguity among your collaborators. Hope it helps.

Ajit K. Thakur, Ph.D.
Retired Statistician

------------------------------
Ajit K. Thakur, Ph.D.
Retired Statistician
------------------------------

Original Message

ASA Connect

Hypothesis testing

Tanaya Kavathekar05-22-2020 13:43

Chauncey Dayton05-25-2020 06:55

Blaise Egan05-26-2020 08:23

David Turner05-26-2020 12:05

Robert O'Brien05-25-2020 08:08

Mark Bailey05-25-2020 08:29

Rene Valverde-Ventura05-25-2020 10:32

Bruce Blaine05-25-2020 10:47

Ajit Thakur05-25-2020 12:39

1. Hypothesis testing

2. RE: Hypothesis testing

3. RE: Hypothesis testing

4. RE: Hypothesis testing

5. RE: Hypothesis testing

6. RE: Hypothesis testing

7. RE: Hypothesis testing

8. RE: Hypothesis testing

9. RE: Hypothesis testing

Contact Us

Membership

Privacy

Follow Us

ASA Connect

Hypothesis testing

Tanaya Kavathekar05-22-2020 13:43

Chauncey Dayton05-25-2020 06:55

Blaise Egan05-26-2020 08:23

David Turner05-26-2020 12:05

Robert O'Brien05-25-2020 08:08

Mark Bailey05-25-2020 08:29

Rene Valverde-Ventura05-25-2020 10:32

Bruce Blaine05-25-2020 10:47

Ajit Thakur05-25-2020 12:39

1. Hypothesis testing

2. RE: Hypothesis testing

3. RE: Hypothesis testing

4. RE: Hypothesis testing

5. RE: Hypothesis testing

6. RE: Hypothesis testing

7. RE: Hypothesis testing

8. RE: Hypothesis testing

9. RE: Hypothesis testing

Related Content

RE: Hypothesis testing

Hypothesis Testing for Entire "Population"

Hypothesis testing: Neyman/Pearson-Fisher

a different look at hypothesis tests

Sequential Hypothesis Testing with Applications to Clinical Trials

Contact Us

Membership

Privacy

Follow Us