ASA Connect

View Only

Back to eGroups

Expand all | Collapse all

Central Limit Thm - using the exponential function

1. Central Limit Thm - using the exponential function

0 Recommend
Kelly Fitzpatrick
Posted 04-03-2018 17:10
Hi,

I needed a little help with understanding something. I was trying to apply the Central Limit Thm to different distributions.
So I first used numbers 1 to 100 as my original data set then took samples of size 5, 10,20 ,30 for 100 trials and showed the students that the means from these 100 samples are approx normally distributed as n increases. Then I went on to use the exponential function and my data set was {e^1,e^2,e^3.e^4......e^100} and then took samples of size 5,10,20,30....then took 100 trials. The means of these samples never approached a normal distribution.....

The Thm states for ANY distribution regardless of the underlying shape the distribution of the sample means will be approx normal when n>30.....this did not work for e^x. Any insight or thoughts would be appreciated.

Kelly

------------------------------
Kelly Fitzpatrick
Assistant Professor
County College of Morris
------------------------------
2. RE: Central Limit Thm - using the exponential function

0 Recommend
Raoul Burchette
Posted 04-04-2018 00:44
The central limit theorem says nothing about 30.

Raoul Burchette

Biostatistician III

Research and Evaluation, SCPMG

100 S. Los Robles Avenue

Pasadena, CA 91101

message phone: 626-564-3471 (8.338.3471)

email: Raoul.J.Burchette@kp.org

NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you.

Original Message
3. RE: Central Limit Thm - using the exponential function

1 Recommend
Walter Hill
Posted 04-06-2018 02:31
The theorem states the variance will be sigma squared over N. Some of your terms are quite large as is the variance.
You might be able to compute the exact sampling distribution for a similar but simpler distribution to get an intuition about the process.

Walter Hill
Political Science
St. Mary's College

Original Message
4. RE: Central Limit Thm - using the exponential function

0 Recommend
Jonathan Siegel
Posted 04-07-2018 23:06
The distributions one encounters in many branches of science, especially biological and social sciences, can be highly skewed and irregular. The sample sizes needed to achieve convergence to an approximately normal distribution can be very, very large.

The central limit theorem only guarantees that convergence to a normal distribution will occur eventually.

This is one of the reasons it’s critical to evaluate the distribution of the data one is using and check whether assumptions such as normality reasonably apply to it for the sample sizes one is using. They often don’t.

Jonathan Siegel
Associate Director Clinical Statistics.

Sent from my iPhone

Original Message
5. RE: Central Limit Thm - using the exponential function

0 Recommend
Chad Cross
Posted 04-04-2018 01:07
If I am understanding your data set correctly, you have a massively skewed distribution with enormous SD, so the number of simulations and the sample size you use are going to need to be much larger than you’ve used in your example in order for your sampling distribution to approach normality under the CLT. The usual suggestion of n around 30 works remarkably well for many moderate distributions, but can fail miserably with highly skewed distributions.

--
Chad L. Cross, PhD, PStat(R)
Biostatistician

Original Message
6. RE: Central Limit Thm - using the exponential function

0 Recommend
David Bristol
Posted 04-04-2018 01:21
Kelly,
There is no statement about convergence for "n>30" in the CLT.
My guess is that the magic '30' is for closeness of the t-distribution to normal for n>30.
Loosely speaking, the rate of convergence for the CLT is related to the symmetry of the underlying distribution, so the exponential data requires a larger sample size.
David

--
David R. Bristol, PhD
President, Statistical Consulting Services, Inc.
1-336-293-7771
www.Statistical-Consulting-Services.com

Original Message
7. RE: Central Limit Thm - using the exponential function

3 Recommend
Raul Avelar
Posted 04-04-2018 07:16
Edited by Raul Avelar 04-05-2018 05:46
Hi Kelly,

The problem you describe is that to observe CLT in action, you need a larger sample size in proportion to how dissimilar your sampling population and the normal distribution are.

Consider the following simple R code to reproduce your experiment:

x<-1:100
y<-exp(x)

y is the exponentiated variable you refer to (not the exponential distribution really, but the exponential transformation of your seed pool of 100 numbers).

[UPDATE: As pointed out by others, CLT is applicable to distributions with finite mean and variance, but if you are randomly selecting from a pool of 100 sequential numbers (i.e., x), that distribution (the uniform distribution), as well as the exponentiated transformation of it (i.e., y) meet that definition.]

Next, a simple code to take a sample WITH REPLACEMENT from your population of 100 numbers:

simsamp<-function(n,m){
result<-c(1,2)
for(i in 1:m) { result[i]<-mean(sample(y,size=n,replace=T))}
hist(result,main=paste('Sample Mean Distribution
n=',n,'; m=',m),xlab = 'mean')
}

Where n is the sample size in each experiment and m is the number of replications we do with the experiment in order to construct a histogram and verify convergence to normality.

Running this code for n=30, it takes around 10,000 replications to start looking at a hint of a small bump around the mean, but this is clearly a multimodal distribution with a big bunch of sample means close to the left boundary (i.e., zero):

It is clear that normality of the sample mean is not achieved for this sample size. The theorem implies that you can expect this distribution to converge to normality for larger and larger sample sizes. So, when trying n=100 (as you did with your class) we see some promise, but the nature of this distribution (bounded at zero on the left) makes that convergence slow and painful and, unfortunately, we are clearly not at normality with n=100 (though we do not see the other big mode of values close to zero anymore):

Finally, it is clear that normality is achieved for a sample size of about 1,000 (probably it was achieved at some point between n=100 and n=1,000), and this is evident in a histogram from as few replications as m=100:

I hope this helps.

Best Regards,

------------------------------
Raul E. Avelar, Ph.D., P.E., PMP
Associate Research Engineer
Operations and Design Division
Texas A&M Transportation Institute
------------------------------

Original Message
8. RE: Central Limit Thm - using the exponential function

1 Recommend
Gerald Belton
Posted 04-04-2018 08:41
Edited by Gerald Belton 04-04-2018 08:53
Two things:

First, the Central Limit Theorem doesn't say that it works for any distribution... it works for distributions that have a mean and a finite variance. For example, it fails for the Cauchy distribution, which has no defined mean and variance.

Making a minor tweak to your example by making the exponents negative will let your demonstration work for your class.

I like to demonstrate the CLT using real data. I downloaded weather data for Raleigh from the National Weather Service website, and looked at rainfall for each day. A histogram shows that this approximately follows an exponential distribution, with a lot of days having less than 0.1" of rain, a much smaller number of days having 0.1 - 0.2", etc. Then sampling from this data, as the sample size increases, the sampling distributions look more like normal distributions.

Gerald Belton

Original Message
9. RE: Central Limit Thm - using the exponential function

1 Recommend
Shonda Kuiper
Posted 04-04-2018 10:15
I think Tim Hesterberg wrote a very nice discussion on this topic:
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/34906.pdf

------------------------------
Shonda Kuiper
Grinnell College
------------------------------

Original Message
10. RE: Central Limit Thm - using the exponential function

1 Recommend
Jay Beder
Posted 04-04-2018 10:27
I was hoping someone would point out that you have not sampled from an exponential distribution. I'm assuming that your original numbers (1, ..., 100) represent a sample from a uniform distribution on the interval [1,100]. But if X has this uniform distribution, e^X does not have an exponential distribution.

Of course, your original data was not random, but the same issue would arise if it had been.

------------------------------
Jay Beder
Professor
University of Wisconsin-Milwaukee
------------------------------

Original Message
11. RE: Central Limit Thm - using the exponential function

1 Recommend
Barbara Graham
Posted 04-05-2018 11:40
Shonda: The link works when it is preceded by 'https://'

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/34906.pdf

------------------------------
Barbara Graham
Biostatistician
Colorado State University
------------------------------

Original Message
12. RE: Central Limit Thm - using the exponential function

1 Recommend
Frank Harrell
Posted 04-05-2018 12:18
Tim Hesterberg's article is spot on.

I cannot for the life of me understand why we still teach the CLT. It effectively assumes that the standard deviation is known, and really, really assumes that the standard deviation is the appropriate measure of dispersion.

It is easy to show that with a log normal distribution the CLT gives terrible confidence coverage in both tails when N=50,000. The main problem is the the person uses the CLT is computing the standard deviation on entirely the wrong scale (i.e., without logging). When we encounter data we don't know the transformation. Bootstrapping can help. So will the use of exact confidence intervals for population quantiles.

------------------------------
Frank Harrell
Professor of Biostatistics
Vanderbilt University School of Medicine
Office of Biostatistics
FDA CDER
------------------------------

Original Message
13. RE: Central Limit Thm - using the exponential function

0 Recommend
Raoul Burchette
Posted 04-08-2018 13:46
Many years ago, when I was a young student of statistics, one of my early professors emphasized the "first rule of statistics" which was to "look at the data." That is, visually inspect the data, or subsamples of it for obvious inconsistencies and possibly blatant errors such as impossible values and miscoded or misentered data, particularly data entered in the wrong--often adjacent--column, then graphically examine the distributions by histograms and scatterplots. We were also taught to examine the processes by which the data were collected and entered and about the kinds of distributions particular sources of data measurements would tend to produce. These kinds of examination would clue one in whether the data was likely skewed or not. If one did this, he or she would recognize there is a possible problem with using symmetric confidence intervals on inherently nonsymmetric data.

I find it disappointing that I was not shown other ways to calculate confidence intervals, especially nonsymmetric ones. Personally, I sometimes split the data at the some measure of central tendency, such as the mean and calculate one-sided standard deviations on each side of the mean. I am sure someone could come up with something better, but at least it doesn't hide the asymmetry of the underlying distribution. [Of course, I also look at skewness and kurtosis in the process of looking at descriptive statistics of the underlying collected data.]

I am aware that some claim selective data correction (that is, the obvious data error that one can easily identify) biases the data, and I don't doubt that it does, but I think, practically speaking, this is mostly not as big a problem as the people I heard it from were making it out to be, though this kind of data correction could be abused. But I guess it comes down to what one is trying to learn, and whether he or she trusts the work of those in the data chain that preceded him or her.

Raoul Burchette

Biostatistician III

Research and Evaluation, SCPMG

100 S. Los Robles Avenue

Pasadena, CA 91101

message phone: 626-564-3471 (8.338.3471)

email: Raoul.J.Burchette@kp.org

NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you.

Original Message
14. RE: Central Limit Thm - using the exponential function

0 Recommend
Francisco Soler
Posted 04-04-2018 11:22
The CLT states that the sample must come from any PDF. Note that e^x is not a PDF. I assume you want e^(-x) which is a PDF with mu = 1. Still, in the case of the exponential distribution, since it is extremely right-skewed, in order to "bring in" that right tail, lots of samples will be needed. For a "better looking" resulting bell-shaped curve, a large sample size will work best.

Frank Soler

------------------------------
Francisco Soler
------------------------------

Original Message
15. RE: Central Limit Thm - using the exponential function

1 Recommend
Filiep Samyn
Posted 04-04-2018 14:47
I have not looked at the variance of your exponential distribution. However, for the CLT to be applicable the variance has to be finite. For instance, the CLT does not apply to the Cauchy distribution which has infinite variance.

------------------------------
Filiep Samyn
------------------------------

Original Message

ASA Connect

Central Limit Thm - using the exponential function

Kelly Fitzpatrick04-03-2018 17:10

Raoul Burchette04-04-2018 00:44

Walter Hill04-06-2018 02:31

Jonathan Siegel04-07-2018 23:06

Chad Cross04-04-2018 01:07

David Bristol04-04-2018 01:21

Raul Avelar04-04-2018 07:16

Gerald Belton04-04-2018 08:41

Shonda Kuiper04-04-2018 10:15

Jay Beder04-04-2018 10:27

Barbara Graham04-05-2018 11:40

Frank Harrell04-05-2018 12:18

Raoul Burchette04-08-2018 13:46

Francisco Soler04-04-2018 11:22

Filiep Samyn04-04-2018 14:47

1. Central Limit Thm - using the exponential function

2. RE: Central Limit Thm - using the exponential function

3. RE: Central Limit Thm - using the exponential function

4. RE: Central Limit Thm - using the exponential function

5. RE: Central Limit Thm - using the exponential function

6. RE: Central Limit Thm - using the exponential function

7. RE: Central Limit Thm - using the exponential function

8. RE: Central Limit Thm - using the exponential function

9. RE: Central Limit Thm - using the exponential function

10. RE: Central Limit Thm - using the exponential function

11. RE: Central Limit Thm - using the exponential function

12. RE: Central Limit Thm - using the exponential function

13. RE: Central Limit Thm - using the exponential function

14. RE: Central Limit Thm - using the exponential function

15. RE: Central Limit Thm - using the exponential function

Contact Us

Membership

Privacy

Follow Us

ASA Connect

Central Limit Thm - using the exponential function

Kelly Fitzpatrick04-03-2018 17:10

Raoul Burchette04-04-2018 00:44

Walter Hill04-06-2018 02:31

Jonathan Siegel04-07-2018 23:06

Chad Cross04-04-2018 01:07

David Bristol04-04-2018 01:21

Raul Avelar04-04-2018 07:16

Gerald Belton04-04-2018 08:41

Shonda Kuiper04-04-2018 10:15

Jay Beder04-04-2018 10:27

Barbara Graham04-05-2018 11:40

Frank Harrell04-05-2018 12:18

Raoul Burchette04-08-2018 13:46

Francisco Soler04-04-2018 11:22

Filiep Samyn04-04-2018 14:47

1. Central Limit Thm - using the exponential function

2. RE: Central Limit Thm - using the exponential function

3. RE: Central Limit Thm - using the exponential function

4. RE: Central Limit Thm - using the exponential function

5. RE: Central Limit Thm - using the exponential function

6. RE: Central Limit Thm - using the exponential function

7. RE: Central Limit Thm - using the exponential function

8. RE: Central Limit Thm - using the exponential function

9. RE: Central Limit Thm - using the exponential function

10. RE: Central Limit Thm - using the exponential function

11. RE: Central Limit Thm - using the exponential function

12. RE: Central Limit Thm - using the exponential function

13. RE: Central Limit Thm - using the exponential function

14. RE: Central Limit Thm - using the exponential function

15. RE: Central Limit Thm - using the exponential function

Related Content

Need help explaining the Normal distribution and Central Limit theorem to non-stats people

Choice of Probability Distribution Function in Simulation Project

iwada

Central Tendency Theorem

proper name of a retrospective matched cohort study?

Contact Us

Membership

Privacy

Follow Us