Hi Kelly,
The problem you describe is that to observe CLT in action, you need a larger sample size in proportion to how dissimilar your sampling population and the normal distribution are.
Consider the following simple R code to reproduce your experiment:
x<-1:100
y<-exp(x)
y is the exponentiated variable you refer to (not the exponential distribution really, but the exponential transformation of your seed pool of 100 numbers).
[UPDATE: As pointed out by others, CLT is applicable to distributions with finite mean and variance, but if you are randomly selecting from a pool of 100 sequential numbers (i.e., x), that distribution (the uniform distribution), as well as the exponentiated transformation of it (i.e., y) meet that definition.]
Next, a simple code to take a sample WITH REPLACEMENT from your population of 100 numbers:
simsamp<-function(n,m){
result<-c(1,2)
for(i in 1:m) { result[i]<-mean(sample(y,size=n,replace=T))}
hist(result,main=paste('Sample Mean Distribution
n=',n,'; m=',m),xlab = 'mean')
}
Where n is the sample size in each experiment and m is the number of replications we do with the experiment in order to construct a histogram and verify convergence to normality.
Running this code for n=30, it takes around 10,000 replications to start looking at a hint of a small bump around the mean, but this is clearly a multimodal distribution with a big bunch of sample means close to the left boundary (i.e., zero):
It is clear that normality of the sample mean is not achieved for this sample size. The theorem implies that you can expect this distribution to converge to normality for larger and larger sample sizes. So, when trying n=100 (as you did with your class) we see some promise, but the nature of this distribution (bounded at zero on the left) makes that convergence slow and painful and, unfortunately, we are clearly not at normality with n=100 (though we do not see the other big mode of values close to zero anymore):
Finally, it is clear that normality is achieved for a sample size of about 1,000 (probably it was achieved at some point between n=100 and n=1,000), and this is evident in a histogram from as few replications as m=100:
I hope this helps.
Best Regards,
------------------------------
Raul E. Avelar, Ph.D., P.E., PMP
Associate Research Engineer
Operations and Design Division
Texas A&M Transportation Institute
------------------------------
Original Message:
Sent: 04-04-2018 01:20
From: David Bristol
Subject: Central Limit Thm - using the exponential function
Kelly,
There is no statement about convergence for "n>30" in the CLT.
My guess is that the magic '30' is for closeness of the t-distribution to normal for n>30.
Loosely speaking, the rate of convergence for the CLT is related to the symmetry of the underlying distribution, so the exponential data requires a larger sample size.
David
--
David R. Bristol, PhD
President, Statistical Consulting Services, Inc.
1-336-293-7771
Original Message------
Hi,
I needed a little help with understanding something. I was trying to apply the Central Limit Thm to different distributions.
So I first used numbers 1 to 100 as my original data set then took samples of size 5, 10,20 ,30 for 100 trials and showed the students that the means from these 100 samples are approx normally distributed as n increases. Then I went on to use the exponential function and my data set was {e^1,e^2,e^3.e^4......e^100} and then took samples of size 5,10,20,30....then took 100 trials. The means of these samples never approached a normal distribution.....
The Thm states for ANY distribution regardless of the underlying shape the distribution of the sample means will be approx normal when n>30.....this did not work for e^x. Any insight or thoughts would be appreciated.
Kelly
------------------------------
Kelly Fitzpatrick
Assistant Professor
County College of Morris
------------------------------