ASA Connect

 View Only
Expand all | Collapse all

Central Limit Thm - using the exponential function

  • 1.  Central Limit Thm - using the exponential function

    Posted 04-03-2018 17:10
    Hi,

    I needed a little help with understanding something.  I was trying to apply the Central Limit Thm to different distributions.
    So I first used numbers 1 to 100 as my original data set then took samples of size 5, 10,20 ,30 for 100 trials and showed the students that the means from these 100 samples are approx normally distributed as n increases. Then I went on to use the exponential function and my data set was {e^1,e^2,e^3.e^4......e^100} and then took samples of size 5,10,20,30....then took 100 trials.  The means of these samples never approached a normal distribution.....

    The Thm states for ANY distribution regardless of the underlying shape the distribution of the sample means will be approx normal when n>30.....this did not work for e^x. Any insight or thoughts would be appreciated.

    Kelly

    ------------------------------
    Kelly Fitzpatrick
    Assistant Professor
    County College of Morris
    ------------------------------


  • 2.  RE: Central Limit Thm - using the exponential function

    Posted 04-04-2018 00:44

    The central limit theorem says nothing about 30. 

    Raoul Burchette

    Biostatistician III

    Research and Evaluation, SCPMG

    100 S. Los Robles Avenue

    Pasadena, CA 91101

    message phone:  626-564-3471 (8.338.3471)

    email:  Raoul.J.Burchette@kp.org


    NOTICE TO RECIPIENT:  If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents.  If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them.  Thank you.






  • 3.  RE: Central Limit Thm - using the exponential function

    Posted 04-06-2018 02:31
        The theorem states the variance will be sigma squared over N.  Some of your terms are quite large as is the variance. 
       You might be able to compute the exact sampling distribution for a similar but simpler distribution to get an intuition about the process.

    Walter Hill
    Political Science
    St. Mary's College





  • 4.  RE: Central Limit Thm - using the exponential function

    Posted 04-07-2018 23:06
    The distributions one encounters in many branches of science, especially biological and social sciences, can be highly skewed and irregular. The sample sizes needed to achieve convergence to an approximately normal distribution can be very, very large.

    The central limit theorem only guarantees that convergence to a normal distribution will occur eventually.

    This is one of the reasons it’s critical to evaluate the distribution of the data one is using and check whether assumptions such as normality reasonably apply to it for the sample sizes one is using. They often don’t.

    Jonathan Siegel
    Associate Director Clinical Statistics.

    Sent from my iPhone




  • 5.  RE: Central Limit Thm - using the exponential function

    Posted 04-04-2018 01:07
    If I am understanding your data set correctly, you have a massively skewed distribution with enormous SD, so the number of simulations and the sample size you use are going to need to be much larger than you’ve used in your example in order for your sampling distribution to approach normality under the CLT. The usual suggestion of n around 30 works remarkably well for many moderate distributions, but can fail miserably with highly skewed distributions.

    --
    Chad L. Cross, PhD, PStat(R)
    Biostatistician




  • 6.  RE: Central Limit Thm - using the exponential function

    Posted 04-04-2018 01:21
    Kelly,
    There is no statement about convergence for "n>30" in the CLT.
    My guess is that the magic '30' is for closeness of the t-distribution to normal for n>30.
    Loosely speaking, the rate of convergence for the CLT is related to the symmetry of the underlying distribution, so the exponential data requires a larger sample size.
    David

    --
    David R. Bristol, PhD
    President, Statistical Consulting Services, Inc.
    1-336-293-7771





  • 7.  RE: Central Limit Thm - using the exponential function

    Posted 04-04-2018 07:16
    Edited by Raul Avelar 04-05-2018 05:46
    Hi Kelly,

    The problem you describe is that to observe CLT in action, you need a larger sample size in proportion to how dissimilar your sampling population and the normal distribution are.

    Consider the following simple R code to reproduce your experiment:

    x<-1:100
    y<-exp(x)

    y is the exponentiated variable you refer to (not the exponential distribution really, but the exponential transformation of your seed pool of 100 numbers).

    [UPDATE: As pointed out by others, CLT is applicable to distributions with finite mean and variance, but if you are randomly selecting from a pool of 100 sequential numbers (i.e., x), that distribution (the uniform distribution), as well as the exponentiated transformation of it (i.e., y) meet that definition.]

    Next, a simple code to take a sample WITH REPLACEMENT from your population of 100 numbers:

    simsamp<-function(n,m){
    result<-c(1,2)
    for(i in 1:m) { result[i]<-mean(sample(y,size=n,replace=T))}
    hist(result,main=paste('Sample Mean Distribution
    n=',n,'; m=',m),xlab = 'mean')
    }

    Where n is the sample size in each experiment and m is the number of replications we do with the experiment in order to construct a histogram and verify convergence to normality.

    Running this code for n=30, it takes around 10,000 replications to start looking at a hint of a small bump around the mean, but this is clearly a multimodal distribution with a big bunch of sample means close to the left boundary (i.e., zero):

    simulationsIt is clear that normality of the sample mean is not achieved for this sample size. The theorem implies that you can expect this distribution to converge to normality for larger and larger sample sizes. So, when trying n=100 (as you did with your class) we see some promise, but the nature of this distribution (bounded at zero on the left) makes that convergence slow and painful and, unfortunately, we are clearly not at normality with n=100 (though we do not see the other big mode of values close to zero anymore):

    simulations
    Finally, it is clear that normality is achieved for a sample size of about 1,000 (probably it was achieved at some point between n=100 and n=1,000), and this is evident in a histogram from as few replications as m=100:
    simulations
    I hope this helps.

    Best Regards,

    ------------------------------
    Raul E. Avelar, Ph.D., P.E., PMP
    Associate Research Engineer
    Operations and Design Division
    Texas A&M Transportation Institute
    ------------------------------



  • 8.  RE: Central Limit Thm - using the exponential function

    Posted 04-04-2018 08:41
    Edited by Gerald Belton 04-04-2018 08:53
    Two things: 
    First, the Central Limit Theorem doesn't say that it works for any distribution... it works for distributions that have a mean and a finite variance. For example, it fails for the Cauchy distribution, which has no defined mean and variance.
    Making a minor tweak to your example by making the exponents negative will let your demonstration work for your class. 
    I like to demonstrate the CLT using real data. I downloaded weather data for Raleigh from the National Weather Service website, and looked at rainfall for each day. A histogram shows that this approximately follows an exponential distribution, with a lot of days having less than 0.1" of rain, a much smaller number of days having 0.1 - 0.2", etc. Then sampling from this data, as the sample size increases, the sampling distributions look more like normal distributions.
    Gerald Belton





  • 9.  RE: Central Limit Thm - using the exponential function

    Posted 04-04-2018 10:15
    I think Tim Hesterberg wrote a very nice discussion on this topic:
    https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/34906.pdf

    ------------------------------
    Shonda Kuiper
    Grinnell College
    ------------------------------



  • 10.  RE: Central Limit Thm - using the exponential function

    Posted 04-04-2018 10:27
    I was hoping someone would point out that you have not sampled from an exponential distribution.  I'm assuming that your original numbers (1, ..., 100) represent a sample from a uniform distribution on the interval [1,100].  But if X has this uniform distribution, e^X does not have an exponential distribution.

    Of course, your original data was not random, but the same issue would arise if it had been.

    ------------------------------
    Jay Beder
    Professor
    University of Wisconsin-Milwaukee
    ------------------------------



  • 11.  RE: Central Limit Thm - using the exponential function

    Posted 04-05-2018 11:40
    Shonda: The link works when it is preceded by 'https://'

    https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/34906.pdf




    ------------------------------
    Barbara Graham
    Biostatistician
    Colorado State University
    ------------------------------



  • 12.  RE: Central Limit Thm - using the exponential function

    Posted 04-05-2018 12:18
    Tim Hesterberg's article is spot on.

    I cannot for the life of me understand why we still teach the CLT.   It effectively assumes that the standard deviation is known, and really, really assumes that the standard deviation is the appropriate measure of dispersion.

    It is easy to show that with a log normal distribution the CLT gives terrible confidence coverage in both tails when N=50,000.  The main problem is the the person uses the CLT is computing the standard deviation on entirely the wrong scale (i.e., without logging).  When we encounter data we don't know the transformation.  Bootstrapping can help.  So will the use of exact confidence intervals for population quantiles.

    ------------------------------
    Frank Harrell
    Professor of Biostatistics
    Vanderbilt University School of Medicine
    Office of Biostatistics
    FDA CDER
    ------------------------------



  • 13.  RE: Central Limit Thm - using the exponential function

    Posted 04-08-2018 13:46

    Many years ago, when I was a young student of statistics, one of my early professors emphasized the "first rule of statistics" which was to "look at the data."  That is, visually inspect the data, or subsamples of it for obvious inconsistencies and possibly blatant errors such as impossible values and miscoded or misentered data, particularly data entered in the wrong--often adjacent--column, then graphically examine the distributions by histograms and scatterplots.  We were also taught to examine the processes by which the data were collected and entered and about the kinds of distributions particular sources of data measurements would tend to produce.  These kinds of examination would clue one in whether the data was likely skewed or not.  If one did this, he or she would recognize there is a possible problem with using symmetric confidence intervals on inherently nonsymmetric data. 


    I find it disappointing that I was not shown other ways to calculate confidence intervals, especially nonsymmetric ones.  Personally, I sometimes split the data at the some measure of central tendency, such as the mean and calculate one-sided standard deviations on each side of the mean.  I am sure someone could come up with something better, but at least it doesn't hide the asymmetry of the underlying distribution.  [Of course, I also look at skewness and kurtosis in the process of looking at descriptive statistics of the underlying collected data.]


    I am aware that some claim selective data correction (that is, the obvious data error that one can easily identify) biases the data, and I don't doubt that it does, but I think, practically speaking, this is mostly not as big a problem as the people I heard it from were making it out to be, though this kind of data correction could be abused.  But I guess it comes down to what one is trying to learn, and whether he or she trusts the work of those in the data chain that preceded him or her.


    Raoul Burchette

    Biostatistician III

    Research and Evaluation, SCPMG

    100 S. Los Robles Avenue

    Pasadena, CA 91101

    message phone:  626-564-3471 (8.338.3471)

    email:  Raoul.J.Burchette@kp.org


    NOTICE TO RECIPIENT:  If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents.  If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them.  Thank you.






  • 14.  RE: Central Limit Thm - using the exponential function

    Posted 04-04-2018 11:22
    The CLT states that the sample must come from any PDF. Note that e^x is not a PDF. I assume you want e^(-x) which is a PDF with mu = 1. Still, in the case of the exponential distribution, since it is extremely right-skewed, in order to "bring in" that right tail, lots of samples will be needed. For a "better looking" resulting bell-shaped curve, a large sample size will work best.

    Frank Soler

    ------------------------------
    Francisco Soler
    ------------------------------



  • 15.  RE: Central Limit Thm - using the exponential function

    Posted 04-04-2018 14:47
    I have not looked at the variance of your exponential distribution. However, for the CLT to be applicable the variance has to be finite.  For instance, the CLT does not apply to the Cauchy distribution which has infinite variance.

    ------------------------------
    Filiep Samyn
    ------------------------------