ASA Connect

 View Only
Expand all | Collapse all

Sample Size Question

  • 1.  Sample Size Question

    Posted 10-07-2016 15:30

    I apologize if this is a straight forward question, but it doesn't seem to fit into any formula that I am aware of.

     

    I have a situation where I have a population of x observations.  I am not investigating a quantitative feature but rather something qualitative. Since there is no specific quantitative parameter or metric that I am investigating, it seems to me that effect size and standard power formulas aren't applicable.

    In my particular case, upon examining these observations, we found that their time frame was much longer not only of what was expected but also what logically makes sense. We thus want to go back and dig into them to answer the question of why were they so long. It's not practical to review and research all of them, so I plan on taking a random sample.  I don't have a good hypothesis as to what the possible explanation may be and it's quite possible that there are multiple explanations. 

    My question is how large do I need my random sample to be to ensure that my results are representative of my population?

    In this particular case, I have 112 observations, but I'd appreciate feedback in general of how to determine the appropriate size of the random sample in such a case to be confident that it is representative of the population.

    Thank you,

     

    Jonathan Burns

     



  • 2.  RE: Sample Size Question

    Posted 10-10-2016 01:58

    Jonathan -

    I think the principles should be the same, or actually analogous: you need to reduce bias as much as possible, and then have a large enough sample size such that you cover underlying variability. 

    Perhaps you could stratify (categorize) your 112 observations - hopefully being "representative" of your population themselves - and examine enough, randomly, from each group that you no longer find "surprises."  This would have to be an iterative approach.  Without a corresponding estimate (or guesstimate) of a standard deviation for each stratum, I can not think of any way to do better.

    Best wishes.  

    ------------------------------
    James Knaub
    Lead Mathematical Statistician
    Retired



  • 3.  RE: Sample Size Question

    Posted 10-10-2016 08:47

    Sounds like you are dealing with qualitative research. Take a look at the set of papers in this issue (first one highlighted below) http://www.tandfonline.com/toc/tsrm20/18/6

    Also papers in screenshot below.

    ______________________

    Cruz Velasco

    Biostatistics Program, Department of Pediatrics

    UAMS College of Medicine & Arkansas Children Research Institute

     


    Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.





  • 4.  RE: Sample Size Question

    Posted 10-10-2016 10:58

    There is no such thing as a straightforward sample size question.  However, I am having difficulty deciphering what you have as "data" and what you'd like to hypothesize and test.  Is it possible for you to give more details without divulging too many details?

     

    Susan E. Spruill

    Susan E. Spruill, PStat®

    Statistical Consultant, President

    Applied Statistics and Consulting

    828-467-9184 (phone)

    Professional Statistician accredited by the American Statistical Association

    www.appstatsconsulting.com

     






  • 5.  RE: Sample Size Question

    Posted 10-10-2016 13:25

    Although you say you are not investigating a quantitative feature, you indicate that the observations' time frames are much longer than either expected or reasonable. That already introduces a quantitative element. Do the 112 time frames vary in how long they are? And was the shortest of the 112 time frames still much longer than either expected or reasonable?

    ------------------------------
    Eric Siegel, MS
    Research Associate
    Department of Biostatistics
    Univ. Arkansas Medical Sciences



  • 6.  RE: Sample Size Question

    Posted 10-10-2016 15:08
    Any sample that is randomly selected is representative of the
    population. What you want to do, I am guessing, is show that some
    qualitative feature is always present or never present, or present in at
    least one case. In these settings, it helps to remember the rule of three.

    https://en.wikipedia.org/wiki/Rule_of_three_%28statistics%29

    If my comments are off base, then perhaps you can specify in more detail
    what this qualitative feature is that you are investigating.

    Steve Simon, blog.pmean.com




  • 7.  RE: Sample Size Question

    Posted 10-11-2016 16:21

    Stephen D. Simon said, "Any sample that is randomly selected is
    representative of the population."

    I disagree. For example, if you randomly select ten students in a
    particular university, you could conceivably end up with all of them
    basketball players, with heights not at all representative of the student
    body.

    The reason random sampling is important is that the mathematical theorems
    that justify most statistical techniques require random sampling.

    ------------------------------
    Martha Smith
    University of Texas



  • 8.  RE: Sample Size Question

    Posted 10-10-2016 15:11

    If I understand you correctly, you have a population of 112 responses that look suspicious. You'd like to go back to the 112 people who gave you them and ask them about numbers they put there. This would yield an explanation in each case. You don't know what the explanations are likely to be and you don't know how many different explanations there may be.

    You don't have the resources to go to all 112 and check them out so you want to take a sample but you don't know how to choose a sample size. Is that it?

    I think you need a pilot sample. I would choose 10 at random and check them out first to get an idea of the work involved, see whether you get 1 explanation or 10 different ones etc.

    Blaise F Egan

    ------------------------------
    Blaise Egan
    Lead Data Scientist
    British Telecommunications PLC



  • 9.  RE: Sample Size Question

    Posted 10-11-2016 05:11

    If the time frame is the characteristic of interest and if it is measured in a continuous scale, (i.e. mInutes) why can't you apply the standard formulae? Granted, it does not contain information about the underlying causes, but isn't that the typical statistcal problem: unknown population characteristc, known sample, estimate and generalize to the population? If you estimate the time frame to a predetermined degree of accuracy, that alone will make your sample representative enough to study the undelying causes.

    ------------------------------
    Yiannis Bassiakos
    Associate Professor
    University Of Athens, Economics Department



  • 10.  RE: Sample Size Question

    Posted 10-11-2016 05:43

    Jonathan,

    It sounds to me like you are not testing a hypothesis but trying to derive one, in which case I think the question of sample size is moot. 

    As you are not testing a hypothesis you do not need to specify a priori how many samples you will look at to derive one. You can simply look at more and more until some apparent explanation becomes clear.

    However you may wish to leave a sufficient number of samples not looked at as you derive the hypothesis in order to use them to later test it. It is of course impossible to say ahead of time what a "sufficient number" will be.

    Almost certainly the greater the number you need to look at to derive the hypothesis, the more you'll need to test it.

    Tom Parke

    Berry Consultants

    ------------------------------
    Ian Parke



  • 11.  RE: Sample Size Question

    Posted 11-07-2016 12:54

    It sounds like you have 112 observations of a variable called "time".  I'd start by just looking at the distribution of those 112. 

    ------------------------------
    Emil M Friedman, PhD
    emilfriedman@gmail.com
    http://www.statisticalconsulting.org