Discussion: View Thread

  • 1.  Developing and Validating a Scoring Tool

    Posted 09-26-2012 14:21

    Hello:

    Can anyone provide me with guidance on how to validate a scoring tool?

    We are trying to develop a scoring tool  (rather improve an exisiting toool) to capture the severity of a clinical conidtion called incontinence-associated dermatitis (IAD).

    The objective of the tool would be to track clinically relevant changes in the condition and help categorize IAD into either a mild, moderate, severe, or very severe cases.   We ultimately want a tool for use in future comparative studies.

    I have read about face/content validity; criterion/construct-related validity; inter-rater reliability or reproducibility and test-retest reliability...all of which sound relevant. 

    The questions are:
    How do I go about proving my tool is valid in the eyes of the clinical users and the FDA reviewer?
    What statistics or tests do I use to show validity?
    What is the criteria for validity?
    How do I determine the sample size for the study or studies that I need to prove validity/reliability.

    Thanks much!


    -------------------------------------------
    Shelley-Ann Walters
    3M
    -------------------------------------------


  • 2.  RE:Developing and Validating a Scoring Tool

    Posted 09-26-2012 15:35
    Hello reliability measures how well groups of raters agree on rating the instrument given a series (N) of subjects.  Often for continuous data the intraclass correlation measure via an ANOVA is used or kappa for multichotomous data.   Thus for reliability you would have to conduct a study using test subjects and raters trained in the use of the instrument.  Following Fleiss and others look for an ICC or kappa GE 0.7.

    Validity is an assessment of is the instrument measuring what it purports to measure.   This is often arrived at by comparing the "conclusions" of your instrument with standards in the field.

    John

    -------------------------------------------
    John Bartko
    Consulting Biostatistician
    -------------------------------------------








  • 3.  RE:Developing and Validating a Scoring Tool

    Posted 09-27-2012 15:02
    Thanks John for your relevant and succinct feedback.  I think ICC would be the most appropriate statistic to use.  I looked at wikipedia and that directed me to R's icc function.  So I think the relevant option or options for the icc function would be for me to compute the one-way random single  measure of agreement or a two-way random single measure for both agreement and consistency.  This all is a bit over-whelming, but the criteria of an ICC >= 0.7 as a good agreement/consistency is nice to know. 
    Thanks again!


    -------------------------------------------
    Shelley-Ann Walters
    3M
    -------------------------------------------








  • 4.  RE:Developing and Validating a Scoring Tool

    Posted 09-28-2012 16:01
    If you are going in that direction also check-out... The first addresses various options in software packages and the second is how I would cite your ICC work unless you are utilizing a specific published variation.

    David

    • Richard N. MacLennan (November 1993). "Interrater Reliability with SPSS for Windows 5.0". The American Statistician (American Statistical Association) 47 (4): 292-296.]
    • P. E. Shrout & Joseph L. Fleiss (1979). "Intraclass Correlations: Uses in Assessing Rater Reliability," Psychological Bulletin 86(2), 420-428

    -------------------------------------------
    David Reasner
    Albemarle Scientific Consulting LLC
    -------------------------------------------








  • 5.  RE:Developing and Validating a Scoring Tool

    Posted 09-28-2012 16:40
    Thanks David, those are helpful references. The second one, Shrout and Fleiss, is the definitive publication on defining the various types of ICC. Perhaps those definitions can be found in textbooks, but some time ago when a reviewer of an article asked us what "flavor" of ICC we were using (there are several), we hunted and hunted in the article literature (not texts) and finally found that reference.

    Best wishes,

    Nayak



    -------------------------------------------
    Nayak Polissar
    Principal Statistician
    The Mountain-Whisper-Light Statistics
    -------------------------------------------








  • 6.  RE:Developing and Validating a Scoring Tool

    Posted 10-02-2012 11:28
    Thank you everyone for your fedback.  It very much appreciated!
    -------------------------------------------
    Shelley-Ann Walters
    3M
    -------------------------------------------








  • 7.  RE:Developing and Validating a Scoring Tool

    Posted 09-26-2012 15:44

    Dear Shelly-Ann,

    It would be worth looking at the FDA's PRO guidance (Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims).  While the FDA emphasizes content validity (i.e., the patient perspective) rather than evidence based on relations to other variables, the release of the guidance triggered both an academic and industry response that is still ongoing.  In addition to several dedicated conferences, ISPOR and other organizations have responded to the guidance with white papers, conference tracks, etc.  Psychometrics is separate from biometrics and you will find a long-standing literature on the topics that you mention.  It may be worth working with a consultancy (e.g., RTI) where researchers deal routinely with these topics and FDA expectations.  I've also found Ron Hays' work very helpful and will paste his link below.

    Best regards,
    David

    http://www.chime.ucla.edu/directory/Hays.htm

    -------------------------------------------
    David Reasner
    Albemarle Scientific Consulting LLC
    www.AlbemarleScientific.com
    -------------------------------------------








  • 8.  RE:Developing and Validating a Scoring Tool

    Posted 09-27-2012 10:31
    Hello Shelly-Ann,

    I cannot agree more with David, for his nice comments on FDA's PRO guidance and Dr. Ron Hay's work.

    For scale development, based on my experiences at work I found this small booklet by Dr. Robert F.DeVellis is very useful:

    DeVellis RF. Scale development: theory and applications, 3rd edn. Thousand Oaks, CA: Sage Publications 2012.
     
    For scale validating, historically, there had been many types of validity, and here is a good summary by Dr. Bruno D. Zumbo: 

    Zumbo BD. Validity: foundational issues and statistical methodology. In Rao CR and Sinharay S. (Ed.). Psychometrics, Handbook of Statistics [26]. Amsterdam, The Netherlands: Elsevier 2007; 45-79. 

    And here is another article on this topic:

    Cook DA, Beckman TJ.  Current Concepts in Validity and Reliability for Psychometric Instruments: Theory and Application. 
    Am J Med. 2006 Feb;119(2):166.e7-16. Review.

    As to your sample size question, I think there is no easy/consistent answer. Based on different authors (e.g., DeVellis RF, Embertson SE & Reise SP (Item Response Theory for Psychologists, New York, NY. Psychology Press 2000), etc), few hundreds should suffice.

    But I think that the most challenging thing in your plan is, you want to develop a 'tool that can track clinically relevant changes' and 'for use in future comparative studies'. As you may know, it is very difficult to define a 'minimum clinically important difference (MCID), and, in longitudinal studies that use scales/instruments ('tools' in your word), there is another big concern: you need to show 'longitudinal measurement invariance' (many articles on this topic, such as, Brown TA. Confirmatory Factor Analysis for Applied Research. New York. NY. The Guilford Press 2006; 252-266).



    Sincerely yours,

    Chengwu Yang (杨成武)
    ______________________
    Chengwu Yang, MD, MS, PhD
    Assistant Professor of Biostatistics
    Department of Public Health Sciences
    College of Medicine, The Pennsylvania State University
    A210, ASB 3400H, 600 Centerview Drive, Hershey, PA 17033
    Email: yangc@psu.edu; Phone: 717-531-3016; Fax: 717-531-0146
    http://profiles.psu.edu/profiles/ProfileDetails.aspx?From=SE&Person=244
    -------------------------------------------





  • 9.  RE:Developing and Validating a Scoring Tool

    Posted 09-27-2012 15:03
    Thank you; thank you especially for taking the time to write down your citations.   

    I am hoping to find out the minimum work I need to do to get a valid tool developed and validated.  The bare minimum : )

    In terms of study design and analysis, that will depend no doubt on the criteria for proving validity.  I am leaning towards using ICC for inter and intra-rater agreement.  I am limited in the sample size of total raters: up to 20 is feasible to recruit.  However, another aspect fo sample size would be the number of cases to present to the raters.  Since I am hoping the tool to distinguish 4 classes of IAD condition..perhaps 3 cases per condition, for a totoal fo 12 cases.  

    So can I comfortably utilize 20 raters who will score 12 different patients' IAD condition (running the gamult of severity) as sufficient for my needs? 

    The question of MCID (the minimum clinically important difference) is one I would have to put some thought in...but I guess the tool's ability to distinguish between or among the 4 severity classes: mild, moderate, severe and very severe is related to that question.  MCID could then be determined from the validation study?

    Any further thoughts? 

    Thanks much!


    -------------------------------------------
    Shelley-Ann Walters
    3M
    -------------------------------------------








  • 10.  RE:Developing and Validating a Scoring Tool

    Posted 09-26-2012 20:52
    Please tell us more about the intended end-users.  Would this tool take the form of a questionnaire that the patient fills out himself or herself?  Or would this tool be something that the doctor or nurse fills out out while inspecting the patient?  Also, will the tool have quality-of-life-type questions and/or psychosocial questions? or will it be entirely clinical & work along the lines of something we'd find in Versions 3 or 4 of the Common Terminology Criteria for Adverse Events?

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------