Discussion: View Thread

  • 1.  The meaning of the difference between values

    Posted 02-09-2013 19:41
    This message has been cross posted to the following eGroups: Statistical Education Section and Statistical Consulting Section .
    -------------------------------------------

    Hi everybody,


    100 examiners took a test with a simple survey. We constructed a model to predict examiners' answer with test questions and survey questionaries.  Thus, there are three different answers: their actual answers, predicted answers and true answers. 

    Could you explain the meaning of the difference of answers (error) ?

    1. we compare the predicted answers to the actual answers  

    ->  obtain the error rate of the model
    ->  Validate a model

    2. we compare the predicted answers to the expected answers (ground truth) 

    ->  if the model is more or less accurate than the examiners (?)


    3. we compare the actual answers to the expected answers (ground truth)

    ->  if the prediction error is not small ( around  .2 ) and some actual answers are different from the expected answers, then can we conclude that some examiners' mistake raise the prediction error of the model?   


    Or, how can I utilize the true answers to provide more information? 




    I would really appreciate any comments or suggestions from the group.

    Best Regards,

    Mina Yoo



  • 2.  RE:The meaning of the difference between values

    Posted 02-10-2013 10:21

    Dear Mina,

     

    Without answering your question directly (or at least starting with your last question), each item should have a relationship to the underlying construct (e.g., "knowledge of X").  You can explore the relationship using either classical test theory or item response theory as your approach.  As a quick start, you could check the item characteristic curve where you plot the cumulative incidence of a correct response to Item Y against the total score.  Roughly, the total score represents the observed ability which reflects the underlying construct (theta).  Naturally, higher total scores (and, presumably, higher underlying knowledge of X) should be associated with an increasing probability of a correct response.  Where the "expected" answers differ from the "predicted" answers (plot both or even all possible responses), there will be a disruption of the relationship and the disruption should be reflected in the plot.  For example, a mistake in the scoring key or an item that measure a different construct should look different than the other items.

     

    Best regards,

    David

    -------------------------------------------
    David Reasner
    President and Founder
    Albemarle Scientific Consulting LLC
    -------------------------------------------








  • 3.  RE:The meaning of the difference between values

    Posted 02-11-2013 10:33
    (Cross-posted as above)

    Mina, I'm not sure you have defined the three types of answers clearly.  Are "predicted answers" what you expect the examiners to say, regardless of what the truth is?  Are "true answers" and "expected answers (ground truth)" synonomous?  If not, we now have four different types of answers, not three. 

    The purpose of question (1) appears to be prediction of what the examiners will say.  What is the purpose of quetion (2)?  Are there two different models, one to predict the true answers and one to predict the examiners' answers? 
    -------------------------------------------
    Emil M Friedman, PhD
    emil.friedman@alum.mit.edu (forwards to day job)
    emilfrie@alumni.princeton.edu (home)
    http://www.statisticalconsulting.org


  • 4.  RE:The meaning of the difference between values

    Posted 02-11-2013 12:52

    Dear Friedman,

    Thank you for your reply.


    I constructed a model (random forest) using the questions related to analytic abilities and demographic survey questions such as a level of degree, etc. The model predicts a final answer ( hat Y) based on an examiner's test and survey (X). 

    The true answer means a correct answer of each question. Thus we have  the examiners' answers ( Y ), fitted (predicted) answers (hat Y) and true answers.   

    In general, we don't know the true value, or we treats Y is true. Here, Y is not the true (correct) answer. Thus, I think we can provide more detailed explanation about the model error (hat Y - Y) by using extra information, true answers.

    I am not sure how to approach this problem and how to define the meaning of the differences, true Y - Y and true Y - hat Y. 

    Thank you for your consideration and time.

    Have a great week!!

    Sincerely,
    Mina Yoo