Discussion: View Thread

covariates with measurement error

  • 1.  covariates with measurement error

    Posted 05-08-2013 17:26
    I'd appreciate a few opinions or recommedations on an appropriate statistical methodology.

    I'm working on a project where a cancer tumor sample is taken, and measurements made of a marker. The marker is evaluated by
    a pathologist looking under a microscope. The values reported in this data are like 45%, 50%, 55%, 60%....
    -always- integer values, rounded to the nearest 5. The data was collected long before a statistician got involved. There are no data available on inter/intra pathologist-observer variability. ( There are about 8 variables reported in this manner by the pathologist).

    The client would like to have some regressions with that baseline marker variable (and a few others) prepared for a long term outcome for the patient (e.g. death or tumor progression).

    Two questions
    1. Would this covariate be treated like any other, such as gender or age, or ethnic origin in Cox, or logistic regressions?
    2. What other method of analysis may be appropriate?

    Thanks

    -------------------------------------------
    Chris Barker, Ph.D.
    www,barkerstats.com

    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    -------------------------------------------


  • 2.  RE:covariates with measurement error

    Posted 05-09-2013 03:47


    -------------------------------------------
    Ehsan Motazedi
    -------------------------------------------

    Hello,

    If I have correctly understood your problem, you say that your covariates have been measured with noise, and you would like to minimize this noise effect upon your analysis? In that case, one way maybe to group the scores into fewer categories (e.g. <25%->1; 25%-75%->2;>75% ->3 or something similar and maybe you could do it upon some medical opinion), and then treat the new covariates as ordinal or categorical. 

    Another way maybe to run the analysis in the conventional manner and then perform a sensitivity analysis to investigate the effect of random noise upon your outcomes and inference. I hope this may be a bit of help. Good luck with the analysis!



    ---------------------------------
    Ehsan Motazedi, MSc.
    Biostatistician
    www.erasmusmc.nl
    ---------------------------------  






  • 3.  RE:covariates with measurement error

    Posted 05-09-2013 08:23
    But changing ratio data into ordinal data result in a *loss* of information.

    -------------------------------------------
    Wayne Fischer
    Statistician
    University of Texas Medical Branch
    -------------------------------------------








  • 4.  RE:covariates with measurement error

    Posted 05-09-2013 08:48


    One could explicitly model the rounding. For instance:

    Obs_X_t = Ture_x_t + u_t,  where 

               u_t = x_t - round(x_t)  ~  uniform (-5%, 5%)    

    Y_t = \beta_0+\beta_1 X_t + \epsilon_t

    and apply methodology as explained in, say,  Fuller (2006) Measurement Error Models.

    Good luck,
    Nagaraj
    -------------------------------------------
    Nagaraj Neerchal
    Professor and Chair
    UMBC
    -------------------------------------------








  • 5.  RE:covariates with measurement error

    Posted 05-09-2013 10:20
    Thank you.  Let me clarify  that the feature being measured (essentially a percentage of the cell being stained), is continuous.  Software, etc. exists to make those measurements.  The pathologists may not have or be able to afford the more accurate tools for measuring - and "eyeballing" is the method of choice.

    As suggested, I could simulate the data and the covariates/grouping.



    -------------------------------------------
    Chris Barker, Ph.D.

    www,barkerstats.com

    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    -------------------------------------------








  • 6.  RE:covariates with measurement error

    Posted 05-09-2013 08:53
    That loss would occur in almost every denoising operation, a compromise must be made between the "signal to noise ratio" and the " signal strength".

    -------------------------------------------
    Ehsan Motazedi, MSc.
    Biostatistician
    www.erasmusmc.nl
    -------------------------------------------







  • 7.  RE:covariates with measurement error

    Posted 05-09-2013 12:32
    Chris, thinking about your questions:

    Is it always one pathologist, or are there multiple pathologists with a single results for each marker on each cancer tumor sample? (You mentioned no information on inter/intra-pathologist variability, so that would imply not having more than one pathologist rate each marker on each sample, but it's good to verify.) If there are multiple pathologists on the study, do they align with particular markers, particular subsets of patients, or just whichever pathologist was assigned?

    If it is always one pathologist, the results will generalize to patients but not to other pathologists. That would be addressed in the discussion. If there are multiple pathologists involved, you might want to reflect that in your statistical model. Random effects for pathologists could allow you to generalize to patients and pathologists, but only if there are enough pathologists to make that credible.

    Regarding the marker scores, treating them as continuous variables the way one treats age implies equal spacing: the difference between 50% and 55% is the same as the difference between 55% and 60%. So you need to determine whether this assumption is reasonable. Another option is to treat the marker scores as ordinal data (which they are) with a series of cumulative dummy variables, but you would need a sufficient number of data points; that number would depend on how many marker scores occur in the data. For example, if the scores range from 0% through 100% you would need more data points than if they are restricted to 30% through 70%.

    Finally, have you looked into ROC curve analysis?

    -------------------------------------------
    Alicia Toledano
    President
    Biostatistics Consulting, LLC
    -------------------------------------------








  • 8.  RE:covariates with measurement error

    Posted 05-09-2013 14:32

    Hi, thanks. I meet with the client tomorrow, (she is visiting the U.S. for a week) and yours are on my list of questions.

    I'm developing a "statistical prediction model" and treating the variable as ordinal is a reasonable option, however as you mentioned, unfortunately on my training dataset, I have barely enough data to estimate the regression model that includes the ordinal variable and a few other variables that are to be included in the model.



    -------------------------------------------
    Chris Barker, Ph.D.
    President - San Francisco Bay Area Chapter of the American Statistical Association
    www,barkerstats.com

    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    -------------------------------------------








  • 9.  RE:covariates with measurement error

    Posted 05-09-2013 15:10
    I would be surprised if you were to run a simulation with
    newY = oldY +rv.uniform(-2.5,2.5)).
    that you would find much difference in the results.
    Since you are looking at rounding that would add noise of up to 2.5 on either side.

    What would you do if you had a classrom test with 21 questions that were right/wrong? Would you use the sum of the items as a ratio level of measurement?


    Another way to find out if it makes a difference to treat the variable as ordinal and asinterval level in the same run of a CATREG and see if the it makes a difference.

    My gut feeling without knowing a lot more about the actual data is that you would not have to use up the extra df to treat it as ordinal. 

    Before I would treat the variable as merely ordinal  I would try an inverse density function to transform the proportion to a z score. and see if it makes a contribution to the reasoning.
    zscore = idf.normal(proportion, 1,0).

    What do you see if you do a scatter plot vizualization of pairs of  IVs and the untransformed DV?  In your output does a loess fit look better than a linear fit?

    Of course it makes a differnce what the meaning of the 8 variables is, but if they ar DVs can you gain df by treating the variables as 8 repeats of measuring the same construct rather than 8 different constructs?


    HTH



    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------








  • 10.  RE:covariates with measurement error

    Posted 05-09-2013 19:28
    Hi Chris: Given the fact that biomarker values are converted into percentages and then reported, one could think of transforming the  percentages in 2/3 different ways. But before discussing the different ways, I would like to mention that it is hard to believe that the lab does not have any other data. It is compulsary that all the lab results are meticulously recorded in their books.Any way, my thoughts are as follows:

    The simplest way is to use probabilities Pi corresponding to the percentages and use them as values of  continuous variable.

    Another way is to transform these percentages onto "Probits" as is done in quantal bioaassays. Adding 5 to the Z score provides all positive values.
    Third choice is  to convert the data into categories, just as we categorize the gender or age data in logistic or Cox regression analysis.

    Transforming the percentages into probits is probably the best solution for handling your data.


    -------------------------------------------
    Sheela Talwalker, Ph.D.
    CSO, T'Walker Consulting
    -------------------------------------------








  • 11.  RE:covariates with measurement error

    Posted 05-09-2013 21:43
    An additional choice would be to express the percentages as proportions between 0 and 1, and subject them to Beta regression. 

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences of Biostatistics
    -------------------------------------------