ASA Connect

 View Only
  • 1.  How to best compute descriptives for time-varying predictors

    Posted 05-25-2020 14:28
    Hi everyone, 

    In a discrete-time survival analysis setting, where subjects are measured for a varying number of time intervals, what is the best way to compute descriptive statistics for time-varying predictors?   Imagine a simplified setting like this: 

    Subject          Time Interval          Event Status    Predictor Value       Number of Intervals 
         1                    [t1, t2)                          0                     10                                         2
         1                    [t2, t3)                          1                     12
         2                    [t1, t2)                          0                      9                                          3
         2                    [t2, t3)                          0                      8
         2                    [t3, t4)                          0                     11
         3                    [t1, t2)                          0                      7                                          2
         3                    [t2, t3]                          0                     10


    In this setting, Subject 1 is followed up for 2 time intervals and experiences the (non-recurrent) event of interest sometime during the second interval (since Event = 0 in the first interval and Event = 1 in the second interval).   The value of the predictor variable for Subject 1 is 10 in the first interval and 12 in the second interval.  (Assume this value is collected at the beginning of each interval.)    Subject 2 is followed up for 3 time intervals and does not experience the event by the end of his follow-up; this subject contributes 3 predictor values.  etc. 

    If we wanted to report a mean value for the predictor variable across the entire sample of 3 subjects, would it make sense to first compute the mean of the predictor variable for each subject (e.g., Subject 1 has a mean value of (10 + 12)/2 = 11, Subject 2 has a mean value of (9 + 8 + 11)/3 = 9.3, Subject 3 has a mean value of (7 + 10)/2 = 8.5) and then compute a weighted mean of the resulting values, where the weights reflect the number of time intervals of follow-up for each subject:   2/(2 + 3 + 2) * 11   + 3/(2 + 3 + 2)  * 9.3  +  2/(2 + 3 + 2)* 8.5 =  9.56

    Similarly, if we wanted to compute the standard deviation for the predictor variable across the entire sample of 3 subjects, would it make sense to compute it as a weighted standard deviation, using a formula such as the one here: https://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weightsd.pdf
     
    My concern is that if I just compute a mean of all predictor values across all subjects, i.e., (10 + 12 + 9 + 8 + 11 + 7 + 10)/7 = 9.57, that mean won't reflect the fact that subjects have multiple observations on the predictor variable of interest and the number of observations changes from subject to subject.  I guess we could compute  (10 + 12 + 9 + 8 + 11 + 7 + 10)/7 = 9.57 after all and then report it as representing a typical value of the predictor variable across all subjects in the sample and all time intervals?  Whereas 2/(2 + 3 + 2) * 11   + 3/(2 + 3 + 2)  * 9.3  +  2/(2 + 3 + 2)* 8.5 =  9.56 would represent a typical value of the predictor variable across the entire follow-up duration for the subjects in the study?

    I tried finding some references on how people tend to do this type of summarization for time-varying predictors measured on subjects with different numbers of intervals but couldn't find anything helpful.  

    Any suggestions on how to best handle this would be much appreciated! 

    Many thanks, 

    Isabella

    ------------------------------
    Isabella R. Ghement, Ph.D.
    Ghement Statistical Consulting Company Ltd.
    ------------------------------


  • 2.  RE: How to best compute descriptives for time-varying predictors

    Posted 05-26-2020 08:41

    Three thoughts - 

    1. In time series analysis, we often find situations with different intervals or with unevenly-spaced intervals. This is a common problem in statistical astrophysics, whence most of my experience derives, due to inclement weather making observation impossible on certain days distributed completely at random (a rare actual occurrence of it!) with respect to the astronomical phenomena at hand. You may wish to consider re-framing the data as continuous, evenly-spaced, and all intervals the same length but with missing data. The suggestion here is not to treat it as an interval problem but as a missing data problem with common intervals for all subjects. In the particular example given, all subjects would have the same three intervals but not all would have observations at every point. This makes it a missing value problem, for which there are a number of excellent references. 

    2. Where there are a sufficient number of observations - not found in the illustrative example given but common enough in practice - bootstrapping can be an effective tool for establishing statistical parameters and models. A weak model is developed using randomly selected observations. This process is repeated many times, developing a distribution of results. 

    3. The effectiveness of imputation and other models in time series with missing data can be evaluated by observing the missingness mechanism and then suppressing some records in a complete time series using the same mechanism. For example: suppose the time series of interest has 20% missing completely at random. We can take a different, complete time series in the same body of data and suppress 20% completely at random. Various modeling methods are applied to the data with artificially missing records and and each compared to the known but suppressed actual values. This can provide some insight as to which methods will perform best when applied to the data of interest. 

    Evaluation of Imputation Methods - MCAR, From Corliss D. J, and Brookshaw R., 2009


    ------------------------------
    David J Corliss, PhD
    Director, Peace-Work www.peace-work.org
    davidjcorliss@peace-work.org
    ------------------------------



  • 3.  RE: How to best compute descriptives for time-varying predictors

    Posted 05-26-2020 08:54
    Hi Isabella,

    I usually think of variables at different time periods as different variables that are best summarized in a plot where you can visualize the trajectory over time, but that is because I was first trained as an Economist reading few variables for an extended number of periods. In that context, an overall average is useful to visualize if the variables are stationary. If you need to aggregate and compare values over time, maybe you would want to look into how Price Indexes are computed, since they are weighted averages of prices using weights of an "average" basket of goods that changes once in a while.

    It's not much, but I hope this helps a little.
    Arthur

    ------------------------------
    Arthur Carbonare De Avila
    ------------------------------



  • 4.  RE: How to best compute descriptives for time-varying predictors

    Posted 05-26-2020 09:17
    Hi Isabella:

    The subjects in the study are always of interest for a variety of reasons.
    But, when you get to a complicated model like you have here, typically,
    I create a range of hypothetical subjects and make predictions for them.
    And, based on those predictions, I perform the inference rather than 
    on the subjects that were actually in the study, i.e., the real subjects
    inform the model and then the model is used for the inference.

    Rodney

    ------------------------------
    Rodney Sparapani
    ------------------------------