Hi everyone,
In a discrete-time survival analysis setting, where subjects are measured for a varying number of time intervals, what is the best way to compute descriptive statistics for time-varying predictors? Imagine a simplified setting like this:
Subject Time Interval Event Status Predictor Value Number of Intervals 1 [t1, t2) 0 10 2 1 [t2, t3) 1 12 2 [t1, t2) 0 9 3 2 [t2, t3) 0 8 2 [t3, t4) 0 11 3 [t1, t2) 0 7 2 3 [t2, t3] 0 10In this setting, Subject 1 is followed up for 2 time intervals and experiences the (non-recurrent) event of interest sometime during the second interval (since Event = 0 in the first interval and Event = 1 in the second interval). The value of the predictor variable for Subject 1 is 10 in the first interval and 12 in the second interval. (Assume this value is collected at the beginning of each interval.) Subject 2 is followed up for 3 time intervals and does not experience the event by the end of his follow-up; this subject contributes 3 predictor values. etc.
If we wanted to report a mean value for the predictor variable across the entire sample of 3 subjects, would it make sense to first compute the mean of the predictor variable for each subject (e.g., Subject 1 has a mean value of
(10 + 12)/2 = 11, Subject 2 has a mean value of
(9 + 8 + 11)/3 = 9.3, Subject 3 has a mean value of
(7 + 10)/2 = 8.5) and then compute a weighted mean of the resulting values, where the weights reflect the number of time intervals of follow-up for each subject:
2/(2 + 3 + 2) * 11 + 3/(2 + 3 + 2) * 9.3 + 2/(2 + 3 + 2)* 8.5 = 9.56?
Similarly, if we wanted to compute the standard deviation for the predictor variable across the entire sample of 3 subjects, would it make sense to compute it as a weighted standard deviation, using a formula such as the one here:
https://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weightsd.pdf?
My concern is that if I just compute a mean of all predictor values across all subjects, i.e.,
(10 + 12 + 9 + 8 + 11 + 7 + 10)/7 = 9.57, that mean won't reflect the fact that subjects have multiple observations on the predictor variable of interest and the number of observations changes from subject to subject. I guess we could compute
(10 + 12 + 9 + 8 + 11 + 7 + 10)/7 = 9.57 after all and then report it as representing
a typical value of the predictor variable across all subjects in the sample and all time intervals? Whereas
2/(2 + 3 + 2) * 11 + 3/(2 + 3 + 2) * 9.3 + 2/(2 + 3 + 2)* 8.5 = 9.56 would
represent a
typical value of the predictor variable across the entire follow-up duration for the subjects in the study?I tried finding some references on how people tend to do this type of summarization for time-varying predictors measured on subjects with different numbers of intervals but couldn't find anything helpful.
Any suggestions on how to best handle this would be much appreciated!
Many thanks,
Isabella
------------------------------
Isabella R. Ghement, Ph.D.
Ghement Statistical Consulting Company Ltd.
------------------------------