Hi everyone,

In a discrete-time survival analysis setting, where subjects are measured for a varying number of time intervals, what is the best way to compute descriptive statistics for time-varying predictors? Imagine a simplified setting like this:

**Subject Time Interval Event Status Predictor Value Number of Intervals ** ** 1 [t1, t2) 0 10 2**** 1 [t2, t3) 1 12** ** 2 [t1, t2) 0 9 3**** 2 [t2, t3) 0 8**** 2 [t3, t4) 0 11** ** 3 [t1, t2) 0 7 2**** 3 [t2, t3] 0 10**In this setting, Subject 1 is followed up for 2 time intervals and experiences the (non-recurrent) event of interest sometime during the second interval (since Event = 0 in the first interval and Event = 1 in the second interval). The value of the predictor variable for Subject 1 is 10 in the first interval and 12 in the second interval. (Assume this value is collected at the beginning of each interval.) Subject 2 is followed up for 3 time intervals and does not experience the event by the end of his follow-up; this subject contributes 3 predictor values. etc.

If we wanted to report a mean value for the predictor variable across the entire sample of 3 subjects, would it make sense to first compute the mean of the predictor variable for each subject (e.g., Subject 1 has a mean value of

**(10 + 12)/2** **=** **11**, Subject 2 has a mean value of

**(9 + 8 + 11)/3 = 9.3**, Subject 3 has a mean value of

** (7 + 10)/2 =** **8.5**) and then compute a weighted mean of the resulting values, where the weights reflect the number of time intervals of follow-up for each subject:

** 2/(2 + 3 + 2) * 11 + 3/(2 + 3 + 2) * 9.3 + 2/(2 + 3 + 2)* 8.5 = 9.56**?

Similarly, if we wanted to compute the standard deviation for the predictor variable across the entire sample of 3 subjects, would it make sense to compute it as a weighted standard deviation, using a formula such as the one here:

https://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weightsd.pdf?

My concern is that if I just compute a mean of all predictor values across all subjects, i.e.,

**(10 + 12 + 9 + 8 + 11 + 7 + 10)/7 = 9.57**, that mean won't reflect the fact that subjects have multiple observations on the predictor variable of interest and the number of observations changes from subject to subject. I guess we could compute

**(10 + 12 + 9 + 8 + 11 + 7 + 10)/7 = 9.57** after all and then report it as representing

** a typical value of the predictor variable across all subjects in the sample and all time intervals**? Whereas

**2/(2 + 3 + 2) * 11 + 3/(2 + 3 + 2) * 9.3 + 2/(2 + 3 + 2)* 8.5 = 9.56 **would

** **represent a

** typical value of the predictor variable across the entire follow-up duration for the subjects in the study?**I tried finding some references on how people tend to do this type of summarization for time-varying predictors measured on subjects with different numbers of intervals but couldn't find anything helpful.

Any suggestions on how to best handle this would be much appreciated!

Many thanks,

Isabella

------------------------------

Isabella R. Ghement, Ph.D.

Ghement Statistical Consulting Company Ltd.

------------------------------