The key idea is that you create a population for the "other" covariates. So to compare
those with X1=a and X1=b, create a data set "pop" containing some population for the other
variables (X2 and X3). Then make two data sets temp1 = cbind(pop, X1='a') and temp2 =
cbind(pop, X1='b'), and get the population survival curve for each of them using
survexp(). Plot them together. The population can be any distribution you want, such
that you can tell a sensible story about what the curves represent: the distribution found
in the data set, one that is more balanced, less balanced, only old people, .... In
response to your question of using "only prototypical" subjects as the population the
answer is that of course you can. You simply need to be able to say what it represents.
For your continuous variable X3 below, the first question is what population to use
for X1 and X2, call it pop12, and the second is what curves you want to draw. An
effective graph will choose a small number of "representative" values from X3. Too many
and the plot is too busy, to few and its not sufficiently varied. But each curve will be
the same process: temp = cbind(pop12, X3 = some value); survexp(coxfit, newdata=temp).
Terry Therneau
-----------------------------
Thank you very much to everyone who responded to my initial inquiry about adjusted
survival curves.
Terry, I finally had a chance to review your excellent CRAN vignette regarding adjusted
survival curves and wanted to follow up with a few questions.
Let's say I am interested in constructing an adjusted "conditional" survival curve using
the methodology in the vignette. The way I understand this methodology, it implies two
steps: 1) model and 2) balance.
For 1), a Cox proportional hazards regression model is fitted to the sample data. For 2),
the model is used to obtain a survival curve for each individual in the sample (i.e., for
each configuration of predictor variables represented in the sample) and then the adjusted
"conditional" survival curve is obtained by performing a simple averaging of the obtained
survival curves.
Now, let's say that the Cox proportional hazards regression model includes 3 predictors
(i.e., X1, X2 and X3) such that X1 and X2 are both categorical with two levels each (i.e.,
levels a and b for X1; levels A and B for X2) and X3 is continuous. Let's also say that
we are interested in a variation of the adjusted "conditional" survival curves that would
enable us to visualize the effect of each of these predictor variables in turns on
survival (rather than the combined effect of all three predictor variables on overall
survival).
For the effect of X1, we could separate our sample into two sub-samples: i) subjects
where X1 = a and ii) subjects where X1 = b. We could then use the Cox proportional
hazards regression model to separately construct survival curves for all configurations of
values for X2 and X3 present in each of the two sub-samples. Simple averaging of those
survival curves across the subjects in each sub-sample would yield two adjusted
"conditional" survival curves - one for each level of X1.
For the effect of X2, we would proceed in a similar fashion as described for X1, except
that the two sub-samples would correspond to subjects where X2 = A and X2 = B,
respectively, and the construction of survival curves would pertain to all configurations
of values of X1 and X3 present in the two sub-samples.
For the effect of X3, things are trickier if we only care about some pre-specified values
of X3 (rather than all observed values of X3 present in the sample). For example, if X3
stands for Age, the pre-specified values of Age might be 30, 40 and 40 years. For a
relatively small study, it's entirely possible that none of the subjects in the study
actually have any of those ages (or some of those ages). So the question is: How would we
proceed in this situation, especially when it comes to looking at the effects of X1 and X2
on survival?
For this situation, could we use a slightly different reasoning where, instead of
averaging survival curves over all subjects in the sample with various observed
configurations of predictor variables, we would average survival curves over
"prototypical" subjects with idealized configurations of predictor variables? In the
context of the example given, we could define the following "prototypical" subjects:
X1 = a, X2 = A, X3 = 30 (where X3 = Age)
X1 = a, X2 = B, X3 = 30
X1 = b, X2 = A, X3 = 30
X2 = b, X2 = B, X3 = 30
X1 = a, X2 = A, X3 = 40
etc.
Each of these idealized configurations of predictor variables would yield a single
survival curve. Eliciting the effect of X1 would mean averaging the survival curves
corresponding to subjects with (i) X1 = a for whom X2 can be either A or B and X3 can be
either 30, 40, 50 and (ii) X1 = b for whom X2 can be either A or B and X3 can be either
30, 40, 50, etc. This would yield to average survival curves across "prototypical"
subjects with either X1 = a or X1 = b for whom X2 can be either A or B and X3 can be
either 30, 40, 50 (rather than average survival curves in the entire cohort of patients
with either X1 = a or X1 = b).
Does this make sense? And if it doesn't make sense, is there a better way to deal with
adjusted survival curves in situations where one of the predictor variables is continuous
and we are interested in learning more about its effect on survival by comparing adjusted
survival curves at some of its pre-specified values?
Thanks a lot,
Isabella
------------------------------
Isabella R. Ghement, Ph./D.
Ghement Statistical Consulting Company Ltd.
E-mail:
isabella@ghement.caWeb:
www.ghement.caTel: 604-767-1250
------------------------------
Reply to Group Reply to Sender via Email View Thread Recommend Forward Flag as
Inappropriate Post New Message
Original Message:
Sent: 10-18-2016 07:46
Original Message------
Thank you very much to everyone who responded to my initial inquiry about adjusted survival curves.
Terry, I finally had a chance to review your excellent CRAN vignette regarding adjusted survival curves and wanted to follow up with a few questions.
Let's say I am interested in constructing an adjusted "conditional" survival curve using the methodology in the vignette. The way I understand this methodology, it implies two steps: 1) model and 2) balance.
For 1), a Cox proportional hazards regression model is fitted to the sample data. For 2), the model is used to obtain a survival curve for each individual in the sample (i.e., for each configuration of predictor variables represented in the sample) and then the adjusted "conditional" survival curve is obtained by performing a simple averaging of the obtained survival curves.
Now, let's say that the Cox proportional hazards regression model includes 3 predictors (i.e., X1, X2 and X3) such that X1 and X2 are both categorical with two levels each (i.e., levels a and b for X1; levels A and B for X2) and X3 is continuous. Let's also say that we are interested in a variation of the adjusted "conditional" survival curves that would enable us to visualize the effect of each of these predictor variables in turns on survival (rather than the combined effect of all three predictor variables on overall survival).
For the effect of X1, we could separate our sample into two sub-samples: i) subjects where X1 = a and ii) subjects where X1 = b. We could then use the Cox proportional hazards regression model to separately construct survival curves for all configurations of values for X2 and X3 present in each of the two sub-samples. Simple averaging of those survival curves across the subjects in each sub-sample would yield two adjusted "conditional" survival curves - one for each level of X1.
For the effect of X2, we would proceed in a similar fashion as described for X1, except that the two sub-samples would correspond to subjects where X2 = A and X2 = B, respectively, and the construction of survival curves would pertain to all configurations of values of X1 and X3 present in the two sub-samples.
For the effect of X3, things are trickier if we only care about some pre-specified values of X3 (rather than all observed values of X3 present in the sample). For example, if X3 stands for Age, the pre-specified values of Age might be 30, 40 and 40 years. For a relatively small study, it's entirely possible that none of the subjects in the study actually have any of those ages (or some of those ages). So the question is: How would we proceed in this situation, especially when it comes to looking at the effects of X1 and X2 on survival?
For this situation, could we use a slightly different reasoning where, instead of averaging survival curves over all subjects in the sample with various observed configurations of predictor variables, we would average survival curves over "prototypical" subjects with idealized configurations of predictor variables? In the context of the example given, we could define the following "prototypical" subjects:
X1 = a, X2 = A, X3 = 30 (where X3 = Age)
X1 = a, X2 = B, X3 = 30
X1 = b, X2 = A, X3 = 30
X2 = b, X2 = B, X3 = 30
X1 = a, X2 = A, X3 = 40
etc.
Each of these idealized configurations of predictor variables would yield a single survival curve. Eliciting the effect of X1 would mean averaging the survival curves corresponding to subjects with (i) X1 = a for whom X2 can be either A or B and X3 can be either 30, 40, 50 and (ii) X1 = b for whom X2 can be either A or B and X3 can be either 30, 40, 50, etc. This would yield to average survival curves across "prototypical" subjects with either X1 = a or X1 = b for whom X2 can be either A or B and X3 can be either 30, 40, 50 (rather than average survival curves in the entire cohort of patients with either X1 = a or X1 = b).
Does this make sense? And if it doesn't make sense, is there a better way to deal with adjusted survival curves in situations where one of the predictor variables is continuous and we are interested in learning more about its effect on survival by comparing adjusted survival curves at some of its pre-specified values?
Thanks a lot,
Isabella
------------------------------
Isabella R. Ghement, Ph./D.
Ghement Statistical Consulting Company Ltd.
E-mail: isabella@ghement.ca
Web: www.ghement.ca
Tel: 604-767-1250
------------------------------