Discussion: View Thread

  • 1.  Time Series Simulations

    Posted 02-17-2012 17:00

    Hi everyone,

    I am working on a consulting problem which requires the simulation of time series data via bootstrapping and would like to get some feedback from the consulting list regarding the suitability of the method I would like to use to simulate these time series. 

    The simulations will revolve around the following ideas:

    1)      We have an original time series y which includes missing values, left-censored values and outliers. 

    2)      We will fit a regression model of the form "linear trend + seasonal component" to the y data.  (Presumably, this type of model will accommodate the left-censored values and the outliers.)

    3)      We will extract the residuals from this model fit and use some form of bootstrap to create B replicate sets of residuals.

    4)      We need to create bootstrap time series of the form "linear trend + bootstrap residuals" and feed them into the simulations.  (The simulations will eventually look at the power to detect a linear trend based on the bootstrap time series.)

    For step 2), I am planning on using a nonparametric regression method (e.g., rank regression) in order to extract the residuals required by step 3).   Because the original time series has missing values, I will apply this method only to the non-missing values of the series and then make sure I construct a set of augmented residuals that will consist of the residuals obtained from the complete data with missing values inserted wherever we had a missing observation in the original time series.  (Is there a better way to do this?)

    For step 3), I could use either ARMA bootstrap or maximum entropy bootstrap to take into account the potential serial correlation of the residuals.  (Block bootstrap could also be an option.)  As far as I know, neither of these two methods was designed to accommodate missing values.   Is it OK to apply either of these two bootstrapping methods to the residuals corresponding to the complete observations and then insert missing values for the missing observations?   If not, what other approach would be suitable in this situation?  For both bootstrap methods, there is a concern that missing values, outliers and censored data values may distort the results - but, at the end of the day, it is important to create time series that include all of the special features of the original time series.   

    Thank you in advance for any insights you may be able to provide. 

    Kind regards,

    Isabella  

     

    -------------------------------------------
    Isabella Ghement
    Ghement Statistical Consulting Co.
    -------------------------------------------


  • 2.  RE:Time Series Simulations

    Posted 02-17-2012 18:56
    I am not sure about the details of the approach you are planning and especially the part about generating just 8 sets of residuals.  Having written books on bootstrapping I am acquainted with the existing literature on the bootstrap.  There are frequency domain approaches but the primary approaches are time domain.  The time domain approaches can be divided into two types (1) model based and (2) nonparametric.  The model based approach uses a particular parametric form. ARIMA models with fixed orders p, d and q would be one example. You fit the model by maximum likelihood and compute the residuals.  Then you bootstrap the residuals.  For any bootstrap set of residuals you add them back to the fitted data points to get a bootstrap version of the time series.  Fit the model to the bootstrap data to start getting bottstrap estimates of the model parameters.  Repeat the process many time and you get a bootstrap distirbution for the parameters from which bootstrap confidence intervals can be generated.  The nonparametric approach involves bootstrapping the values of blocks of consecutive data points from the time series.  The most commonly used version of this is called moving blosk bootstrap.  Good coverage of the methods can be found in the following books
    1. Davison and Hinkley (1997)
    2. Chernick (2008) and 
    3. Lahiri (2003).

    Lahiri's book is the most detailed as it deals solely with dependent data.  Now your example has some wrinkles that are not part of the common literature, outliers and censored data.  If by outliers you just mean that some extreme values occur because of heavytailed residuals you would treat the outliers no differently than any other observation.
    Regarding the left censoring, I am not sure what you mean.  Are you talking about censoring the time of occurrence or the value of the observation?  If it is time then I would think there would be interval censoring because the left censoring would mean that the observation occurred prior to time t' but since the observations are ordered in time it would also have to occur after the time of the preceding observation say tp.  So the time would fall in the interval
    (tp, t').

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------