Bootstrap ช่วงการทำนาย

มีเทคนิค bootstrap ใดบ้างในการคำนวณช่วงการทำนายสำหรับการทำนายจุดที่ได้รับเช่นจากการถดถอยเชิงเส้นหรือวิธีการถดถอยอื่น ๆ (เพื่อนบ้านที่ใกล้ที่สุด k-tree ที่ถดถอยเป็นต้น)

ยังไงก็เถอะฉันรู้สึกว่าบางครั้งวิธีที่เสนอให้เพียงแค่ดึงรองเท้าทำนายจุด (ดูเช่นช่วงเวลาการทำนายสำหรับการถดถอย kNN ) ไม่ได้ให้ช่วงการทำนาย แต่เป็นช่วงความมั่นใจ

ตัวอย่างใน R

# STEP 1: GENERATE DATA

set.seed(34345)

n <- 100 
x <- runif(n)
y <- 1 + 0.2*x + rnorm(n)
data <- data.frame(x, y)


# STEP 2: COMPUTE CLASSIC 95%-PREDICTION INTERVAL
fit <- lm(y ~ x)
plot(fit) # not shown but looks fine with respect to all relevant aspects

# Classic prediction interval based on standard error of forecast
predict(fit, list(x = 0.1), interval = "p")
# -0.6588168 3.093755

# Classic confidence interval based on standard error of estimation
predict(fit, list(x = 0.1), interval = "c")
# 0.893388 1.54155


# STEP 3: NOW BY BOOTSTRAP
B <- 1000
pred <- numeric(B)
for (i in 1:B) {
  boot <- sample(n, n, replace = TRUE)
  fit.b <- lm(y ~ x, data = data[boot,])
  pred[i] <- predict(fit.b, list(x = 0.1))
}
quantile(pred, c(0.025, 0.975))
# 0.8699302 1.5399179

เห็นได้ชัดว่าช่วง bootstrap พื้นฐาน 95% นั้นตรงกับช่วงความมั่นใจ 95% ไม่ใช่ช่วงการทำนาย 95% ดังนั้นคำถามของฉัน: วิธีการทำอย่างถูกต้อง?

bootstrap prediction-interval

— Michael M
แหล่งที่มา

อย่างน้อยในกรณีที่มีกำลังสองน้อยสุดสามัญคุณจะต้องมากกว่าแค่การทำนายจุด คุณต้องการใช้ข้อผิดพลาดที่เหลือโดยประมาณเพื่อสร้างช่วงเวลาการทำนายเช่นกัน

— Kodiologist

เกี่ยวข้อง: stats.stackexchange.com/q/44860

@duplo: ขอบคุณที่ชี้ให้เห็น ความยาวที่ถูกต้องของช่วงการทำนายคลาสสิกนั้นขึ้นอยู่กับสมมติฐานเชิงบรรทัดฐานของข้อผิดพลาดโดยตรงดังนั้นหากมันเป็นแง่ดีเกินไปก็ค่อนข้างแน่นอนว่าเวอร์ชั่น bootstrapped จะมาจากที่นั่น ฉันสงสัยว่ามีวิธี bootstrap ทั่วไปทำงานในการถดถอย (ไม่จำเป็นต้อง OLS)

— Michael M

ฉันคิดว่า \ textit {conformal อนุมาน} อาจเป็นสิ่งที่คุณต้องการซึ่งช่วยให้คุณสร้างช่วงการคาดการณ์ตามการสุ่มตัวอย่างใหม่ที่มีขอบเขตการสุ่มตัวอย่างตัวอย่างที่ถูกต้องและไม่ครอบคลุมมากเกินไป มีกระดาษที่ดีที่สามารถใช้ได้ที่เป็นarxiv.org/pdf/1604.04173.pdfซึ่งเป็นไปได้ที่จะอ่านเป็นเบื้องต้นเกี่ยวกับหัวข้อและแพคเกจ R ที่สามารถใช้ได้จากgithub.com/ryantibs/conformal

— Simon Boge Brant

วิธีการที่ออกมาวางข้างล่างนี้เป็นหนึ่งที่อธิบายไว้ในมาตรา 6.3.3 ของเดวิดสันและคลีย์ (1997), วิธีการบูตและการประยุกต์ใช้ของพวกเขา ขอขอบคุณที่ Glen_b และแสดงความคิดเห็นของเขาที่นี่ เนื่องจากมีคำถามหลายข้อเกี่ยวกับการตรวจสอบข้ามในหัวข้อนี้ฉันคิดว่ามันคุ้มค่าที่จะเขียน

ตัวแบบการถดถอยเชิงเส้นคือ:

\begin{aligned} Y_{i} & = X_{i} β + ϵ_{i} \end{aligned}

$\begin{align} Y_i &= X_i\beta+\epsilon_i \end{align}$

เรามีข้อมูลที่เราใช้ในการประมาณ $i=1,2,\ldots,N$ $\beta$

\begin{aligned} {\hat{β}}_{OLS} & = {(X^{'} X)}^{- 1} X^{'} Y \end{aligned}

$\begin{align} \hat{\beta}_{\text{OLS}} &= \left( X'X \right)^{-1}X'Y \end{align}$

ทีนี้เราต้องการทำนายว่าจะเป็นอะไรสำหรับจุดข้อมูลใหม่เนื่องจากเรารู้ว่าสำหรับมัน นี่คือปัญหาการทำนาย ขอเรียกใหม่ (ซึ่งเรารู้) และใหม่ (ซึ่งเราอยากที่จะคาดการณ์) 1การคาดการณ์ตามปกติ (ถ้าเราสมมติว่าเป็น iid และไม่เกี่ยวข้องกับ ) คือ: $Y$ $X$ $X$ $X_{N+1}$ $Y$ $Y_{N+1}$ $\epsilon_i$ $X$

\begin{aligned} Y_{N + 1}^{p} & = X_{N + 1} {\hat{β}}_{OLS} \end{aligned}

$\begin{align} Y^p_{N+1} &= X_{N+1}\hat{\beta}_{\text{OLS}} \end{align}$

\begin{aligned} e_{N + 1}^{p} & = Y_{N + 1} - Y_{N + 1}^{p} \end{aligned}

$\begin{align} e^p_{N+1} &= Y_{N+1}-Y^p_{N+1} \end{align}$

\begin{aligned} Y_{N + 1} & = Y_{N + 1}^{p} + e_{N + 1}^{p} \end{aligned}

$\begin{align} Y_{N+1} &= Y^p_{N+1} + e^p_{N+1} \end{align}$

Now, $Y^p_{N+1}$ we have already calculated. So, if we want to bound $Y_{N+1}$ in an interval, say, 90% of the time, all we need to do is estimate consistently the $5^{th}$ and $95^{th}$ percentiles/quantiles of $e^p_{N+1}$ , call them $e^5,e^{95}$ , and the prediction interval will be $\left[Y^p_{N+1}+e^5,Y^p_{N+1}+e^{95} \right]$ .

How to estimate the quantiles/percentiles of $e^p_{N+1}$ ? Well, we can write:

\begin{aligned} e_{N + 1}^{p} & = Y_{N + 1} - Y_{N + 1}^{p} \\ = X_{N + 1} β + ϵ_{N + 1} - X_{N + 1} {\hat{β}}_{OLS} \\ = X_{N + 1} (β - {\hat{β}}_{OLS}) + ϵ_{N + 1} \end{aligned}

$\begin{align} e^p_{N+1} &= Y_{N+1}-Y^p_{N+1}\\ &= X_{N+1}\beta + \epsilon_{N+1} - X_{N+1}\hat{\beta}_{\text{OLS}}\\ &= X_{N+1}\left( \beta-\hat{\beta}_{\text{OLS}} \right) + \epsilon_{N+1} \end{align}$

The strategy will be to sample (in a bootstrap kind of way) many times from $e^p_{N+1}$ and then calculate percentiles in the usual way. So, maybe we will sample 10,000 times from $e^p_{N+1}$ , and then estimate the $5^{th}$ and $95^{th}$ percentiles as the $500^{th}$ and $9,500^{th}$ smallest members of the sample.

To draw on $X_{N+1}\left( \beta-\hat{\beta}_{\text{OLS}} \right)$ , we can bootstrap errors (cases would be fine, too, but we are assuming iid errors anyway). So, on each bootstrap replication, you draw $N$ times with replacement from the variance-adjusted residuals (see next para) to get $\epsilon^*_i$ , then make new $Y^*_i=X_i\hat{\beta}_{\text{OLS}}+\epsilon^*_i$ , then run OLS on the new dataset, $\left(Y^*,X \right)$ to get this replication's $\beta^*_r$ . At last, this replication's draw on $X_{N+1}\left( \beta-\hat{\beta}_{\text{OLS}} \right)$ is $X_{N+1}\left( \hat{\beta}_{\text{OLS}}-\beta^*_r \right)$

Given we are assuming iid $\epsilon$ , the natural way to sample from the $\epsilon_{N+1}$ part of the equation is to use the residuals we have from the regression, $\left\{ e^*_1,e^*_2,\ldots,e^*_N \right\}$ . Residuals have different and generally too small variances, so we will want to sample from $\left\{ s_1-\overline{s},s_2-\overline{s},\ldots,s_N-\overline{s} \right\}$ , the variance-corrected residuals, where $s_i=e^*_i/\sqrt{(1-h_i)}$ and $h_i$ is the leverage of observation $i$ .

And, finally, the algorithm for making a 90% prediction interval for $Y_{N+1}$ , given that $X$ is $X_{N+1}$ is:

Make the prediction $Y^p_{N+1}=X_{N+1}\hat{\beta}_{\text{OLS}}$ .
Make the variance-adjusted residuals, $\left\{ s_1-\overline{s},s_2-\overline{s},\ldots,s_N-\overline{s}\right\}$ , where $s_i=e_i/\sqrt(1-h_{i})$ .
For replications :
- Draw $N$ times on the adjusted residuals to make bootstrap residuals $\left\{\epsilon^*_1,\epsilon^*_2,\ldots,\epsilon^*_N \right\}$
- Generate bootstrap $Y^*=X\hat{\beta}_{\text{OLS}}+\epsilon^*$
- Calculate bootstrap OLS estimator for this replication, $\beta^*_r=\left( X'X \right)^{-1}X'Y^*$
- Obtain bootstrap residuals from this replication, $e^*_r=Y^*-X\beta^*_r$
- Calculate bootstrap variance-adjusted residuals from this replication, $s^*-\overline{s^*}$
- Draw one of the bootstrap variance-adjusted residuals from this replication, $\epsilon^*_{N+1,r}$
- Calculate this replication's draw on $e^p_{N+1}$ , $e^{p*}_r=X_{N+1}\left( \hat{\beta}_{\text{OLS}}-\beta^*_r \right)+\epsilon^*_{N+1,r}$
Find $5^{th}$ and $95^{th}$ percentiles of $e^p_{N+1}$ , $e^5,e^{95}$
90% prediction interval for $Y_{N+1}$ is $\left[Y^p_{N+1}+e^5,Y^p_{N+1}+e^{95} \right]$ .

Here is R code:

# This script gives an example of the procedure to construct a prediction interval
# for a linear regression model using a bootstrap method.  The method is the one
# described in Section 6.3.3 of Davidson and Hinckley (1997),
# _Bootstrap Methods and Their Application_.


#rm(list=ls())
set.seed(12344321)
library(MASS)
library(Hmisc)

# Generate bivariate regression data
x <- runif(n=100,min=0,max=100)
y <- 1 + x + (rexp(n=100,rate=0.25)-4)

my.reg <- lm(y~x)
summary(my.reg)

# Predict y for x=78:
y.p <- coef(my.reg)["(Intercept)"] + coef(my.reg)["x"]*78
y.p

# Create adjusted residuals
leverage <- influence(my.reg)$hat
my.s.resid <- residuals(my.reg)/sqrt(1-leverage)
my.s.resid <- my.s.resid - mean(my.s.resid)


reg <- my.reg
s <- my.s.resid

the.replication <- function(reg,s,x_Np1=0){
  # Make bootstrap residuals
  ep.star <- sample(s,size=length(reg$residuals),replace=TRUE)

  # Make bootstrap Y
  y.star <- fitted(reg)+ep.star

  # Do bootstrap regression
  x <- model.frame(reg)[,2]
  bs.reg <- lm(y.star~x)

  # Create bootstrapped adjusted residuals
  bs.lev <- influence(bs.reg)$hat
  bs.s   <- residuals(bs.reg)/sqrt(1-bs.lev)
  bs.s   <- bs.s - mean(bs.s)

  # Calculate draw on prediction error
  xb.xb <- coef(my.reg)["(Intercept)"] - coef(bs.reg)["(Intercept)"] 
  xb.xb <- xb.xb + (coef(my.reg)["x"] - coef(bs.reg)["x"])*x_Np1
  return(unname(xb.xb + sample(bs.s,size=1)))
}

# Do bootstrap with 10,000 replications
ep.draws <- replicate(n=10000,the.replication(reg=my.reg,s=my.s.resid,x_Np1=78))

# Create prediction interval
y.p+quantile(ep.draws,probs=c(0.05,0.95))

# prediction interval using normal assumption
predict(my.reg,newdata=data.frame(x=78),interval="prediction",level=0.90)


# Quick and dirty Monte Carlo to see which prediction interval is better
# That is, what are the 5th and 95th percentiles of Y_{N+1}
# 
# To do it properly, I guess we would want to do the whole procedure above
# 10,000 times and then see what percentage of the time each prediction 
# interval covered Y_{N+1}

y.np1 <- 1 + 78 + (rexp(n=10000,rate=0.25)-4)
quantile(y.np1,probs=c(0.05,0.95))

— Bill
แหล่งที่มา

Thank you for the useful, detailed explanations. Following these lines, I think that a general technique outside OLS (tree based techniques, nearest neighbour etc.) wont be easily available, right?

— Michael M

There is this one for random forests: stats.stackexchange.com/questions/49750/… which sounds similar.

— Bill

As far as I can tell, if you abstract

X β

$X\beta$ to

f (X, θ)

$f(X, \theta)$ , this technique works for any model.

— shadowtalker

How do you generalise the "variance adjusted residuals" - the OLS approach relies on the leverage - is there a leverage calculation for an arbitrary f(X) estimator?

— David Waterworth