วิธีการถดถอยแบบฉาก (รวมสี่เหลี่ยมจัตุรัสน้อยที่สุด) ผ่านทาง PCA ได้อย่างไร

ฉันมักจะใช้lm()ในการวิจัยเพื่อดำเนินการถดถอยเชิงเส้นของ $y$ บนx $x$ ฟังก์ชั่นที่ส่งกลับค่าสัมประสิทธิ์ $\beta$ ดังกล่าวว่า

y = β x .

$y = \beta x.$

วันนี้ฉันได้เรียนรู้เกี่ยวกับกำลังสองรวมน้อยที่สุดและสามารถprincomp()ใช้ฟังก์ชัน (การวิเคราะห์องค์ประกอบหลัก, PCA) เพื่อดำเนินการได้ มันควรจะดีสำหรับฉัน (แม่นยำยิ่งขึ้น) ฉันได้ทำการทดสอบโดยใช้princomp()เช่น:

r <- princomp( ~ x + y)

ปัญหาของฉันคือวิธีการตีความผลลัพธ์ ฉันจะรับสัมประสิทธิ์การถดถอยได้อย่างไร โดย "ค่าสัมประสิทธิ์" ผมหมายถึงจำนวน $\beta$ ว่าผมจะต้องใช้ในการคูณ $x$ คุ้มค่าที่จะให้ตัวเลขที่ใกล้เคียงกับปี $y$

— Dail
แหล่งที่มา

One moment guys, i'm a bit confused. look at: zoonek2.free.fr/UNIX/48_R/09.html This is called PCA (Principal Component Analysis, aka "orthogonal regression" or "perpendicular sums of squares" or "total least squares") so i think we are talking about TLS with princomp() No?

— Dail

No; those are two different things, see wikipedia article about PCA. The fact it is used here is a hack (I don't know how exact, but I'm going to check it); that's why the complex extraction of coefficients.

A related question: stats.stackexchange.com/questions/2691/… and a blog post is referenced by one of the answers: cerebralmastication.com/2010/09/…

— Jonathan

Ordinary least squares vs. total least squares

$x$ $x$ $y$

OLS vs TLS

$y=\beta x$ $y$ $\hat y$ . TLS fits the same equation by minimizing squared distances between $(x,y)$ points and their projection on the line. In this simplest case TLS line is simply the first principal component of the 2D data. To find $\beta$ , do PCA on $(x,y)$ points, i.e. construct the $2\times 2$ covariance matrix $\boldsymbol \Sigma$ and find its first eigenvector $\mathbf v = (v_x, v_y)$ ; then $\beta = v_y/v_x$ .

In Matlab:

 v = pca([x y]);    //# x and y are centered column vectors
 beta = v(2,1)/v(1,1);

In R:

 v <- prcomp(cbind(x,y))$rotation
 beta <- v[2,1]/v[1,1]

By the way, this will yield correct slope even if $x$ and $y$ were not centered (because built-in PCA functions automatically perform centering). To recover the intercept, compute $\beta_0 = \bar y - \beta \bar x$ .

OLS vs. TLS, multiple regression

Given a dependent variable $y$ and many independent variables $x_i$ (again, all centered for simplicity), regression fits an equation

y = β_{1} x_{1} + \dots + β_{p} x_{p} .

$y= \beta_1 x_1 + \ldots + \beta_p x_p.$ OLS does the fit by minimizing the squared errors between observed values of

y

$y$ and predicted values

\hat{y}

$\hat y$ . TLS does the fit by minimizing the squared distances between observed

(x, y) \in R^{p + 1}

$(\mathbf x, y)\in\mathbb R^{p+1}$ points and the closest points on the regression plane/hyperplane.

Note that there is no "regression line" anymore! The equation above specifies a hyperplane: it's a 2D plane if there are two predictors, 3D hyperplane if there are three predictors, etc. So the solution above does not work: we cannot get the TLS solution by taking the first PC only (which is a line). Still, the solution can be easily obtained via PCA.

As before, PCA is performed on $(\mathbf x, y)$ points. This yields $p+1$ eigenvectors in columns of $\mathbf V$ . The first $p$ eigenvectors define a $p$ -dimensional hyperplane $\mathcal H$ that we need; the last (number $p+1$ ) eigenvector $\mathbf v_{p+1}$ is orthogonal to it. The question is how to transform the basis of $\mathcal H$ given by the first $p$ eigenvectors into the $\boldsymbol \beta$ coefficients.

Observe that if we set $x_i=0$ for all $i \ne k$ and only $x_k=1$ , then $\hat y=\beta_k$ , i.e. the vector

(0, \dots, 1, \dots, β_{k}) \in H

$(0,\ldots, 1, \ldots, \beta_k) \in \mathcal H$ lies in the hyperplane

H

$\mathcal H$ . On the other hand, we know that

v_{p + 1} = (v_{1}, \dots, v_{p + 1}) ⊥ H

$\mathbf v_{p+1}=(v_1, \ldots, v_{p+1}) \:\bot\: \mathcal H$ is orthogonal to it. I.e. their dot product must be zero:

v_{k} + β_{k} v_{p + 1} = 0 \Rightarrow β_{k} = - v_{k} / v_{p + 1} .

$v_k + \beta_k v_{p+1}=0 \Rightarrow \beta_k = -v_k/v_{p+1}.$

In Matlab:

 v = pca([X y]);    //# X is a centered n-times-p matrix, y is n-times-1 column vector
 beta = -v(1:end-1,end)/v(end,end);

In R:

 v <- prcomp(cbind(X,y))$rotation
 beta <- -v[-ncol(v),ncol(v)] / v[ncol(v),ncol(v)]

Again, this will yield correct slopes even if $x$ and $y$ were not centered (because built-in PCA functions automatically perform centering). To recover the intercept, compute $\beta_0 = \bar y - \bar {\mathbf x} \boldsymbol \beta$ .

As a sanity check, notice that this solution coincides with the previous one in case of only a single predictor $x$ . Indeed, then the $(x,y)$ space is 2D, and so, given that the first PCA eigenvector is orthogonal to the second (last) one, $v^{(1)}_y/v^{(1)}_x=-v^{(2)}_x/v^{(2)}_y$ .

Closed form solution for TLS

Surprisingly, it turns out that there is a closed form equation for $\boldsymbol \beta$ . The argument below is taken from Sabine van Huffel's book "The total least squares" (section 2.3.2).

Let $\mathbf X$ and $\mathbf y$ be the centered data matrices. The last PCA eigenvector $\mathbf v_{p+1}$ is an eigenvector of the covariance matrix of $[\mathbf X\: \mathbf y]$ with an eigenvalue $\sigma^2_{p+1}$ . If it is an eigenvector, then so is $-\mathbf v_{p+1}/v_{p+1} = (\boldsymbol \beta\:\: -1)^\top$ . Writing down the eigenvector equation:

(\begin{matrix} X^{⊤} X & X^{⊤} y \\ y^{⊤} X & y^{⊤} y \end{matrix}) (\begin{matrix} β \\ - 1 \end{matrix}) = σ_{p + 1}^{2} (\begin{matrix} β \\ - 1 \end{matrix}),

$\left(\begin{array}{c}\mathbf X^\top \mathbf X & \mathbf X^\top \mathbf y\\ \mathbf y^\top \mathbf X & \mathbf y^\top \mathbf y\end{array}\right) \left(\begin{array}{c}\boldsymbol \beta \\ -1\end{array}\right) = \sigma^2_{p+1}\left(\begin{array}{c}\boldsymbol \beta \\ -1\end{array}\right),$ and computing the product on the left, we immediately get that

β_{T L S} = (X^{⊤} X - σ_{p + 1}^{2} I)^{- 1} X^{⊤} y,

$\boldsymbol \beta_\mathrm{TLS} = (\mathbf X^\top \mathbf X - \sigma^2_{p+1}\mathbf I)^{-1} \mathbf X^\top \mathbf y,$ which strongly reminds the familiar OLS expression

β_{O L S} = (X^{⊤} X)^{- 1} X^{⊤} y .

$\boldsymbol \beta_\mathrm{OLS} = (\mathbf X^\top \mathbf X)^{-1} \mathbf X^\top \mathbf y.$

Multivariate multiple regression

The same formula can be generalized to the multivariate case, but even to define what multivariate TLS does, would require some algebra. See Wikipedia on TLS. Multivariate OLS regression is equivalent to a bunch of univariate OLS regressions for each dependent variable, but in the TLS case it is not so.

— amoeba says Reinstate Monica
แหล่งที่มา

I do not know R, but still wanted to provide R snippets for future reference. There are many people here proficient in R. Please feel free to edit my snippets if needed! Thank you.

— amoeba says Reinstate Monica

Nice post, but if I may ask what guarantees the fact that the vector

(0, \dots, 1, \dots, β_{k})

$(0,\ldots, 1, \ldots, \beta_k)$ lies in the hyperplane?

— JohnK

@JohnK, I am not sure what exactly is unclear. As I wrote, let all

x_{i}

$x_i$ be equal to zero apart from

x_{k} = 1

$x_k=1$ . Then if you plug this into

y = \sum β_{j} x_{j}

$y=\sum \beta_j x_j$ , you will get

y = β_{k} \cdot 1 = β_{k}

$y=\beta_k\cdot 1 = \beta_k$ . So the point

(0, \dots, 1, \dots β_{k})

$(0,\ldots, 1, \ldots \beta_k)$ lies on the hyperplane defined by the equation

y = \sum β_{j} x_{j}

$y=\sum \beta_j x_j$ .

— amoeba says Reinstate Monica

I seem to have misread that part but now it is clear. Thanks for the clarification too.

— JohnK

In R, you might prefer "eigen(cov(cbind(x, y)))$vectors" over "prcomp(cbind(x, y))$rotation" because the former is much faster for larger vectors.

— Thomas Browne

Based on the naive GNU Octave implementation found here, something like this might (grain of salt, it's late) work.

tls <- function(A, b){

  n <- ncol(A)
  C <- cbind(A, b)

  V <- svd(C)$v
  VAB <- V[1:n, (n+1):ncol(V)]
  VBB <- V[(n+1):nrow(V), (n+1):ncol(V)]
  return(-VAB/VBB)
}

— cashoes
แหล่งที่มา

princomp is running principal component analysis instead of total least squares regression. As far as I know there is no R function nor package that does TLS; at most there is Deming regression in MethComp.
Yet, please treat this as a suggestion that it is most likely not worth it.

I thought Deming in the MethComp package was TLS - what's the difference?

— mark999

You must give it the ratio of errors on x and y; pure TLS optimises this.