มีวิธีที่ฉลาด / ชาญฉลาดในการทำความเข้าใจตัวตนถดถอยเชิงเส้นนี้สำหรับหลาย ๆ

ในการถดถอยเชิงเส้นฉันได้พบผลลัพธ์ที่น่ายินดีว่าถ้าเราพอดีกับแบบจำลอง

E [Y] = β_{1} X_{1} + β_{2} X_{2} + c,

$E[Y] = \beta_1 X_1 + \beta_2 X_2 + c,$

แล้วถ้าเราสร้างมาตรฐานและศูนย์ ,และข้อมูล $Y$ $X_1$ $X_2$

R^{2} = C o r (Y, X_{1}) β_{1} + C o r (Y, X_{2}) β_{2} .

$R^2 = \mathrm{Cor}(Y,X_1) \beta_1 + \mathrm{Cor}(Y, X_2) \beta_2.$

สิ่งนี้ทำให้ฉันรู้สึกเหมือนเป็นตัวแปร 2 รุ่นของสำหรับการถดถอยซึ่งเป็นที่ชื่นชอบ $R^2 = \mathrm{Cor}(Y,X)^2$ $y=mx+c$

แต่ข้อพิสูจน์เดียวที่ฉันรู้ไม่ได้อยู่ในเชิงสร้างสรรค์หรือลึกซึ้ง (ดูด้านล่าง) และยังมองมันรู้สึกว่าควรเข้าใจได้ง่าย

ตัวอย่างความคิด:

และพารามิเตอร์ให้เรา 'สัดส่วนของและ $\beta_1$ $\beta_2$ $X_1$ $X_2$ ใน $Y$ และดังนั้นเราจึงได้สัดส่วนตามความสัมพันธ์ของพวกเขา ...
$\beta$ s มีความสัมพันธ์บางส่วน $R^2$ คือความสัมพันธ์หลายกำลังสอง ... ความสัมพันธ์คูณด้วยความสัมพันธ์บางส่วน ...
ถ้าเราปรับมุมฉากก่อนจากนั้น $\beta$ จะเป็น $\mathrm{Cov}/\mathrm{Var}$ ... ผลลัพธ์นี้มีความหมายทางเรขาคณิตหรือไม่?

ดูเหมือนว่าไม่มีหัวข้อใดที่จะนำพาฉันไปได้ทุกที่ ทุกคนสามารถให้คำอธิบายที่ชัดเจนเกี่ยวกับวิธีการเข้าใจผลลัพธ์นี้

หลักฐานไม่น่าพอใจ

R^{2} = \frac{S S_{r e g}}{S S_{T o t}} = \frac{S S_{r e g}}{N} = ⟨ (β_{1} X_{1} + β_{2} X_{2})^{2} ⟩ = ⟨ β_{1}^{2} X_{1}^{2} ⟩ + ⟨ β_{2}^{2} X_{2}^{2} ⟩ + 2 ⟨ β_{1} β_{2} X_{1} X_{2} ⟩

$\begin{equation} R^2 = \frac{SS_{reg}}{SS_{Tot}} = \frac{SS_{reg}}{N} = \langle(\beta_1 X_1 + \beta_2 X_2)^2\rangle \\= \langle\beta_1^2 X_1^2\rangle + \langle\beta_2^2 X_2^2\rangle + 2\langle\beta_1\beta_2X_1X_2\rangle \end{equation}$

และ

C o r (Y, X_{1}) β_{1} + C o r (Y, X_{2}) β_{2} = ⟨ Y X_{1} ⟩ β_{1} + ⟨ Y X_{2} ⟩ β_{2} = ⟨ β_{1} X_{1}^{2} + β_{2} X_{1} X_{2} ⟩ β_{1} + ⟨ β_{1} X_{1} X_{2} + β_{2} X_{2}^{2} ⟩ β_{2} = ⟨ β_{1}^{2} X_{1}^{2} ⟩ + ⟨ β_{2}^{2} X_{2}^{2} ⟩ + 2 ⟨ β_{1} β_{2} X_{1} X_{2} ⟩

$\begin{equation} \mathrm{Cor}(Y,X_1) \beta_1 + \mathrm{Cor}(Y, X_2) \beta_2 = \langle YX_1\rangle\beta_1 + \langle Y X_2\rangle \beta_2\\ =\langle \beta_1 X_1^2 + \beta_2 X_1 X_2\rangle \beta_1 + \langle \beta_1 X_1 X_2 + \beta_2 X_2^2\rangle \beta_2\\ =\langle \beta_1^2 X_1^2\rangle + \langle \beta_2^2 X_2^2 \rangle + 2\langle \beta_1 \beta_2 X_1 X_2\rangle \end{equation}$

QED

— Korone
แหล่งที่มา

คุณจะต้องใช้ตัวแปรมาตรฐานมิฉะนั้นสูตรของคุณสำหรับ

R^{2}

$R^2$ ไม่รับประกันว่าจะอยู่ระหว่าง

0

$0$ และ

1

$1$ . แม้ว่าข้อสมมติฐานนี้จะออกมาในหลักฐานของคุณ แต่จะช่วยให้ชัดเจนในตอนแรก ฉันงงกับสิ่งที่คุณกำลังทำอยู่จริงๆเช่นกัน: ของคุณ

R^{2}

$R^2$ เห็นได้ชัดว่าเป็นฟังก์ชั่นของแบบจำลองเพียงอย่างเดียว - ไม่มีอะไรเกี่ยวข้องกับข้อมูล - แต่คุณเริ่มพูดถึงว่าคุณมี "พอดี" แบบกับบางสิ่งบางอย่าง

— whuber

Doesn't your top result only hold if X1 & X2 are perfectly uncorrelated?

— gung - Reinstate Monica

@gung I don't think so - proof at bottom seems to say it works regardless. This result surprises me too, hence wanting a "clear understanding proof"

— Korone

@whuber I'm not sure what you mean by "function of the model alone"? I simply mean the

R^{2}

$R^2$ for simple OLS with two predicter variables. I.e. this is the 2 variable version of

R^{2} = C o r (Y, X)^{2}

$R^2 = Cor(Y,X)^2$

— Korone

I cannot tell whether your

β_{i}

$\beta_i$ are the parameters or the estimates.

— whuber

คำตอบ:

The hat matrix is idempotent.

(This is a linear-algebraic way of stating that OLS is an orthogonal projection of the response vector onto the space spanned by the variables.)

Recall that by definition

R^{2} = \frac{E S S}{T S S}

$R^2 = \frac{ESS}{TSS}$

where

E S S = (\hat{Y})^{'} \hat{Y}

$ESS = (\hat Y)^\prime \hat Y$

is the sum of squares of the (centered) predicted values and

T S S = Y^{'} Y

$TSS = Y^\prime Y$

is the sum of squares of the (centered) response values. Standardizing $Y$ beforehand to unit variance also implies

T S S = Y^{'} Y = n .

$TSS = Y^\prime Y = n.$

Recall, too, that the estimated coefficients are given by

\hat{β} = (X^{'} X)^{-} X^{'} Y,

$\hat\beta = (X^\prime X)^{-} X^\prime Y,$

whence

\hat{Y} = X \hat{β} = X (X^{'} X)^{-} X^{'} Y = H Y

$\hat Y = X \hat \beta = X (X^\prime X)^{-} X^\prime Y = H Y$

where $H$ is the "hat matrix" effecting the projection of $Y$ onto its least squares fit $\hat Y$ . It is symmetric (which is obvious from its very form) and idempotent. Here is a proof of the latter for those unfamiliar with this result. It's just shuffling parentheses around:

\begin{aligned} H^{'} H = H H & = (X (X^{'} X)^{-} X^{'}) (X (X^{'} X)^{-} X^{'}) \\ = X (X^{'} X)^{-} (X^{'} X) (X^{'} X)^{-} X^{'} \\ = X (X^{'} X)^{-} X^{'} = H . \end{aligned}

$\eqalign{H^\prime H = H H &=\left( X (X^\prime X)^{-} X^\prime\right)\left(X (X^\prime X)^{-} X^\prime \right) \\ &= X (X^\prime X)^{-} \left(X^\prime X \right) (X^\prime X)^{-} X^\prime \\ &= X (X^\prime X)^{-} X^\prime = H. }$

Therefore

R^{2} = \frac{E S S}{T S S} = \frac{1}{n} (\hat{Y})^{'} \hat{Y} = \frac{1}{n} Y^{'} H^{'} H Y = \frac{1}{n} Y^{'} H Y = (\frac{1}{n} Y^{'} X) \hat{β} .

$R^2 = \frac{ESS}{TSS} = \frac{1}{n} (\hat Y)^\prime \hat Y = \frac{1}{n}Y^\prime H^\prime H Y = \frac{1}{n}Y^\prime H Y = \left(\frac{1}{n}Y^\prime X\right) \hat \beta.$

The crucial move in the middle used the idempotence of the hat matrix. The right hand side is your magical formula because $\frac{1}{n}Y^\prime X$ is the (row) vector of correlation coefficients between $Y$ and the columns of $X$ .

— whuber
แหล่งที่มา

(+1) Very nice write-up. But why ^{-} instead of ^{-1} everywhere?

— amoeba

@amoeba It's a generalized inverse, put there to handle the cases where

X^{'} X

$X^\prime X$ may be singular.

— whuber

@amoeba Penrose, in his original paper (A Generalized Inverse for Matrices, 1954) used the notation

A^{†}

$A^\dagger$ . I like neither that nor the

A^{+}

$A^{+}$ notation because they are too easily confused with conjugates, transposes, or conjugate transposes, whereas the

A^{-}

$A^{-}$ notation is so suggestive of an inverse the casual reader can get away with thinking of it as

A^{- 1}

$A^{-1}$ if they like. You're just too good a reader--but thanks for noticing.

— whuber

Interesting and compelling motivation, but may I ask if this notation is something that is occasionally used elsewhere or is it your own invention?

— amoeba

@amoeba: Yes, this notation appears elsewhere, including in the classical texts by Graybill on the linear model.

— cardinal

The following three formulas are well known, they are found in many books on linear regression. It is not difficult to derive them.

$\beta_1= \frac {r_{YX_1}-r_{YX_2}r_{X_1X_2}} {\sqrt{1-r_{X_1X_2}^2}}$

$\beta_2= \frac {r_{YX_2}-r_{YX_1}r_{X_1X_2}} {\sqrt{1-r_{X_1X_2}^2}}$

$R^2= \frac {r_{YX_1}^2+r_{YX_2}^2-2 r_{YX_1}r_{YX_2}r_{X_1X_2}} {\sqrt{1-r_{X_1X_2}^2}}$

If you substitute the two betas into your equation $R^2 = r_{YX_1} \beta_1 + r_{YX_2} \beta_2$ , you will get the above formula for R-square.

Here is a geometric "insight". Below are two pictures showing regression of $Y$ by $X_1$ and $X_2$ . This kind of representation is known as variables-as-vectors in subject space (please read what it is about). The pictures are drawn after all the three variables were centered, and so (1) every vector's length = st. deviation of the respective variable, and (2) angle (its cosine) between every two vectors = correlation between the respective variables.

enter image description here

$\hat{Y}$ is the regression prediction (orthogonal projection of $Y$ onto "plane X"); $e$ is the error term; $cos \angle{Y \hat{Y}}={|\hat Y|}/|Y|$ , multiple correlation coefficient.

The left picture depicts skew coordinates of $\hat{Y}$ on variables $X_1$ and $X_2$ . We know that such coordinates relate the regression coefficients. Namely, the coordinates are: $b_1|X_1|=b_1\sigma_{X_1}$ and $b_2|X_2|=b_2\sigma_{X_2}$ .

And the right picture shows corresponding perpendicular coordinates. We know that such coordinates relate the zero order correlation coefficients (these are cosines of orthogonal projections). If $r_1$ is the correlation between $Y$ and $X_1$ and $r_1^*$ is the correlation between $\hat Y$ and $X_1$ then the coordinate is $r_1|Y|=r_1\sigma_{Y} = r_1^*|\hat{Y}|=r_1^*\sigma_{\hat{Y}}$ . Likewise for the other coordinate, $r_2|Y|=r_2\sigma_{Y} = r_2^*|\hat{Y}|=r_2^*\sigma_{\hat{Y}}$ .

So far it were general explanations of linear regression vector representation. Now we turn for the task to show how it may lead to $R^2 = r_1 \beta_1 + r_2 \beta_2$ .

First of all, recall that in their question @Corone put forward the condition that the expression is true when all the three variables are standardized, that is, not just centered but also scaled to variance 1. Then (i.e. implying $|X_1|=|X_2|=|Y|=1$ to be the "working parts" of the vectors) we have coordinates equal to: $b_1|X_1|=\beta_1$ ; $b_2|X_2|=\beta_2$ ; $r_1|Y|=r_1$ ; $r_2|Y|=r_2$ ; as well as $R=|\hat Y|/|Y|=|\hat Y|$ . Redraw, under these conditions, just the "plane X" of the pictures above:

enter image description here

On the picture, we have a pair of perpendicular coordinates and a pair of skew coordinates, of the same vector $\hat Y$ of length $R$ . There exist a general rule to obtain perpendicular coordinates from skew ones (or back): $\bf P = S C$ , where $\bf P$ is points X axes matrix of perpendicular ones; $\bf S$ is the same sized matrix of skew ones; and $\bf C$ are the axes X axes symmetric matrix of angles (cosines) between the nonorthogonal axes.

$X_1$ and $X_2$ are the axes in our case, with $r_{12}$ being the cosine between them. So, $r_1 = \beta_1 + \beta_2 r_{12}$ and $r_2 = \beta_1 r_{12} + \beta_2$ .

Substitute these $r$ s expressed via $\beta$ s in the @Corone's statement $R^2 = r_1 \beta_1 + r_2 \beta_2$ , and you'll get that $R^2 = \beta_1^2 + \beta_2^2 + 2\beta_1\beta_2r_{12}$ , - which is true, because it is exactly how a diagonal of a parallelogram (tinted on the picture) is expressed via its adjacent sides (quantity $\beta_1\beta_2r_{12}$ being the scalar product).

This same thing is true for any number of predictors X. Unfortunately, it is impossible to draw the alike pictures with many predictors.

— ttnphns
แหล่งที่มา

+1 nice to see it constructed this way as well, but this doesn't add as much insight compared to whuber's answer

— Korone

@Corone, I added some "insight" which you might take.

— ttnphns

+1 Really cool (after the update). I thought that invoking "general rule" of converting between coordinates is a bit of an overkill (and for me was only confusing); to see that e.g.

r_{1} = β_{1} + β_{2} r_{12}

$r_1 = \beta_1 + \beta_2 r_{12}$ one only needs to remember the definition of cosine and look at one of the right triangles.

— amoeba

Really cool edit, switched accepted.

— Korone