พหุนามตัดกันสำหรับการถดถอย

ฉันไม่เข้าใจการใช้ความแตกต่างพหุนามในการถดถอยเชิงเส้น โดยเฉพาะอย่างยิ่งฉันหมายถึงการเข้ารหัสที่ใช้โดยRเพื่อแสดงตัวแปรช่วงเวลา (ตัวแปรลำดับที่มีระดับเว้นระยะเท่ากัน) อธิบายไว้ในหน้านี้

ในตัวอย่างของหน้านั้นถ้าฉันเข้าใจอย่างถูกต้อง R เหมาะกับโมเดลสำหรับตัวแปรช่วงเวลาคืนค่าสัมประสิทธิ์บางอย่างซึ่งให้น้ำหนักแนวโน้มเชิงเส้นการกำลังสองหรือลูกบาศก์ ดังนั้นรูปแบบการติดตั้งควรเป็น:

w r i t e = 52.7870 + 14.2587 X - 0.9680 X^{2} - 0.1554 X^{3},

${\rm write} = 52.7870 + 14.2587X - 0.9680X^2 - 0.1554X^3,$

โดยที่ $X$ ควรรับค่า $1$ , $2$ , $3$ หรือ $4$ ตามระดับที่แตกต่างกันของตัวแปรช่วงเวลา

ถูกต้องหรือไม่ และถ้าเป็นเช่นนั้นแล้วอะไรคือจุดประสงค์ของการตัดกันของพหุนาม

r regression contrasts

— Pippo
แหล่งที่มา

ไม่ค่าสัมประสิทธิ์เหล่านี้มีไว้สำหรับคำพหุนามแบบฉากมุมฉาก : คุณเขียนแบบจำลองสำหรับคำพหุนามแบบดิบ แทนที่

X

$X$ ,

X^{2}

$X^2$ , &

X^{3}

$X^3$ ด้วยค่า

L

$L$ ,

Q

$Q$ , &

C

$C$ ตามลำดับ (จากตารางค้นหา)

— Scortchi - Reinstate Monica

เรียน @Scortchi ขอขอบคุณสำหรับการตอบกลับของคุณ ฉันเดาว่าคุณเข้าใจว่าคุณหมายถึงอะไร แต่แล้วฉันก็ไม่เข้าใจโดยสุจริตว่าคำพหุนามแบบฉากฉากเหล่านี้ทำงานอย่างไร : P

— Pippo

ในฐานะที่เป็นเรื่องสำคัญสิ่งที่คุณมีไม่ได้เป็นแบบจำลองที่เหมาะสม คุณอาจต้องการ 'หมวก' ที่มีขนาดใหญ่เกินการเขียน (หรือ E [write]) ซึ่งหมายถึงค่าที่คาดการณ์ไว้ของการเขียนหรือค่าคาดหวังของการเขียน หรือคุณต้องการ '+ e' ในตอนท้ายเพื่อระบุสิ่งที่เหลืออยู่

— gung - Reinstate Monica

@Scortchi คืออะไรหรือคุณจะหา "ตารางค้นหา" ได้อย่างไร?

— Antoni Parellada

@AntoniParellada: มันเป็นตารางในหน้า OP ที่เชื่อมโยงไปยัง: ats.ucla.edu/stat/r/library/contrast_coding.htm#ORTHOGONAL & ได้contr.polyอยู่ใน R.

— Scortchi - Reinstate Monica

คำตอบ:

เพื่อสรุป (และในกรณีที่ไฮเปอร์ลิงก์ OP ล้มเหลวในอนาคต) เรากำลังดูชุดข้อมูลhsb2ดังนี้:

   id     female race ses schtyp prog read write math science socst
1  70        0    4   1      1    1   57    52   41      47    57
2 121        1    4   2      1    3   68    59   53      63    61
...
199 118      1    4   2      1    1   55    62   58      58    61
200 137      1    4   3      1    2   63    65   65      53    61

ซึ่งสามารถนำเข้าที่นี่

เราเปลี่ยนตัวแปรreadเป็นและสั่ง / ตัวแปรลำดับ:

hsb2$readcat<-cut(hsb2$read, 4, ordered = TRUE)
(means = tapply(hsb2$write, hsb2$readcat, mean))
 (28,40]  (40,52]  (52,64]  (64,76] 
42.77273 49.97849 56.56364 61.83333

ตอนนี้เรามีทุกชุดที่จะเพียงแค่ทำงานปกติ ANOVA - ใช่มันเป็น R และเราโดยทั่วไปมีตัวแปรขึ้นอยู่อย่างต่อเนื่องและตัวแปรอธิบายที่มีหลายระดับwrite readcatใน R เราสามารถใช้lm(write ~ readcat, hsb2)

1. การสร้างเมทริกซ์ความคมชัด:

ตัวแปรที่เรียงลำดับมีสี่ระดับที่แตกต่างกันreadcatดังนั้นเราจะมีความแตกต่าง $n-1=3$

table(hsb2$readcat)

(28,40] (40,52] (52,64] (64,76] 
     22      93      55      30

ก่อนอื่นเราไปหาเงินและดูที่ฟังก์ชั่น R ในตัว:

contr.poly(4)
             .L   .Q         .C
[1,] -0.6708204  0.5 -0.2236068
[2,] -0.2236068 -0.5  0.6708204
[3,]  0.2236068 -0.5 -0.6708204
[4,]  0.6708204  0.5  0.2236068

ตอนนี้เรามาแยกกันว่าเกิดอะไรขึ้นภายใต้ประทุน:

scores = 1:4  # 1 2 3 4 These are the four levels of the explanatory variable.
y = scores - mean(scores) # scores - 2.5

$y = \small [-1.5, -0.5, 0.5, 1.5]$

$\small \text{seq_len(n) - 1} = [0, 1, 2, 3]$

n = 4; X <- outer(y, seq_len(n) - 1, "^") # n = 4 in this case

$\small\begin{bmatrix} 1&-1.5&2.25&-3.375\\1&-0.5&0.25&-0.125\\1&0.5&0.25&0.125\\1&1.5&2.25&3.375 \end{bmatrix}$

What happened there? the outer(a, b, "^") raises the elements of a to the elements of b, so that the first column results from the operations, $\small(-1.5)^0$ , $\small(-0.5)^0$ , $\small 0.5^0$ and $\small 1.5^0$ ; the second column from $\small(-1.5)^1$ , $\small(-0.5)^1$ , $\small0.5^1$ and $\small1.5^1$ ; the third from $\small(-1.5)^2=2.25$ , $\small(-0.5)^2 = 0.25$ , $\small0.5^2 = 0.25$ and $\small1.5^2 = 2.25$ ; and the fourth, $\small(-1.5)^3=-3.375$ , $\small(-0.5)^3=-0.125$ , $\small0.5^3=0.125$ and $\small1.5^3=3.375$ .

ต่อไปเราจะทำการย่อยสลายแบบออโธกราฟนอของเมทริกซ์นี้และทำการแทนค่า Q ( ) บางส่วนของการทำงานภายในของฟังก์ชั่นที่ใช้ในการ QR ตัวประกอบใน R ใช้ในการโพสต์นี้มีอธิบายเพิ่มเติมที่นี่ $QR$ c_Q = qr(X)$qr

$\small\begin{bmatrix} -2&0&-2.5&0\\0.5&-2.236&0&-4.584\\0.5&0.447&2&0\\0.5&0.894&-0.9296&-1.342 \end{bmatrix}$

z = c_Q * (row(c_Q) == col(c_Q)) $\bf R$ $QR$

raw = qr.qy(qr(X), z) $Q$ qr(X)$qr $Q$ Q = qr.Q(qr(X)) $Qz$ Q %*% z

$\bf Q$ $\bf R$ does not change the orthogonality of the constituent column vectors, but given that the absolute value of the eigenvalues appears in decreasing order from top left to bottom right, the multiplication of $Qz$ will tend to decrease the values in the higher order polynomial columns:

Matrix of Eigenvalues of R
     [,1]      [,2] [,3]      [,4]
[1,]   -2  0.000000    0  0.000000
[2,]    0 -2.236068    0  0.000000
[3,]    0  0.000000    2  0.000000
[4,]    0  0.000000    0 -1.341641

Compare the values in the later column vectors (quadratic and cubic) before and after the $QR$ factorization operations, and to the unaffected first two columns.

Before QR factorization operations (orthogonal col. vec.)
     [,1] [,2] [,3]   [,4]
[1,]    1 -1.5 2.25 -3.375
[2,]    1 -0.5 0.25 -0.125
[3,]    1  0.5 0.25  0.125
[4,]    1  1.5 2.25  3.375


After QR operations (equally orthogonal col. vec.)
     [,1] [,2] [,3]   [,4]
[1,]    1 -1.5    1 -0.295
[2,]    1 -0.5   -1  0.885
[3,]    1  0.5   -1 -0.885
[4,]    1  1.5    1  0.295

Finally we call (Z <- sweep(raw, 2L, apply(raw, 2L, function(x) sqrt(sum(x^2))), "/", check.margin = FALSE)) turning the matrix raw into an orthonormal vectors:

Orthonormal vectors (orthonormal basis of R^4)
     [,1]       [,2] [,3]       [,4]
[1,]  0.5 -0.6708204  0.5 -0.2236068
[2,]  0.5 -0.2236068 -0.5  0.6708204
[3,]  0.5  0.2236068 -0.5 -0.6708204
[4,]  0.5  0.6708204  0.5  0.2236068

This function simply "normalizes" the matrix by dividing ("/") columnwise each element by the $\small\sqrt{\sum_\text{col.} x_i^2}$ . So it can be decomposed in two steps: $(\text{i})$ apply(raw, 2, function(x)sqrt(sum(x^2))), resulting in 2 2.236 2 1.341, which are the denominators for each column in $(\text{ii})$ where every element in a column is divided by the corresponding value of $(\text{i})$ .

At this point the column vectors form an orthonormal basis of $\mathbb{R}^4$ , until we get rid of the first column, which will be the intercept, and we have reproduced the result of contr.poly(4):

$\small\begin{bmatrix} -0.6708204&0.5&-0.2236068\\-0.2236068&-0.5&0.6708204\\0.2236068&-0.5&-0.6708204\\0.6708204&0.5&0.2236068 \end{bmatrix}$

The columns of this matrix are orthonormal, as can be shown by (sum(Z[,3]^2))^(1/4) = 1 and z[,3]%*%z[,4] = 0, for example (incidentally the same goes for rows). And, each column is the result of raising the initial $\text{scores - mean}$ to the $1$ -st, $2$ -nd and $3$ -rd power, respectively - i.e. linear, quadratic and cubic.

2. Which contrasts (columns) contribute significantly to explain the differences between levels in the explanatory variable?

We can just run the ANOVA and look at the summary...

summary(lm(write ~ readcat, hsb2))

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  52.7870     0.6339  83.268   <2e-16 ***
readcat.L    14.2587     1.4841   9.607   <2e-16 ***
readcat.Q    -0.9680     1.2679  -0.764    0.446    
readcat.C    -0.1554     1.0062  -0.154    0.877

... to see that there is a linear effect of readcat on write, so that the original values (in the third chunk of code in the beginning of the post) can be reproduced as:

coeff = coefficients(lm(write ~ readcat, hsb2))
C = contr.poly(4)
(recovered = c(coeff %*% c(1, C[1,]),
               coeff %*% c(1, C[2,]),
               coeff %*% c(1, C[3,]),
               coeff %*% c(1, C[4,])))
[1] 42.77273 49.97849 56.56364 61.83333

... or...

... or much better...

Being orthogonal contrasts the sum of their components adds to zero $\displaystyle \sum_{i=1}^t a_i = 0$ for $a_1,\cdots,a_t$ constants, and the dot product of any two of them is zero. If we could visualized them they would look something like this:

The idea behind orthogonal contrast is that the inferences that we can exctract (in this case generating coefficients via a linear regression) will be the result of independent aspects of the data. This would not be the case if we simply used $X^0, X^1, \cdots. X^n$ as contrasts.

Graphically, this is much easier to understand. Compare the actual means by groups in large square black blocks to the prediced values, and see why a straight line approximation with minimal contribution of quadratic and cubic polynomials (with curves only approximated with loess) is optimal:

If, just for effect, the coefficients of the ANOVA had been as large for the linear contrast for the other approximations (quadratic and cubic), the nonsensical plot that follows would depict more clearly the polynomial plots of each "contribution":

The code is here.

— Antoni Parellada
แหล่งที่มา

+1 Wow. Can this answer (I haven't read it till the end so far) be seen as an answer to my old, forgotten question too stats.stackexchange.com/q/63639/3277?

— ttnphns

(+1) @ttnphns: Arguably it'd fit even better there.

— Scortchi - Reinstate Monica

Just a tip: You might want to comment me there with a link to here; or issue an answer there - which I am likely to accept.

— ttnphns

@ttnphns and @Scortchi Thank you! I spent quite some time trying to make sense of these concepts, and didn't expect much reaction. So it is a very positive surprise. I think there are some wrinkles to iron out in regards to explaining the qr.qy() function, but I'll definitely try to see if I can say something minimally coherent about your question as soon as I have some time.

— Antoni Parellada

@Elvis I did try to choose a good summary sentence and place it somewhere in the post. I think this is a good point, and calls for a nice mathematical explanation, but it may be too much at this point to elaborate further.

— Antoni Parellada

I will use your example to explain how it works. Using polynomial contrasts with four groups yields following.

\begin{aligned} E w r i t e_{1} & = μ - 0.67 L + 0.5 Q - 0.22 C \\ E w r i t e_{2} & = μ - 0.22 L - 0.5 Q + 0.67 C \\ E w r i t e_{3} & = μ + 0.22 L - 0.5 Q - 0.67 C \\ E w r i t e_{4} & = μ + 0.67 L + 0.5 Q + 0.22 C \end{aligned}

$\begin{align} E\,write_1 &= \mu -0.67L + 0.5Q -0.22C\\ E\,write_2 &= \mu -0.22L -0.5Q + 0.67C\\ E\,write_3 &= \mu + 0.22L -0.5Q -0.67C\\ E\,write_4 &= \mu + 0.67L + 0.5Q + 0.22C \end{align}$

Where first equation works for the group of lowest reading scores and the fourth one for the group of best reading scores. we can compare these equations to the one given using normal linear regression (supposing $read_i$ is continous)

E w r i t e_{i} = μ + r e a d_{i} L + r e a d_{i}^{2} Q + r e a d_{i}^{3} C

$E\,write_i=\mu+read_iL + read_i^2Q+read_i^3C$

Usually instead of $L,Q,C$ you would have $\beta_1, \beta_2, \beta_3$ and written at first position. But this writing resembles the one with polynomial contrasts. So numbers in front of $L, Q, C$ are actually instead of $read_i, read_i^2, read_i^3$ . You can see that coefficients before $L$ have linear trend, before $Q$ quadratic and before $C$ cubic.

Then R estimates parameters $\mu, L,Q,C$ and gives you

\hat{μ} = 52.79, \hat{L} = 14.26, \hat{Q} = - 0.97, \hat{C} = - 0.16

$\widehat{\mu}=52.79, \widehat{L}=14.26, \widehat{Q}=−0.97, \widehat{C}=−0.16$ Where

\hat{μ} = \frac{1}{4} \sum_{i = 1}^{4} E w r i t e_{i}

$\widehat{\mu}=\frac{1}{4}\sum_{i=1}^4E\,write_i$ and estimated coefficients

\hat{μ}, \hat{L}, \hat{Q}, \hat{C}

$\widehat{\mu}, \widehat{L}, \widehat{Q}, \widehat{C}$ are something like estimates at normal linear regression. So from the output you can see if estimated coefficients are significantly different from zero, so you could anticipate some kind of linear, quadratic or cubic trend.

In that example is significantly non-zero only $\widehat{L}$ . So your conclusion could be: We see that the better scoring in writing depends linearly on reading score, but there is no significant quadratic or cubic effect.

— Fimba
แหล่งที่มา