การย้อนกลับของการถดถอยสัน: รับเมทริกซ์การตอบสนองและค่าสัมประสิทธิ์การถดถอยค้นหาตัวทำนายที่เหมาะสม

พิจารณาปัญหาการถดถอย OLS มาตรฐาน $\newcommand{\Y}{\mathbf Y}\newcommand{\X}{\mathbf X}\newcommand{\B}{\boldsymbol\beta}\DeclareMathOperator*{argmin}{argmin}$ : ฉันมีเมทริกซ์ $\Y$ และ $\X$ และฉันต้องการหาเพื่อลด โซลูชันได้รับโดย $\B$

L = ‖ Y - X β ‖^{2} .

$L=\|\Y-\X\B\|^2.$

\hat{β} = \underset{β}{argmin} {L} = (X^{⊤} X)^{+} X^{⊤} Y .

$\hat\B=\argmin_\B\{L\} = (\X^\top\X)^+\X^\top \Y.$

ฉันยังสามารถก่อให้เกิดปัญหา "ย้อนกลับ": ได้รับ $\Y$ และ $\B^*$ , ค้นหา $\hat\X$ ที่จะให้ผลลัพธ์ $\hat\B\approx \B^*$ , เช่นจะย่อ $\|\argmin_\B\{L\}-\B^*\|^2$ 2 ในคำพูดของผมมีการตอบสนองเมทริกซ์ $\Y$ และเวกเตอร์สัมประสิทธิ์ $\B^*$ และฉันต้องการที่จะหาเมทริกซ์ทำนายที่ว่าจะให้ผลผลิตใกล้เคียงกับค่าสัมประสิทธิ์ $\B^*$ * แน่นอนนี่เป็นปัญหาการถดถอยของ OLS ด้วยโซลูชัน

\hat{X} = \underset{X}{argmin} {‖ \underset{β}{argmin} {L} - β^{*} ‖^{2}} = Y β^{⊤} (β β^{⊤})^{+} .

$\hat\X = \argmin_\X\Big\{\|\argmin_\B\{L\}-\B^*\|^2\Big\} = \Y\B^\top(\B\B^\top)^{+}.$

การปรับปรุงการชี้แจง:ตามที่ @ GeoMatt22 อธิบายไว้ในคำตอบของเขาหาก $\Y$ เป็นเวกเตอร์ (เช่นถ้ามีเพียงหนึ่งตัวแปรตอบกลับ) จากนั้น $\hat \X$ นี้จะอยู่ในอันดับที่หนึ่งและปัญหาย้อนกลับจะถูกประเมินอย่างหนาแน่น ในกรณีของฉัน $\Y$ เป็นเมทริกซ์จริง ๆ (นั่นคือมีตัวแปรตอบกลับมากมายมันคือการถดถอยหลายตัวแปร ) ดังนั้น $\X$ คือ $n\times p$ , $\Y$ คือ $n\times q$ และ $\B$ คือ $p\times q$ Q

ฉันสนใจที่จะแก้ปัญหา "ย้อนกลับ" สำหรับการถดถอยบนสันเขา กล่าวคือตอนนี้ฟังก์ชั่นการสูญเสียของฉันคือ

L = ‖ Y - X β ‖^{2} + μ ‖ β ‖^{2}

$L=\|\Y-\X\B\|^2+\mu\|\B\|^2$ และการแก้ปัญหาคือ

\hat{β} = \underset{β}{argmin} {L} = (X^{⊤} X + μ I)^{- 1} X^{⊤} Y .

$\hat\B=\argmin_\B\{L\}=(\X^\top \X+\mu\mathbf I)^{-1}\X^\top \Y.$

ปัญหา "ย้อนกลับ" คือการหา

\hat{X} = \underset{X}{argmin} {‖ \underset{β}{argmin} {L} - β^{*} ‖^{2}} = ?

$\hat\X = \argmin_\X\Big\{\|\argmin_\B\{L\}-\B^*\|^2\Big\} = \;?$

อีกครั้งฉันมีเมทริกซ์การตอบสนอง $\Y$ และสัมประสิทธิ์เวกเตอร์ $\B^*$ และฉันต้องการค้นหาเมทริกซ์ตัวทำนายที่จะให้ค่าสัมประสิทธิ์ใกล้กับ $\B^*$ *

จริงๆแล้วมีสองสูตรที่เกี่ยวข้อง:

ค้นหา $\hat\X$ ได้รับ $\Y$ และ $\B^*$ และ $\mu$ \
ค้นหาและให้และ * $\hat\X$ $\hat \mu$ $\Y$ $\B^*$

ทั้งคู่มีวิธีแก้ปัญหาโดยตรงหรือไม่?

นี่คือข้อความที่ตัดตอนมา Matlab สั้น ๆ เพื่อแสดงปัญหา:

% generate some data
n = 10; % number of samples
p = 20; % number of predictors
q = 30; % number of responses
Y = rand(n,q);
X = rand(n,p);
mu = 0;
I = eye(p);

% solve the forward problem: find beta given y,X,mu
betahat = pinv(X'*X + mu*I) * X'*Y;

% backward problem: find X given y,beta,mu
% this formula works correctly only when mu=0
Xhat =  Y*betahat'*pinv(betahat*betahat');

% verify if Xhat indeed yields betahat
betahathat = pinv(Xhat'*Xhat + mu*I)*Xhat'*Y;
max(abs(betahathat(:) - betahat(:)))

รหัสนี้จะแสดงผลเป็นศูนย์ถ้าmu=0ไม่เช่นนั้น

regression least-squares ridge-regression

— อะมีบาพูดว่า Reinstate Monica
แหล่งที่มา

เนื่องจากได้รับและจึงไม่มีผลต่อการเปลี่ยนแปลงในการสูญเสีย ดังนั้นใน (1) คุณยังคงทำ OLS อยู่ (2) นั้นง่ายพอ ๆ กันเพราะความสูญเสียสามารถทำให้เกิดความสูญเสียเล็ก ๆ น้อย ๆ โดยการติดลบโดยพลการภายในขอบเขตของข้อ จำกัด ใด ๆ ที่คุณเปรียบเทียบเพื่อกำหนด ที่ช่วยลดคุณไปยังกรณี (1)

B

$B$

μ

$\mu$

\hat{μ}

$\hat\mu$

— whuber

@whuber ขอบคุณ ฉันคิดว่าฉันไม่ได้อธิบายอย่างชัดเจนเพียงพอ พิจารณา (1) และ (เรียกมันว่า ) แต่ฉันต้องการค้นหาที่จะให้ค่าสัมประสิทธิ์การถดถอยสันเขาใกล้กับในคำอื่น ๆ ที่ฉันต้องการค้นหาย่อฉันไม่เห็นว่าทำไมสิ่งนี้ควรเป็น OLS

B

$B$

μ

$\mu$

B^{*}

$B^*$

X

$X$

B^{*}

$B^*$

X

$X$

‖ \underset{B}{argmin} {L_{r i d g e} (X, B)} - B^{*} ‖^{2} .

$\Big\|\operatorname*{argmin}_B\big\{ L_\mathrm{ridge}(X,B)\big\} - B^*\Big\|^2.$

— อะมีบาพูดว่า Reinstate Monica

มันเหมือนกับว่าฉันมีและฉันต้องการหาเช่นนั้นอยู่ใกล้กับมา มันไม่ได้เป็นเช่นเดียวกับการค้นพบ*)

f (v, w)

$f(v,w)$

v

$v$

{argmin}_{w} f (v, w)

$\operatorname{argmin}_w f(v,w)$

w^{*}

$w^*$

{argmin}_{v} f (v, w^{*})

$\operatorname{argmin}_v f(v,w^*)$

— อะมีบาพูดว่า Reinstate Monica

การแสดงออกในโพสต์ของคุณทำให้เกิดความสับสนเกี่ยวกับเรื่องนั้นเพราะเห็นได้ชัดว่าคุณไม่ได้ใช้เป็นฟังก์ชันการสูญเสีย คุณอาจอธิบายรายละเอียดของปัญหา (1) และ (2) ในโพสต์ได้ไหม?

L

$L$

— whuber

@ hxd1011 คอลัมน์จำนวนมากใน X มักจะเรียกว่า "การถดถอยแบบหลายจุด" โดยทั่วไปแล้วคอลัมน์จำนวนมากใน Y มักจะเรียกว่า "การถดถอยหลายตัวแปร"

— อะมีบาพูดว่า Reinstate Monica

ตอนนี้คำถามได้มาบรรจบกับการกำหนดปัญหาที่น่าสนใจมากขึ้นฉันได้พบวิธีแก้ปัญหาสำหรับกรณีที่ 1 (พารามิเตอร์ริดจ์ที่รู้จัก) สิ่งนี้จะช่วยในกรณีที่ 2 (ไม่ใช่โซลูชันเชิงวิเคราะห์ แต่เป็นสูตรที่เรียบง่ายและข้อ จำกัด บางอย่าง)

สรุป:ทั้งสองสูตรปัญหาผกผันไม่มีคำตอบที่ไม่ซ้ำกัน ในกรณีที่ 2ที่สันเขาพารามิเตอร์เป็นที่รู้จักมีเพียบหลายโซลูชั่นสำหรับ ]ในกรณีที่ 1 ซึ่งให้มีจำนวน จำกัด ของวิธีแก้ปัญหาสำหรับเนื่องจากความคลุมเครือในสเปกตรัมค่าเอกพจน์ $\mu\equiv\omega^2$ $X_\omega$ $\omega\in[0,\omega_\max]$ $\omega$ $X_\omega$

(รากศัพท์นั้นยาวไปหน่อยดังนั้น TL, DR: จะมีรหัส Matlab ที่ใช้งานได้ในตอนท้าย)

กรณีที่ไม่ได้รับการพิจารณา ("OLS")

ปัญหาที่เกิดขึ้นข้างหน้าเป็น ที่ , และ Q

min_{B} ‖ X B - Y ‖^{2}

$\min_B\|XB-Y\|^2$

X \in R^{n \times p}

$X\in\mathbb{R}^{n\times p}$

B \in R^{p \times q}

$B\in\mathbb{R}^{p\times q}$

Y \in R^{n \times q}

$Y\in\mathbb{R}^{n\times q}$

ขึ้นอยู่กับคำถามที่ปรับปรุงแล้วเราจะถือว่าดังนั้นอยู่ภายใต้การกำหนดให้และYในฐานะที่เป็นคำถามที่เราจะถือว่า "เริ่มต้น" (ขั้นต่ำ -norm) สารละลาย ที่เป็นpseudoinverseของX $n<p<q$ $B$ $X$ $Y$ $L_2$

B = X^{+} Y

$B=X^+Y$

X^{+}

$X^+$

X

$X$

จากการสลายตัวของเอกพจน์ ( SVD ) ของซึ่งกำหนดโดย * pseudoinverse สามารถคำนวณได้เป็น ** (* นิพจน์แรกใช้ SVD แบบเต็มในขณะที่นิพจน์ที่สองใช้ SVD ที่ลดลง ** สำหรับความเรียบง่ายฉันถือว่ามีระดับเต็มเช่น $X$

X = U S V^{T} = U S_{0} V_{0}^{T}

$X=USV^T=US_0V_0^T$

X^{+} = V S^{+} U^{T} = V_{0} S_{0}^{- 1} U^{T}

$X^+=VS^+U^T=V_0S_0^{-1}U^T$

X

$X$

S_{0}^{- 1}

$S_0^{-1}$ มีอยู่)

ดังนั้นปัญหาไปข้างหน้ามีทางออก สำหรับการอ้างอิงในอนาคตฉันทราบว่าโดยที่

B \equiv X^{+} Y = (V_{0} S_{0}^{- 1} U^{T}) Y

$B\equiv X^+Y=\left(V_0S_0^{-1}U^T\right)Y$

S_{0} = d i a g (σ_{0})

$S_0=\mathrm{diag}(\sigma_0)$

σ_{0} > 0

$\sigma_0>0$ เป็นเวกเตอร์ ของค่าเอกพจน์

ในตรงกันข้ามปัญหาเราจะได้รับและBเรารู้ว่ามาจากกระบวนการข้างต้น แต่เราไม่ทราบว่าXจากนั้นภารกิจคือการพิจารณาที่เหมาะสม $Y$ $B$ $B$ $X$ $X$ X

ดังที่ระบุไว้ในคำถามที่ปรับปรุงแล้วในกรณีนี้เราสามารถกู้คืนโดยใช้วิธีการเดียวกันโดยพื้นฐานคือ $X$

X_{0} = Y B^{+}

$X_0=YB^+$ ตอนนี้ใช้ pseudoinverse ของB

B

$B$

กรณีที่กำหนดมากเกินไป (ตัวประมาณแบบสัน)

ในกรณี "OLS" ปัญหาภายใต้การพิจารณาถูกแก้ไขโดยการเลือก วิธีแก้ปัญหาขั้นต่ำเช่นวิธีการ "เฉพาะ" ของเรานั้นโดยปริยายทำให้เป็นระเบียบ regularized

แทนที่จะเลือกวิธีแก้ปัญหาบรรทัดฐานขั้นต่ำที่นี่เราแนะนำพารามิเตอร์เพื่อควบคุม "บรรทัดฐาน" ควรจะเล็กแค่ไหนเช่นเราใช้ $\omega$ การถดถอยของสันการถดถอยสันเขา

ในกรณีนี้เรามีชุดของปัญหาไปข้างหน้าสำหรับ , ที่จะได้รับจาก การจัดเก็บภาษีซ้ายที่แตกต่างกันและขวา เวกเตอร์ข้างมือเป็น $\beta_k$ $k=1,\ldots,q$

min_{β} ‖ X β - y_{k} ‖^{2} + ω^{2} ‖ β ‖^{2}

$\min_\beta\|X\beta-y_k\|^2+\omega^2\|\beta\|^2$

คอลเลกชันของปัญหานี้สามารถลดลงได้ต่อไปนี้ "OLS" ปัญหา

ที่เราได้แนะนำเติมเมทริกซ์

B_{ω} = [β_{1}, \dots, β_{k}], Y = [y_{1}, \dots, y_{k}]

$B_{\omega}=[\beta_1,\ldots,\beta_k] \quad,\quad Y=[y_1,\ldots,y_k]$

min_{B} ‖ X_{ω} B - Y ‖^{2}

$\min_B\|\mathsf{X}_\omega B-\mathsf{Y}\|^2$

X_{ω} = [\begin{matrix} X \\ ω I \end{matrix}], Y = [\begin{matrix} Y \\ 0 \end{matrix}]

$\mathsf{X}_\omega=\begin{bmatrix}X \\ \omega I\end{bmatrix} \quad , \quad \mathsf{Y}=\begin{bmatrix}Y \\ 0 \end{bmatrix}$

B_{ω} = X^{+} Y

$B_\omega = \mathsf{X}^+\mathsf{Y}$

B_{ω} = (V_{0} S_{ω}^{- 2} U^{T}) Y

$B_\omega = \left(V_0S_\omega^{-2}U^T\right) Y$

σ_{ω}^{2} = \frac{σ_{0}^{2} + ω^{2}}{σ_{0}}

$\sigma_\omega^2 = \frac{\sigma_0^2+\omega^2}{\sigma_0}$

p \leq n

$p\leq n$

σ_{ω}

$\sigma_\omega$ vector are expressed in terms of the

σ_{0}

$\sigma_0$ vector, where all operations are entry-wise.)

Now in this problem we can still formally recover a "base solution" as

X_{ω} = Y B_{ω}^{+}

$X_\omega=YB_\omega^+$ but this is not a true solution anymore.

However, the analogy still holds in that this "solution" has SVD

X_{ω} = U S_{ω}^{2} V_{0}^{T}

$X_\omega=US_\omega^2V_0^T$ with the singular values

σ_{ω}^{2}

$\sigma_\omega^2$ given above.

So we can derive a quadratic equation relating the desired singular values $\sigma_0$ to the recoverable singular values $\sigma_\omega^2$ and the regularization parameter $\omega$ . The solution is then

σ_{0} = \bar{σ} \pm Δ σ, \bar{σ} = \frac{1}{2} σ_{ω}^{2}, Δ σ = \sqrt{(\bar{σ} + ω) (\bar{σ} - ω)}

$\sigma_0=\bar{\sigma} \pm \Delta\sigma \quad , \quad \bar{\sigma} = \tfrac{1}{2}\sigma_\omega^2 \quad , \quad \Delta\sigma = \sqrt{\left(\bar{\sigma}+\omega\right)\left(\bar{\sigma}-\omega\right)}$

The Matlab demo below (tested online via Octave) shows that this solution method appears to work in practice as well as theory. The last line shows that all the singular values of $X$ are in the reconstruction $\bar{\sigma}\pm\Delta\sigma$ , but I have not completely figured out which root to take (sgn = $+$ vs. $-$ ). For $\omega=0$ it will always be the $+$ root. This generally seems to hold for "small" $\omega$ , whereas for "large" $\omega$ the $-$ root seems to take over. (Demo below is set to "large" case currently.)

% Matlab demo of "Reverse Ridge Regression"
n = 3; p = 5; q = 8; w = 1*sqrt(1e+1); sgn = -1;
Y = rand(n,q); X = rand(n,p);
I = eye(p); Z = zeros(p,q);
err = @(a,b)norm(a(:)-b(:),Inf);

B = pinv([X;w*I])*[Y;Z];
Xhat0 = Y*pinv(B);
dBres0 = err( pinv([Xhat0;w*I])*[Y;Z] , B )

[Uw,Sw2,Vw0] = svd(Xhat0, 'econ');

sw2 = diag(Sw2); s0mid = sw2/2;
ds0 = sqrt(max( 0 , s0mid.^2 - w^2 ));
s0 = s0mid + sgn * ds0;
Xhat = Uw*diag(s0)*Vw0';

dBres = err( pinv([Xhat;w*I])*[Y;Z] , B )
dXerr = err( Xhat , X )
sigX = svd(X)', sigHat = [s0mid+ds0,s0mid-ds0]' % all there, but which sign?

I cannot say how robust this solution is, as inverse problems are generally ill-posed, and analytical solutions can be very fragile. However cursory experiments polluting $B$ with Gaussian noise (i.e. so it has full rank $p$ vs. reduced rank $n$ ) seem to indicate the method is reasonably well behaved.

As for problem 2 (i.e. $\omega$ unknown), the above gives at least an upper bound on $\omega$ . For the quadratic discriminant to be non-negative we must have

ω \leq ω_{max} = {\bar{σ}}_{n} = min [\frac{1}{2} σ_{ω}^{2}]

$\omega \leq \omega_{\max} = \bar{\sigma}_n = \min[\tfrac{1}{2}\sigma_\omega^2]$

For the quadratic-root sign ambiguity, the following code snippet shows that independent of sign, any $\hat{X}$ will give the same forward $B$ ridge-solution, even when $\sigma_0$ differs from $\mathrm{SVD}[X]$ .

Xrnd=Uw*diag(s0mid+sign(randn(n,1)).*ds0)*Vw0'; % random signs
dBrnd=err(pinv([Xrnd;w*I])*[Y;Z],B) % B is always consistent ...
dXrnd=err(Xrnd,X) % ... even when X is not

— GeoMatt22
แหล่งที่มา

+11. Thanks a lot for all the effort that you put into answering this question and for all the discussion that we had. This seems to answer my question entirely. I felt that simply accepting your answer is not enough in this case; this deserves much more than two upvotes that this answer currently has. Cheers.

— amoeba says Reinstate Monica

@amoeba thanks! I am glad it was helpful. I think I will post a comment on whuber's answer you link asking if he thinks it is appropriate and/or if there is a better answer to use. (Note he prefaces his SVD discussion with the proviso

p \leq n

$p\leq n$ , i.e. an over-determined

X

$X$ .)

— GeoMatt22

@GeoMatt22 my comment on original question says using pinv is not a good thing, do you agree?

— Haitao Du

@hxd1011 In general you (almost) never want to explicitly invert a matrix numerically, and this holds also for the pseudo-inverse. The two reasons I used it here are 1) consistency with the mathematical equations + amoeba's demo code, and 2) for the case of underdetermined systems, the default Matlab "slash" solutions can differ from the pinv ones. Almost all of the cases in my code could be replaced by the appropriate \ or / commands, which are generally to be preferred. (These allow Matlab to decide the most effective direct solver.)

— GeoMatt22

@hxd1011 to clarify on point 2 of my previous comment, from the link in your comment on the original question: "If the rank of A is less than the number of columns in A, then x = A\B is not necessarily the minimum norm solution. The more computationally expensive x = pinv(A)*B computes the minimum norm least-squares solution.".

— GeoMatt22