ความหมายและการบรรจบกันของกำลังสองน้อยที่สุดที่ได้รับคืน

ฉันได้ใช้กำลังสองน้อยที่สุดซ้ำอย่างน้อยกำลังสอง (IRLS) เพื่อย่อฟังก์ชันของแบบฟอร์มต่อไปนี้

$J(m) = \sum_{i=1}^{N} \rho \left(\left| x_i - m \right|\right)$

โดยที่ $N$ คือจำนวนอินสแตนซ์ของ $x_i \in \mathbb{R}$ , $m \in \mathbb{R}$ คือค่าประมาณที่ฉันต้องการและ $\rho$ เป็นฟังก์ชันการปรับค่าที่เหมาะสม สมมติว่ามันเป็นนูน (แต่ไม่จำเป็นต้องเข้มงวด) และเปลี่ยนแปลงได้ในตอนนี้ เป็นตัวอย่างที่ดีของดังกล่าว $\rho$ เป็นฟังก์ชั่นการสูญเสีย Huber

สิ่งที่ฉันทำคือแยกความแตกต่าง $J(m)$ เทียบกับ $m$ (และจัดการ) ที่จะได้รับ

$\frac{dJ}{dm}= \sum_{i=1}^{N} \frac{\rho'\left( \left|x_i-m\right|\right) }{\left|x_i-m\right|} \left( x_i-m \right)$

และแก้ปัญหานี้ซ้ำ ๆ โดยการตั้งค่าให้เท่ากับ 0 และกำหนดน้ำหนักที่การวนซ้ำ $k$ เป็น $w_i(k) = \frac{\rho'\left( \left|x_i-m{(k)}\right|\right) }{\left|x_i-m{(k)}\right|}$ (หมายเหตุว่าภาวะเอกฐานการรับรู้ที่ $x_i=m{(k)}$ คือจริงๆเอกพจน์ที่ถอดออกได้ในทุก $\rho$ 's ฉันอาจจะเกี่ยวกับการดูแล) จากนั้นฉันก็จะได้

$\sum_{i=1}^{N} w_i(k) \left( x_i-m{(k+1)} \right)=0$

และฉันแก้ปัญหาเพื่อให้ได้ $m(k+1) = \frac{\sum_{i=1}^{N} w_i(k) x_i}{ \sum_{i=1}^{N} w_i(k)}$ )

ฉันทำซ้ำอัลกอริธึมจุดคงที่นี้จนกระทั่ง "ลู่เข้า" ฉันจะทราบว่าถ้าคุณไปถึงจุดคงที่คุณจะดีที่สุดเนื่องจากอนุพันธ์ของคุณเป็น 0 และมันเป็นฟังก์ชันนูน

ฉันมีคำถามสองข้อเกี่ยวกับขั้นตอนนี้:

นี่เป็นอัลกอริทึม IRLS มาตรฐานหรือไม่ หลังจากอ่านบทความหลายเรื่องในหัวข้อ (และพวกเขากระจัดกระจายและคลุมเครือเกี่ยวกับสิ่งที่เป็น IRLS) นี่คือคำจำกัดความที่สอดคล้องกันที่สุดของอัลกอริทึมที่ฉันสามารถหาได้ ฉันสามารถโพสต์เอกสารได้ถ้าคนต้องการ แต่จริงๆแล้วฉันไม่ต้องการมีอคติกับใครที่นี่ แน่นอนคุณสามารถสรุปเทคนิคพื้นฐานนี้ให้กับปัญหาประเภทอื่น ๆ ที่เกี่ยวข้องกับเวกเตอร์และข้อโต้แย้งอื่นที่ไม่ใช่การกำหนดอาร์กิวเมนต์เป็นบรรทัดฐานของฟังก์ชันเลียนแบบพารามิเตอร์ของคุณ ความช่วยเหลือหรือความเข้าใจใด ๆ จะดีมากในเรื่องนี้ $x_i$ $\left|x_i-m{(k)}\right|$
การบรรจบกันดูเหมือนจะใช้งานได้จริง แต่ฉันมีข้อกังวลเล็กน้อยเกี่ยวกับเรื่องนี้ ฉันยังไม่เห็นหลักฐานของมัน หลังจากการจำลอง Matlab ง่าย ๆ ฉันเห็นว่าการวนซ้ำของสิ่งนี้ไม่ใช่การทำแผนที่การหดตัว (ฉันสร้างอินสแตนซ์สุ่มสองรายการของและการคำนวณ $m$ และเห็นว่านี่เป็นครั้งคราวมากกว่า 1) การแมปที่กำหนดโดยการวนซ้ำหลายครั้งติดต่อกันนั้นไม่ได้เป็นการจับคู่การหดตัวอย่างเคร่งครัด แต่ความน่าจะเป็นของค่าคงที่ Lipschitz ที่สูงกว่า 1 จะต่ำมาก ดังนั้นจึงมีความคิดเกี่ยวกับความน่าจะเป็นของการหดแผนที่? เครื่องจักรที่ฉันใช้เพื่อพิสูจน์ว่าสิ่งนี้มาบรรจบกันคืออะไร? มันรวมกันหรือไม่ $\frac{\left|m_1(k+1) - m_2(k+1)\right|}{\left|m_1(k)-m_2(k)\right|}$

คำแนะนำใด ๆ ที่เป็นประโยชน์

แก้ไข: ฉันชอบกระดาษบน IRLS สำหรับการตรวจจับการฟื้นตัว / อัดแบบเบาบางโดย Daubechies และคณะ 2551 "ซ้ำอย่างน้อยที่สุดถ่วงน้ำหนักกำลังสองน้อยที่สุดสำหรับการกู้คืนอย่างกระจัดกระจาย" บน arXiv แต่ดูเหมือนว่าจะมุ่งเน้นไปที่น้ำหนักของปัญหาที่ไม่ได้เกิดจาก กรณีของฉันง่ายกว่ามาก

— คริส A.
แหล่งที่มา

การดูหน้า wiki บนIRWLSฉันต่อสู้กับความแตกต่างระหว่างขั้นตอนที่คุณอธิบายและ IRWLS (พวกเขาใช้

| y_{i} - x x_{i}^{'} β β |^{2}

$|y_i-\pmb x_i'\pmb\beta|^2$ as their particular

ρ

$\rho$ function). Can you explain in what ways you think the algorithm you propose is different from IRWLS?

— user603

I never stated that it was different, and if I implied it, I didn't mean to.

— Chris A.

สำหรับคำถามแรกของคุณคุณควรกำหนด "มาตรฐาน" หรือรับทราบว่ามีการสร้าง "แบบจำลองมาตรฐาน" เป็นที่ยอมรับ ตามความคิดเห็นที่ระบุไว้อย่างน้อยก็ปรากฏว่าวิธีที่คุณใช้ IRWLS นั้นค่อนข้างมาตรฐาน

สำหรับคำถามที่สองของคุณ "การทำแผนที่ความน่าจะเป็นแบบหดตัว" สามารถเชื่อมโยงได้ (อย่างไม่เป็นทางการ) กับการบรรจบกันของ จากสิ่งที่ฉันอ่านมีวรรณคดีขนาดใหญ่ในวิชาวิศวกรรมเป็นหลัก ในสาขาเศรษฐศาสตร์เราใช้นิดหน่อยโดยเฉพาะงานน้ำเชื้อของ Lennart Ljung - บทความแรกคือLjung (1977) - ซึ่งแสดงให้เห็นว่าการลู่เข้า (หรือไม่) ของอัลกอริธึมแบบสุ่มซ้ำสามารถถูกกำหนดโดยเสถียรภาพ (หรือ ไม่) ของสมการเชิงอนุพันธ์สามัญที่เกี่ยวข้อง

(สิ่งต่อไปนี้ได้รับการทำงานซ้ำหลังจากการสนทนาที่มีผลกับ OP ในความคิดเห็น)

การลู่เข้า

I will use as reference Saber Elaydi "An Introduction to Difference Equations", 2005, 3d ed. The analysis is conditional on some given data sample, so the $x's$ are treated as fixed.

The first-order condition for the minimization of the objective function, viewed as a recursive function in $m$ ,

m (k + 1) = \sum_{i = 1}^{N} v_{i} [m (k)] x_{i}, v_{i} [m (k)] \equiv \frac{w_{i} [m (k)]}{\sum_{i = 1}^{N} w_{i} [m (k)]} [1]

$m(k+1) = \sum_{i=1}^{N} v_i[m(k)] x_i, \;\; v_i[m(k)] \equiv \frac{w_i[m(k)]}{ \sum_{i=1}^{N} w_i[m(k)]} \qquad [1]$

has a fixed point (the argmin of the objective function). By Theorem 1.13 pp 27-28 of Elaydi, if the first derivative with respect to $m$ of the RHS of $[1]$ , evaluated at the fixed point $m^*$ , denote it $A'(m^*)$ , is smaller than unity in absolute value, then $m^*$ is asymptotically stable (AS). More over by Theorem 4.3 p.179 we have that this also implies that the fixed point is uniformly AS (UAS).
"Asymptotically stable" means that for some range of values around the fixed point, a neighborhood $(m^* \pm \gamma)$ , not necessarily small in size, the fixed point is attractive , and so if the algorithm gives values in this neighborhood, it will converge. The property being "uniform", means that the boundary of this neighborhood, and hence its size, is independent of the initial value of the algorithm. The fixed point becomes globally UAS, if $\gamma = \infty$ .
So in our case, if we prove that

| A^{'} (m^{*}) | \equiv | \sum_{i = 1}^{N} \frac{\partial v_{i} (m^{*})}{\partial m} x_{i} | < 1 [2]

$|A'(m^*)|\equiv \left|\sum_{i=1}^{N} \frac{\partial v_i(m^*)}{\partial m}x_i\right| <1 \qquad [2]$

we have proven the UAS property, but without global convergence. Then we can either try to establish that the neighborhood of attraction is in fact the whole extended real numbers, or, that the specific starting value the OP uses as mentioned in the comments (and it is standard in IRLS methodology), i.e. the sample mean of the $x$ 's, $\bar x$ , always belongs to the neighborhood of attraction of the fixed point.

We calculate the derivative

\frac{\partial v_{i} (m^{*})}{\partial m} = \frac{\frac{\partial w_{i} (m^{*})}{\partial m} \sum_{i = 1}^{N} w_{i} (m^{*}) - w_{i} (m^{*}) \sum_{i = 1}^{N} \frac{\partial w_{i} (m^{*})}{\partial m}}{{(\sum_{i = 1}^{N} w_{i} (m^{*}))}^{2}}

$\frac{\partial v_i(m^*)}{\partial m} = \frac {\frac{\partial w_i(m^*)}{\partial m}\sum_{i=1}^{N} w_i(m^*)-w_i(m^*)\sum_{i=1}^{N}\frac{\partial w_i(m^*)}{\partial m}}{\left(\sum_{i=1}^{N} w_i(m^*)\right)^2}$

= \frac{1}{\sum_{i = 1}^{N} w_{i} (m^{*})} \cdot [\frac{\partial w_{i} (m^{*})}{\partial m} - v_{i} (m^{*}) \sum_{i = 1}^{N} \frac{\partial w_{i} (m^{*})}{\partial m}]

$=\frac 1{\sum_{i=1}^{N} w_i(m^*)}\cdot\left[\frac{\partial w_i(m^*)}{\partial m}-v_i(m^*)\sum_{i=1}^{N}\frac{\partial w_i(m^*)}{\partial m}\right]$ Then

A^{'} (m^{*}) = \frac{1}{\sum_{i = 1}^{N} w_{i} (m^{*})} \cdot [\sum_{i = 1}^{N} \frac{\partial w_{i} (m^{*})}{\partial m} x_{i} - (\sum_{i = 1}^{N} \frac{\partial w_{i} (m^{*})}{\partial m}) \sum_{i = 1}^{N} v_{i} (m^{*}) x_{i}]

$A'(m^*) = \frac 1{\sum_{i=1}^{N} w_i(m^*)}\cdot\left[\sum_{i=1}^{N}\frac{\partial w_i(m^*)}{\partial m}x_i-\left(\sum_{i=1}^{N}\frac{\partial w_i(m^*)}{\partial m}\right)\sum_{i=1}^{N}v_i(m^*)x_i\right]$

= \frac{1}{\sum_{i = 1}^{N} w_{i} (m^{*})} \cdot [\sum_{i = 1}^{N} \frac{\partial w_{i} (m^{*})}{\partial m} x_{i} - (\sum_{i = 1}^{N} \frac{\partial w_{i} (m^{*})}{\partial m}) m^{*}]

$=\frac 1{\sum_{i=1}^{N} w_i(m^*)}\cdot\left[\sum_{i=1}^{N}\frac{\partial w_i(m^*)}{\partial m}x_i-\left(\sum_{i=1}^{N}\frac{\partial w_i(m^*)}{\partial m}\right)m^*\right]$

and

| A^{'} (m^{*}) | < 1 \Rightarrow | \sum_{i = 1}^{N} \frac{\partial w_{i} (m^{*})}{\partial m} (x_{i} - m^{*}) | < | \sum_{i = 1}^{N} w_{i} (m^{*}) | [3]

$|A'(m^*)| <1 \Rightarrow \left|\sum_{i=1}^{N}\frac{\partial w_i(m^*)}{\partial m}(x_i-m^*)\right| < \left|\sum_{i=1}^{N} w_i(m^*)\right| \qquad [3]$

we have

\begin{aligned} \frac{\partial w_{i} (m^{*})}{\partial m} = & \frac{- ρ^{″} (| x_{i} - m^{*} |) \cdot \frac{x_{i} - m^{*}}{| x_{i} - m^{*} |} | x_{i} - m^{*} | + \frac{x_{i} - m^{*}}{| x_{i} - m^{*} |} ρ^{'} (| x_{i} - m^{*} |)}{| x_{i} - m^{*} |^{2}} \\ = \frac{x_{i} - m^{*}}{| x_{i} - m^{*} |^{3}} ρ^{'} (| x_{i} - m^{*} |) - ρ^{″} (| x_{i} - m^{*} |) \cdot \frac{x_{i} - m^{*}}{| x_{i} - m^{*} |^{2}} \\ = \frac{x_{i} - m^{*}}{| x_{i} - m^{*} |^{2}} \cdot [\frac{ρ^{'} (| x_{i} - m^{*} |)}{| x_{i} - m^{*} |} - ρ^{″} (| x_{i} - m^{*} |)] \\ = \frac{x_{i} - m^{*}}{| x_{i} - m^{*} |^{2}} \cdot [w_{i} (m^{*}) - ρ^{″} (| x_{i} - m^{*} |)] \end{aligned}

$\begin{align}\frac{\partial w_i(m^*)}{\partial m} = &\frac{-\rho''(|x_i-m^*|)\cdot \frac {x_i-m^*}{|x_i-m^*|}|x_i-m^*|+\frac {x_i-m^*}{|x_i-m^*|}\rho'(|x_i-m^*|)}{|x_i-m^*|^2} \\ \\ &=\frac {x_i-m^*}{|x_i-m^*|^3}\rho'(|x_i-m^*|) - \rho''(|x_i-m^*|)\cdot \frac {x_i-m^*}{|x_i-m^*|^2} \\ \\ &=\frac {x_i-m^*}{|x_i-m^*|^2}\cdot \left[\frac {\rho'(|x_i-m^*|)}{|x_i-m^*|}-\rho''(|x_i-m^*|)\right]\\ \\ &=\frac {x_i-m^*}{|x_i-m^*|^2}\cdot \left[w_i(m^*)-\rho''(|x_i-m^*|)\right] \end{align}$

Inserting this into $[3]$ we have

| \sum_{i = 1}^{N} \frac{x_{i} - m^{*}}{| x_{i} - m^{*} |^{2}} \cdot [w_{i} (m^{*}) - ρ^{″} (| x_{i} - m^{*} |)] (x_{i} - m^{*}) | < | \sum_{i = 1}^{N} w_{i} (m^{*}) |

$\left|\sum_{i=1}^{N}\frac {x_i-m^*}{|x_i-m^*|^2}\cdot \left[w_i(m^*)-\rho''(|x_i-m^*|)\right](x_i-m^*)\right| < \left|\sum_{i=1}^{N} w_i(m^*)\right|$

\Rightarrow | \sum_{i = 1}^{N} w_{i} (m^{*}) - \sum_{i = 1}^{N} ρ^{″} (| x_{i} - m^{*} |) | < | \sum_{i = 1}^{N} w_{i} (m^{*}) | [4]

$\Rightarrow \left|\sum_{i=1}^{N}w_i(m^*)-\sum_{i=1}^{N}\rho''(|x_i-m^*|)\right| < \left|\sum_{i=1}^{N} w_i(m^*)\right| \qquad [4]$

This is the condition that must be satisfied for the fixed point to be UAS. Since in our case the penalty function is convex, the sums involved are positive. So condition $[4]$ is equivalent to

\sum_{i = 1}^{N} ρ^{″} (| x_{i} - m^{*} |) < 2 \sum_{i = 1}^{N} w_{i} (m^{*}) [5]

$\sum_{i=1}^{N}\rho''(|x_i-m^*|) < 2\sum_{i=1}^{N}w_i(m^*) \qquad [5]$

If $\rho(|x_i-m|)$ is Hubert's loss function, then we have a quadratic ( $q$ ) and a linear ( $l$ ) branch,

ρ (| x_{i} - m |) = {\begin{cases} (1 / 2) | x_{i} - m |^{2} | x_{i} - m | \leq δ \\ δ (| x_{i} - m | - δ / 2) | x_{i} - m | > δ \end{cases}

$\rho(|x_i-m|)=\cases{ (1/2)|x_i- m|^2 \qquad\;\;\;\; |x_i-m|\leq \delta \\ \\ \delta\big(|x_i-m|-\delta/2\big) \qquad |x_i-m|> \delta}$

and

ρ^{'} (| x_{i} - m |) = {\begin{cases} | x_{i} - m | | x_{i} - m | \leq δ \\ δ | x_{i} - m | > δ \end{cases}

$\rho'(|x_i-m|)=\cases{ |x_i- m| \qquad |x_i-m|\leq \delta \\ \\ \delta \qquad \qquad \;\;\;\; |x_i-m|> \delta}$

ρ^{″} (| x_{i} - m |) = {\begin{cases} 1 | x_{i} - m | \leq δ \\ 0 | x_{i} - m | > δ \end{cases}

$\rho''(|x_i-m|)=\cases{ 1\qquad |x_i-m|\leq \delta \\ \\ 0 \qquad |x_i-m|> \delta}$

{\begin{cases} w_{i, q} (m) = 1 | x_{i} - m | \leq δ \\ w_{i, l} (m) = \frac{δ}{| x_{i} - m |} < 1 | x_{i} - m | > δ \end{cases}

$\cases{ w_{i,q}(m) =1\qquad \qquad \qquad |x_i-m|\leq \delta \\ \\ w_{i,l}(m) =\frac {\delta}{|x_i-m|} <1 \qquad |x_i-m|> \delta}$

Since we do not know how many of the $|x_i-m^*|$ 's place us in the quadratic branch and how many in the linear, we decompose condition $[5]$ as ( $N_q + N_l = N$ )

\sum_{i = 1}^{N_{q}} ρ_{q}^{″} + \sum_{i = 1}^{N_{l}} ρ_{l}^{″} < 2 [\sum_{i = 1}^{N_{q}} w_{i, q} + \sum_{i = 1}^{N_{l}} w_{i, l}]

$\sum_{i=1}^{N_q}\rho_q''+\sum_{i=1}^{N_l}\rho_l'' < 2\left[\sum_{i=1}^{N_q}w_{i,q} +\sum_{i=1}^{N_l}w_{i,l}\right]$

\Rightarrow N_{q} + 0 < 2 [N_{q} + \sum_{i = 1}^{N_{l}} w_{i, l}] \Rightarrow 0 < N_{q} + 2 \sum_{i = 1}^{N_{l}} w_{i, l}

$\Rightarrow N_q + 0 < 2\left[N_q +\sum_{i=1}^{N_l}w_{i,l}\right] \Rightarrow 0 < N_q+2\sum_{i=1}^{N_l}w_{i,l}$

which holds. So for the Huber loss function the fixed point of the algorithm is uniformly asymptotically stable, irrespective of the $x$ 's. We note that the first derivative is smaller than unity in absolute value for any $m$ , not just the fixed point.

What we should do now is either prove that the UAS property is also global, or that, if $m(0) = \bar x$ then $m(0)$ belongs to the neighborhood of attraction of $m^*$ .

— Alecos Papadopoulos
แหล่งที่มา

Thanks for the response. Give me some time to analyze this answer.

— Chris A.

Certainly. After all, the question waited 20 months.

— Alecos Papadopoulos

Yeah, I was reminded of the problem and decided to put up a bounty. :)

— Chris A.

Lucky me. I wasn't there 20 months ago - I would have taken up this question, bounty or not.

— Alecos Papadopoulos

Thanks so much for this response. It's looking like, so far, that you've earned the bounty. BTW, your indexing on the derivative of

v_{i}

$v_i$ w.r.t

m

$m$ is notationally weird. Couldn't the summations on the second line of this use another variable, such as

j

$j$ ?

— Chris A.