คุณสมบัติการจัดอันดับในการถดถอยโลจิสติก

ฉันใช้การถดถอยโลจิสติก ฉันมีหกคุณสมบัติฉันต้องการทราบคุณสมบัติที่สำคัญในตัวจําแนกนี้ที่มีผลต่อผลลัพธ์มากกว่าคุณสมบัติอื่น ๆ ฉันใช้ Information Gain แต่ดูเหมือนว่ามันไม่ได้ขึ้นอยู่กับตัวจําแนกที่ใช้แล้ว มีวิธีการจัดอันดับคุณลักษณะตามความสำคัญของพวกเขาตามตัวจําแนกเฉพาะ (เช่น Logistic Regression) หรือไม่ ความช่วยเหลือใด ๆ จะได้รับการชื่นชมอย่างมาก

— BlueGirl
แหล่งที่มา

การถดถอยโลจิสติกไม่ใช่ตัวจําแนก โปรดเขียนคำถามของคุณอีกครั้งเพื่อแสดงว่าการถดถอยแบบลอจิสติกเป็นรูปแบบการประมาณความน่าจะเป็นโดยตรง

— Frank Harrell

นอกเหนือจากจุดที่ FrankHarrell ยกขึ้นคุณดูค่า

p

$p$ ของค่าสัมประสิทธิ์โดยประมาณของคุณหรือไม่ ไม่ใช่วิธีที่ดีที่สุดในการจัดอันดับคุณสมบัติ แต่สามารถให้จุดเริ่มต้นแก่คุณได้

— usεr11852

แน่นอนว่าการถดถอยโลจิสติกกำลังประเมินความน่าจะเป็นและไม่ได้จัดประเภทสิ่งต่าง ๆ อย่างชัดเจน แต่ใครจะสนใจ? มีวัตถุประสงค์เพื่อตัดสินใจว่าคลาสใดน่าจะเป็นไปได้มากที่สุดและไม่มีอะไรผิดปกติในการเรียกว่าลักษณนามหากเป็นสิ่งที่คุณใช้อยู่

— dsaxton

คำตอบ:

ฉันคิดว่าคำตอบที่คุณกำลังมองหาอาจเป็นอัลกอริทึมBoruta นี่เป็นวิธีการห่อหุ้มที่วัดความสำคัญของคุณสมบัติในแง่ "ความเกี่ยวข้องทั้งหมด" โดยตรงและนำไปใช้ในแพ็คเกจ Rซึ่งสร้างพล็อตที่ดีเช่น ที่ความสำคัญของคุณสมบัติใด ๆ อยู่บนแกน y และเปรียบเทียบกับ null พล็อตเป็นสีน้ำเงินที่นี่ โพสต์บล็อกนี้อธิบายวิธีการและฉันขอแนะนำให้คุณอ่านมันเป็นบทนำที่ชัดเจนมาก

— babelproofreader
แหล่งที่มา

p >> n

$p >> n$

@ usεr11852ไม่ฉันทำไม่ได้ ฉันเพิ่งเจอสิ่งนี้ด้วยตัวเองในสัปดาห์ที่แล้วหรือมากกว่านั้น

— babelproofreader

อืมม ... ตกลง Boruta ดูเหมือนว่ามีแนวโน้มมาก แต่ฉันมักจะสงสัยเกี่ยวกับขั้นตอนวิธีการใหม่ที่ดีจนผมเห็นพวกเขาเป็นส่วนหนึ่งของการศึกษามากขึ้นและเห็นกรณีที่พวกเขาล้มเหลวในการ Excel ( ไม่มีอาหารกลางวันฟรีทฤษฎีบท )

— usεr11852

แนวคิดที่น่าสนใจ แต่ไม่เกี่ยวข้องกับการถดถอยโลจิสติก

— Frank Harrell

"Boruta เป็นวิธีการเลือกคุณสมบัติไม่ใช่วิธีการจัดอันดับคุณลักษณะ" ดูคำถามที่พบบ่อยในหน้าแรกของแพ็คเกจ

— steadyfish

$R^2$

รายการวิธียอดนิยมเพื่อจัดลำดับความสำคัญของคุณลักษณะในตัวแบบการถดถอยโลจิสติกคือ:

$R^2$
ความเพียงพอ: สัดส่วนของความน่าจะเป็นบันทึกแบบเต็มซึ่งสามารถอธิบายได้โดยตัวทำนายแต่ละตัว
ความสอดคล้อง: แสดงถึงความสามารถของแบบจำลองในการแยกความแตกต่างระหว่างตัวแปรตอบสนองเชิงบวกและเชิงลบ แบบจำลองแยกต่างหากถูกสร้างขึ้นสำหรับตัวทำนายแต่ละตัวและคะแนนความสำคัญคือความน่าจะเป็นที่คาดการณ์ของผลบวกจริงตามตัวทำนายนั้นเพียงอย่างเดียว
ค่าข้อมูล: ค่าข้อมูลปริมาณจำนวนข้อมูลเกี่ยวกับผลลัพธ์ที่ได้จากการทำนาย มันขึ้นอยู่กับการวิเคราะห์ของตัวทำนายแต่ละตัวโดยไม่คำนึงถึงตัวทำนายอื่น ๆ

อ้างอิง:

— Sandeep S. Sandhu
แหล่งที่มา

min_{w, b} \sum_{i = 1}^{n} \log (1 + \exp (- y_{i} f_{w, b} (x_{i}))) + λ {‖ w ‖}^{2}

$\mathop {\min }\limits_{{\bf{w}},b} \sum\limits_{i = 1}^n {\log \left( {1 + \exp \left( { - {y_i}{f_{{\bf{w}},b}}({x_i})} \right)} \right) + \lambda {{\left\| {\bf{w}} \right\|}^2}}$ where the

x_{i}

$x_i$ and

y_{i}

$y_i$ are the feature vector and target vector for example

i

$i$ from your training set. This function originates from the joint likelihood over all training examples, which explains its probabalistic nature even though we use it for classification. In the equation

w

$\mathbf{w}$ is your weight vector and

b

$b$ your bias. I trust that you know what

f_{w, b} (x_{i})

${{f_{w,b}}({x_i})}$ is. The last term in the minimization problem is the regularization term, which, among other things, controls the generalization of the model.

Assuming all your $\mathbf{x}$ are normalized, for example by deviding by the magnitude of $\mathbf{x}$ , it is quite easy to see which variables are more important: those wich are larger c.f. the others or (on the negative side) smaller c.f. the others. They influence the loss the most.

If you are keen on finding the variables which really are important and in the process don't mind kicking a few out, you can $\ell_1$ regularize your loss function:

min_{w, b} \sum_{i = 1}^{n} \log (1 + \exp (- y_{i} f_{w, b} (x_{i}))) + λ | w |

$\mathop {\min }\limits_{{\bf{w}},b} \sum\limits_{i = 1}^n {\log \left( {1 + \exp \left( { - {y_i}{f_{{\bf{w}},b}}({x_i})} \right)} \right) + \lambda \left| {\bf{w}} \right|}$

The derivatives or the regularizer are quite straightforward, so I will not mention them here. Using this form of regularization and an appropriate $\lambda$ will enforce the less important elements in $\mathbf{w}$ to become zero and the others not.

I hope this helps. Ask if you have any further questions.

— pAt84
แหล่งที่มา

LR is not a classification scheme. Any use of classification comes as a postestimation step after defining the utility/cost function. Also, the OP did not ask about penalized maximum likelihood estimation. To provide evidence for relative importance of variables in regression it is very easy to use the bootstrap to obtain confidence limits for the ranks of added predictive information provided by each predictor. An example appears in Chapter 4 of Regression Modeling Strategies whose online notes and R code are available at biostat.mc.vanderbilt.edu/RmS#Materials

— Frank Harrell

Prof. Harrell, please. It is obvious we are approaching this from two different sides. You from the statistical one and I am from machine learning. I respect you, your research and your career but you are very free to formulate your own answer and let the OP decide, which one he considers the better answer for his question. I am keen on learning, so please teach me your approach but don't make me buy your book.

— pAt84

I'll just note that logistic regression was developed by statistician DR Cox in 1958, decades before machine learning existed. It is also important to note that the "loss function" (better called an objective function perhaps?) you formulated does not have any relationship whatsoever to classification. And what implied to you that my extensive notes and audio files available online with all the information I referred to cost anything?

— Frank Harrell

I upvoted both initial comments, as both raise valid points. Later comments a bit like petty quarreling to me...

— usεr11852

P.S. Trying for a more clear way to say this, optimizing prediction/estimation leads to optimum decisions because the utility function is applied in a second step and is allowed to be unrelated to the predictors. Optimizing prediction/estimation does not optimize classification and vice-versa. Optimizing classification amounts to using a strange utility function that is tailored to the dataset at hand and may not apply to new datasets. Folks who really want to optimize classification (not recommended) can use a method that bypasses estimation/prediction altogether.

— Frank Harrell