Pseudo R สูตรกำลังสองสำหรับ GLMs

28

ฉันพบสูตรสำหรับหลอก $R^2$ ในหนังสือขยายแบบจำลองเชิงเส้นด้วย R, Julian J. Faraway (หน้า 59)

1 - \frac{ResidualDeviance}{NullDeviance}

$1-\frac{\text{ResidualDeviance}}{\text{NullDeviance}}$ NullDeviance

นี่เป็นสูตรทั่วไปสำหรับหลอก $R^2$ สำหรับ GLM หรือไม่

r regression generalized-linear-model r-squared

— MarkDollar
แหล่งที่มา

22

มีหลอกหลายวินาทีสำหรับ GLiMs ดีสถิติยูซีแอลช่วยเหลือสถานที่มีภาพรวมที่ครอบคลุมของพวกเขาที่นี่ หนึ่งรายการที่เรียกว่า McFadden ของหลอก 2สัมพันธ์กับการจำแนกประเภทของยูซีแอลเอมันเหมือนกับในแง่ที่ว่ามันเป็นดัชนีการปรับปรุงโมเดลที่ติดตั้งไว้เหนือตัวแบบโมฆะ บางซอฟต์แวร์ทางสถิติสะดุดตา SPSS ถ้าผมจำได้อย่างถูกต้องพิมพ์ออกมา McFadden ของหลอกโดยเริ่มต้นด้วยผลจากการวิเคราะห์บางอย่างเช่นการถดถอยโลจิสติกดังนั้นฉันสงสัยว่ามันเป็นเรื่องธรรมดามากแม้ว่าค็อกซ์และปราดเปรื่องและ Nagelkerke หลอก $R^2$ $R^2$ $R^2$ $R^2$ $R^2$ อาจจะมากกว่านั้น อย่างไรก็ตามหลอกของ McFadden- $R^2$ ไม่มีคุณสมบัติทั้งหมดของ (ไม่มีหลอก ) หากใครบางคนมีความสนใจในการใช้หลอก - เพื่อทำความเข้าใจกับแบบจำลองผมขอแนะนำให้อ่านเธรด CV ที่ยอดเยี่ยมนี้: การวัด หลอกใด - เป็นมาตรการที่จะรายงานการถดถอยโลจิสติก (Cox & Snell หรือ Nagelkerke) (สำหรับสิ่งที่มันคุ้มค่าตัวเองเป็น slipperier กว่าคนตระหนักถึงการสาธิตที่ดีซึ่งสามารถมองเห็นได้ในคำตอบ @ whuber ของที่นี่: Is ? ประโยชน์หรือเป็นอันตราย ) $R^2$ $R^2$ $R^2$ $R^2$ $R^2$ $R^2$

— gung - Reinstate Monica
แหล่งที่มา

ฉันสงสัยว่า pseudo-R2 ทั้งหมดเหล่านี้ได้รับการออกแบบมาโดยเฉพาะสำหรับการถดถอยโลจิสติกเท่านั้นหรือไม่ หรือพวกเขาพูดคุยกันยังสำหรับปัวซองและแกมมา - glms? ฉันพบสูตร R2 ที่แตกต่างกันสำหรับแต่ละ GLM ที่เป็นไปได้ใน

Colin Cameron, A., & Windmeijer, F. A. (1997). An R-squared measure of goodness of fit for some common nonlinear regression models. Journal of Econometrics, 77(2), 329-342.

— Jens

@ Jens บางคนดูเหมือนจะเจาะจง LR แต่คนอื่น ๆ ใช้ deviance ซึ่งคุณสามารถได้รับจาก GLiM ใด ๆ

— gung - Reinstate Monica

1

โปรดทราบว่า McFadden ของ

มักจะถูกกำหนดไว้ในแง่ของการเข้าสู่ระบบโอกาสซึ่งถูกกำหนดให้เท่านั้นถึงค่าคงที่สารเติมแต่งและไม่เบี่ยงเบนเป็นในคำถามของ OP โดยไม่ต้องกำหนดค่าคงที่สารเติมแต่งที่ McFadden ของ

ไม่ได้กำหนดไว้อย่างดี ความเบี่ยงเบนเป็นทางเลือกที่ไม่ซ้ำใครของค่าคงที่การเติมซึ่งในใจของฉันเป็นตัวเลือกที่เหมาะสมที่สุดหากการวางนัยทั่วไปควรเปรียบเทียบกับ

จากแบบจำลองเชิงเส้น

R^{2}

$R^2$

R^{2}

$R^2$

R^{2}

$R^2$

— NRH

ระบุว่า GLMs เหมาะสมกับการใช้ซ้ำอย่างน้อยกำลังสองน้อยที่สุดเช่นในbwlewis.github.io/GLMสิ่งที่จะคัดค้านจริงของการคำนวณ R2 ถ่วงน้ำหนักในระดับการเชื่อมโยง GLM ใช้น้ำหนัก 1 / แปรปรวนเป็นน้ำหนัก (ซึ่ง glm ให้กลับ ในน้ำหนักสล็อตใน glm พอดี) ไหม

— Tom Wenseleers

R^{2}

$R^2$

9

R gives null and residual deviance in the output to glm so that you can make exactly this sort of comparison (see the last two lines below).

> x = log(1:10)

> y = 1:10

> glm(y ~ x, family = poisson)

>Call:  glm(formula = y ~ x, family = poisson)

Coefficients:
(Intercept)            x  
  5.564e-13    1.000e+00  

Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
Null Deviance:      16.64 
Residual Deviance: 2.887e-15    AIC: 37.97

You can also pull these values out of the object with model$null.deviance and model$deviance

— David J. Harris
แหล่งที่มา

Ah, okay. I was just answering the question as written. I'd have added more, but I'm not 100% sure how the null deviance is calculated myself (it has something to do with a saturated model's log likelihood, but I don't remember enough of the details about saturation to be confident that I could give good intuitions)

— David J. Harris

I don't have it in the glm output (family possion or quasipoisson).

— Curious

@Tomas see my edits. I don't know if I was mistaken 2 years ago or if the default output has changed since then.

— David J. Harris

Tomas the information is produced by summary.glm. As for whether that definition of an

R^{2}

$R^2$ is common would require some kind of survey. I would say it's not especially rare, in that I've seen it before, but not something that is necessarily widely used.

— Glen_b -Reinstate Monica

1

Read the question. Do you think you answer it? The question was not "where can I get the components of the formula?".

— Curious

6

สูตรที่คุณเสนอได้รับการเสนอโดย Maddala (1983) และ Magee (1990) เพื่อประมาณ R กำลังสองในโมเดลโลจิสติก ดังนั้นฉันไม่คิดว่ามันใช้ได้กับโมเดล glm ทั้งหมด (ดูวิธีการถดถอยแบบโมเดิร์นโดย Thomas P. Ryan ในหน้า 266)

หากคุณสร้างชุดข้อมูลปลอมคุณจะเห็นว่ามันประเมินค่า R กำลังสองต่ำเกินไปสำหรับ gaussian glm ต่อตัวอย่าง

ฉันคิดว่า gaussian glm คุณสามารถใช้สูตรพื้นฐาน (lm) R กำลังสอง ...

R2gauss<- function(y,model){
    moy<-mean(y)
    N<- length(y)
    p<-length(model$coefficients)-1
    SSres<- sum((y-predict(model))^2)
    SStot<-sum((y-moy)^2)
    R2<-1-(SSres/SStot)
    Rajust<-1-(((1-R2)*(N-1))/(N-p-1))
    return(data.frame(R2,Rajust,SSres,SStot))
}

และสำหรับโลจิสติก (หรือตระกูลทวินามใน r) ฉันจะใช้สูตรที่คุณเสนอ ...

    R2logit<- function(y,model){
    R2<- 1-(model$deviance/model$null.deviance)
    return(R2)
    }

จนถึงตอนนี้สำหรับปัวซอง glm ฉันได้ใช้สมการจากโพสต์นี้

https://stackoverflow.com/questions/23067475/how-do-i-obtain-pseudo-r2-measures-in-stata-when-using-glm-regression

นอกจากนี้ยังมีบทความที่ยอดเยี่ยมเกี่ยวกับหลอก R2 ที่มีอยู่ในประตูงานวิจัย ... นี่คือลิงค์:

https://www.researchgate.net/publication/222802021_Pseudo_R-squared_measures_for_Poisson_regression_models_with_over-_or_underdispersion

ฉันหวังว่าความช่วยเหลือนี้

— Nico Coallier
แหล่งที่มา

Just fit a GLM model with family=gaussian(link=identity) and check the value of 1-summary(GLM)$deviance/summary(GLM)$null.deviance and you will see that the R2 does match the R2 value of a regular OLS regression, so the above answer is correct! See also my post here - stats.stackexchange.com/questions/412580/…

— Tom Wenseleers

3

The R package modEvA calculates D-Squared as 1 - (mod$deviance/mod$null.deviance) as mentioned by David J. Harris

set.seed(1)
data <- data.frame(y=rpois(n=10, lambda=exp(1 + 0.2 * x)), x=runif(n=10, min=0, max=1.5))

mod <- glm(y~x,data,family = poisson)

1- (mod$deviance/mod$null.deviance)
[1] 0.01133757
library(modEvA);modEvA::Dsquared(mod)
[1] 0.01133757

The D-Squared or explained Deviance of the model is introduced in (Guisan & Zimmermann 2000) https://doi.org/10.1016/S0304-3800(00)00354-9

— user2673238
แหล่งที่มา