ทำไมความเป็นอิสระหมายถึงความสัมพันธ์เป็นศูนย์?

16

ก่อนอื่นฉันไม่ได้ถามสิ่งนี้:

ทำไมความสัมพันธ์แบบศูนย์ไม่มีนัยถึงความเป็นอิสระ?

นี่คือที่อยู่(ค่อนข้างดี)ที่นี่: /math/444408/why-does-zero-correlation-not-imply-independence

สิ่งที่ฉันถามอยู่ตรงข้าม ... บอกว่าตัวแปรสองตัวเป็นอิสระจากกัน

พวกเขาไม่สามารถมีความสัมพันธ์กันเล็กน้อยโดยไม่ได้ตั้งใจหรือไม่?

มันไม่ควรจะเป็น ... ความเป็นอิสระหมายถึงสหสัมพันธ์น้อยมาก?

— โจชัวโรนิส
แหล่งที่มา

5

แม้แต่ตัวแปรอิสระจะมีความสัมพันธ์ตัวอย่างที่ไม่ใช่ศูนย์เกือบทุกครั้งแม้ว่ามันจะยังคงใกล้เคียงกับศูนย์ก็ตาม

— jsk

10

@jsk ชี้ให้เห็นว่าคุณอาจสับสนกับความสัมพันธ์ตัวอย่างกับความสัมพันธ์ที่คาดไว้

— David

1

@ David คุณช่วยอธิบายได้มั้ย ฉันยังคงเป็นผู้เริ่มต้นในสถิติมาก

— Joshua Ronis

3

@JoshuaRonis ความสัมพันธ์ตัวอย่างคือความสัมพันธ์ที่คุณสังเกตเห็นเมื่อทำงานกับกลุ่มข้อมูล คุณใช้สิ่งนั้นเพื่อทำความเข้าใจว่าความสัมพันธ์ "จริง" ระหว่างตัวแปรสองตัวคืออะไร ยิ่งตัวอย่างใหญ่มากเท่าไหร่คุณก็ยิ่งได้รับการประมาณที่ดีขึ้นเท่านั้น ตัวอย่างเช่นความสัมพันธ์ระหว่างผลลัพธ์ของสองลูกเต๋าจึงไม่เกี่ยวข้องดังนั้นแม้ว่าคุณจะหมุนรวมกันสิบครั้งคุณอาจได้รับความสัมพันธ์ (เนื่องจากมีโอกาสสุ่ม) แต่โปรดตระหนักว่าไม่มีความพึงพอใจในเชิงบวกหรือเชิงลบ (เช่นคุณมีโอกาสเท่ากัน)

— David

1

ไม่ใช่การล่อลวง แต่เป็นการสนทนาที่เกี่ยวข้อง: ความสัมพันธ์ที่ไม่เป็นศูนย์หมายถึงการพึ่งพาอาศัยกันหรือไม่?

— SecretAgentMan

36

โดยนิยามของสัมประสิทธิ์สหสัมพันธ์หากตัวแปรสองตัวเป็นอิสระต่อกันความสัมพันธ์ของพวกเขาคือศูนย์ ดังนั้นมันจึงไม่สามารถเกิดความสัมพันธ์โดยบังเอิญได้!

ρ_{X, Y} = \frac{E [X Y] - E [X] E [Y]}{\sqrt{E [X^{2}] - [E [X]]^{2}} \sqrt{E [Y^{2}] - [E [Y]]^{2}}}

$\rho_{X,Y}=\frac{\operatorname{E}[XY]-\operatorname{E}[X]\operatorname{E}[Y]}{\sqrt{\operatorname{E}[X^2]-[\operatorname{E}[X]]^2}~\sqrt{\operatorname{E}[Y^2]- [\operatorname{E}[Y]]^2}}$

ถ้า $X$ และ $Y$ มีความเป็นอิสระหมายถึง $\operatorname{E}[XY]= \operatorname{E}[X]\operatorname{E}[Y]$ ]ดังนั้นตัวเศษของ $\rho_{X,Y}$ จึงเป็นศูนย์ในกรณีนี้

ดังนั้นถ้าคุณไม่เปลี่ยนความหมายของสหสัมพันธ์ดังที่กล่าวไว้ที่นี่มันเป็นไปไม่ได้ เว้นแต่จะอธิบายให้ชัดเจนถึงการขาดความสัมพันธ์ของคุณ

— พระเจ้าช่วย
แหล่งที่มา

2

และยังมีแผนภูมิที่แสดงความสัมพันธ์ระหว่างจำนวนของโจรสลัดกับอุณหภูมิเฉลี่ยทั่วโลกอย่างชัดเจน ดังที่ความคิดเห็นอื่นชี้ให้เห็นเราต้องระมัดระวังเกี่ยวกับขนาดของกลุ่มตัวอย่างไม่ต้องพูดถึง 'การปรากฏตัวโดยไม่ตั้งใจ'

— Carl Witthoft

@OmG "ถ้าคุณไม่เปลี่ยนความหมายของสหสัมพันธ์ตามที่กล่าวไว้ที่นี่" เมื่อฉันอ่านคำถาม OPs ฉันได้รับความหมายที่แตกต่างกันมากของ "สหสัมพันธ์" สำหรับฉัน: "พวกเขาไม่สามารถมีความสัมพันธ์เล็กน้อยโดยบังเอิญ?" คุณจะพบว่า "ความสัมพันธ์เล็กน้อยโดยบังเอิญ"

— industry7

1

@ industry7 ฉันเห็น แต่ควรกำหนดไว้ในวิธีการที่เป็นทางการ มันมีคุณภาพและเราไม่สามารถพูดถึงมันได้ที่นี่

— OmG

@CarlWitthoft จำนวนโจรสลัดและอุณหภูมิเฉลี่ยทั่วโลกไม่เป็นอิสระ พวกเขามีสาเหตุที่พบบ่อย (เช่นเวลาการพัฒนาความทันสมัย ฯลฯ ) ที่สร้างการพึ่งพาระหว่างพวกเขา "อิสรภาพ" ไม่ได้หมายความว่า "ไม่ก่อให้เกิด"; มันหมายถึง "ไม่เกี่ยวข้อง" และชัดเจนว่าแผนภูมิเหล่านั้นแสดงการเชื่อมโยง

— โนอาห์

@Noah ฉันกลัว WHOOSH ที่เกิดขึ้น venganza.org

— คาร์ล Witthoft

19

Comment on sample correlation. In comparing two small independent samples of the same size, the sample correlation is often noticeably different from $r = 0.$ [Nothing here contradicts @OmG's Answer (+1) on the population correlation $\rho.]$

Consider correlations between a million pairs of independent samples of size $n = 5$ from the exponential distribution with rate $1.$

set.seed(616)
r = replicate( 10^6, cor(rexp(5), rexp(5))  )
mean(abs(r) > .5)
[1] 0.386212
mean(r)
[1] -0.0005904455

hist(r, prob=T, br=40, col="skyblue2")
  abline(v=c(-.5,.5), col="red", lwd=2)

For example, here is the scatterplot of first of the million pairs of samples of size $5,$ for which $r = -0.5716.$

ไม่มีอะไรพิเศษเกี่ยวกับการแจกแจงแบบเอ็กซ์โพเนนเชียลในเรื่องนี้ การเปลี่ยนการกระจายพาเรนต์เป็นมาตรฐานปกติให้ผลลัพธ์ดังต่อไปนี้

set.seed(2019)
...
mean(abs(r) > .5)
[1] 0.391061
mean(r)
[1] 1.43269e-05

ในทางตรงกันข้ามนี่คือฮิสโตแกรมที่สัมพันธ์กันของคู่ตัวอย่างขนาดปกติ $n = 20.$

หมายเหตุ:หน้าอื่น ๆ ในเว็บไซต์นี้กล่าวถึงการกระจาย $r$ ในรายละเอียดเพิ่มเติม หนึ่งในนั้นคือQ & A

— BruceET
แหล่งที่มา

6

สำหรับขนาดตัวอย่างขนาดเล็กคุณอาจพบความสัมพันธ์ตัวอย่างที่ "เห็นได้ชัด" แตกต่างจากศูนย์ แต่คุณไม่น่าจะพบความสัมพันธ์ที่แตกต่างจากศูนย์อย่างมีนัยสำคัญ แม้ว่าการประเมินจุดของคุณจะห่างจากศูนย์ แต่คุณมีข้อมูลน้อยเกินไปที่จะอ้างได้อย่างมั่นใจว่าคุณเห็นความสัมพันธ์ที่ไม่ใช่ศูนย์เนื่องจากมี แต่โอกาส มีเพียง 5 คู่แม้ค่าสัมประสิทธิ์สหสัมพันธ์มากกว่า 0.8 อาจไม่แตกต่างอย่างมีนัยสำคัญจาก 0

— นิวเคลียร์วัง

11

คำตอบง่ายๆ: ถ้า 2 ตัวแปรมีความอิสระความสัมพันธ์ของประชากรจะเป็นศูนย์ในขณะที่ความสัมพันธ์ตัวอย่างจะมีขนาดเล็ก แต่ไม่ใช่ศูนย์

That is because the sample is not a perfect representation of the population.

The larger the sample, the better it represents the population, so the smaller the correlation you'll have. For an infinite sample, the correlation would be zero.

— Dave
แหล่งที่มา

1

The precise formulation would be that for any

p

$p$ and

ϵ

$\epsilon$ , there is some

n

$n$ such that if the sample size is greater than

n

$n$ , then the probability of the correlation being greater than

ϵ

$\epsilon$ is less than

p

$p$ .

— Acccumulation

Yes, absolutely correct! I tried to keep my answer as simple and conceptual as possible.

— Dave

1

Maybe this is helpful for some people sharing the same intuitive understanding. We've all seen something like this:

These data are presumably independent but clearly exhibit correlation ( $r = 0.66$ ). "I thought independence implies zero correlation!" the student says.

As others have already pointed out, the sample values are correlated, but that does not mean the population has nonzero correlation.

Of course, these two should be independent—given Nicolas Cage appeared in a record-setting 10 films this year, we shouldn't be closing the local pool for the summer for safety purposes.

But when we check how many people drown this year, there is a small chance that a record-setting 1000 people drown this year.

Getting such correlation is unlikely. Maybe one in a thousand. But it's possible, even though the two are independent. But this is just one case. Consider that there the millions of possible events to measure out there, and you can see the chance that the odds of some two happening to give a high correlation is quite high (hence the existence of graphs such as that above).

Another way to look at it is that guaranteeing that two independent events will always give uncorrelated values is itself restrictive. Given two independent dice, and the results of the first, there are a certain (sizable) set of results for the second dice which will give some nonzero correlation. To restrict the second dice's results to give zero correlation with the first is a clear violation of independence, as the first dice's rolls are now affecting the distribution of the results.

— Simon Alford
แหล่งที่มา