มีใครแก้ไข PTLOS แบบฝึกหัด 4.1 ได้บ้างไหม?

นี้การออกกำลังกายที่ได้รับในทฤษฎีความน่าจะเป็น: ตรรกะของวิทยาศาสตร์โดยเอ็ดวินเจย์นส์, ปี 2003 มีวิธีการแก้ปัญหาบางส่วนเป็นที่นี่ ฉันได้หาทางแก้ปัญหาบางส่วนที่กว้างขึ้นและสงสัยว่ามีคนอื่นแก้ไขมันได้ไหม ฉันจะรอสักครู่ก่อนโพสต์คำตอบของฉันเพื่อให้ผู้อื่นได้ไป

เอาล่ะสมมติว่าเรามี $n$ พิเศษร่วมกันและสมมติฐานหมดจดแสดงโดย $H_i \;\;(i=1,\dots,n)$ )ต่อไปสมมติว่าเรามีชุดข้อมูล $m$ แสดงโดย $D_j \;\;(j=1,\dots,m)$ )อัตราส่วนความน่าจะเป็นสำหรับข้อสมมติฐานที่ i ถูกกำหนดโดย:

L R (H_{i}) = \frac{P (D_{1} D_{2} \dots, D_{m} | H_{i})}{P (D_{1} D_{2} \dots, D_{m} | {\bar{H}}_{i})}

$LR(H_{i})=\frac{P(D_{1}D_{2}\dots,D_{m}|H_{i})}{P(D_{1}D_{2}\dots,D_{m}|\overline{H}_{i})}$

โปรดทราบว่าสิ่งเหล่านี้เป็นความน่าจะเป็นตามเงื่อนไข ตอนนี้สมมติว่าได้รับ ith สมมติฐานชุดข้อมูลมีความเป็นอิสระเพื่อให้เรามี: $H_{i}$ $m$

P (D_{1} D_{2} \dots, D_{m} | H_{i}) = \prod_{j = 1}^{m} P (D_{j} | H_{i}) (i = 1, \dots, n) Condition 1

$P(D_{1}D_{2}\dots,D_{m}|H_{i})=\prod_{j=1}^{m}P(D_{j}|H_{i}) \;\;\;\; (i=1,\dots,n)\;\;\;\text{Condition 1}$

ตอนนี้มันจะค่อนข้างสะดวกถ้าตัวหารยังรวมอยู่ในสถานการณ์นี้ด้วยดังนั้นเราจึงมี:

P (D_{1} D_{2} \dots, D_{m} | {\bar{H}}_{i}) = \prod_{j = 1}^{m} P (D_{j} | {\bar{H}}_{i}) (i = 1, \dots, n) Condition 2

$P(D_{1}D_{2}\dots,D_{m}|\overline{H}_{i})=\prod_{j=1}^{m}P(D_{j}|\overline{H}_{i}) \;\;\;\; (i=1,\dots,n)\;\;\;\text{Condition 2}$

สำหรับในกรณีนี้อัตราส่วนความน่าจะเป็นแยกเป็นผลิตภัณฑ์ที่มีขนาดเล็กลงสำหรับแต่ละชุดข้อมูลดังนั้นเราจึงมี:

L R (H_{i}) = \prod_{j = 1}^{m} \frac{P (D_{j} | H_{i})}{P (D_{j} | {\bar{H}}_{i})}

$LR(H_i)=\prod_{j=1}^{m}\frac{P(D_{j}|H_{i})}{P(D_{j}|\overline{H}_{i})}$

ดังนั้นในกรณีนี้แต่ละชุดข้อมูล "จะออกเสียงลงคะแนนสำหรับ $H_i$ " หรือ "โหวตกับ $H_i$ " เป็นอิสระจากชุดข้อมูลอื่น ๆ

แบบฝึกหัดคือการพิสูจน์ว่าถ้า $n>2$ (มากกว่าสองข้อสมมุติ) ไม่มีวิธีที่ไม่น่าสนใจเช่นนี้ที่แฟคตอริ่งสามารถเกิดขึ้นได้ นั่นคือถ้าคุณสมมติว่าเงื่อนไข 1 และเงื่อนไข 2 ค้างไว้ปัจจัยส่วนใหญ่: แตกต่างจาก 1 และดังนั้นชุดข้อมูลเพียง 1 ชุดเท่านั้นที่จะมีส่วนในอัตราส่วนความน่าจะเป็น

\frac{P (D_{1} | H_{i})}{P (D_{1} | {\bar{H}}_{i})} \frac{P (D_{2} | H_{i})}{P (D_{2} | {\bar{H}}_{i})} \dots \frac{P (D_{m} | H_{i})}{P (D_{m} | {\bar{H}}_{i})}

$\frac{P(D_{1}|H_{i})}{P(D_{1}|\overline{H}_{i})}\frac{P(D_{2}|H_{i})}{P(D_{2}|\overline{H}_{i})}\dots\frac{P(D_{m}|H_{i})}{P(D_{m}|\overline{H}_{i})}$

ฉันเองพบว่าผลลัพธ์นี้น่าสนใจมากเพราะโดยทั่วไปแล้วมันแสดงให้เห็นว่าการทดสอบสมมติฐานหลายรายการนั้นไม่มีอะไรนอกจากชุดการทดสอบสมมติฐานคู่

— probabilityislogic
แหล่งที่มา

ฉันสับสนเล็กน้อยเกี่ยวกับดัชนีใน

; เป็น

? หรือมันคือ

{\bar{H}}_{i}

$\bar H_i$

{\bar{H}}_{i} = \arg max_{h \subset H_{i}} P (D_{1}, \dots D_{m} | h)

$\bar H_i = \arg\max_{h\subset H_i} P(D_1, \dots D_m | h)$

{\bar{H}}_{i} = \arg max_{h \in {H_{1}, \dots, H_{n}}} P (D_{1}, \dots D_{m} | h)

$\bar H_i = \arg\max_{h\in \{H_1, \dots, H_n\} } P(D_1, \dots D_m | h)$ ? Seems like it ought to be the latter, but then I'm not sure why the subscript. Or maybe I'm missing something else entirely :)

— JMS

@JMS -

หมายถึงคำสั่งตรรกะ "

เป็นเท็จ" หรือว่าหนึ่งในสมมติฐานอื่น ๆ เป็นจริง ดังนั้นใน "พีชคณิตแบบบูล" เรามี

(เพราะสมมติฐานที่มีความพิเศษและครบถ้วนสมบูรณ์)

{\bar{H}}_{i}

$\overline{H}_{i}$

H_{i}

$H_{i}$

{\bar{H}}_{i} \equiv H_{1} + H_{2} + \dots + H_{i - 1} + H_{i + 1} + \dots + H_{n}

$\overline{H}_{i}\equiv H_1+H_2+\dots+H_{i-1}+H_{i+1}+\dots+H_{n}$

— probabilityislogic

I feel like there has to be a more intuitive solution than the algebra given in Sanders' partial solution. If the data are independent given each of the hypotheses then this continues to hold when the priors of the hypothesis are varied. And somehow, the result is that the same must apply for the conclusion...

— charles.y.zheng

@charles - I know exactly how you feel. I thought I could derive it using some qualitative inconsistency (Reductio ad absurdum), but I couldn't do it. I Could extend Sander's maths though. And it is Condition 2 which is "the dodgy one" in terms of what the result means.

— probabilityislogic

@probabilityislogic "it basically shows that multiple hypothesis testing is nothing but a series of binary hypothesis tests." Please, could you expand on this sentence? By reading page 98 from Jaynes' book, I understand that you can reduce testing of

H_{1}, \dots, H_{n}

$H_1, \dots, H_n$ to testing

H_{1}

$H_1$ against each other hypothesis and then somehow normalize to get the posterior for

H_{1}

$H_1$ , but I do not understand why this would follow from the results of Excercise 4.1.

— Martin Drozdik

คำตอบ:

The reason we accepted eq. 4.28 (in the book, your condition 1) was that we assumed the probability of the data given a certain hypothesis $H_a$ and background information $X$ is independent, in other words for any $D_i$ and $D_j$ with $i\neq{j}$ :

P (D_{i} | D_{j} H_{a} X) = P (D_{i} | H_{a} X) (1)

$\begin{equation}P(D_i|D_jH_aX)=P(D_i|H_aX)\quad\quad{\rm (1)}\end{equation}$ Nonextensibility beyond the binary case can therefore be discussed like this: If we assume eq.1 to be true, is eq.2 also true?

P (D_{i} | D_{j} \bar{H_{a}} X) \overset{?}{=} P (D_{i} | \bar{H_{a}} X) (2)

$\begin{equation}P(D_i|D_j\overline{H_a}X)\stackrel{?}{=}P(D_i|\overline{H_a}X)\quad\quad{\rm (2)}\end{equation}$ First lets look at the left side of eq.2, using the multiplication rule:

P (D_{i} | D_{j} \bar{H_{a}} X) = \frac{P (D_{i} D_{j} \bar{H_{a}} | X)}{P (D_{j} \bar{H_{a}} | X)} (3)

$\begin{equation}P(D_i|D_j\overline{H_a}X)=\frac{P(D_iD_j\overline{H_a}|X)}{P(D_j\overline{H_a}|X)}\quad\quad{\rm (3)}\end{equation}$ Since the

n

$n$ hypotheses

{H_{1} \dots H_{n}}

$\{H_1\dots{H_n}\}$ are assumed mutually exclusive and exhaustive, we can write:

\bar{H_{a}} = \sum_{b \neq a} H_{b}

$\overline{H_a}=\sum_{b\neq{a}}H_b$ So eq.3 becomes:

P (D_{i} | D_{j} \bar{H_{a}} X) = \frac{\sum_{b \neq a} P (D_{i} | D_{j} H_{b} X) P (D_{j} H_{b} | X)}{\sum_{b \neq a} P (D_{j} H_{b} | X)} = \frac{\sum_{b \neq a} P (D_{i} | H_{b} X) P (D_{j} H_{b} | X)}{\sum_{b \neq a} P (D_{j} H_{b} | X)}

$P(D_i|D_j\overline{H_a}X)=\frac{\sum_{b\neq{a}}P(D_i|D_jH_bX)P(D_jH_b|X)}{\sum_{b\neq{a}}P(D_jH_b|X)}=\frac{\sum_{b\neq{a}}P(D_i|H_bX)P(D_jH_b|X)}{\sum_{b\neq{a}}P(D_jH_b|X)}$

b \neq a

$b\neq{a}$ ), the equal terms in the nominator and denominator,

P (D_{j} H_{b} | X

$P(D_jH_b|X$ ), cancel out and eq.2 is proved correct, since

H_{b} = \bar{H_{a}}

$H_b=\overline{H_a}$ . Therefore equation 4.29 can be derived from equation 4.28 in the book. But when we have more than two hypotheses, this doesn't happen, for example, if we have three hypotheses:

{H_{1}, H_{2}, H_{3}}

$\{H_1, H_2, H_3\}$ , the equation above becomes:

P (D_{i} | D_{j} \bar{H_{1}} X) = \frac{P (D_{i} | H_{2} X) P (D_{j} H_{2} | X) + P (D_{i} | H_{3} X) P (D_{j} H_{3} | X)}{P (D_{j} H_{2} | X) + P (D_{j} H_{3} | X)}

$P(D_i|D_j\overline{H_1}X)=\frac{P(D_i|H_2X)P(D_jH_2|X)+P(D_i|H_3X)P(D_jH_3|X)}{P(D_jH_2|X)+P(D_jH_3|X)}$ In other words:

P (D_{i} | D_{j} \bar{H_{1}} X) = \frac{P (D_{i} | H_{2} X)}{1 + \frac{P (D_{j} H_{3} | X)}{P (D_{j} H_{2} | X)}} + \frac{P (D_{i} | H_{3} X)}{1 + \frac{P (D_{j} H_{2} | X)}{P (D_{j} H_{3} | X)}}

$P(D_i|D_j\overline{H_1}X)=\frac{P(D_i|H_2X)}{1+\frac{P(D_jH_3|X)}{P(D_jH_2|X)}}+\frac{P(D_i|H_3X)}{1+\frac{P(D_jH_2|X)}{P(D_jH_3|X)}}$ The only way this equation can yield eq.2 is that both denominators equal 1, i.e. both fractions in the denominators must equal zero. But that is impossible.

— astroboy
แหล่งที่มา

I think the fourth equation is incorrect. We should have

P (D_{i} D_{j} H_{b} | X) = P (D_{i} H_{B} | X) P (D_{j} | H_{b} X)

$P(D_iD_jH_b|X)=P(D_iH_B|X)P(D_j|H_bX)$

— probabilityislogic

Thank you very much probabilityislogic, I was able to correct the solution. What do you think now?

— astroboy

I just don't understand how Jaynes says: "Those who fail to distinguish between logical independence and causal independence would suppose that (4.29) is always valid".

— astroboy

I think I found the answer to my last comment: right after the sentence above Jaynes says: "provided only that no

D_{i}

$D_i$ exerts a physical influence on any other

D_{j}

$D_j$ ". So essentially Jaynes is saying that even if they don't have physical influence, there is a logical limitation that doesn't allow the generalization to more than two hypotheses.

— astroboy

After reading the text again I feel my last comment was not a good answer. As I understand it now, Jayne's wanted to say: "Those who fail to distinguish between logical independence and causal independence" would argue that

D_{i}

$D_i$ and

D_{j}

$D_j$ are assumed to have no physical influence. Thus they have causal independence which for them implies logical independence over any set of hypotheses. So they find all this discussion meaningless and simply proceed to generalize the binary case.

— astroboy

Okay, so rather than go and re-derive Saunder's equation (5), I will just state it here. Condition 1 and 2 imply the following equality:

\prod_{j = 1}^{m} (\sum_{k \neq i} h_{k} d_{j k}) = {(\sum_{k \neq i} h_{k})}^{m - 1} (\sum_{k \neq i} h_{k} \prod_{j = 1}^{m} d_{j k})

$\prod_{j=1}^{m}\left(\sum_{k\neq i}h_{k}d_{jk}\right)=\left(\sum_{k\neq i}h_{k}\right)^{m-1}\left(\sum_{k\neq i}h_{k}\prod_{j=1}^{m}d_{jk}\right)$ where

d_{j k} = P (D_{j} | H_{k}, I) h_{k} = P (H_{k} | I)

$d_{jk}=P(D_{j}|H_{k},I)\;\;\;\;h_{k}=P(H_{k}|I)$

Now we can specialise to the case $m=2$ (two data sets) by taking $D_{1}^{(1)}\equiv D_{1}$ and relabeling $D_{2}^{(1)}\equiv D_{2}D_{3}\dots D_{m}$ . Note that these two data sets still satisfy conditions 1 and 2, so the result above applies to them as well. Now expanding in the case $m=2$ we get:

(\sum_{k \neq i} h_{k} d_{1 k}) (\sum_{l \neq i} h_{l} d_{2 l}) = (\sum_{k \neq i} h_{k}) (\sum_{l \neq i} h_{l} d_{1 l} d_{2 l})

$\left(\sum_{k\neq i}h_{k}d_{1k}\right)\left(\sum_{l\neq i}h_{l}d_{2l}\right)=\left(\sum_{k\neq i}h_{k}\right)\left(\sum_{l\neq i}h_{l}d_{1l}d_{2l}\right)$

\to \sum_{k \neq i} \sum_{l \neq i} h_{k} h_{l} d_{1 k} d_{2 l} = \sum_{k \neq i} \sum_{l \neq i} h_{k} h_{l} d_{1 l} d_{2 l}

$\rightarrow\sum_{k\neq i}\sum_{l\neq i}h_{k}h_{l}d_{1k}d_{2l}=\sum_{k\neq i}\sum_{l\neq i}h_{k}h_{l}d_{1l}d_{2l}$

\to \sum_{k \neq i} \sum_{l \neq i} h_{k} h_{l} d_{2 l} (d_{1 k} - d_{1 l}) = 0 (i = 1, \dots, n)

$\rightarrow\sum_{k\neq i}\sum_{l\neq i}h_{k}h_{l}d_{2l}(d_{1k}-d_{1l})=0\;\;\;\;\;\;\; (i=1,\dots,n)$

The term $(d_{1a}-d_{1b})$ occurs twice in the above double summation, once when $k=a$ and $l=b$ , and once again when $k=b$ and $l=a$ . This will occur as long as $a,b\neq i$ . The coefficient of each term is given by $d_{2b}$ and $-d_{2a}$ . Now because there are $i$ of these equations, we can actually remove $i$ from these equations. To illustrate, take $i=1$ , now this means we have all conditions except where $a=1,b=2$ and $b=1,a=2$ . Now take $i=3$ , and we now can have these two conditions (note this assumes at least three hypothesis). So the equation can be re-written as:

\sum_{l > k} h_{k} h_{l} (d_{2 l} - d_{2 k}) (d_{1 k} - d_{1 l}) = 0

$\sum_{l>k}h_{k}h_{l}(d_{2l}-d_{2k})(d_{1k}-d_{1l})=0$

Now each of the $h_i$ terms must be greater than zero, for otherwise we are dealing with $n_{1}<n$ hypothesis, and the answer can be reformulated in terms of $n_{1}$ . So these can be removed from the above set of conditions:

\sum_{l > k} (d_{2 l} - d_{2 k}) (d_{1 k} - d_{1 l}) = 0

$\sum_{l>k}(d_{2l}-d_{2k})(d_{1k}-d_{1l})=0$

Thus, there are $\frac{n(n-1)}{2}$ conditions that must be satisfied, and each conditions implies one of two "sub-conditions": that $d_{jk}=d_{jl}$ for either $j=1$ or $j=2$ (but not necessarily both). Now we have a set of all of the unique pairs $(k,l)$ for $d_{jk}=d_{jl}$ . If we were to take $n-1$ of these pairs for one of the $j$ , then we would have all the numbers $1,\dots,n$ in the set, and $d_{j1}=d_{j2}=\dots=d_{j,n-1}=d_{j,n}$ . This is because the first pair has $2$ elements, and each additional pair brings at least one additional element to the set*

But note that because there are $\frac{n(n-1)}{2}$ conditions, we must choose at least the smallest integer greater than or equal to $\frac{1}{2}\times\frac{n(n-1)}{2}=\frac{n(n-1)}{4}$ for one of the $j=1$ or $j=2$ . If $n>4$ then the number of terms chosen is greater than $n-1$ . If $n=4$ or $n=3$ then we must choose exactly $n-1$ terms. This implies that $d_{j1}=d_{j2}=\dots=d_{j,n-1}=d_{j,n}$ . Only with two hypothesis ( $n=2$ ) is where this does not occur. But from the last equation in Saunder's article this equality condition implies:

P (D_{j} | {\bar{H}}_{i}) = \frac{\sum_{k \neq i} d_{j k} h_{k}}{\sum_{k \neq i} h_{k}} = d_{j i} \frac{\sum_{k \neq i} h_{k}}{\sum_{k \neq i} h_{k}} = d_{j i} = P (D_{j} | H_{i})

$P(D_{j}|\overline{H}_{i})=\frac{\sum_{k\neq i}d_{jk}h_{k}}{\sum_{k\neq i}h_{k}}=d_{ji}\frac{\sum_{k\neq i}h_{k}}{\sum_{k\neq i}h_{k}}=d_{ji}=P(D_{j}|H_{i})$

Thus, in the likelihood ratio we have:

\frac{P (D_{1}^{(1)} | H_{i})}{P (D_{1}^{(1)} | {\bar{H}}_{i})} = \frac{P (D_{1} | H_{i})}{P (D_{1} | {\bar{H}}_{i})} = 1 OR \frac{P (D_{2}^{(1)} | H_{i})}{P (D_{2}^{(1)} | {\bar{H}}_{i})} = \frac{P (D_{2} D_{3} \dots, D_{m} | H_{i})}{P (D_{2} D_{3} \dots, D_{m} | {\bar{H}}_{i})} = 1

$\frac{P(D_{1}^{(1)}|H_{i})}{P(D_{1}^{(1)}|\overline{H}_{i})}=\frac{P(D_{1}|H_{i})}{P(D_{1}|\overline{H}_{i})}=1 \text{ OR} \frac{P(D_{2}^{(1)}|H_{i})}{P(D_{2}^{(1)}|\overline{H}_{i})}=\frac{P(D_{2}D_{3}\dots,D_{m}|H_{i})}{P(D_{2}D_{3}\dots,D_{m}|\overline{H}_{i})}=1$

To complete the proof, note that if the second condition holds, the result is already proved, and only one ratio can be different from 1. If the first condition holds, then we can repeat the above analysis by relabeling $D_{1}^{(2)}\equiv D_{2}$ and $D_{2}^{(2)}\equiv D_{3}\dots,D_{m}$ . Then we would have $D_{1},D_{2}$ not contributing, or $D_{2}$ being the only contributor. We would then have a third relabeling when $D_{1}D_{2}$ not contributing holds, and so on. Thus, only one data set can contribute to the likelihood ratio when condition 1 and condition 2 hold, and there are more than two hypothesis.

*NOTE: An additional pair might bring no new terms, but this would be offset by a pair which brought 2 new terms. e.g. take $d_{j1}=d_{j2}$ as first[+2], $d_{j1}=d_{j3}$ [+1] and $d_{j2}=d_{j3}$ [+0], but next term must have $d_{jk}=d_{jl}$ for both $k,l\notin (1,2,3)$ . This will add two terms [+2]. If $n=4$ then we don't need to choose any more, but for the "other" $j$ we must choose the 3 pairs which are not $(1,2),(2,3),(1,3)$ . These are $(1,4),(2,4),(3,4)$ and thus the equality holds, because all numbers $(1,2,3,4)$ are in the set.

— probabilityislogic
แหล่งที่มา

I am beginning to doubt the accuracy of this proof. The result in Saunders maths implies only

n

$n$ non linear constraints on the

d_{j k}

$d_{jk}$ . This makes

d_{j k}

$d_{jk}$ only have

n

$n$ degrees of freedom instead of

2 n

$2n$ . However to get to the

\frac{n (n - 1)}{2}

$\frac{n(n-1)}{2}$ conditions a different argument is required.

— probabilityislogic

For the record, here is a somewhat more extensive proof. It also contains some background information. Maybe this is helpful for others studying the topic.

The main idea of the proof is to show that Jaynes' conditions 1 and 2 imply that

P (D_{m_{k}} | H_{i} X) = P (D_{m_{k}} | X),

$P(D_{m_k}|H_iX)=P(D_{m_k}|X),$ for all but one data set

m_{k} = 1, \dots, m

$m_k=1,\ldots,m$ . It then shows that for all these data sets, we also have

P (D_{m_{k}} | {\bar{H}}_{i} X) = P (D_{m_{k}} | X) .

$P(D_{m_k}|\overline H_iX)=P(D_{m_k}|X).$ Thus we have for all but one data set,

\frac{P (D_{m_{k}} | H_{i} X)}{P (D_{m_{k}} | {\bar{H}}_{i} X)} = \frac{P (D_{m_{k}} | X)}{P (D_{m_{k}} | X)} = 1.

$\frac{P(D_{m_k}|H_iX)}{P(D_{m_k}|\overline H_iX)} = \frac{P(D_{m_k}|X)}{P(D_{m_k}|X)} = 1.$ The reason that I wanted to include the proof here is that some of the steps involved are not at all obvious, and one needs to take care not to use anything else than conditions 1 and 2 and the product rule (as many of the other proofs implicitly do). The link above includes all these steps in detail. It is on my Google Drive and I will make sure it stays accessible.

— dennis
แหล่งที่มา

Welcome to Cross Validated. Thank you for your answer. Can you please edit you answer to expand it, in order to include the main points of the link you provide? It will be more helpful both for people searching in this site and in case the link breaks. By the way, take the opportunity to take the Tour, if you haven't done it already. See also some tips on How to Answer, on formatting help and on writing down equations using LaTeX / MathJax.

— Ertxiem - reinstate Monica

Thanks for your comment. I edited the post and sketched the main steps of the proof.

— dennis