จะพิสูจน์ได้อย่างไรว่าไวยากรณ์นั้นไม่คลุมเครือ?

25

ปัญหาของฉันคือฉันจะพิสูจน์ได้อย่างไรว่าไวยากรณ์ไม่คลุมเครือ? ฉันมีไวยากรณ์ต่อไปนี้:

S \to s t a t e m e n t ∣ if e x p r e s s i o n then S ∣ if e x p r e s s i o n then S else S

$S → statement ∣ \mbox{if } expression \mbox{ then } S ∣ \mbox{if } expression \mbox{ then } S \mbox{ else } S$

และทำให้เป็นไวยากรณ์ที่ชัดเจนฉันคิดว่ามันถูกต้อง:

$S → S_1 ∣ S_2$
$S_1 → \mbox{if } expression \mbox{ then } S ∣ \mbox{if } expression \mbox{ then } S_2 \mbox{ else } S_1$
$S_2 → \mbox{if } expression \mbox{ then } S_2 \mbox{ else } S_2 ∣ statement$

ฉันรู้ว่าไวยากรณ์ที่ไม่คลุมเครือมีต้นไม้แยกหนึ่งต้นสำหรับทุกเทอม

— user1594
แหล่งที่มา

20

มี (อย่างน้อย) วิธีหนึ่งที่จะพิสูจน์ได้ว่าไม่น่าสงสัยของไวยากรณ์สำหรับภาษาLประกอบด้วยสองขั้นตอน: $G = (N,T,\delta,S)$ $L$

พิสูจน์ ) $L \subseteq \mathcal{L}(G)$
พิสูจน์. $[z^n]S_G(z) = |L_n|$

ขั้นตอนแรกค่อนข้างชัดเจน: แสดงว่าไวยากรณ์สร้าง (อย่างน้อย) คำที่คุณต้องการนั่นคือความถูกต้อง

ขั้นตอนที่สองแสดงให้เห็นว่ามีต้นไม้ที่ใช้ไวยากรณ์เป็นจำนวนมากสำหรับคำที่มีความยาวเมื่อมีคำที่มีความยาว - มี 1 ซึ่งหมายถึงความไม่น่าสงสัย มันใช้ฟังก์ชั่นโครงสร้างของซึ่งกลับไปที่ Chomsky และSchützenberger [1] คือ $G$ $n$ $L$ $n$ $G$

$\qquad \displaystyle S_G(z) = \sum_{n=0}^\infty t_nz^n$

กับจำนวนต้นไม้ไวยากรณ์มีคำของความยาวnแน่นอนคุณต้องมีสำหรับสิ่งนี้ในการทำงาน $t_n = [z^n]S_G(z)$ $G$ $n$ $|L_n|$

สิ่งที่ดีคือการที่คือ (ปกติ) ง่ายที่จะได้รับสำหรับภาษาบริบทฟรี แต่การหารูปแบบปิดสำหรับอาจเป็นเรื่องยาก แปลงให้เป็นระบบสมการของฟังก์ชั่นด้วยตัวแปรเดียวต่อ nonterminal: $S_G$ $t_n$ $G$

$\qquad \displaystyle \left[ A(z) = \sum\limits_{(A, a_0 \dots a_k) \in \delta} \ \prod\limits_{i=0}^{k} \ \tau(a_i)\ : A \in N \right] \text{ with } \tau(a) = \begin{cases} a(z) &, a \in N \\ z &, a \in T \\ \end{cases}.$

สิ่งนี้อาจดูน่ากลัว แต่เป็นเพียงการเปลี่ยนแปลงทางไวยากรณ์เท่านั้นที่จะชัดเจนในตัวอย่าง ความคิดที่จะสร้างสัญลักษณ์ว่าสถานีจะถูกนับในสัญลักษณ์ของและเนื่องจากระบบมีรูปแบบเดียวกับ , เกิดขึ้นบ่อยครั้งในผลรวมเป็นขั้วสามารถสร้างขึ้นโดยGตรวจสอบ Kuich [2] เพื่อดูรายละเอียด $z$ $G$ $z^n$ $n$ $G$

การแก้ระบบสมการนี้ (พีชคณิตคอมพิวเตอร์!) ให้ผลตอบแทน ; ตอนนี้คุณ "เท่านั้น" ต้องดึงสัมประสิทธิ์ (ในรูปแบบปิดทั่วไป) TCS โกงแผ่นและคอมพิวเตอร์พีชคณิตมักจะสามารถทำเช่นนั้น $S(z) = S_G(z)$

ตัวอย่าง

พิจารณาไวยากรณ์อย่างง่ายพร้อมกฎ $G$

ε $\qquad \displaystyle S \to aSa \mid bSb \mid \varepsilon$

เป็นที่ชัดเจนว่า (ขั้นตอนที่ 1 พิสูจน์โดยอุปนัย) มี $\mathcal{L}(G) = \{ww^R \mid w \in \{a,b\}^*\}$ palindromes ของความยาวถ้าเป็นเลขคู่,มิฉะนั้น $2^{\frac{n}{2}}$ $n$ $n$ $0$

การตั้งค่าระบบสมการให้ผลตอบแทน

$\qquad \displaystyle S(z) = 2z^2S(z) + 1$

whose solution is

$\qquad \displaystyle S_G(z) = \frac{1}{1-2z^2}$ .

The coefficients of $S_G$ coincide with the numbers of palindromes, so $G$ is unambiguous.

The Algebraic Theory of Context-Free Languages by Chomsky, Schützenberger (1963)
On the entropy of context-free languages by Kuich (1970)

— กราฟิลส์
แหล่งที่มา

3

ดังที่คุณทราบ @Raphael ความกำกวมไม่สามารถตัดสินใจได้ดังนั้นอย่างน้อยหนึ่งในขั้นตอนของคุณจะไม่สามารถใช้กลไกได้ ความคิดใดที่คน? การเดินทางรูปแบบปิดสำหรับ

?

t_{n}

$t_n$

— Martin Berger

2

ระบบสมการอาจไม่สามารถแก้ไขได้อัลกอริธึมหากระดับสูงเกินไปและการดึงสัมประสิทธิ์ที่แน่นอนออกจากฟังก์ชั่นการสร้างอาจเป็นเรื่องยาก (เกินไป) ใน "การปฏิบัติ" แม้ว่าหนึ่งในข้อตกลงกับไวยากรณ์ของ "องศา" เล็ก ๆ - โปรดทราบว่าชัมสกีรูปแบบปกติจะนำไปสู่ระบบสมการของการศึกษาระดับปริญญาเล็ก - และมีวิธีการอย่างน้อย

-asymptotics สำหรับสัมประสิทธิ์ ; สิ่งนี้อาจเพียงพอที่จะสร้างความคลุมเครือ โปรดทราบว่าเพื่อพิสูจน์ความไม่แน่นอนแสดง

โดยไม่ต้องดึงสัมประสิทธิ์เพียงพอ การพิสูจน์ตัวตนนี้อาจเป็นเรื่องยาก

\sim

$\sim$

S_{L} (z) = S_{G} (z)

$S_L(z) = S_G(z)$

— ราฟาเอล

Thank you @Raphael. Do you know of any texts that develop in detail how undecidability comes into play even if one uses e.g. Chomsky normal form? (I can't get hold of Kuich.)

— Martin Berger

@MartinBerger I just rediscovered your comment in my todo list; sorry for the long silence. There are three steps which (I think) are not computable in general: 1) Determine

S_{G}

$S_G$ . 2) Compute

| L_{n} |

$|L_n|$ . 3) Determine

[z^{n}] S_{g} (z)

$[z^n]S_g(z)$ . In particular, what representation of

L

$L$ to use for 2)?

— Raphael

Why is representation of

L

$L$ a problem? We can use any of the multiple ways of representing CFGs for compilers for example. Maybe you mean how to represent

L_{n}

$L_n$ ?

— Martin Berger

6

This is a good question, but some Googling would have told you that there is no general method for deciding ambiguity, so you need to make your question more specific.

— reinierpost
แหล่งที่มา

2

The OP asks for proof techniques, not algorithms.

— Raphael

I think so, too; it might be mentioned in the question.

— reinierpost

1

Google is not an oracle of truth, because knowlede is not democratic, and Google results are. I wouldn't count on Google in this case, because people often copy-cat one from another without checking the correctness of what they copy. Without showing a proof, they might be wrong.

— SasQ

5

@SasQ: You read my words too literally. What Google gives me is the URLs to aticles that explain things.

— reinierpost

4

For some grammars, a proof by induction (over word length) is possible.

Consider for example a grammar $G$ over $\Sigma = \{a,b\}$ given by the following rules:

$\qquad \displaystyle S \to aSa \mid bSb \mid \varepsilon$

All words of length $\leq 1$ in $L(G)$ -- there's only $\varepsilon$ -- have only one left-derivation.

Assume that all words of length $\leq n$ for some $n \in \mathbb{N}$ have only one left-derivation.

Now consider arbitrary $w = w_1 w' w_n \in L(G) \cap \Sigma^n$ for some $n > 0$ . Clearly, $w_1 \in \Sigma$ . If $w_1 = a$ , we know that the first rule in every left-derivation has to be $S \to aSa$ ; if $w_1 = b$ , it has to be $S \to bSb$ . This covers all cases. By induction hypothesis, we know that there is exactly one left-derivation for $w'$ . In combination, we conclude that there is exactly one left-derivation for $w$ as well.

This becomes harder if

there are multiple non-terminals,
the grammar is not linear, and/or
the grammar is left-recursive.

It may help to strengthen the claim to all sentential forms (if the grammar has no unproductive non-terminals) and "root" non-terminals.

I think the conversion to Greibach normal form maintains (un)ambiguity, to applying this step first may take care of left-recursion nicely.

The key is to identify one feature of every word that fixes (at least) one derivation step. The rest follows inductively.

— Raphael
แหล่งที่มา

3

Basically, it's a child generation problem. Start with the first expression, and generate it's children .... Keep doing it recursively (DFS), and after quite a few iterations, see if you can generate the same expanded expression from two different children. If you are able to do that, it's ambiguous. There is no way to determine the running time of this algorithm though. Assume it's safe, after maybe generating 30 levels of children :) (Of course it could bomb on the 31st)

— Karthik Kumar Viswanathan
แหล่งที่มา

1

The OP asks for proof techniques, not algorithms.

— Raphael

2

that can't possibly be a way to prove if a grammar is ambiguous or not. As a matter of fact when that bombing happens is undecidable.

— Sнаđошƒаӽ