Regex กำลังตรวจสอบความถูกต้องของ regex [ปิด]

ปิด. คำถามนี้เป็นคำถามปิดหัวข้อ ไม่ยอมรับคำตอบในขณะนี้

ต้องการปรับปรุงคำถามนี้หรือไม่ อัพเดตคำถามเพื่อให้เป็นไปตามหัวข้อสำหรับ Code Golf Stack Exchange

ปิดให้บริการใน2 ปีที่ผ่านมา

สร้าง regex ที่จะยอมรับสตริง regex เป็นอินพุตและตรวจสอบว่ามันถูกต้อง โดยทั่วไป regex ของคุณควรจะสามารถตรวจสอบตัวเอง (ไม่ควรตรวจสอบ regex ที่ไม่ถูกต้องดังนั้นคุณจึงไม่สามารถใช้.*;)

รสชาติของคุณจะต้องได้รับการสนับสนุนอย่างเต็มที่จากการใช้งานที่รู้จักกันดี (Perl, sed, grep, gawk, ฯลฯ ) และต้องสนับสนุนอย่างเต็มที่ว่าการใช้งานเหล่านั้นรองรับอะไรบ้าง [ไม่ต้องกังวลกับทนายพูด ฉันแค่พยายามลบช่องว่างที่เป็นไปได้สำหรับสมาร์ท ***]

ฉันเขียนโค้ดกอล์ฟแต่ฉันกังวลว่ามันจะให้ความรู้แก่ผู้ที่รู้จักและใช้รสชาติที่ไม่มีคุณสมบัติ หรือความกังวลของฉันไม่มีมูลความจริง?

code-challenge code-golf code-golf game sudoku code-challenge math ai-player code-challenge sorting rosetta-stone code-challenge code-challenge programming-puzzle code-golf number code-golf maze code-golf math regular-expression code-golf sequence code-golf graph-theory code-golf string word-puzzle natural-language brainfuck metagolf optimized-output fastest-algorithm code-golf game-of-life cellular-automata code-golf puzzle-solver grid code-golf combinatorics binary-tree popularity-contest code-challenge code-golf ascii-art kolmogorov-complexity brainfuck metagolf code-golf c date code-golf word-puzzle crossword word-search code-golf code-golf quine code-golf string random

— Mateen Ulhaq
แหล่งที่มา

เป็นไปไม่ได้วงเล็บเหลี่ยมที่สร้างขึ้นเองทำให้ regex กลายเป็นไวยากรณ์อิสระตามบริบท (แทนที่ด้วยเครื่องหมายรูปแบบภาษาโปแลนด์ยังต้องการสแต็ก)

— ratchet freak

@ ratchet Augh คุณอาจพูดถูก

— Mateen Ulhaq

มีนามสกุลบางอย่างในภาษาปกติที่อาจอนุญาตให้จับคู่วงเล็บปีกกา แต่ฉันไม่รู้ว่าจะทำอย่างไร

— ratchet freak

มันจะต้องเป็นไปได้ด้วย Perl regexes

— Peter Taylor

@BrianVandenberg นิพจน์ทั่วไปที่นำมาใช้ในภาษาสมัยใหม่นั้นค่อนข้างไม่ธรรมดาทั้งหมด ... ทันทีที่คุณเพิ่มการอ้างอิงย้อนกลับคุณสามารถจับคู่ภาษาที่ไม่ใช่ภาษาปกติได้ นอกจากนี้ทั้ง Perl / PCRE และ. NET มีประสิทธิภาพเพียงพอที่จะจับคู่การซ้อนที่ถูกต้อง

— Martin Ender

ทับทิม

ผมพยายามที่จะตรงกับไวยากรณ์ที่เกิดขึ้นจริงของรสชาติ regex ทับทิมมากที่สุดเท่าที่เป็นไปได้ แต่มีนิสัยใจคอไม่กี่: มันยอมรับ lookbehinds ไม่กี่ที่เป็นจริงไม่ถูกต้อง (เช่น(?<=(?<!))) D-Aและตระหนักถึงช่วงที่ตัวละครที่ว่างเปล่าเช่น หลังสามารถแก้ไขได้สำหรับ ASCII แต่ regex นั้นยาวพอที่จะเป็น

\A(?<main>
    (?!
        \{(\d+)?,(\d+)?\} # do not match lone counted repetition
    )
    (?:
        [^()\[\]\\*+?|<'] | # anything but metacharacters
        (?<cclass>
            \[ \^? (?: # character class
                (?: # character class
                    [^\[\]\\-] | # anything but square brackets,  backslashes or dashes
                    \g<esc> |
                    \[ : \^? (?: # POSIX char-class
                        alnum | alpha | word | blank | cntrl | x?digit | graph | lower | print | punct | space | upper
                    ) : \] |
                    - (?!
                        \\[dwhsDWHS]
                    ) # range / dash not succeeded by a character class
                )+ |
                \g<cclass> # more than one bracket as delimiter
            ) \]
        ) |
        (?<esc>
            \\[^cuxkg] | # any escaped character
            \\x \h\h? | # hex escape
            \\u \h{4} | # Unicode escape
            \\c . # control escape
        ) |
        \\[kg] (?:
            < \w[^>]* (?: > | \Z) |
            ' \w[^']* (?: ' | \Z)
        )? | # named backrefs
        (?<! (?<! \\) \\[kg]) [<'] | # don't match < or ' if preceded by \k or \g
        \| (?! \g<rep> ) | # alternation
        \( (?: # group
            (?:
                \?
                (?:
                    [>:=!] | # atomic / non-capturing / lookahead
                    (?<namedg>
                        < [_a-zA-Z][^>]* > |
                        ' [_a-zA-Z][^']* ' # named group
                    ) |
                    [xmi-]+: # regex options
                )
            )?
            \g<main>*
        ) \) |
        \(\?<[!=] (?<lbpat>
            (?! \{(\d+)?,(\d+)?\} )
            [^()\[\]\\*+?] |
            \g<esc>  (?<! \\[zZ]) |
            \g<cclass> |
            \( (?: # group
                (?:
                    \?: |
                    \? \g<namedg> |
                    \? <[!=]
                )?
                \g<lbpat>*
            ) \) |
            \(\?\# [^)]* \)
        )* \)
        |
        \(\? [xmi-]+ \) # option group
        (?! \g<rep> ) 
        |
        \(\?\# [^)]*+ \) # comment
        (?! \g<rep> )
    )+
    (?<rep>
        (?:
            [*+?] | # repetition
            \{(\d+)?,(\d+)?\} # counted repetition
        )
        [+?]? # with a possessive/lazy modifier
    )?
)*\Z

รุ่นที่อ่านไม่ได้:

\A(?<main>(?!\{(\d+)?,(\d+)?\})(?:[^()\[\]\\*+?|<']|(?<cclass>\[\^?(?:(?:[^\[\]\\-]|\g<esc>|\[:\^?(?:alnum|alpha|word|blank|cntrl|x?digit|graph|lower|print|punct|space|upper):\]|-(?!\\[dwhsDWHS]))+|\g<cclass>)\])|(?<esc>\\[^cuxkg]|\\x\h\h?|\\u\h{4}|\\c.)|\\[kg](?:<\w[^>]*(?:>|\Z)|'\w[^']*(?:'|\Z))?|(?<!(?<!\\)\\[kg])[<']|\|(?!\g<rep>)|\((?:(?:\?(?:[>:=!]|(?<namedg><[_a-zA-Z][^>]*>|'[_a-zA-Z][^']*')|[xmi-]+:))?\g<main>*)\)|\(\?<[!=](?<lbpat>(?!\{(\d+)?,(\d+)?\})[^()\[\]\\*+?]|\g<esc>(?<!\\[zZ])|\g<cclass>|\((?:(?:\?:|\?\g<namedg>|\?<[!=])?\g<lbpat>*)\)|\(\?#[^)]*\))*\)|\(\?[xmi-]+\)(?!\g<rep>)|\(\?#[^)]*+\)(?!\g<rep>))+(?<rep>(?:[*+?]|\{(\d+)?,(\d+)?\})[+?]?)?)*\Z

— Lowjacker
แหล่งที่มา

ไม่ใช่ทั้งสองรุ่นที่อ่านไม่ได้ใช่ไหม

— Kibbee

@Kibbee คนแรกอ่านได้อย่างสมเหตุสมผลถ้าคุณรู้ว่า regex ดี

— Lowjacker

สิ่งนี้ทำให้มั่นใจได้อย่างไรว่าไม่มีการตั้งค่าตัวเลขที่ไม่ถูกต้อง

— Martin Ender

ฉันเดาว่ามันไม่ จากนั้นอีกครั้งไม่ใช่ข้อ จำกัด เพียงอย่างเดียวที่มี (ดูด้านบน) บางสิ่งสามารถแก้ไขได้ แต่ regex จะกลายเป็นความยาวที่น่าขัน

— Lowjacker