การแทนที่รูปแบบการจับคู่เคสด้วย sed

14

ฉันมีซอร์สโค้ดกระจายข้ามหลายไฟล์

มันมีรูปแบบที่ฉันจำเป็นต้องเปลี่ยนด้วยabcdefpqrstuvxyz
รูปแบบอาจจะAbcdef(กรณีประโยค) Pqrstuvxyzแล้วจะต้องถูกแทนที่ด้วย
รูปแบบอาจจะAbCdEf(กรณีสลับ) PqRsTuVxYzแล้วจะต้องถูกแทนที่ด้วย

กล่าวโดยย่อคือฉันต้องจับคู่กรณีของรูปแบบต้นทางและใช้รูปแบบปลายทางที่เหมาะสม

ฉันจะใช้สิ่งนี้sedกับเครื่องมืออื่นได้อย่างไร

text-processing sed awk

— user1263746
แหล่งที่มา

และถ้าเป็นABcDeF?

— Stéphane Chazelas

PQrStUvxyz - ฉันเข้าใจคุณแล้ว

— user1263746

ดังนั้นถ้าABcDeF-> PQrStUvxyzแน่นอนAbCdEf-> PqRsTuvxyzจะสอดคล้องกันอย่างมีเหตุผล หากจะคัดลอกเคสจากสตริงหนึ่งไปอีกสตริงหนึ่งจะเกิดอะไรขึ้นหากสตริงที่สองถูกแทนที่ยาวกว่า

— แกรม

มาลองตัดส่วนที่เปลี่ยนเป็น "pqrstu" เพื่อความกระชับ

— user1263746

9

โซลูชันแบบพกพาที่ใช้sed:

sed '
:1
/[aA][bB][cC][dD][eE][fF]/!b
s//\
&\
pqrstu\
PQRSTU\
/;:2
s/\n[[:lower:]]\(.*\n\)\(.\)\(.*\n\).\(.*\n\)/\2\
\1\3\4/;s/\n[^[:lower:]]\(.*\n\).\(.*\n\)\(.\)\(.*\n\)/\3\
\1\2\4/;t2
s/\n.*\n//;b1'

มันง่ายขึ้นเล็กน้อยเมื่อใช้ GNU sed:

search=abcdef replace=pqrstuvwx
sed -r ":1;/$search/I!b;s//\n&&&\n$replace\n/;:2
    s/\n[[:lower:]](.*\n)(.)(.*\n)/\l\2\n\1\3/
    s/\n[^[:lower:]](.*\n)(.)(.*\n)/\u\2\n\1\3/;t2
    s/\n.*\n(.*)\n/\1/g;b1"

โดยใช้&&&ข้างต้นเรานำมาใช้รูปแบบกรณีของสตริงสำหรับส่วนที่เหลือของการเปลี่ยนจึงABcdefจะมีการเปลี่ยนแปลงไปPQrstuVWxและการAbCdEf PqRsTuVwXเปลี่ยนเป็นเพื่อ&ให้มีผลเฉพาะกรณีของอักขระ 6 ตัวแรก

(หมายเหตุว่ามันอาจจะไม่ได้ทำในสิ่งที่คุณต้องการหรืออาจใช้เป็นห่วงอนันต์ถ้าเปลี่ยนอาจจะมีการเปลี่ยนตัว (เช่นถ้าแทนfooสำหรับfooหรือbcdสำหรับabcd)

— Stéphane Chazelas
แหล่งที่มา

8

โซลูชันแบบพกพาที่ใช้awk:

awk -v find=abcdef -v rep=pqrstu '{
  lwr=tolower($0)
  offset=index(lwr, tolower(find))

  if( offset > 0 ) {
    printf "%s", substr($0, 0, offset)
    len=length(find)

    for( i=0; i<len; i++ ) {
      out=substr(rep, i+1, 1)

      if( substr($0, offset+i, 1) == substr(lwr, offset+i, 1) )
        printf "%s", tolower(out)
      else
        printf "%s", toupper(out)
    }

    printf "%s\n", substr($0, offset+len)
  }
}'

อินพุตตัวอย่าง:

other abcdef other
other Abcdef other
other AbCdEf other

ตัวอย่างผลลัพธ์:

other pqrstu other
other Pqrstu other
other PqRsTu other

ปรับปรุง

ดังที่ระบุไว้ในความคิดเห็นข้างต้นจะแทนที่ตัวอย่างแรกของfindทุกบรรทัด ในการแทนที่อินสแตนซ์ทั้งหมด:

awk -v find=abcdef -v rep=pqrstu '{
  input=$0
  lwr=tolower(input)
  offset=index(lwr, tolower(find))

  if( offset > 0 ) {
    while( offset > 0 ) {

      printf "%s", substr(input, 0, offset)
      len=length(find)

      for( i=0; i<len; i++ ) {
        out=substr(rep, i+1, 1)

        if( substr(input, offset+i, 1) == substr(lwr, offset+i, 1) )
          printf "%s", tolower(out)
        else
          printf "%s", toupper(out)
      }

      input=substr(input, offset+len)
      lwr=substr(lwr, offset+len)
      offset=index(lwr, tolower(find))
    }

    print input
  }
}'

อินพุตตัวอย่าง:

other abcdef other ABCdef other
other Abcdef other abcDEF
other AbCdEf other aBCdEf other

ตัวอย่างผลลัพธ์:

other pqrstu other PQRstu other
other Pqrstu other pqrSTU
other PqRsTu other pQRsTu other

— แกรม
แหล่งที่มา

โปรดทราบว่าจะดำเนินการเพียงหนึ่งอินสแตนซ์ต่อบรรทัด

— Stéphane Chazelas

@StephaneChazelas อัปเดตเพื่อจัดการหลายอินสแตนซ์

— แกรม

6

perlคุณสามารถใช้ ส่งตรงจากคำถามที่พบบ่อย - ข้อความจากperldoc perlfaq6:

ฉันจะแทนที่ตัวพิมพ์เล็กและใหญ่ใน LHS ได้อย่างไรในขณะที่รักษาเคสบน RHS

ต่อไปนี้เป็นโซลูชัน Perlish ที่น่ารักโดย Larry Rosler มันใช้ประโยชน์จากคุณสมบัติของ bitcoin xor ในสตริง ASCII

   $_= "this is a TEsT case";

   $old = 'test';
   $new = 'success';

   s{(\Q$old\E)}
   { uc $new | (uc $1 ^ $1) .
           (uc(substr $1, -1) ^ substr $1, -1) x
           (length($new) - length $1)
   }egi;

   print;

และนี่มันก็เป็นรูทีนย่อยซึ่งจำลองตามข้างบน:

       sub preserve_case($$) {
               my ($old, $new) = @_;
               my $mask = uc $old ^ $old;

               uc $new | $mask .
                       substr($mask, -1) x (length($new) - length($old))
   }

       $string = "this is a TEsT case";
       $string =~ s/(test)/preserve_case($1, "success")/egi;
       print "$string\n";

ภาพพิมพ์นี้:

           this is a SUcCESS case

เพื่อเป็นทางเลือกในการรักษาตัวพิมพ์ของคำว่ายาวกว่าคำเดิมคุณสามารถใช้รหัสนี้โดย Jeff Pinyan:

   sub preserve_case {
           my ($from, $to) = @_;
           my ($lf, $lt) = map length, @_;

           if ($lt < $lf) { $from = substr $from, 0, $lt }
           else { $from .= substr $to, $lf }

           return uc $to | ($from ^ uc $from);
           }

สิ่งนี้เปลี่ยนประโยคเป็น "นี่เป็นกรณี SUcCess"

เพียงเพื่อแสดงให้เห็นว่าโปรแกรมเมอร์ C สามารถเขียน C ในภาษาการเขียนโปรแกรมใด ๆ หากคุณต้องการโซลูชันที่เหมือน C มากขึ้นสคริปต์ต่อไปนี้ทำให้การทดแทนมีตัวอักษรตัวเดียวกันตามตัวอักษรเหมือนต้นฉบับ (นอกจากนี้ยังเกิดขึ้นในการทำงานช้ากว่าโซลูชัน Perlish ถึง 240%) หากการแทนที่มีอักขระมากกว่าสตริงที่จะถูกแทนที่กรณีของอักขระตัวสุดท้ายจะถูกใช้สำหรับการทดแทนที่เหลือ

   # Original by Nathan Torkington, massaged by Jeffrey Friedl
   #
   sub preserve_case($$)
   {
           my ($old, $new) = @_;
           my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc
           my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new));
           my ($len) = $oldlen < $newlen ? $oldlen : $newlen;

           for ($i = 0; $i < $len; $i++) {
                   if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) {
                           $state = 0;
                   } elsif (lc $c eq $c) {
                           substr($new, $i, 1) = lc(substr($new, $i, 1));
                           $state = 1;
                   } else {
                           substr($new, $i, 1) = uc(substr($new, $i, 1));
                           $state = 2;
                   }
           }
           # finish up with any remaining new (for when new is longer than old)
           if ($newlen > $oldlen) {
                   if ($state == 1) {
                           substr($new, $oldlen) = lc(substr($new, $oldlen));
                   } elsif ($state == 2) {
                           substr($new, $oldlen) = uc(substr($new, $oldlen));
                   }
           }
           return $new;
   }

— devnull
แหล่งที่มา

โปรดทราบว่ามัน จำกัด อยู่ที่ตัวอักษร ASCII

— Stéphane Chazelas

5

หากคุณตัดการแทนที่ให้pqrstuลองสิ่งนี้:

การป้อนข้อมูล:

abcdef
Abcdef
AbCdEf
ABcDeF

ouput:

$ perl -lpe 's/$_/$_^lc($_)^"pqrstu"/ei' file
pqrstu
Pqrstu
PqRsTu
PQrStU

หากคุณต้องการแทนที่prstuvxyzอาจเป็น:

$ perl -lne '@c=unpack("(A4)*",$_);
    $_ =~ s/$_/$_^lc($_)^"pqrstu"/ei;
    $c[0] =~ s/$c[0]/$c[0]^lc($c[0])^"vxyz"/ei;
    print $_,$c[0]' file
pqrstuvxyz
PqrstuVxyz
PqRsTuVxYz
PQrStUVXyZ

ฉันไม่สามารถหากฎใด ๆ ที่จะ map ->ABcDeFPQrStUvxyz

— cuonglm
แหล่งที่มา

โปรดทราบว่ามัน จำกัด อยู่ที่ตัวอักษร ASCII

— Stéphane Chazelas

3

สิ่งนี้จะทำในสิ่งที่คุณอธิบาย

sed -i.bak -e "s/abcdef/pqrstuvxyz/g" \
 -e "s/AbCdEf/PqRsTuVxYz/g" \
 -e "s/Abcdef/Pqrstuvxyz/g" files/src

— Unx
แหล่งที่มา