227

ฉันจะรับสตริงหลังจากสตริงย่อยที่ระบุได้อย่างไร

ตัวอย่างเช่นฉันต้องการรับสายหลังจาก"world"ในmy_string="hello python world , i'm a beginner "

python string

400

วิธีที่ง่ายที่สุดอาจเป็นเพียงการแยกคำเป้าหมายของคุณ

my_string="hello python world , i'm a beginner "
print my_string.split("world",1)[1]

แยกใช้คำ (หรือตัวอักษร) เพื่อแยกและเลือก จำกัด จำนวนแยก

ในตัวอย่างนี้แบ่งเป็น "โลก" และ จำกัด ให้แยกเพียงครั้งเดียว

— Joran Beasley
แหล่งที่มา

ถ้าฉันต้องการแยกข้อความด้วยคำว่า 'ต่ำ' และมันมีคำต่ำกว่าก่อนหน้านี้สิ่งนี้จะไม่ทำงาน!

— Leonardo Hermoso

1

คุณจะแบ่งง่าย ๆ 2xtarget.split('lower',1)[-1].split('low',1)[-1]

— Joran Beasley

ถ้าประโยคนั้นเป็น "สวัสดี python Megaworld โลกฉันเป็นผู้เริ่มต้น" ฉันจะทำให้มันดูที่คำทั้งหมดและไม่ได้เป็นส่วนหนึ่งของ 'Megaworld' ได้อย่างไร? ขอบคุณ

— pbou

1

จากนั้นสตริงที่คุณค้นหาคือ "โลก" ... หรือใช้ regex สำหรับคำขอบเขต

— Joran Beasley

6

my_string.partition("world")[-1](หรือ...[2]) เร็วขึ้น

— Martijn Pieters

66

s1 = "hello python world , i'm a beginner "
s2 = "world"

print s1[s1.index(s2) + len(s2):]

หากคุณต้องการที่จะจัดการกับกรณีที่s2เป็นไม่ได้อยู่ในปัจจุบันs1แล้วใช้เมื่อเทียบกับs1.find(s2) indexหากค่าตอบแทนของการเรียกร้องที่เป็น-1แล้วไม่ได้อยู่ในs2s1

— arshajii
แหล่งที่มา

คุณได้รับรหัสที่แตกต่างกัน (ที่คั่นหลายพัน) ... ฉันไม่แน่ใจว่าคุณไม่ได้สร้างสตริงย่อยที่ไม่จำเป็นด้วยสิ่งนี้

— Joran Beasley

@JoranBeasley เราเรียกเฉพาะ index (), len () และ slice ไม่มีเหตุผลสำหรับดัชนี () และ len () เพื่อสร้างสตริงย่อยและหากพวกเขาทำ (ฉันคิดว่ามันยากที่จะเชื่อ) นั่นเป็นเพียงรายละเอียดการใช้งานที่ไม่จำเป็น เหมือนกันสำหรับส่วน - ไม่มีเหตุผลที่มันจะสร้างสตริงย่อยนอกเหนือจากที่ส่งคืน

— shx2

@ shx2print( s1[s1.index(s2) + len(s2):] is s1[s1.index(s2) + len(s2):])

— Joran Beasley

@JoranBeasley คุณพยายามทำอะไรกับตัวอย่างนี้? ว่าในการโทรหลายครั้งวัตถุที่แตกต่างกันจะถูกส่งกลับ? โดย "สารตั้งต้นที่ไม่จำเป็น" ฉันหมายถึงสารตั้งต้นอื่นนอกเหนือจากที่ส่งคืนเช่นสารตั้งต้นที่ไม่จำเป็นต้องสร้างเพื่อให้ได้ผลลัพธ์

— shx2

57

partitionฉันไม่มีใครแปลกใจที่กล่าวถึง

def substring_after(s, delim):
    return s.partition(delim)[2]

IMHO โซลูชันนี้สามารถอ่านได้มากกว่า @ arshajii นอกเหนือจากนั้นฉันคิดว่า @ arshajii นั้นดีที่สุดสำหรับการเป็นผู้ที่เร็วที่สุด - มันไม่ได้สร้างสำเนา / สตริงย่อยที่ไม่จำเป็น

— shx2
แหล่งที่มา

2

นี่เป็นทางออกที่ดีและจัดการกรณีที่สตริงย่อยไม่ได้เป็นส่วนหนึ่งของสตริงฐานอย่าง

— mattmc3

คุณได้รับรหัสที่แตกต่างกัน (ที่คั่นหลายพัน) ... ฉันไม่แน่ใจว่าคุณไม่ได้สร้างสตริงย่อยที่ไม่จำเป็นด้วย (และฉันขี้เกียจเกินไปที่จะทำโปรไฟล์ให้เหมาะสม)

— Joran Beasley

1

@JoranBeasley มันอย่างชัดเจนไม่สร้าง substings ที่ไม่จำเป็น ฉันคิดว่าคุณอ่านคำตอบของฉันผิด

— shx2

(เช่นเดียวกับอาราชิฉันคิดว่า ... )

— Joran Beasley

3

นอกจากนี้เป็นได้เร็วstr.split(..., 1)กว่า

— Martijn Pieters

20

คุณต้องการใช้str.partition():

>>> my_string.partition("world")[2]
" , i'm a beginner "

เพราะตัวเลือกนี้จะเร็วกว่าทางเลือก

โปรดทราบว่าสิ่งนี้จะสร้างสตริงว่างถ้าตัวคั่นหายไป:

>>> my_string.partition("Monty")[2]  # delimiter missing
''

หากคุณต้องการมีสตริงต้นฉบับให้ทดสอบว่าค่าที่สองที่ส่งคืนจากค่าstr.partition()ว่างเปล่าหรือไม่:

prefix, success, result = my_string.partition(delimiter)
if not success: result = prefix

คุณสามารถใช้งานได้str.split()ไม่เกิน 1:

>>> my_string.split("world", 1)[-1]
" , i'm a beginner "
>>> my_string.split("Monty", 1)[-1]  # delimiter missing
"hello python world , i'm a beginner "

แต่ตัวเลือกนี้จะช้าลง สำหรับสถานการณ์ที่ดีที่สุดstr.partition()จะเร็วกว่าประมาณ15%เมื่อเทียบกับstr.split():

                                missing        first         lower         upper          last
      str.partition(...)[2]:  [3.745 usec]  [0.434 usec]  [1.533 usec]  <3.543 usec>  [4.075 usec]
str.partition(...) and test:   3.793 usec    0.445 usec    1.597 usec    3.208 usec    4.170 usec
      str.split(..., 1)[-1]:  <3.817 usec>  <0.518 usec>  <1.632 usec>  [3.191 usec]  <4.173 usec>
            % best vs worst:         1.9%         16.2%          6.1%          9.9%          2.3%

นี่แสดงการกำหนดเวลาต่อการดำเนินการด้วยอินพุตที่นี่ตัวคั่นจะหายไป (สถานการณ์กรณีที่เลวร้ายที่สุด) วางไว้ก่อน (สถานการณ์กรณีที่ดีที่สุด) หรือในครึ่งล่างครึ่งบนหรือตำแหน่งสุดท้าย เวลาที่เร็วที่สุดจะถูกทำเครื่องหมายด้วย[...]และ<...>ทำเครื่องหมายว่าแย่ที่สุด

ตารางข้างต้นนี้จัดทำขึ้นโดยการทดลองแบบครอบคลุมเวลาสำหรับทั้งสามตัวเลือกซึ่งผลิตไว้ด้านล่าง ฉันทำการทดสอบบน Python 3.7.4 ในรุ่น 2017 15 "Macbook Pro พร้อมกับ Intel Core i7 2.9 GHz และ RAM 16 GB

สคริปต์นี้สร้างประโยคแบบสุ่มที่มีและไม่มีตัวคั่นที่เลือกแบบสุ่มและหากมีอยู่ที่ตำแหน่งต่าง ๆ ในประโยคที่สร้างขึ้นให้รันการทดสอบตามลำดับแบบสุ่มด้วยการทำซ้ำ (การสร้างบัญชีผลลัพธ์ที่ยุติธรรมสำหรับเหตุการณ์ระบบปฏิบัติการแบบสุ่ม จากนั้นพิมพ์ตารางผลลัพธ์:

import random
from itertools import product
from operator import itemgetter
from pathlib import Path
from timeit import Timer

setup = "from __main__ import sentence as s, delimiter as d"
tests = {
    "str.partition(...)[2]": "r = s.partition(d)[2]",
    "str.partition(...) and test": (
        "prefix, success, result = s.partition(d)\n"
        "if not success: result = prefix"
    ),
    "str.split(..., 1)[-1]": "r = s.split(d, 1)[-1]",
}

placement = "missing first lower upper last".split()
delimiter_count = 3

wordfile = Path("/usr/dict/words")  # Linux
if not wordfile.exists():
    # macos
    wordfile = Path("/usr/share/dict/words")
words = [w.strip() for w in wordfile.open()]

def gen_sentence(delimiter, where="missing", l=1000):
    """Generate a random sentence of length l

    The delimiter is incorporated according to the value of where:

    "missing": no delimiter
    "first":   delimiter is the first word
    "lower":   delimiter is present in the first half
    "upper":   delimiter is present in the second half
    "last":    delimiter is the last word

    """
    possible = [w for w in words if delimiter not in w]
    sentence = random.choices(possible, k=l)
    half = l // 2
    if where == "first":
        # best case, at the start
        sentence[0] = delimiter
    elif where == "lower":
        # lower half
        sentence[random.randrange(1, half)] = delimiter
    elif where == "upper":
        sentence[random.randrange(half, l)] = delimiter
    elif where == "last":
        sentence[-1] = delimiter
    # else: worst case, no delimiter

    return " ".join(sentence)

delimiters = random.choices(words, k=delimiter_count)
timings = {}
sentences = [
    # where, delimiter, sentence
    (w, d, gen_sentence(d, w)) for d, w in product(delimiters, placement)
]
test_mix = [
    # label, test, where, delimiter sentence
    (*t, *s) for t, s in product(tests.items(), sentences)
]
random.shuffle(test_mix)

for i, (label, test, where, delimiter, sentence) in enumerate(test_mix, 1):
    print(f"\rRunning timed tests, {i:2d}/{len(test_mix)}", end="")
    t = Timer(test, setup)
    number, _ = t.autorange()
    results = t.repeat(5, number)
    # best time for this specific random sentence and placement
    timings.setdefault(
        label, {}
    ).setdefault(
        where, []
    ).append(min(dt / number for dt in results))

print()

scales = [(1.0, 'sec'), (0.001, 'msec'), (1e-06, 'usec'), (1e-09, 'nsec')]
width = max(map(len, timings))
rows = []
bestrow = dict.fromkeys(placement, (float("inf"), None))
worstrow = dict.fromkeys(placement, (float("-inf"), None))

for row, label in enumerate(tests):
    columns = []
    worst = float("-inf")
    for p in placement:
        timing = min(timings[label][p])
        if timing < bestrow[p][0]:
            bestrow[p] = (timing, row)
        if timing > worstrow[p][0]:
            worstrow[p] = (timing, row)
        worst = max(timing, worst)
        columns.append(timing)

    scale, unit = next((s, u) for s, u in scales if worst >= s)
    rows.append(
        [f"{label:>{width}}:", *(f" {c / scale:.3f} {unit} " for c in columns)]
    )

colwidth = max(len(c) for r in rows for c in r[1:])
print(' ' * (width + 1), *(p.center(colwidth) for p in placement), sep="  ")
for r, row in enumerate(rows):
    for c, p in enumerate(placement, 1):
        if bestrow[p][1] == r:
            row[c] = f"[{row[c][1:-1]}]"
        elif worstrow[p][1] == r:
            row[c] = f"<{row[c][1:-1]}>"
    print(*row, sep="  ")

percentages = []
for p in placement:
    best, worst = bestrow[p][0], worstrow[p][0]
    ratio = ((worst - best) / worst)
    percentages.append(f"{ratio:{colwidth - 1}.1%} ")

print("% best vs worst:".rjust(width + 1), *percentages, sep="  ")

— Martijn Pieters
แหล่งที่มา

คำตอบที่ดี! โดยเฉพาะอย่างยิ่งเพราะคุณให้เหตุผลที่ดีกว่านี้: P

— Joran Beasley

18

หากคุณต้องการทำสิ่งนี้โดยใช้ regex คุณสามารถใช้กลุ่มที่ไม่ถูกจับภาพเพื่อให้ได้คำว่า "โลก" และจากนั้นก็คว้าทุกสิ่งหลังจากนั้น

(?:world).*

ตัวอย่างสตริงทดสอบที่นี่

— Tadgh
แหล่งที่มา

28

บางคนเมื่อประสบกับปัญหาคิดว่า "ฉันรู้ฉันไม่ใช้สีหน้าปกติ" ... ตอนนี้คุณมี 2 ปัญหา ...

— Joran Beasley

2

ฮ่าฮ่าความผิดพลาดของฉันฉันคิดว่านี่เป็นแท็ก regex ดังนั้นฉันจึงพยายามให้คำตอบกับ regex โอ้ตอนนี้มันอยู่ที่นั่นแล้ว

— Tadgh

1

มันคือทั้งหมดที่ดี ... มันแน่นอนวิธีหนึ่งที่น่าสนใจของแมวตัวนี้ ... overkill สำหรับปัญหานี้แม้ว่า (imho)

— Joran Beasley

ลิงค์กลุ่มที่ไม่ได้จับภาพนั้นไม่ได้ชี้ไปยังสิ่งที่ถูกต้องอีกต่อไป

— Apteryx

1

สำหรับผู้ที่สนใจ นี่คือรหัสเต็มresult = re.search(r"(?:world)(.*)", "hello python world , i'm a beginner ").group(1)

— RaduS

5

คุณสามารถใช้แพ็คเกจนี้ชื่อว่า "substring" เพียงพิมพ์ "สตริงย่อยการติดตั้ง pip" คุณสามารถรับซับสตริงได้โดยเพียงแค่กล่าวถึงตัวอักษร / ดัชนีเริ่มต้นและสิ้นสุด

ตัวอย่างเช่น:

import substring

s = substring.substringByChar("abcdefghijklmnop", startChar="d", endChar="n")

print(s)

เอาท์พุท:

s = defghijklmn

— ศรีรามเวติ
แหล่งที่มา

3

มันเป็นคำถามเก่า แต่ฉันต้องเผชิญกับสถานการณ์เดียวกันฉันต้องแยกสตริงโดยใช้คำว่า "ต่ำ" ซึ่งเป็นปัญหาสำหรับฉันคือว่าฉันมีสตริงเดียวกันกับคำด้านล่างและล่าง

ฉันแก้ไขมันโดยใช้โมดูลใหม่ด้วยวิธีนี้

import re

string = '...below...as higher prices mean lower demand to be expected. Generally, a high reading is seen as negative (or bearish), while a low reading is seen as positive (or bullish) for the Korean Won.'

ใช้ re.split กับ regex เพื่อจับคู่คำที่ตรงกัน

stringafterword = re.split('\\blow\\b',string)[-1]
print(stringafterword)
' reading is seen as positive (or bullish) for the Korean Won.'

รหัสทั่วไปคือ:

re.split('\\bTHE_WORD_YOU_WANT\\b',string)[-1]

หวังว่าสิ่งนี้จะช่วยให้ใครบางคน!

— Leonardo Hermoso
แหล่งที่มา

1

บางทีคุณอาจจะยังสามารถใช้เพียง: string.partition(" low ")[2]? (สังเกตช่องว่างทั้งสองด้านของlow

— Mtl Dev

1

ลองวิธีการทั่วไปนี้:

import re
my_string="hello python world , i'm a beginner "
p = re.compile("world(.*)")
print (p.findall(my_string))

#[" , i'm a beginner "]

— Hadij
แหล่งที่มา

1

ใน Python 3.9 มีremoveprefixการเพิ่มวิธีการใหม่:

>>> 'TestHook'.removeprefix('Test')
'Hook'
>>> 'BaseTestCase'.removeprefix('Test')
'BaseTestCase'

เอกสารประกอบ: https://docs.python.org/3.9/library/stdtypes.html#str.removeprefix
ประกาศ: https://docs.python.org/3.9/whatsnew/3.9.html

— gntskn
แหล่งที่มา

วิธีรับสตริงหลังจากสตริงย่อยเฉพาะ

s = defghijklmn