มัลติโพรเซสซิง - ไปป์เทียบกับคิว

151

อะไรคือความแตกต่างพื้นฐานระหว่างการรอคิวและท่อในแพคเกจ multiprocessing งูใหญ่ ?

ในสถานการณ์ใดควรเลือกหนึ่งในอีกสถานการณ์หนึ่ง จะใช้Pipe()เมื่อไร? จะใช้Queue()เมื่อไร?

— โจนาธาน
แหล่งที่มา

281

A Pipe()สามารถมีจุดปลายได้สองจุดเท่านั้น
Queue()สามารถมีผลิตและผู้บริโภคหลาย

ควรใช้เมื่อใด

Queue()หากคุณต้องการมากกว่าสองจุดในการสื่อสารการใช้งาน

หากคุณต้องการประสิทธิภาพการทำงานที่แน่นอนที่Pipe()จะเร็วเพราะถูกสร้างขึ้นบนQueue()Pipe()

การเปรียบเทียบประสิทธิภาพ

สมมติว่าคุณต้องการวางไข่สองกระบวนการและส่งข้อความระหว่างพวกเขาโดยเร็วที่สุด นี่คือผลการจับเวลาของการแข่งขันลากระหว่างการทดสอบที่คล้ายกันโดยใช้Pipe()และQueue()... นี่คือใน ThinkpadT61 ที่ใช้ Ubuntu 11.10 และ Python 2.7.2

FYI ฉันได้ผลJoinableQueue()เป็นโบนัส JoinableQueue()บัญชีสำหรับงานเมื่อqueue.task_done()มีการเรียก (มันไม่ได้รู้เกี่ยวกับงานเฉพาะมันก็นับงานที่ยังไม่เสร็จในคิว) เพื่อให้queue.join()รู้ว่างานเสร็จแล้ว

รหัสสำหรับแต่ละคำที่ด้านล่างของคำตอบนี้ ...

mpenning@mpenning-T61:~$ python multi_pipe.py 
Sending 10000 numbers to Pipe() took 0.0369849205017 seconds
Sending 100000 numbers to Pipe() took 0.328398942947 seconds
Sending 1000000 numbers to Pipe() took 3.17266988754 seconds
mpenning@mpenning-T61:~$ python multi_queue.py 
Sending 10000 numbers to Queue() took 0.105256080627 seconds
Sending 100000 numbers to Queue() took 0.980564117432 seconds
Sending 1000000 numbers to Queue() took 10.1611330509 seconds
mpnening@mpenning-T61:~$ python multi_joinablequeue.py 
Sending 10000 numbers to JoinableQueue() took 0.172781944275 seconds
Sending 100000 numbers to JoinableQueue() took 1.5714070797 seconds
Sending 1000000 numbers to JoinableQueue() took 15.8527247906 seconds
mpenning@mpenning-T61:~$

ในการสรุปเป็นเรื่องเกี่ยวกับสามครั้งได้เร็วกว่าPipe() Queue()อย่าคิดแม้แต่เรื่องที่JoinableQueue()จะทำเว้นแต่ว่าคุณต้องได้รับประโยชน์

โบนัสวัสดุ 2

การประมวลผลแบบมัลติโพรเซสเซอร์จะแนะนำการเปลี่ยนแปลงที่ละเอียดในการไหลของข้อมูลที่ทำให้การดีบักอย่างหนักเว้นแต่คุณจะรู้ทางลัด ตัวอย่างเช่นคุณอาจมีสคริปต์ที่ทำงานได้ดีเมื่อสร้างดัชนีผ่านพจนานุกรมภายใต้เงื่อนไขหลายประการ แต่ไม่สามารถป้อนข้อมูลบางอย่างได้ไม่บ่อยนัก

โดยปกติเราจะได้รับเบาะแสกับความล้มเหลวเมื่อกระบวนการหลามทั้งหมดล้มเหลว อย่างไรก็ตามคุณจะไม่ได้รับข้อผิดพลาดที่ไม่พึงประสงค์ที่พิมพ์ไปยังคอนโซลหากฟังก์ชันหลายตัวประมวลผลขัดข้อง การติดตามความล้มเหลวในการประมวลผลที่ไม่ทราบจำนวนมากนั้นทำได้ยากโดยไม่มีเงื่อนงำว่ากระบวนการใดขัดข้อง

วิธีที่ง่ายที่สุดที่ฉันได้พบในการติดตามการชนกันของมัลติโพรเซสซิงคือการพันฟังก์ชั่นมัลติโพรเซสซิงทั้งหมดในtry/ exceptและใช้traceback.print_exc():

import traceback
def run(self, args):
    try:
        # Insert stuff to be multiprocessed here
        return args[0]['that']
    except:
        print "FATAL: reader({0}) exited while multiprocessing".format(args) 
        traceback.print_exc()

ตอนนี้เมื่อคุณพบความผิดพลาดคุณจะเห็นสิ่งที่ชอบ:

FATAL: reader([{'crash': 'this'}]) exited while multiprocessing
Traceback (most recent call last):
  File "foo.py", line 19, in __init__
    self.run(args)
  File "foo.py", line 46, in run
    KeyError: 'that'

รหัสแหล่งที่มา:

"""
multi_pipe.py
"""
from multiprocessing import Process, Pipe
import time

def reader_proc(pipe):
    ## Read from the pipe; this will be spawned as a separate Process
    p_output, p_input = pipe
    p_input.close()    # We are only reading
    while True:
        msg = p_output.recv()    # Read from the output pipe and do nothing
        if msg=='DONE':
            break

def writer(count, p_input):
    for ii in xrange(0, count):
        p_input.send(ii)             # Write 'count' numbers into the input pipe
    p_input.send('DONE')

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        # Pipes are unidirectional with two endpoints:  p_input ------> p_output
        p_output, p_input = Pipe()  # writer() writes to p_input from _this_ process
        reader_p = Process(target=reader_proc, args=((p_output, p_input),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process

        p_output.close()       # We no longer need this part of the Pipe()
        _start = time.time()
        writer(count, p_input) # Send a lot of stuff to reader_proc()
        p_input.close()
        reader_p.join()
        print("Sending {0} numbers to Pipe() took {1} seconds".format(count,
            (time.time() - _start)))

"""
multi_queue.py
"""

from multiprocessing import Process, Queue
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    pqueue = Queue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:             
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to Queue() took {1} seconds".format(count, 
            (time.time() - _start)))

"""
multi_joinablequeue.py
"""
from multiprocessing import Process, JoinableQueue
import time

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        queue.task_done()

def writer(count, queue):
    for ii in xrange(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        jqueue = JoinableQueue() # writer() writes to jqueue from _this_ process
        # reader_proc() reads from jqueue as a different process...
        reader_p = Process(target=reader_proc, args=((jqueue),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process
        _start = time.time()
        writer(count, jqueue) # Send a lot of stuff to reader_proc() (in different process)
        jqueue.join()         # Wait for the reader to finish
        print("Sending {0} numbers to JoinableQueue() took {1} seconds".format(count, 
            (time.time() - _start)))

— ไมค์เพนนิงตัน
แหล่งที่มา

@ โจนาธาน "ในบทสรุป Pipe () เร็วกว่าคิว () ประมาณสามเท่า"

— James Brady

ยอดเยี่ยม คำตอบที่ดีและดีที่คุณให้เป็นมาตรฐาน! ฉันมีเพียงสอง quibbles เล็ก ๆ : (1) "คำสั่งของขนาดเร็ว" เป็นบิตของการพูดเกินจริง ความแตกต่างคือ x3 ซึ่งประมาณหนึ่งในสามของลำดับความสำคัญ แค่พูด. ;-); และ (2) การเปรียบเทียบที่ยุติธรรมยิ่งกว่านั้นคือการใช้งาน N คนงานแต่ละคนสื่อสารกับเธรดหลักผ่านทางท่อแบบจุดต่อจุด

— JJC

ถึง "วัสดุโบนัส" ของคุณ ... ใช่ หากคุณเป็นคลาสย่อยให้วางเมธอด 'run' ลงในบล็อกลอง นี่เป็นวิธีที่มีประโยชน์ในการบันทึกข้อยกเว้น ในการทำซ้ำเอาต์พุตข้อยกเว้นปกติ: sys.stderr.write (''. join (traceback.format_exception (* (sys.exc_info ())))))

— travc

@ alexpinho98 - แต่คุณต้องมีข้อมูลนอกวงและโหมดการส่งสัญญาณที่เกี่ยวข้องเพื่อระบุว่าสิ่งที่คุณส่งไม่ใช่ข้อมูลปกติ แต่เป็นข้อมูลผิดพลาด การเห็นว่ากระบวนการเริ่มต้นอยู่ในสภาพที่ไม่สามารถคาดเดาได้ซึ่งอาจจะถามได้มากเกินไป

— scytale

@JJC หากต้องการเล่นลิ้นกับ quibble ของคุณ 3x มีขนาดประมาณครึ่งหนึ่งของขนาดไม่ใช่หนึ่งในสาม - sqrt (10) = ~ 3

— jab

อีกหนึ่งคุณลักษณะQueue()ที่ควรค่าแก่การสังเกตคือฟีดเดอร์เธรด ส่วนนี้บันทึก "เมื่อกระบวนการแรกวางรายการในคิวเธรดตัวป้อนเริ่มต้นซึ่งจะโอนวัตถุจากบัฟเฟอร์ไปยังไปป์" สามารถแทรกจำนวนรายการ (หรือขนาดสูงสุด) ได้Queue()โดยไม่ต้องมีการโทรเพื่อqueue.put()บล็อก สิ่งนี้ช่วยให้คุณสามารถเก็บหลายรายการใน a Queue(), จนกว่าโปรแกรมของคุณจะพร้อมที่จะประมวลผล

Pipe()ในทางกลับกันมีพื้นที่เก็บข้อมูลจำนวน จำกัด สำหรับรายการที่ถูกส่งไปยังการเชื่อมต่อเดียว แต่ยังไม่ได้รับจากการเชื่อมต่ออื่น หลังจากที่ใช้ที่เก็บข้อมูลหมดแล้วการเรียกไปยังconnection.send()จะบล็อกจนกว่าจะมีพื้นที่ว่างสำหรับเขียนรายการทั้งหมด สิ่งนี้จะถ่วงเธรดที่ทำการเขียนจนกระทั่งเธรดอื่นอ่านจากไพพ์ Connectionวัตถุให้คุณเข้าถึงไฟล์อธิบายพื้นฐาน บนระบบ * nix คุณสามารถป้องกันการconnection.send()โทรจากการบล็อกโดยใช้os.set_blocking()ฟังก์ชั่น อย่างไรก็ตามสิ่งนี้จะทำให้เกิดปัญหาหากคุณพยายามส่งรายการเดียวที่ไม่พอดีกับไฟล์ของไปป์ Linux เวอร์ชันล่าสุดช่วยให้คุณเพิ่มขนาดไฟล์ได้ แต่ขนาดสูงสุดที่อนุญาตนั้นแตกต่างกันไปตามการกำหนดค่าระบบ ดังนั้นคุณไม่ควรพึ่งพาPipe()ข้อมูลบัฟเฟอร์ โทรไปที่connection.send สามารถปิดกั้นได้จนกว่าข้อมูลจะถูกอ่านจากไปป์ที่อื่น

โดยสรุปคิวเป็นตัวเลือกที่ดีกว่าไพพ์เมื่อคุณต้องการบัฟเฟอร์ข้อมูล แม้ว่าคุณจะต้องสื่อสารระหว่างสองจุดเท่านั้น

— Roger Iyengar
แหล่งที่มา