วิธีการใช้การทำนายลำดับแบบ "หนึ่งต่อหลายคน" และ "หลายต่อหลายคน" ใน Keras อย่างไร

ฉันพยายามตีความความแตกต่างของการเข้ารหัสของ Keras สำหรับการเรียงลำดับแบบหนึ่งต่อหลายคน (เช่นการจัดประเภทของภาพเดี่ยว) และการติดฉลากแบบลำดับต่อเนื่อง (เช่นการจำแนกประเภทของลำดับภาพ) ฉันเห็นรหัสที่แตกต่างกันสองประเภท:

ประเภท 1 คือที่ที่ไม่มีการเผยแพร่ TimeDistributed ดังนี้

model=Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode="valid", input_shape=[1, 56,14]))
model.add(Activation("relu"))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=pool_size))

model.add(Reshape((56*14,)))
model.add(Dropout(0.25))
model.add(LSTM(5))
model.add(Dense(50))
model.add(Dense(nb_classes))
model.add(Activation("softmax"))

ประเภทที่ 2 คือที่ TimeDistributed ถูกนำไปใช้เช่นนี้

model = Sequential()

model.add(InputLayer(input_shape=(5, 224, 224, 3)))
model.add(TimeDistributed(Convolution2D(64, (3, 3))))
model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2))))
model.add(LSTM(10))
model.add(Dense(3))

คำถามของฉันคือ:

สมมติฐานของฉันถูกต้องหรือไม่ที่ Type 1 เป็นแบบตัวต่อตัวและ Type 2 เป็นแบบตัวต่อตัวหรือไม่ หรือTimeDistributedไม่มีความเกี่ยวข้องในด้านนี้?
ในกรณีใดกรณีหนึ่งต่อหลายคนหรือหลายกลุ่มเป็นชั้นหนาแน่นสุดท้ายที่ควรจะเป็น 1 โหนด "ยาว" (เปล่งเพียงค่าเดียวในทางกลับกัน) และ
ชั้นซ้ำเกิดขึ้นก่อนหน้านี้มีความรับผิดชอบในการกำหนดจำนวน
1 ยาว คุณค่าที่จะปล่อย? หรือเลเยอร์หนาแน่นสุดท้ายควรจะประกอบด้วยโหนด N ที่ไหนN=max sequence length? ถ้าเป็นเช่นนั้นจุดประสงค์ของการ
ใช้ RNN คืออะไรเมื่อเราสามารถผลิตอินพุตที่คล้ายกันที่มี
เอาต์พุตจำนวนมากที่มีตัวประมาณ "vanilla" N parallel?
วิธีกำหนดจำนวนการประทับเวลาใน RNNs มัน
มีความสัมพันธ์อย่างใดอย่างหนึ่งกับความยาวของลำดับเอาท์พุทหรือมันเป็นเพียง
พารามิเตอร์ในการปรับแต่ง?
กรณีเล็กของตัวอย่าง Type 1 ของฉันด้านบนจุดประสงค์ของการใช้
LSTM คืออะไรเมื่อตัวแบบส่งเสียงทำนายชั้นเดียวเท่านั้น (เป็นไปได้
nb_classes)? จะเกิดอะไรขึ้นถ้ามีใครละเว้นเลเยอร์ LSTM

— Hendrik
แหล่งที่มา

คุณสามารถให้ข้อมูลสรุปของทั้งสองรุ่นได้หรือไม่?

— Fadi Bakoura

คำตอบ:

จุดประสงค์ของการใช้เลเยอร์กำเริบใด ๆ คือการให้ผลลัพธ์เป็นผลลัพธ์ไม่เพียงรายการเดียวที่เป็นอิสระจากรายการอื่น แต่เป็นลำดับของรายการเช่นผลลัพธ์ของการดำเนินการของเลเยอร์ในหนึ่งรายการในลำดับนั้นเป็นผลลัพธ์ ของทั้งไอเท็มนั้นและไอเท็มใด ๆ ก่อนหน้าตามลำดับ จำนวนเวลาที่กำหนดกำหนดระยะเวลาดังกล่าวเป็นลำดับ นั่นคือจำนวนไอเท็มที่ควรถูกจัดการในลำดับและส่งผลต่อผลลัพธ์ที่เกิดจากกันและกัน

เลเยอร์ LSTM ทำงานในลักษณะที่ยอมรับอินพุตในรูปแบบ number_of_timesteps, dimension_of_each_item หากพารามิเตอร์ return_sequences ถูกตั้งค่าเป็น False ซึ่งโดยค่าเริ่มต้นเลเยอร์ "รวม" อินพุตของการประทับเวลาทั้งหมดไว้ในเอาต์พุตเดียว หากคุณพิจารณาลำดับของรายการที่กล่าวถึง 10 รายการเลเยอร์ LSTM ที่มี return_sequences ที่ตั้งค่าเป็น "เท็จ" จากลำดับดังกล่าวจะสร้างรายการเอาท์พุทเดี่ยวและแอตทริบิวต์ของรายการเดียวนี้จะเป็นผลมาจากรายการทั้งหมด (เวลา) ลำดับ. นี่คือสิ่งที่คุณต้องการในกรณีที่มีการออกแบบที่หลากหลาย

เลเยอร์ LSTM ที่มี return_sequences ตั้งค่าเป็น True สำหรับแต่ละรายการ (การประทับเวลา) ในลำดับการป้อนข้อมูลจะสร้างเอาต์พุต สิ่งนี้ทำในลักษณะที่เวลาใด ๆ เอาต์พุตจะขึ้นอยู่กับไอเท็มที่กำลังดำเนินการอยู่ในขณะนั้น แต่ยังเป็นไอเท็มก่อนหน้าในลำดับ นี่คือสิ่งที่คุณต้องการในกรณีของการออกแบบหลายต่อหลายคน

เนื่องจากเลเยอร์ LSTM รับลำดับของไอเท็มเป็นอินพุตเลเยอร์ใด ๆ ก่อนเลเยอร์ LSTM ในโมเดลของคุณจะต้องสร้างลำดับเป็นเอาต์พุต ในกรณีของแบบจำลอง Type 1 ของคุณเลเยอร์แรกไม่ได้ทำงานตามลำดับ แต่เป็นรายการทีละรายการ ดังนั้นนี่จะไม่สร้างลำดับของรายการที่จะใช้งานสำหรับ LSTM

การใช้ TimeDistributed ทำให้เป็นไปได้ที่เลเยอร์จะทำงานกับทุกรายการในลำดับโดยไม่มีรายการที่ส่งผลกระทบต่อกันและกัน เลเยอร์ TimeDistributed จึงทำงานตามลำดับของรายการ แต่ไม่มีการเรียกซ้ำ

ในกรณีของแบบจำลองชนิดที่ 2 ของคุณเลเยอร์แรกจะสร้างความยาวของลำดับที่ 5 และการดำเนินการที่ทำกับแต่ละรายการในลำดับจะเป็นอิสระจากกันเนื่องจากเลเยอร์ที่ห่อใน TimeDistributed จะไม่เกิดขึ้นซ้ำ เนื่องจากเลเยอร์ LSTM ใช้การตั้งค่าเริ่มต้น return_sequences = False เลเยอร์ LSTM จะสร้างเอาต์พุตเดี่ยวสำหรับแต่ละลำดับของ 5 รายการดังกล่าว

จำนวนสุดท้ายของโหนดเอาต์พุตในโมเดลของคุณขึ้นอยู่กับกรณีการใช้งาน โหนดเดียวเหมาะสำหรับบางสิ่งบางอย่างเช่นการจำแนกเลขฐานสองหรือเพื่อจัดทำคะแนนบางประเภท

— mevoki
แหล่งที่มา

ฉันคิดว่าคุณอาจจะสามารถใช้งานก่อนหน้าของฉัน ในรหัสนี้ฉันสร้างคลื่นไซน์ (ของความยาวคลื่นและเฟสแบบสุ่ม) และฝึก LSTM ไปยังลำดับของคะแนนจากคลื่นไซน์เหล่านี้และส่งออกลำดับ 150 คะแนนที่เสร็จสมบูรณ์ในแต่ละคลื่นไซน์

นี่คือรูปแบบ:

    features_num=5 
    latent_dim=40

    ##
    encoder_inputs = Input(shape=(None, features_num))
    encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoder_inputs)
    encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
    encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
    encoded = LSTM(latent_dim, return_state=True)(encoded)

    encoder = Model (input=encoder_inputs, output=encoded)
    ##

    encoder_outputs, state_h, state_c = encoder(encoder_inputs)
    encoder_states = [state_h, state_c]

    decoder_inputs=Input(shape=(1, features_num))
    decoder_lstm_1 = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_lstm_2 = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_lstm_3 = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_lstm_4 = LSTM(latent_dim, return_sequences=True, return_state=True)

    decoder_dense = Dense(features_num)

    all_outputs = []
    inputs = decoder_inputs


    states_1=encoder_states
   # Place holder values:
    states_2=states_1; states_3=states_1; states_4=states_1

    for _ in range(1):
        # Run the decoder on the first timestep
        outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
        outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1)
        outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2)
        outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3)

        # Store the current prediction (we will concatenate all predictions later)
        outputs = decoder_dense(outputs_4)
        all_outputs.append(outputs)
        # Reinject the outputs as inputs for the next loop iteration
        # as well as update the states
        inputs = outputs
        states_1 = [state_h_1, state_c_1]
        states_2 = [state_h_2, state_c_2]
        states_3 = [state_h_3, state_c_3]
        states_4 = [state_h_4, state_c_4]


    for _ in range(149):
        # Run the decoder on each timestep
        outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
        outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1, initial_state=states_2)
        outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2, initial_state=states_3)
        outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3, initial_state=states_4)

        # Store the current prediction (we will concatenate all predictions later)
        outputs = decoder_dense(outputs_4)
        all_outputs.append(outputs)
        # Reinject the outputs as inputs for the next loop iteration
        # as well as update the states
        inputs = outputs
        states_1 = [state_h_1, state_c_1]
        states_2 = [state_h_2, state_c_2]
        states_3 = [state_h_3, state_c_3]
        states_4 = [state_h_4, state_c_4]


    # Concatenate all predictions
    decoder_outputs = Lambda(lambda x: K.concatenate(x, axis=1))(all_outputs)   

    model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

    #model = load_model('pre_model.h5')


    print(model.summary())

และนี่คือสคริปต์ทั้งหมด:

from keras.models import Model
from keras.layers import Input, LSTM, Dense, TimeDistributed,Lambda, Dropout, Activation ,RepeatVector
from keras.callbacks import ModelCheckpoint 
import numpy as np

from keras.layers import Lambda
from keras import backend as K

from keras.models import load_model

import os


features_num=5 
latent_dim=40

##
encoder_inputs = Input(shape=(None, features_num))
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoder_inputs)
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
encoded = LSTM(latent_dim, return_state=True)(encoded)

encoder = Model (input=encoder_inputs, output=encoded)
##

encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]

decoder_inputs=Input(shape=(1, features_num))
decoder_lstm_1 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_2 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_3 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_4 = LSTM(latent_dim, return_sequences=True, return_state=True)

decoder_dense = Dense(features_num)

all_outputs = []
inputs = decoder_inputs

# Place holder values:
states_1=encoder_states
states_2=states_1; states_3=states_1; states_4=states_1

for _ in range(1):
    # Run the decoder on one timestep
    outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
    outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1)
    outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2)
    outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3)

    # Store the current prediction (we will concatenate all predictions later)
    outputs = decoder_dense(outputs_4)
    all_outputs.append(outputs)
    # Reinject the outputs as inputs for the next loop iteration
    # as well as update the states
    inputs = outputs
    states_1 = [state_h_1, state_c_1]
    states_2 = [state_h_2, state_c_2]
    states_3 = [state_h_3, state_c_3]
    states_4 = [state_h_4, state_c_4]


for _ in range(149):
    # Run the decoder on one timestep
    outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
    outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1, initial_state=states_2)
    outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2, initial_state=states_3)
    outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3, initial_state=states_4)

    # Store the current prediction (we will concatenate all predictions later)
    outputs = decoder_dense(outputs_4)
    all_outputs.append(outputs)
    # Reinject the outputs as inputs for the next loop iteration
    # as well as update the states
    inputs = outputs
    states_1 = [state_h_1, state_c_1]
    states_2 = [state_h_2, state_c_2]
    states_3 = [state_h_3, state_c_3]
    states_4 = [state_h_4, state_c_4]


# Concatenate all predictions
decoder_outputs = Lambda(lambda x: K.concatenate(x, axis=1))(all_outputs)   

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

#model = load_model('pre_model.h5')


print(model.summary())


model.compile(loss='mean_squared_error', optimizer='adam')


def create_wavelength(min_wavelength, max_wavelength, fluxes_in_wavelength, category )  :         
#category :: 0 - train ; 2 - validate ; 4- test. 1;3;5 - dead space
    c=(category+np.random.random())/6         
    k = fluxes_in_wavelength
#
    base= (np.trunc(k*np.random.random()*(max_wavelength-min_wavelength))       +k*min_wavelength)  /k
    answer=base+c/k
    return (answer)       

def make_line(length,category):
    shift= np.random.random()
    wavelength = create_wavelength(30,10,1,category)
    a=np.arange(length)
    answer=np.sin(a/wavelength+shift)
    return answer

def make_data(seq_num,seq_len,dim,category):
    data=np.array([]).reshape(0,seq_len,dim)
    for i in range (seq_num):
        mini_data=np.array([]).reshape(0,seq_len)
        for j in range (dim):
            line = make_line(seq_len,category)
            line=line.reshape(1,seq_len)            
            mini_data=np.append(mini_data,line,axis=0)
        mini_data=np.swapaxes(mini_data,1,0)
        mini_data=mini_data.reshape(1,seq_len,dim)      
        data=np.append(data,mini_data,axis=0)
    return (data)


def train_generator():
    while True:
        sequence_length = np.random.randint(150, 300)+150       
        data=make_data(1000,sequence_length,features_num,0) # category=0 in train


    #   decoder_target_data is the same as decoder_input_data but offset by one timestep

        encoder_input_data = data[:,:-150,:] # all but last 150 

        decoder_input_data = data[:,-151,:] # the one before the last 150.
        decoder_input_data=decoder_input_data.reshape((decoder_input_data.shape[0],1,decoder_input_data.shape[1]))


        decoder_target_data = (data[:, -150:, :]) # last 150        
        yield [encoder_input_data, decoder_input_data], decoder_target_data
def val_generator():
    while True:

        sequence_length = np.random.randint(150, 300)+150       
        data=make_data(1000,sequence_length,features_num,2) # category=2 in val

        encoder_input_data = data[:,:-150,:] # all but last 150 

        decoder_input_data = data[:,-151,:] # the one before the last 150.
        decoder_input_data=decoder_input_data.reshape((decoder_input_data.shape[0],1,decoder_input_data.shape[1]))

        decoder_target_data = (data[:, -150:, :]) # last 150        
        yield [encoder_input_data, decoder_input_data], decoder_target_data

filepath_for_w= 'flux_p2p_s2s_model.h5' 
checkpointer=ModelCheckpoint(filepath_for_w, monitor='val_loss', verbose=0, save_best_only=True, mode='auto', period=1)     
model.fit_generator(train_generator(),callbacks=[checkpointer], steps_per_epoch=30, epochs=2000, verbose=1,validation_data=val_generator(),validation_steps=30)
model.save(filepath_for_w)




def predict_wave(input_wave,input_for_decoder):  # input wave= x[n,:,:], ie points except the last 150; each wave has feature_num features. run this function for all such instances (=n)   
    #print (input_wave.shape)
    #print (input_for_decoder.shape)
    pred= model.predict([input_wave,input_for_decoder])

    return pred

def predict_many_waves_from_input(x):   
    x, x2=x # x == encoder_input_data ; x==2 decoder_input_data

    instance_num= x.shape[0]


    multi_predict_collection=np.zeros((x.shape[0],150,x.shape[2]))

    for n in range(instance_num):
        input_wave=x[n,:,:].reshape(1,x.shape[1],x.shape[2])
        input_for_decoder=x2[n,:,:].reshape(1,x2.shape[1],x2.shape[2])
        wave_prediction=predict_wave(input_wave,input_for_decoder)
        multi_predict_collection[n,:,:]=wave_prediction
    return (multi_predict_collection)

def test_maker():
    if True:        
        sequence_length = np.random.randint(150, 300)+150       
        data=make_data(470,sequence_length,features_num,4) # category=4 in test

        encoder_input_data = data[:,:-150,:] # all but last 150 

        decoder_input_data = data[:,-151,:] # the one before the last 150.
        decoder_input_data=decoder_input_data.reshape((decoder_input_data.shape[0],1,decoder_input_data.shape[1]))

        decoder_target_data = (data[:, -150:, :]) # last 150        
        return [encoder_input_data, decoder_input_data],    decoder_target_data

x,y= test_maker()   



a=predict_many_waves_from_input (x) # is that right..?
x=x[0] # keep the wave (generated data except last 150 time points) 
print (x.shape)
print (y.shape)
print (a.shape)

np.save ('a.npy',a)
np.save ('y.npy',y)
np.save ('x.npy',x)



print (np.mean(np.absolute(y[:,:,0]-a[:,:,0])))
print (np.mean(np.absolute(y[:,:,1]-a[:,:,1])))
print (np.mean(np.absolute(y[:,:,2]-a[:,:,2])))
print (np.mean(np.absolute(y[:,:,3]-a[:,:,3])))
print (np.mean(np.absolute(y[:,:,4]-a[:,:,4])))

— ลาฟาแยต
แหล่งที่มา