looking for some solutions? You are welcome.

SOLVED: Build fixed length 3D array for LSTM processing given variable length 3D array

Asif Khan:

I have a dataframe like

User_id  trip_id  segmentid  velocity      Transportation_Mode
10         1          1      26.89540395        bus
10         3          1      28.27382321        bus
10         3          2      3.580025333        walk
11         4          1      3.056558794        walk
11         4          2      5.74621102         bus
11         4          3      1.26180075         walk
11         7          1      31.75670826        car
11         7          2      8.572398839        bus
11        10          1      6.314346629        car
11        10          2      12.86523982        bus

I have to predict the sequence of transportation mode using LSTM network. My question is based on the post LSTM preprocessing: Build 3d arrays from pandas data frame based on ID, but difference is that I have multiple features here. In need to create two numpy arrays X, which is 3D array and y is 2D array. There are multiple sequences grouped on (user_id, trip_id), where each group is one sequence. For LSTM, all sequences need to have fixed length, so I need to pad 0 to all groups to make all the groups of same length. For this purpose, I got maximum length of the group in variable mx using code given below.

dataset['Transportation_Mode'] = dataset['Transportation_Mode'].cat.codes
gb = dataset.groupby(['userid','trip_id']) # Create Groupby object
mx = gb['trip_id'].size().max() # Find the largest group
print ('mx=',mx)

mx is the maximum length of the group, so that all other sequences are padded with 0 to have equal length.

The array X is created on above post is as under for one feature (‘segmend_id’), as stated in above mentioned post.

x = np.array([np.pad(frame['segmentid'].values,
                     for _,frame in gb]).reshape(-1,mx,1)

I am looking for simplest way of creating array x for multiple features (segment_id and velocity in our case)? One solution is to create one variable for each input feature (segment_id and velocity) and merge them. But I think, there should be some simpler way of doing this.

Posted in S.E.F
via StackOverflow & StackExchange Atomic Web Robots

No comments: