Monday, 14 October 2019

What is the relationship between timestep and number hidden unit in LSTM?

Timestep = the len of the input sequence. For example, if you want to give LSTM a sentence as an input, your timesteps could either be the number of words or the number of characters depending on what you want.
Number of hidden units = (well) number of hidden units. Sometimes, people call this number of LSTM cells. The inputs of each timestep will go through each unit.
For example, if your sequences are sentences, each of which comprises of 10 words, and you are interested in word level. Your timesteps would be 10. The number of hidden units is of your choice (could be 99 or whatever number you like).
The outputs of a single LSTM layer will be of shape (number-of-sequences , number-of-units) if you do not return sequence. Or they will be of shape (number-of-sequences , number of timesteps , number-of-units) if you do want to return sequence.
For example, if you have 5 sentences, each sentence contains 10 words (or you pad them such that they are equally of length 10), and your single LSTM layer has 15 units, your outputs would be of shape (5, 15) if you do not return sequence. In case you do, you will get output shape (5, 10, 15).
In the example above, considering the first sentence, the first word will go through 15 LSTM units, and so you will get an array of 15 float numbers after that. The same process occurs for all 10 words in the sentence. So in the end, you will actually get an output of shape (10, 15) for the first sentence. Here again, if you choose not to return sequence, you will only get the 10th array of 15 float numbers (shape (1, 15) ), which is basically the output of last word. In contrast, if you want to return sequence, you will get all 10 arrays of 15 floats numbers, each one corresponds to each word which is each timestep.
Note: When I mention ‘word’, I refer to words that have been embedded as numbers.


Post a Comment