I tried to solve Experiment 3a described in the original LSTM paper here: http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf with tensorflow LSTM and failed
From the paper: The task is to observe and then classify input sequences. There are two classes, each occurring with probability 0.5. There is only one input line. Only the rst N real-valued sequence elements convey relevant information about the class. Sequence elements at positions t > N are generated by a Gaussian with mean zero and variance 0.2.
The net architecture that he described in the paper:
"We use a 3-layer net with 1 input unit, 1 output unit, and 3 cell blocks of size 1. The output layer receives connections only from memory cells. Memory cells and gate units receive inputs from input units, memory cells and gate units, and have bias weights. Gate units and output unit are logistic sigmoid in [0; 1], h in [-1; 1], and g in [-2; 2]"
I tried to reproduce it with LSTM with 3 hidden units for T=100 and N=3 but failed.
I used online training (i.e. update the weights after each sequence) as described in the original paper
The core of my code was as follow:
self.batch_size = batch_size = config.batch_size
hidden_size = 3
self._input_data = tf.placeholder(tf.float32, (1, T))
self._targets = tf.placeholder(tf.float32, [1, 1])
lstm_cell = rnn_cell.BasicLSTMCell(hidden_size , forget_bias=1.0)
cell = rnn_cell.MultiRNNCell([lstm_cell] * 1)
self._initial_state = cell.zero_state(1, tf.float32)
weights_hidden = tf.constant(1.0, shape= [config.num_features, config.n_hidden])
prepare the input
inputs = 
for k in range(num_steps):
nextitem = tf.matmul(tf.reshape(self._input_data[:, k], [1, 1]) , weights_hidden)
outputs, states = rnn.rnn(cell, inputs, initial_state=self._initial_state)