Hi, I'm trying to train an LSTM with Pytorch on a timeseries dataset which I have in spake.
The spark dataframe is constructes such that every row contains a training sample and label. The training data is inside my features column which has a nested array of floats with size (lookback_window, number_of_features) the label column is a simple scalar.
training_df.schema =
StructType([
StructField('features', ArrayType(ArrayType(FloatType(), True), True), False),
StructField('label', DoubleType(), True)
])
When I try iterating over the make_torch_dataloader I get for every sample a dictionary with only labels, the are features are missing.
Any idea on the issue, or how I should structure my features data such that this is working?