Question about feature transformation

I was looking at the last assignment and thinking of adding moving averages data as a part of the features in the transformation section. Yet, if the look back window is, for example, 7 days or 21 days, the first 7 or 21 rows will be NaN in this feature column. If this feature is included in neural network, I wonder if it’s okay to just leave them as NaN or is there any way in practice to deal with them?

You can simply drop them, and for the test set, you can drop them too (which means you would have less training and test data). You can’t backward fill, it would lead to data leakage, and if you grabbed an extra 21 rows from the last 21 rows in the training dataset to concatenate on the test set, it might cause some data leakage too.