I am confused about whether feature engineering should be performed before or after the train-test split, specifically for those that look at a rolling window. I understand that forward-looking features lead to data leakage but is that true the other way round? For backward-looking transformation features, can they be engineered before the train-test split?
If yes, why does the volatility example feature under extraction split the data and then compute this feature?