Feature Engineering before/after train-test split

saakshimore · December 16, 2023, 6:40am

I am confused about whether feature engineering should be performed before or after the train-test split, specifically for those that look at a rolling window. I understand that forward-looking features lead to data leakage but is that true the other way round? For backward-looking transformation features, can they be engineered before the train-test split?

If yes, why does the volatility example feature under extraction split the data and then compute this feature?

d.snow · December 16, 2023, 2:18pm

It all depends on the method. When you use log transform it is at a point in time, so you can do it before or after, for all other methods that are not at a point in time, you have to split first and then fit on train and transform on test.

saakshimore · December 16, 2023, 4:38pm

My question is about the logic for splitting before implementing the backward-looking transformation. There is no data leakage in looking at the past so why do we need to split for that?