Question about Forecasting

Xiyao_Fan · May 2, 2022, 2:35am

Hi everyone,

There are many methods to forecast, including traditional methods, such as Exponential Smoothing, ARIMA, SARIMA and SARIMAX and machine learning methods, such as LightGB, Linear Regression and Random Forest.

I learned that generally speaking, machine learning methods perform better than traditional methods, but when I am face with such kind of data and problem, I am still not sure which method I should use. So, I am wondering how you guys solve this problem?

Thank you very much!

ZhaoxuanLai · May 2, 2022, 3:02am

Hi Xiyao,

In my opinion, it depends on the data and problem you face with. Usually, Machine learning methods rely heavily on featured data. For instance, lagging and moving average data. As a result, feature engineering is crucial for machine learning methods to present great prediction results. In the field of machine learning, to some extent, it is the data that is decisive, not the model or the algorithm. Hope this can help!

d.snow · May 2, 2022, 2:00pm

There are multiple types of supervised machine learning models, some have single or multiple outputs (101.44, 102.50, 103.5) like RNNs and for them you can feed pure TS data (the feature engineering is automated), some have single outputs like LightGBM modelsm and for them you have to do some feature engineering yourself. And all of these models can be used for classification or regression tasks. So you just have to ask yourself what data do you have, and what is the task, and accordingly select the best model (sometimes you can test multiple approaches)

Hongzheng · May 17, 2022, 3:24am

Hello Xiyao,

I don’t think one method is always better than the another one. It depends on the type of problem and the type of data you are analyzing. If you are not sure which one is better in the face of a specific problem, my suggestion is that you can try all the models that you think may be the best, and remember to resample multiple times. Then use the results of multiple resampling averages to compare the quality of the model.

At the same time, while comparing models, it is more important to expand the training set you can get as much as possible. Because at present, I think that the proportion of data affecting the results is more important than the proportion of the model affecting the results.

Finally, you can also apply what we learned about feature engineering in this lesson. Through feature engineering, you can generate more new features. This can also have the effect of expanding the training set.

jchen444 · May 19, 2022, 3:49pm

Hi Xiyao,
I think the model you use should depend on the type of data you possess. The selection of a method depends on many factors—the context of the forecast, the relevance and availability of historical data, the degree of accuracy desirable, the time period to be forecast, the cost/ benefit of the forecast to the company, and the time available for making the analysis. So it is really hard to tell which models to use unless you show us the data.