Questions about NLP

I have a question regarding my previous project focusing on detecting fake news. In the model evaluation part, I used traditional machine learning models like NB, SVM, XGBoost, KNN, and some deep learning models like LSTM and BERT. In the validation part, NB always performed better than the rest of the traditional machine learning models, while BERT has gained the highest accuracy among the deep learning models. I just wonder what the math behind them is that allows them to perform better or maybe it’s not because of the math but the dataset I used.

It is slightly hard to give an answer without knowing what the data looks like. Generally deep learning models perform better on unstructured data, because after you have vectorized the text corpus (turned the text into numbers), the dataset is typically high dimensional (more features than rows) and deep learning models have good performance on high dimensional datasets.

In my experience NB models should never really be able to outperform XGBoost models, unless you had a very small dataset, then it could just be a result of noisy predictions.