Neural Network (How to improve model's predicability? )

CamilleLin · February 23, 2022, 7:01pm

Hey, everyone!
I have a question relating to the neural network which is something we haven’t discussed in detail. I’m using a neural network model for my HW2, but I’m running into a problem with improving my model’s predictability. Since I’m new to Data Science and Machine Learning, I might not use the correct terminologies. I hope what I say in the following makes sense.
Say, I make my model randomly choose 25% of the data set as testing data, do this multiple times. My understanding is that I use all variables, except, charges, to predict charges.
I have converted ‘gender’, ‘smoker’, and ‘region’ into numbers, and gave them a standard scale (because for example compared with charges ($20k), the gender variable (0/1) is extremely small. And I added a feature: I input new data based on the format, say, age:25, bim:20, etc, it will give me estimated charges.
My question is that what is the next following step to improve my model so that it can figure out some extreme cases?
I’m pretty sure there are a few different approaches to deal with this problem, but I don’t know what should I exactly do. Thanks!

d.snow · February 23, 2022, 11:10pm

That is a good questions, let’s see if someone can help you (points will be given for an answer of course). If no one responds, I will discuss it with you on Thursday (and I will add a summary here)

d.snow · February 25, 2022, 12:00pm

One thing you can do to help is to convert the charges variable with log transformation, and then after you predict the log of charges, you can convert it back before you measure the error metrics (like mean squared error). You are probably only seeing extreme differences because there are some charges that are very high, meaning the error would look high for them. So perhaps instead of using a means squared error measure you can use MAPE or SMAPE which looks at the percentage difference.