Question regarding correlation

I’m not very sure what dataset to apply when using correlation:


correlation = dataset.corr()

I have dataset1 with original values and dataset2 with log-transformed target variable. When I try correlation on both datasets, the correlation matrix give quite different results, where log_transformed dataset2 exhibits higher correlation for the same pair of variables compared with that of dataset1.
(I think .corr() default correlation method is Pearson Correlation coefficient. )
If I want to examine the correlation between variables, does it matter which dataset or correlation method I use?

1 Like

Hi! You only need to log transform the target variable (charges), it is possible that you would obtain a better correlation. Feel free to perform the correlation before or after the transformation. Its your choice.