robbyjo

Regression with negative RSquared and How to Fix it

Aug 4th, 2024
378
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
R 1.09 KB | Source Code | 0 0
  1. # from https://www.quora.com/When-is-R-squared-negative
  2. set.seed(100)
  3.  
  4. df <- data.frame(x = cumsum(rnorm(100)),
  5.                  y = cumsum(rnorm(100)))
  6. # RJ's Addition
  7. #df <- df[sample(1:NROW(df), size=NROW(df)), ] # Shuffler
  8.  
  9. ## split between training and test ----
  10. train_df <- df[1:80,]
  11. test_df <- df[81:100,]
  12.  
  13. ### fit OLS ----
  14. lm.bias <- lm(y ~ x, data=train_df)
  15. summary(lm.bias)$r.squared ## R^2 = 31.49% (in-sample)
  16.  
  17. ### create a R^2 function ---
  18. r2 <- function(data, model, response){
  19.   preds <- predict(model, newdata=data)
  20.   mean_response <- mean(data[,response])
  21.   actuals <- data[,response]
  22.   null_mse <- mean( (mean_response - actuals)^2 )  # NULL MODEL b/c mean_response is constant
  23.   model_mse <- mean( ( preds - actuals)^2 )  
  24.   r_squared <- (null_mse - model_mse) / null_mse
  25.   return(r_squared)
  26. }
  27.  
  28. r2(train_df, lm.bias, "y")  ### the same as summary(); R^2 = 31.49% (in-sample) ; After reshuffling: R^2 = 33.96%
  29.  
  30. ### calculate out-of-sample R^2 ----
  31. r2(test_df, lm.bias, "y")  ## **NEGATIVE** 103% R^2 (out-sample) ; After reshuffling: R^2 = 13.74%
  32.  
Advertisement
Add Comment
Please, Sign In to add comment