Advertisement
ktbyte

Hw5 - Dataset Exploration

Nov 2nd, 2016
72
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.42 KB | None | 0 0
  1. AIHW5
  2.  
  3. This homework is project oriented, with your new data set. We will begin in class, and it is due next week on Monday:
  4.  
  5. 1. Identify 5 to 15 variables that you can measure with at least 30 rows, but preferably 100+ rows. For numerical variables, plot their histograms. Are they normally distributed? Some of your variables might be qualitative (Yes/No) or in buckets (Low/Medium/High). What is the distribution (table(var)) of those variables?
  6.  
  7. 2. For variables that aren't normally distributed, would a transformation make your data more normal (exponential, logarithmic, log-log)?
  8.  
  9. 3. Write, in English, 5 to 10 relationships you could study as questions. For example, "I think more x is correlated with more y" or "I don't think changes in x affect changes in y".
  10.  
  11. 4. Are there any significant outliers or high leverage points? After inspecting those points, do you want to remove them from the model (Do they represent cases that you don't want to be in your data)? If so, remove them for the rest of these problems.
  12.  
  13. 5. Run multiple multivariate linear regressions. What are some of the answers to your questions. Does the sign on any of the slopes change depending on whether you include/remove certain variables (is there correlation between variables).
  14.  
  15. 6. For linear regressions between your dependent variable and 1 independent variable, plot your residual. Do you see any pattern in the residual? For example, is there autocorrelation? is there a non-linear pattern in the residual? Consult Page 92 - 99 of the ISLR textbook
  16.  
  17. 7. Write a summary ("an abstract") of your research findings in English, which you will present in next class:
  18. 7a. What were you studying? What can you predict try to predict with your model?
  19. 7b. What was your finding? What was surprising? How significant was your finding (t-score/p-score)?
  20. 7c. In your study, it would be hard to separate correlation with causation. How could you design an experiment where you could measure if your independent variables CAUSED your dependent variables to be a certain way? For example, how could you tell that increases in college spending actual increase graduation rates, rather than the other way around?
  21. 7d. If you were to design an experiment, are there other "confounding variables" which might be correlated to both your independent and dependent variables? How could you test if those confounding variables caused BOTH your independent and dependent variables.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement