Advertisement
lowv1

Untitled

Mar 22nd, 2019
158
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 10.92 KB | None | 0 0
  1. # Assignment 1
  2. # Section 2
  3.  
  4. # Clear all objects currently in memory
  5. rm(list=ls())
  6.  
  7. -----------------------------------------------------------------------------------
  8. # Q2.1
  9.  
  10. # Set the working directory
  11. setwd("C:/Users/Vivien/Desktop/Third Year/Semester 1/Econometrics 2/Assignments/Assignment 1")
  12.  
  13. # Import the dataset and print it to the screen
  14. studies_df=read.csv("A1_Data.csv")
  15.  
  16. # Give a variable a name that we can use again
  17. assg=studies_df$assg
  18. exam=studies_df$exam
  19.  
  20. # Convert assg(i) and exam(i) into percent (creating a new column in table)
  21. studies_df$AssgPercent=(assg/30)*100
  22. studies_df$ExamPercent=(exam/70)*100
  23.  
  24. ------------------------------------------------------------------------------------
  25. # Obtain descriptive statistics for assg(i) and exam(i); non-percentage form
  26.  
  27. # Get a table of summary statistics (assg(i))
  28. statstable1 = rbind(mean(assg), # Specify mean
  29. median(assg), # Specify median
  30. sd(assg), # Specify standard deviation
  31. min(assg), # Specify minimum
  32. max(assg)) # Specify maximum
  33.  
  34. # Give names to each of the rows in the stats table
  35. rownames(statstable1) = c("Mean","Median","SD","Min","Max")
  36.  
  37. # Give the first column of the stats table a name
  38. colnames(statstable1) = "Assignment Mark"
  39.  
  40. # Print table of summary statistics to 4d.p
  41. print(round(statstable1,4))
  42.  
  43. # Get a table of summary statistics (exam(i))
  44. statstable2 = rbind(mean(exam), # Specify mean
  45. median(exam), # Specify median
  46. sd(exam), # Specify standard deviation
  47. min(exam), # Specify minimum
  48. max(exam)) # Specify maximum
  49.  
  50. # Give names to each of the rows in the stats table
  51. rownames(statstable2) = c("Mean","Median","SD","Min","Max")
  52.  
  53. # Give the first column of the stats table a name
  54. colnames(statstable2) = "Exam Mark"
  55.  
  56. # Print table of summary statistics to 4d.p
  57. print(round(statstable2,4))
  58.  
  59. # Obtain histogram for assg(i)
  60. hist(studies_df$assg)
  61.  
  62. # Produce a histogram, with some fancier colors
  63. hist(studies_df$assg, # Specify dataset
  64. main = "Assignment Mark", # Title of histogram
  65. xlab = "Total Mark Across Four Assignments", # Name for x-axis
  66. breaks = 20, # More breakpoints in the histogram
  67. col = "yellow") # Yellow coloured histogram!
  68.  
  69. # Obtain histogram for exam(i)
  70. hist(studies_df$exam)
  71.  
  72. # Produce a histogram, with some fancier colors
  73. hist(studies_df$exam, # Specify dataset
  74. main = "Exam Mark", # Title of histogram
  75. xlab = "Final Exam Mark", # Name for x-axis
  76. breaks = 20, # More breakpoints in the histogram
  77. col = "pink") # Pink coloured histogram!
  78.  
  79. -----------------------------------------------------------------------------------
  80. # Obtain descriptive statistics for assg(i) and exam(i); percentage form
  81.  
  82. # Give a variable a name that we can use again
  83. assg2=studies_df$AssgPercent
  84. exam2=studies_df$ExamPercent
  85.  
  86. # Get a table of summary statistics (assg(i))
  87. statstable3 = rbind(mean(assg2), # Specify mean
  88. median(assg2), # Specify median
  89. sd(assg2), # Specify standard deviation
  90. min(assg2), # Specify minimum
  91. max(assg2)) # Specify maximum
  92.  
  93. # Give names to each of the rows in the stats table
  94. rownames(statstable3) = c("Mean","Median","SD","Min","Max")
  95.  
  96. # Give the first column of the stats table a name
  97. colnames(statstable3) = "Assignment Mark (%)"
  98.  
  99. # Print table of summary statistics to 4d.p
  100. print(round(statstable3,3))
  101.  
  102. # Get a table of summary statistics (exam(i))
  103. statstable4 = rbind(mean(exam2), # Specify mean
  104. median(exam2), # Specify median
  105. sd(exam2), # Specify standard deviation
  106. min(exam2), # Specify minimum
  107. max(exam2)) # Specify maximum
  108.  
  109. # Give names to each of the rows in the stats table
  110. rownames(statstable4) = c("Mean","Median","SD","Min","Max")
  111.  
  112. # Give the first column of the stats table a name
  113. colnames(statstable4) = "Exam Mark (%)"
  114.  
  115. # Print table of summary statistics to 4d.p
  116. print(round(statstable4,3))
  117.  
  118. # Obtain histogram for assg(i)
  119. hist(studies_df$AssgPercent)
  120.  
  121. # Produce a histogram, with some fancier colors
  122. hist(studies_df$AssgPercent, # Specify dataset
  123. main = "Assignment Mark", # Title of histogram
  124. xlab = "Total Percentage Mark Across Four Assignments", # Name for x-axis
  125.  
  126. col = "yellow") # Yellow coloured histogram!
  127.  
  128. # Obtain histogram for exam(i)
  129. hist(studies_df$ExamPercent)
  130.  
  131. # Produce a histogram, with some fancier colors
  132. hist(studies_df$ExamPercent, # Specify dataset
  133. main = "Exam Mark", # Title of histogram
  134. xlab = "Final Percentage Exam Mark", # Name for x-axis
  135. breaks = 20, # More breakpoints in the histogram
  136. col = "pink") # Pink coloured histogram!
  137.  
  138. # The histogram for 'Assignment Mark' is has a long left tail and is hence negatively skewed. This makes sense since the Mean < Median.
  139.  
  140. # The histogram for 'Exam Mark'is more symmetric and is possibly close to a normal distribution (ignorning for the outlier on the left). This makes sense since the Mean ~~ Median.
  141.  
  142. ------------------------------------------------------------------------------------
  143. # Q2.2
  144.  
  145. # Run an OLS regression
  146. eqn1=lm(ExamPecent~AssgPercent,data=studies_df)
  147. print(summary(eqn1))
  148.  
  149. # Statistical interpretation
  150. # Intercept
  151. # The statistical interpretation of B0 is for a student who obtains an overall mark of 0 for the assignments, on average the students' overall exam mark will be 26.3553 (out of 70).
  152.  
  153. # Coefficient of assg
  154. # The statistical interpretation of B1 is a 1 mark increase in the assignment mark will on average result in a 0.7258 increase in the overall exam mark.
  155.  
  156. # Causal interpretation
  157. # Intercept
  158. # The causal interpretation of B0 is for a student who obtains an overall mark of 0 for the assignments, it will cause the student to have an overall exam mark of 26.3553 (out of 70).
  159.  
  160. # Coefficient of assg
  161. # The causal interpretation of B1 is a 1 mark increase in the assignment mark will cause a 0.7258 increase in the overall exam mark.
  162.  
  163. ------------------------------------------------------------------------------------
  164. # Q2.3
  165.  
  166. # Removing all cases where flag.0(i)=1 (removing cases where a student obtained a 0 for an assessment)
  167. eqn2=lm(ExamPercent~AssgPercent,data=subset(studies_df,flag.0==0))
  168. print(summary(eqn2))
  169.  
  170. # The coefficient of assg(i) has increased by 0.2121 from 0.7258 to 0.9379.
  171. # This indicates that for students who completed all assessments/obtained a mark greater than zero
  172. # their marginal benefit of an extra mark on their assignments on the overall exam mark is 0.2121
  173. # more than a student who had failed/not completed an assessment.
  174.  
  175. # I believe that this is a sensible/innocuous decision because it will allow us to obtain a more accurate
  176. # causal interpretation of obtaining an extra assignment mark on the overall exam mark
  177. # Ultimately, we are interested in seeing the correlation/relationship between assignment marks and the overall exam mark.
  178. # Including students who had failed or did not complete an assessment would serve
  179. # as an outlier in our regression analysis and would impact the accuracy of our analysis.
  180.  
  181. ------------------------------------------------------------------------------------
  182. # Q2.4
  183.  
  184. # No, I do not believe that the regression model in Q2.3 suffers from simultaenous causality.
  185. # Though it is plausible to say that a change in the assignment mark will cause a change in the overall exam mark
  186. # (e.g. those who work harder do better on the exam/overall)
  187. # it is not plausible to say a change in the exam mark will affect the assignment mark.
  188. # This is because at the point of doing the exam, all students would have completed their assignments
  189. # and would have already received finalised assignment marks (cannot be changed due to a change in exam mark).
  190.  
  191. ------------------------------------------------------------------------------------
  192. # Q2.5
  193.  
  194. # See notes
  195.  
  196. ------------------------------------------------------------------------------------
  197. # Q2.6
  198.  
  199. # If we did not assume assignments are undertaken individually, then it would be
  200. # difficult to say that there is a relationship between views(i) and assg(i).
  201. # The level of participation in Ed will not necessarily impact the assg mark when done in a group.
  202. # Not strictly controlling for individual ability.
  203.  
  204. ------------------------------------------------------------------------------------
  205. # Q2.7
  206.  
  207. # Adding views(i) to the regression
  208. eqn3=lm(ExamPercent~AssgPercent+views,data=subset(studies_df,flag.0==0))
  209. print(summary(eqn3))
  210. # Intercept
  211. # A student who achieves an overall assignment mark of 0 and has viewed 0 threads on Ed
  212. # will on average obtain an overall exam mark of 24.0259 (out of 70).
  213. # Not a valid interpretation as we have excluded for students who obtained a 0 on any assessment.
  214. # An overall assignment mark of 0 implies that the student obtained a 0 for all assignments.
  215. # The average mark has increased by 2.5562 (gone from 21.4697 to 24.0259).
  216.  
  217. # Coefficient of assg
  218. # Controlling for the number of threads the student has viewed on Ed,
  219. # obtaining 1 extra mark on the assignment will on average result in a
  220. # 0.7173 increase in the overall exam mark (marginal benefit of an extra mark on their
  221. # assignments on the overall exam mark)
  222.  
  223. # Coefficient of views
  224. # Controlling for the total mark over all four assignments, viewing 1 extra thread on Ed
  225. # will on average result in a 0.0093 increase in the overall exam mark.
  226.  
  227. # Because there is a difference between Delta1 and Beta1, it is clear that OVB problem exists.
  228. # A regression of exam(i) on assg(i) will not have a causal interpretation.
  229. # If we require a causal estimate, then we need to develop a strategy to obtain one.
  230.  
  231. ------------------------------------------------------------------------------------
  232. # Q2.8
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement