Guest User

Untitled

a guest
Jun 21st, 2018
132
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 8.22 KB | None | 0 0
  1. ---
  2. title: "Problem Set 7"
  3. author: "H. Busshoff & P. Heinemann"
  4. date: "17 Juni 2018"
  5. output:
  6. word_document: default
  7. pdf_document: default
  8. html_document:
  9. df_print: paged
  10. ---
  11.  
  12. #1 Background
  13.  
  14. #1
  15.  
  16. Police presence and crime are likely to be simultaneously determined. Cities with high crime rates are likely to employ more police. Therefore, the correlation between police presence and crime cannot easily be attributed to a one-directional causal link.
  17.  
  18. # 2
  19.  
  20. Terrorist attack on July 18th, 1994 on the main Jewish center in Argentina. A week later, the government assigned police protection to every Jewish and Muslim building in the country. The authors used data on motor vehicle theft per block in different neighbourhoods before and after the attack and information about the location of Jewish institutions in these neighbourhoods.
  21.  
  22. #3
  23.  
  24. The different groups are different blocks, some of which received treatment (blocks where Jewish institutions are located and thus police protection has been assigned after the attack) and some of which didn't (where no such institution is located).
  25. Expectations are that being treated or being close to a treated block should reduce car theft in a block and that the effect reduces with increasing distance to a treated block.
  26.  
  27.  
  28. #4
  29.  
  30. Common trend assumption: In the absence of more police, the development of thefts in the different blocks would have been parallel.
  31. Given this assumption, the difference between how strongly theft changed in treated blocks and how strongly theft changed in non-treated blocks after the attack can be attributed to the causal effect of police presence on car theft.
  32.  
  33. # 5
  34.  
  35. Find evidence that police lowers crime. Relative to the control group, car thefts fall by 75 percent in the blocks in which the protected institutions are situated. Effect is local. Police presence in a given block does not reduce car theft one or two blocks away.
  36.  
  37.  
  38. #2 Empirics
  39.  
  40. Importing Data
  41.  
  42. ```{r}
  43. df <- read.csv("C:/Users/Hannah/Desktop/ps7_crimerates.csv")
  44. ```
  45.  
  46.  
  47. Loading packages
  48.  
  49. ```{r}
  50. library(tidyr)
  51. library(dplyr)
  52. library(psych)
  53. library(skimr)
  54. library(ggplot2)
  55. library(mice)
  56. library(miceadds)
  57. ```
  58.  
  59. Let's have a brief look at data
  60.  
  61. ```{r}
  62. skim(df)
  63. ```
  64.  
  65.  
  66. #1
  67.  
  68. Recoding variables to distinguish relevant dimensions.
  69.  
  70. ```{r}
  71. df$D = 0
  72. df$D[df$distance == 0] = 1
  73. df$T = 0
  74. df$T[df$week >= 16] = 1
  75. ```
  76.  
  77. #2
  78.  
  79. Computing all conditional means
  80.  
  81. ```{r}
  82. e = data.frame(matrix(NA, ncol = 2, nrow = 2))
  83.  
  84. for (i in 0:1) {
  85. for (j in 0:1)
  86. e[i+1, j+1] = mean(df$NumCarThefts[df$D == i & df$T == j])
  87. }
  88. ```
  89.  
  90. Deriving the causal estimates
  91.  
  92. ```{r}
  93. treatment = e[2,2] - e[2,1] - (e[1,2] - e[1,1])
  94. ```
  95.  
  96. The treatment induces the relative number of auto thefts to decrease by approx. 0.018 pts.
  97.  
  98. #3
  99.  
  100. Alternative to compute causal effect via a linear regression
  101.  
  102. ```{r}
  103. lm1 = lm.cluster(data = df, NumCarThefts ~ D + T + D:T, cluster="blockID")
  104. summary(lm1)
  105. ```
  106. Gives same result as it should be.
  107.  
  108. #4
  109.  
  110. ```{r message=TRUE, warning=FALSE, paged.print=FALSE}
  111. df$block = as.factor(df$blockID)
  112. lm2 = lm.cluster(data = df, NumCarThefts ~ D + T + D:T + block, cluster="block")
  113. coef(lm2)[c("(Intercept)","D","T","D:T")]
  114. ```
  115.  
  116. Effect of interaction term does not change.
  117.  
  118. #5
  119.  
  120. They will be already subsumed in the fixed effects as they do not change over time as the following shows.
  121.  
  122.  
  123. ```{r}
  124.  
  125. x = df %>%
  126. group_by(blockID, week) %>%
  127. summarise (
  128. bank = mean(df$dumBank),
  129. inst = mean(df$dumPublicInstitution)
  130. )
  131. y = x %>%
  132. group_by(blockID) %>%
  133. summarise(
  134. varbank = var(bank),
  135. varinst = var(inst)
  136. )
  137.  
  138. skim(y)
  139. ```
  140.  
  141. # 6
  142.  
  143. ```{r}
  144. df$weekD = as.factor(df$week)
  145. lm3 = lm.cluster(data = df, NumCarThefts ~ D + T + D:T + block + weekD, cluster="block")
  146. coef(lm3)[c("(Intercept)","D","T","D:T")]
  147. ```
  148.  
  149. No, coefficient does not change.
  150.  
  151. # 7
  152.  
  153. Exploring the common trends assumption
  154.  
  155. ```{r message=FALSE, warning=FALSE}
  156. y = df %>%
  157. group_by(D, week) %>%
  158. summarise(
  159. y = mean(NumCarThefts))
  160.  
  161. y$D = as.factor(y$D)
  162.  
  163. ggplot(data = y, aes(x = week, y = y, color = D)) +
  164. geom_line(aes(group = D))
  165.  
  166.  
  167. ```
  168.  
  169.  
  170. # 8
  171.  
  172. Testing the common trend assumption via a regression
  173.  
  174. ```{r}
  175. reg4 = lm(df$NumCarThefts ~ D + T + D:T + week + week:D, data = df)
  176. summary(reg4)
  177. ```
  178.  
  179. Interaction week and dummy variable not significant. Therefore, conclude no violation of the common trend assumption.
  180.  
  181. Alternatively, use an interaction of week indicators and dummy
  182.  
  183. ```{r}
  184. reg5 = lm(df$NumCarThefts ~ T + D + weekD + weekD:D, data = df)
  185.  
  186. anova(reg5)
  187. ```
  188.  
  189. No evidence that common trend assumption is violated.
  190.  
  191. # Part 3: Synthetic controls
  192.  
  193. # 1
  194.  
  195. Importing data
  196.  
  197. ```{r}
  198. df1 <- read.csv("C:/Users/Hannah/Downloads/ps7_birthrates.csv")
  199.  
  200. library(ggplot2)
  201. library(zoo)
  202. df1$date = with(df1, interaction(df1$year, df1$quarter))
  203. df1$date1 = as.Date(as.yearqtr(df1$date, format = "%Y.%q"))
  204.  
  205. df1$date2 = as.numeric(df1$date1)
  206. ggplot(data = df1, aes(x = date1, y = birthrate, color = countyname)) +
  207. geom_line(aes(group = countyname)) +
  208. theme(axis.text.x = element_text(angle = 90)) + geom_point() +
  209. geom_vline(xintercept = as.numeric(as.Date("1995-04-01")), linetype=4) +
  210. geom_vline(xintercept = as.numeric(as.Date("1996-01-01")), linetype = 4)
  211.  
  212. ```
  213.  
  214. No, common trend assumtion seems to be violated.
  215.  
  216.  
  217. # 2
  218.  
  219. ```{r}
  220. df1$D = 0
  221. df1$D[df1$countyname == 'oklahoma'] = 1
  222. df1$D = as.factor(df1$D)
  223.  
  224. x = df1 %>%
  225. group_by(D, date1) %>%
  226. summarise(
  227. births = mean(birthrate))
  228.  
  229. ggplot(data = x, aes(x = date1, y = births, color = D)) +
  230. geom_line(aes(group = D)) +
  231. theme(axis.text.x = element_text(angle = 90)) + geom_point() +
  232. geom_vline(xintercept = as.numeric(as.Date("1995-04-01")), linetype=4) +
  233. geom_vline(xintercept = as.numeric(as.Date("1996-01-01")), linetype = 4)
  234. ```
  235.  
  236. Yes, looks much better now.
  237.  
  238. #3
  239.  
  240. Load package for synthetic controls
  241.  
  242. ```{r}
  243. library(Synth)
  244.  
  245. df1$countyname = as.factor(df1$countyname)
  246. df1$SD = as.numeric(df1$countyname)
  247. df1$countyname = as.character(df1$countyname)
  248.  
  249. a = unique(df1$date2[df1$year == 1992 & df1$quarter == 1])
  250. b = unique(df1$date2[df1$year == 1995 & df1$quarter == 4])
  251. c = unique(df1$date2[df1$year == 1990 & df1$quarter == 1 ])
  252.  
  253. d = unique(df1$date2[df1$date2 <= b])
  254. e = unique(df1$date2[df1$date2 >= a & df1$date2 <= b])
  255. f = unique(df1$date2[df1$year == 1990 & df1$quarter == 1])
  256.  
  257. dataprepout1 = dataprep(foo = df1, predictors = "schoolenrollment", predictors.op = "mean", dependent = "birthrate", unit.variable = "SD", unit.names.variable = "countyname", time.variable = "date2", treatment.identifier = 7, controls.identifier = c(1:6, 8:12), time.optimize.ssr = c(d), time.predictors.prior = f)
  258.  
  259.  
  260. synth.out1 <- synth(data.prep.obj = dataprepout1,
  261. method = "BFGS")
  262.  
  263.  
  264. synth.tables <- synth.tab(dataprep.res = dataprepout1,
  265. synth.res = synth.out1
  266. )
  267.  
  268. synth.tables
  269. ```
  270.  
  271.  
  272. ```{r}
  273. x = synth.tables$tab.w
  274.  
  275. df1$weight = 0
  276.  
  277. for (ii in c(1:6, 8:12)) {
  278. df1$weight[df1$SD == ii] = x$w.weights[x$unit.numbers == ii]
  279. }
  280.  
  281. df1$y = df1$birthrate * df1$weight
  282. df1$y[df1$D == 1] = df1$birthrate[df1$D == 1]
  283.  
  284. y = df1 %>%
  285. group_by(date1, D) %>%
  286. summarise(
  287. births = sum(y))
  288.  
  289. ggplot(data = y, aes(x = date1, y = births, color = D)) +
  290. geom_line(aes(group = D)) +
  291. geom_vline(xintercept = as.numeric(as.Date("1996-01-01")), linetype=4)
  292.  
  293.  
  294. ```
  295.  
  296. # 5
  297.  
  298.  
  299. Assessing significance of the effect via randomization inference - we take all counties (without weighting scheme) as units of observations:
  300.  
  301. ```{r}
  302. df1$T = 0
  303. df1$T[as.numeric(as.Date(df1$date1)) >= as.numeric(as.Date("1996-01-01"))] = 1
  304.  
  305. fisher = df1 %>%
  306. group_by(countyname) %>%
  307. summarise(
  308. pretreatment = mean(birthrate[T==0]),
  309. posttreatment = mean(birthrate[T==1])
  310. )
  311.  
  312. fisher$delta = fisher$posttreatment - fisher$pretreatment
  313.  
  314. #Under H0: Delta(0) = Delta(1) for all counties
  315.  
  316. a = sum(fisher$delta)
  317.  
  318. #There are 12 possible permutations in the setting (N over NT).
  319.  
  320. j = 12
  321.  
  322. X = data.frame(matrix(NA, ncol = 3, nrow = j))
  323. names(X) = c("Treatment Effect H0", "Absolute Value", "Relative Size")
  324.  
  325. ref = fisher$delta[7] - 1/11*sum(fisher$delta[c(1:6, 8:12)])
  326.  
  327. for (i in c(1:6, 8:12)) {
  328. b = fisher$delta[i]
  329. X[i, 1] = b - (a - b)*1/11
  330. X[i, 2] = abs(X[i, 1] )
  331. if (X[i,2] >= ref ) {
  332. X[i,3] = 1
  333. } else {
  334. X[i,3] = 0
  335. }
  336. }
  337.  
  338. p = (1 + sum(X$V3[c(1:6, 8:12)]))/12
  339. ```
  340.  
  341. Effect (weakly) significant - computed p-value is 0.083 percent.
Add Comment
Please, Sign In to add comment