Advertisement
Guest User

Untitled

a guest
Aug 28th, 2016
67
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 8.32 KB | None | 0 0
  1. ########
  2. ## About myself...
  3.  
  4. ##########
  5. ## how does one call a function programatically?
  6. ## with call and do.call
  7.  
  8. ## say I have a function to add two numbers
  9. ```{r}
  10. myf <- function(x, y) x + y
  11. myf(1, 1)
  12. ```
  13.  
  14. ## do.call evaluates the function "what", with the arguments "args" (supplied as a list)
  15. ```{r}
  16. args(do.call)
  17. do.call(myf, list(1, 1))
  18. ```
  19.  
  20. ## call prepares it but does not evaluate it, and instead takes arguments separately through the ... argument
  21. args(call)
  22. (c1 <- call("myf", 1, 1))
  23. eval(c1)
  24.  
  25. ## I'll focus on do.call
  26. ## For better or for worse, I also will ignore the "envir" option which allows the call to be executed in a particular environment.
  27. ## For better, because it would take longer than I have to address this in enough detail to be meaningful.
  28. ## For worse because every so often really weird things can happen if you don't understand what this means.
  29.  
  30. ##########
  31. ## most common use (perhaps) is as part of split-apply-combine, as so named and made famous by Hadley Wickham
  32. ## this is what we used to do before he wrote the plyr package, and still is occassionally useful
  33. ## say we have data that we want to split by group and apply a function to
  34.  
  35. ## here's a data frame with five groups, 10 observations of a and b for each group
  36. set.seed(5)
  37. d <- data.frame(group=rep(LETTERS[1:5], each=10), a=10 + round(rnorm(50, 1)), b=11 + round(rnorm(50,1)))
  38. ## split it by group
  39. ds <- split(d, d$group)
  40. ## apply a function to each part that gets mean a, mean b, confidence interval of difference for paired t test
  41. out <- lapply(ds, function(di) with(di, c(mean.a=mean(a), mean.b=mean(b), t.confint=t.test(a, b, paired=TRUE)$conf.int)))
  42. do.call(rbind, out)
  43.  
  44. #####################
  45. ## more on how it works, with lm (because it shows you the call itself in the output)
  46.  
  47. ## let's make and plot some data (avoiding dollar signs...)
  48. set.seed(1)
  49. d <- data.frame(x=round(runif(50, 0, 10), 1))
  50. d <- within(d, {
  51. y <- round(1 + 0.9*x + rnorm(n, 0, 4), 1)
  52. })
  53. with(d, plot(x, y))
  54.  
  55. ## normally would fit model like this
  56. lm(y ~ x, data=d)
  57.  
  58. ## so with do.call, let's try this...
  59. do.call(lm, list(y~x, data=d))
  60.  
  61. ## wow, what was that!
  62. ## two things: 1) function call is passed in as value of argument, and 2) parameters are evaluated before sending to do.call
  63. ## instead send the function as a character and wrap parameters we don't want evaluated first in "quote", which simply returns its
  64. ## argument when evaluated
  65. do.call("lm", list(y~x, data=quote(d)))
  66.  
  67. ## Both, however, give the correct result and can be used later with summary, coef, etc, as needed
  68. ## The fact that parameters are evaluated first, though, is potentially really handy!
  69.  
  70. #######
  71. ## this is my second main use for do.call
  72. ## to programatically fit different formulas
  73.  
  74. ## can store formula in a character variable,
  75. ## could be a vector of options that we loop through
  76. f <- "y ~ x"
  77.  
  78. ## will work without do.call, but call refers to "f"
  79. (m1 <- lm(f, data=d))
  80.  
  81. ## do. call instead evaluates the f first, so call is as expected
  82. (m2 <- do.call("lm", list(as.formula(f), data=quote(d))))
  83.  
  84.  
  85. ## doesn't matter in most contexts, but makes me a little nervous sometimes
  86. ## say we change f, then update the model (not that I use update a lot...)
  87. f <- "y ~ 1"
  88. update(m1)
  89. update(m2)
  90. ## there are situations when you might want either, I guess, but it feels cleaner to
  91. ## me to have the actual formula I used stored rather than a generic "f".
  92. ## Though if you think about it, I prefer the opposite for the data argument, so...
  93.  
  94.  
  95. #################
  96. ## using do.call for optimal use of optim
  97.  
  98. ## see also: stats4::mle and bbmle::mle2
  99. ## which makes some (possibly all) of this unneeded
  100. ## still, handy to see how parameters can be passed around from function to function
  101.  
  102. ## making data in global environment
  103. ## in real life, this would be in a data frame
  104. ## could explore how to write functions that use variables from within data frame but not today...
  105.  
  106. rm(list=ls())
  107. set.seed(1)
  108. n <- 50
  109. x <- round(runif(n, 0, 10), 1)
  110. y <- round(1 + 0.9*x + rnorm(n, 0, 4), 1)
  111. plot(x, y)
  112.  
  113. ## say we want to find the best linear fit, and standard deviation, assuming
  114. ## data is normally distributed around linear fit with constant variance
  115. ## y_i ~ N(a + bx_i, s^2)
  116. ## want to get parameters a, b, and s that maximize corresponding likelihood
  117. m1 <- lm(y ~ x)
  118. summary(m1)
  119.  
  120. ## I'll report log(s) instead of s itself, so that range can be over whole real line
  121. setNames(c(coef(m1), log(sqrt(sum(residuals(m1)^2)/n))), c("int", "slope", "logsd"))
  122.  
  123. ## say we want to do this "by hand"
  124. ## as an example of what to do if had more complex likelihood function
  125. ## we want to minimize the negative log likelihood
  126. negloglik <- function(int, slope, logsd, x, y) {
  127. -sum(dnorm(y, mean=int + slope * x, sd=exp(logsd), log=TRUE))
  128. }
  129.  
  130. ## using optim is an option to do this
  131. ## but optim wants the parameters as one parameter at the beginning
  132. args(optim)
  133.  
  134. ## method one
  135. ## positional, lengthy
  136. nll1 <- function(par, x, y) {
  137. negloglik(int=par[1], slope=par[2], logsd=par[3], x=x, y=y)
  138. }
  139. optim(par=c(int=1, slope=1, logsd=log(4)), nll1, x=x, y=y)$par
  140.  
  141. ## method two
  142. ## uses do.call to put them in call automatically
  143. nll2 <- function(par, x, y) {
  144. do.call(negloglik, c(as.list(par), list(x=x, y=y)))
  145. }
  146. optim(par=c(int=1, slope=1, logsd=log(4)), nll2, x=x, y=y)$par
  147.  
  148. #####
  149. ## aside: combining lists
  150.  
  151. ## combining lists seems like it should be straightforward
  152. ## here are two lists, let's combine them
  153. a1 <- list(x=1, y=2)
  154. a2 <- list(a=c(10,11), b=c(11,12))
  155. c(a1, a2)
  156.  
  157. ## but this only works if both of the things are lists,
  158. ## if not, it doesn't do what we want
  159. d <- 1:3
  160. c(a2, d=d)
  161. list(a2, d=d)
  162. c(a2, list(d=d))
  163.  
  164. ## the other function I used is `as.list`, which takes a vector
  165. ## and makes it into a list instead. Contrast with just `list`.
  166. a <- c(x=1, y=2)
  167. as.list(a)
  168. list(a)
  169. #####
  170.  
  171. ## method three
  172. ## uses do.call to optimize over some, not others
  173. ## this is really why I learned about this, to be able to profile the likelihood more easily
  174. nll3 <- function(par, x, y, ...) {
  175. do.call(negloglik, c(as.list(par), list(x=x, y=y), ...))
  176. }
  177. optim(par=c(int=1, slope=1), nll3, x=x, y=y, logsd=log(4))$par
  178.  
  179. ## method four
  180. ## more generic, as all parameters could get wrapped into the ...
  181. nll4 <- function(par, ...) {
  182. do.call(negloglik, c(as.list(par), list(...)))
  183. }
  184. optim(par=c(int=1, slope=1), nll4, x=x, y=y, logsd=log(4))$par
  185.  
  186. #######
  187. ## OK, now we're going to "show off..."
  188. ## not that you would choose these for this example
  189. ## but might be handy in other kinds of circumstances
  190.  
  191. ## method five
  192. ## a helper function for optim
  193. optimhelp <- function(par, FUN, ...) {
  194. do.call(FUN, c(as.list(par), list(...)))
  195. }
  196. optim(par=c(int=1, slope=1), optimhelp, FUN=negloglik, x=x, y=y, logsd=log(4))$par
  197.  
  198.  
  199. ## method six factory to make really generic
  200. ## ignoring issues with environments and closures right now...
  201. optfactory <- function(FUN) {
  202. function(par, ...) { do.call(FUN, c(as.list(par), list(...))) }
  203. }
  204.  
  205. nllpar <- optfactory(negloglik)
  206. optim(par=c(int=1, slope=1), nllpar, x=x, y=y, logsd=log(4))$par
  207.  
  208. ## method seven, build a new version of optim
  209. ## this automatically allows extra parameters to be sent to optim as well!
  210. optim2 <- function(par, fn, ...) {
  211. ff <- function(par, ...) { do.call(fn, c(as.list(par), list(...))) }
  212. optim(par=par, fn=ff, ...)
  213. }
  214. optim2(par=c(int=1, slope=1), fn=negloglik, x=x, y=y, logsd=log(4))$par
  215.  
  216. ## I suppose you could oneliner this too...
  217. optim(par=c(int=1, slope=1), fn=function(par, ...) { do.call(negloglik, c(as.list(par), list(...))) }, x=x, y=y, logsd=log(4))$par
  218.  
  219. ## or, can just include them as parameters,
  220. ## probably the clearest way to write the code for users
  221. optim2b <- function(par, fn, gr=NULL, ...,
  222. method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"),
  223. lower = -Inf, upper = Inf, control = list(), hessian = FALSE) {
  224. ff <- function(par, ...) { do.call(fn, c(as.list(par), list(...))) }
  225. optim(par=par, fn=ff, gr=gr, ..., method=method, lower=lower, upper=upper, control=control, hessian=hessian)
  226. }
  227. optim2b(par=c(int=1, slope=1), fn=negloglik, x=x, y=y, logsd=log(4), hessian=TRUE)[c("par", "hessian")]
  228.  
  229. ## or, for kicks, could show how to separate parameters for the two functions
  230. optim2c <- function(par, fn, extra, ...) {
  231. ff <- function(par, extra) { do.call(fn, c(as.list(par), extra)) }
  232. optim(par=par, fn=ff, extra=extra, ...)
  233. }
  234. optim2c(par=c(int=1, slope=1), fn=negloglik, extra=list(x=x, y=y, logsd=log(4)), hessian=TRUE)[c("par", "hessian")]
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement