Catnip MDP: optimal Bayesian solver

Costs:

    - [I Love Catnip! 2 oz Liquid Catnip Spray, by Pet MasterMind](http://www.amazon.com/Catnip-Pet-MasterMind-Premium-Natural/dp/B016R4EBKW/): \$8.96
    - [Honeysuckle Spray for Cats (2 oz)](https://www.amazon.com/dp/B000788TP8/): \$7.99
    - [Valerian Root Extract](https://www.amazon.com/dp/B002H2VYNG/): \$11.86 (another possibility: [Nature's Answer super-concentrated 1 fluid oz](https://www.vitacost.com/natures-answer-valerian-root-alcohol-free-1-fl-oz), \$7)
    - [Arata Silvervine 12x0.5g, Cat's Favorite, Best Value](https://www.amazon.com/dp/B008PNNMHW/): \$10.95 + \$6.82 (S&H from Japan for the silvervine) = \$17.77 or \$2.96/g

Load Bol et al 2017 for predicting response to one drug based on responses to others:

~~~{.R}
bol2017 <- read.csv("https://www.gwern.net/docs/catnip/2017-bol-cats.csv")
bol2017[,5:8] <- (bol2017[,5:8] > 0) # treat '5'/'10' (weak/strong response) as binary
bol2017Responses <- subset(bol2017, select=c("Catnip", "Valerian", "Silver.vine", "Tatarian.honeysuckle"))
colnames(bol2017Responses) <- c("Catnip", "Valerian", "Silvervine", "Honeysuckle") # rename for consistency
library(skimr)
skim(bol2017Responses)
#...  variable missing complete   n mean                   count
#       Catnip       1       99 100 0.68 TRU: 67, FAL: 32, NA: 1
#  Honeysuckle       3       97 100 0.53 TRU: 51, FAL: 46, NA: 3
#   Silvervine       0      100 100 0.79 TRU: 79, FAL: 21, NA: 0
#     Valerian       4       96 100 0.47 FAL: 51, TRU: 45, NA: 4
~~~

Greedy one-step policy + simple marginal probability chooses a test sequence of catnip/valerian/silvervine/honeysuckle:

Drug         Cost    P    Loss
----------- ----- ---- -------
     Catnip  8.96 0.67 2.9568
   Valerian  7.00 0.47 3.7100
 Silvervine 17.77 0.79 3.7317
Honeysuckle  7.99 0.53 3.7553

~~~
costs <- read.csv(stdin(), header=TRUE, colClasses=c("character", "numeric"))
Drug,Loss
Catnip,8.96
Valerian,7.00
Silvervine,17.77
Honeysuckle,7.99
~~~

Define a catnip testing MDP: we are testing 4 cat drugs with intercorrelated responses, we want to find 1 drug which works, and we want to minimize the monetary loss (since we have to pay for each one as defined above).
What is the optimal sequence, which minimizes expected loss before finding 1 working drug?
We can solve by backwards induction on a decision tree and using Bayesian linear regression to predict posterior probability of each remaining drug's success probability conditional on previous drugs not working:

~~~{.R}
library(rstanarm)
library(memoise)
library(boot)

predictBayes <- memoize(function (action, observations, df) {
    print(paste("Observations in predictBayes:", observations))
    formula <- as.formula(paste(action, paste(c(names(observations), "1"), collapse=" + "), sep=" ~ "))
    b <- stan_glm(formula, family=binomial(), algorithm="optimizing", data=bol2017Responses)
    print(b)
    if (length(observations)==0) { mean(inv.logit(predict(b))) } else { inv.logit(predict(b, newdata=as.data.frame(observations))[1]) } })

f <- memoize(function(observations) {
    allActions <- colnames(bol2017Responses)
    actions <- allActions[!allActions %in% (names(observations))] # untried drugs
    if (length(actions) == 0) { return(0) } else {

       actionCosts <- costs[costs$Drug %in% actions,]$Loss

       probabilities <- c(numeric())
       for (i in 1:length(actions)) {
          probabilities[i] <- predictBayes(actions[i], observations, bol2017Responses)
       }
       lossActions <- c(numeric())

       for (i in 1:length(actions)) {
           # we pay the loss to test each action; if successful, we stop here, otherwise we add a FALSE observation and recurse
           obs <- list(a=FALSE); names(obs) <- actions[i]
           obs <- c(observations, obs)
           newHistory <- obs[order(unlist(obs), decreasing = TRUE)] # sort history to avoid unnecessarily different but equivalent calls
           # 1. incur cost of each drug; success means stopping; failure: augment history with failure observation, and look for an optimal path through remaining actions recursively by calling 'f' again
           lossActions[i] <- actionCosts[i] + probabilities[i]*0 + ((1-probabilities[i]) * (f(newHistory)))
       }
       print(data.frame(Actions=actions, P=probabilities, Loss=lossActions))
       print(paste("Optimal action: ", actions[which.min(lossActions)], "; given observations: ", names(observations), observations))
       return(min(lossActions)) }
    })
# f(list(Catnip=FALSE, Honeysuckle=FALSE, Silvervine=FALSE))
f(list())

## No data, optimal choice is catnip:
#       Actions            P        Loss
# 1      Catnip 0.6810641502 15.79007271
# 2    Valerian 0.4731509696 16.92706363
# 3  Silvervine 0.7879218896 20.92153367
# 4 Honeysuckle 0.5276423201 16.51408849

## With catnip resistance, optimal choice is honeysuckle:
#
#       Actions            P        Loss
# 1    Valerian 0.2005122547 23.27730700
# 2  Silvervine 0.7202520977 21.49101933
# 3 Honeysuckle 0.3287608075 21.41519278

## With catnip/honeysuckle resistance, optimal choice is silvervine, and if silvervine doesn't work, only valerian is left:
#
#      Actions            P        Loss
# 1   Valerian 0.1695397115 21.75727933
# 2 Silvervine 0.6813415396 20.00060922

## Final optimal policy: catnip, honeysuckle, silvervine, valerian; aside from catnip, almost reverse of greedy policy.
~~~